• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

example of a medical hypothesis

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

example of a medical hypothesis

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

How Doctors Generate Diagnostic Hypotheses: A Study of Radiological Diagnosis with Functional Magnetic Resonance Imaging

* E-mail: [email protected]

Affiliation Laboratory of Medical Informatics (LIM 01), Faculty of Medicine of the University of São Paulo, São Paulo, Brazil

Affiliation Department and Institute of Radiology (LIM 44), Faculty of Medicine of the University of São Paulo, São Paulo, Brazil

Affiliations Department and Institute of Radiology (LIM 44), Faculty of Medicine of the University of São Paulo, São Paulo, Brazil, Center for Mathematics, Computation and Cognition, Federal University of ABC, Santo André, Brazil

Affiliation Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom

  • Marcio Melo, 
  • Daniel J. Scarpin, 
  • Edson Amaro Jr, 
  • Rodrigo B. D. Passos, 
  • João R. Sato, 
  • Karl J. Friston, 
  • Cathy J. Price

PLOS

  • Published: December 14, 2011
  • https://doi.org/10.1371/journal.pone.0028752
  • Reader Comments

Figure 1

In medical practice, diagnostic hypotheses are often made by physicians in the first moments of contact with patients; sometimes even before they report their symptoms. We propose that generation of diagnostic hypotheses in this context is the result of cognitive processes subserved by brain mechanisms that are similar to those involved in naming objects or concepts in everyday life.

Methodology and Principal Findings

To test this proposal we developed an experimental paradigm with functional magnetic resonance imaging (fMRI) using radiological diagnosis as a model. Twenty-five radiologists diagnosed lesions in chest X-ray images and named non-medical targets (animals) embedded in chest X-ray images while being scanned in a fMRI session. Images were presented for 1.5 seconds; response times (RTs) and the ensuing cortical activations were assessed. The mean response time for diagnosing lesions was 1.33 (SD ±0.14) seconds and 1.23 (SD ±0.13) seconds for naming animals. 72% of the radiologists reported cogitating differential diagnoses during trials (3.5 seconds). The overall pattern of cortical activations was remarkably similar for both types of targets. However, within the neural systems shared by both stimuli, activation was significantly greater in left inferior frontal sulcus and posterior cingulate cortex for lesions relative to animals.

Conclusions

Generation of diagnostic hypotheses and differential diagnoses made through the immediate visual recognition of clinical signs can be a fast and automatic process. The co-localization of significant brain activation for lesions and animals suggests that generating diagnostic hypotheses for lesions and naming animals are served by the same neuronal systems. Nevertheless, diagnosing lesions was cognitively more demanding and associated with more activation in higher order cortical areas. These results support the hypothesis that medical diagnoses based on prompt visual recognition of clinical signs and naming in everyday life are supported by similar brain systems.

Citation: Melo M, Scarpin DJ, Amaro E Jr, Passos RBD, Sato JR, Friston KJ, et al. (2011) How Doctors Generate Diagnostic Hypotheses: A Study of Radiological Diagnosis with Functional Magnetic Resonance Imaging. PLoS ONE 6(12): e28752. https://doi.org/10.1371/journal.pone.0028752

Editor: André Aleman, University of Groningen, Netherlands

Received: August 15, 2011; Accepted: November 14, 2011; Published: December 14, 2011

Copyright: © 2011 Melo et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This study is part of the Cooperação Interinstitucional de Apoio à Pesquisa sobre o Cerebro (CINAPCE) funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil. Cathy J. Price and Karl J. Friston are funded by the Wellcome Trust, UK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

There is substantial and converging evidence that a significant part of the understanding of the environment that we have in our everyday lives is carried out by brain mechanisms that are fast, automatic, and effortless [1] , [2] , [3] . Possibly as a consequence of these processes, diagnostic hypotheses in medical practice are often made by physicians in the first moments of contact with patients; sometimes even before the report of symptoms [4] , [5] , [6] , [7] , [8] . To exemplify, when a doctor encounters a patient with pronounced jaundice diagnostic hypotheses related to liver diseases immediately and automatically come to her/his awareness. This type of diagnosis has been ascribed to pattern recognition or non-analytical reasoning [9] , [10] .

We propose that the generation of diagnostic hypotheses in such circumstances is the result of neurocognitive processes that are similar to those involved in naming objects or concepts in everyday life. Conversely, recognition of objects in everyday life can be conceptualized as a diagnostic process [11] . A critical test of this proposal would be to compare the brain systems involved in diagnosing lesions with those involved in naming. To explore this hypothesis, radiological diagnosis was used as a model in the visual domain. We developed an experimental paradigm in which radiologists diagnosed lesions in chest X-ray images and named non-medical targets (animals) embedded in chest X-ray images, during functional magnetic resonance imaging (fMRI). We expected to show that diagnosing lesions and naming animals, presented in the same context, would produce a similar pattern of brain activations. Naming letters was introduced as a control task (see Figure 1 ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

*Arrows pointing to targets in the image; not present in the original images.

https://doi.org/10.1371/journal.pone.0028752.g001

Mean response times (RTs), error and hesitation rates are shown in Table 1 .

thumbnail

https://doi.org/10.1371/journal.pone.0028752.t001

When naming lesions, subjects reported becoming aware of a greater number of potential names in comparison to animals and letters (as evidenced by the lexical semantic association indices in Table 1 ). The associated words were usually different names, i.e. synonyms, for the same target, e.g. ‘enlarged heart’ while diagnosing cardiomegaly, or the two words name for the lesion, e.g. ‘mediastinal enlargement’. However, 18 (72%) participants reported that while diagnosing some lesions the name of alternative diagnoses came to mind, e.g. ‘bulla’ while diagnosing a cavitation; 15.8% of the lexical semantic associations for lesions were differential diagnoses. Twenty-two (88%) participants reported becoming aware of the name of other animals as alternatives to some of the animals they were naming; e.g. ‘dromedary’ while naming a camel; 64.0% of the lexical semantic associations in this category were the names of other animals.

In 8.00% of the correct responses for lesions, subjects responded with a one-word name other than the name learned during training (e.g. ‘condensation’ in response to pneumonia). Also in 5.13% of lesion trials, the correct responses had more than one word (e.g. ‘aortic elongation’ or ‘pleural effusion’).

The patterns of cortical activations observed when naming each category of stimulus (relative to a control baseline of null events), were strikingly similar in their anatomical deployment ( Figure 2 ). When lexical semantic associations were not controlled, activation was higher for naming lesions than naming animals and letters in the left inferior frontal sulcus and posterior cingulate cortex ( Table 2 and Figure 3 ). Activation in the same areas was also higher for naming animals than naming letters (p<0.001 uncorrected). This decreasing order of activation: lesions>animals>letters ( Figure 3 ) is parallel to a similar order of diminishing lexical semantic association indices ( Table 1 ). Indeed, when lexical semantic associations were co-varied out in the second statistical analysis, there were no areas where activation was significantly higher for lesions than animals and letters. This contrasts to the activation in more posterior regions, posterior fusiform gyrus and posterolateral occipital cortex, that was higher for animals than lesions ( Table 2 and Figure 2 ).

thumbnail

*family wise error rate corrected p<0.05. Statistical parametric maps rendered on an International Consortium of Brain Mapping individual brain.

https://doi.org/10.1371/journal.pone.0028752.g002

thumbnail

* 90% confidence interval. # contrast [lesions>(animals and letters)] inclusively masked with lesions>baseline, lesions>animals, and lesions>letters at p = 0.001.

https://doi.org/10.1371/journal.pone.0028752.g003

thumbnail

https://doi.org/10.1371/journal.pone.0028752.t002

In summary, naming lesions, animals and letters activated the same set of distributed brain regions but to significantly different degrees. In high-order cortical regions (prefrontal and cingulate cortices), activation was proportional to the number of lexical semantic associations (lesions>animals>letters), while in visual cortices activation was higher for animal naming.

This investigation was conducted to test the proposition that generation of diagnostic hypotheses evoked by the immediate visual recognition of clinical signs engages neural systems that are recruited when naming objects in everyday life. The results support this hypothesis by showing very significant and similar activations in a circumscribed set of distributed cortical regions when naming radiological lesions and animals in the same context. However, diagnosing lesions was cognitively more demanding and associated with more activation in higher order cortical areas.

Higher mean RTs, error, and hesitation rates suggest that, on average, diagnosing lesions was more difficult than naming animals in our experimental setting. This could be related to the visual characteristics of lesions and/or the fact that low-frequency (in everyday language) lesion names are more difficult to recall, compared to high-frequency animal and letter names [12] . Lexical semantic associations, i.e. being aware of words/concepts other than that vocalized, were more frequent while diagnosing lesions ( Table 1 ), as compared to the other two categories, also indicating greater cognitive demand related to the selection of appropriate names in this particular task.

A relevant aspect of our results was that generation of diagnostic hypotheses can be very fast; the mean RT to diagnose lesions was 1.33 seconds. It is possible that the training before the fMRI experiment contributed to this performance. But very rapid identification of lesions (<1 second) has already been reported in radiology studies [13] , [14] , [15] .

An important finding was that radiologists were able to cogitate differential diagnoses during the 3.5 seconds of a trial. A similar awareness of alternative names of other animals was reported while naming animals. In a few cases even letters evoked words to subjects ( Table 1 ). Participants were not instructed to make differential diagnoses or to think about alternative names for animals during the task. These results are compatible with a fast and automatic semantic association process, in which the recall of a diagnosis or a name of an animal occurs with the concomitant activation of semantically related concepts [16] , [17] , [18] . Clearly, a formal and definitive diagnosis can not be made in seconds but its core cognitive process, the generation of diagnostic hypotheses which were the names of lesions in our study, is crucial for a final correct diagnosis [4] , [5] , [6] , [7] , [8] .

Diagnosing lesions in radiological images can be conceptualized as a process that is similar to localizing and naming objects in a scene [19] . There are several fMRI studies of location and recognition [20] or naming objects [21] but we are unaware of studies in which all of these tasks are combined. We found very significant brain activations associated with the localization, recognition and naming of targets in radiological images, be they lesions, animals, or letters ( Figure 2 ). Activations were greater in the left inferior frontal sulcus and posterior cingulate cortex for naming lesions than animals and also greater for naming animals than letters ( Table 2 and Figure 3 ). When lexical semantic associations, more frequent for lesions ( Table1 ), were taken into account in an analysis of covariance the difference of activations between those regions was no longer significant (at a corrected level). All the regions activated in the present study have also been reported in object naming in other studies [21] , therefore there was no indication that the participating radiogists were naming animals differently from laypeople.

In agreement with our findings, activation in the left inferior frontal sulcus was reported in two fMRI studies of semantic verbal fluency - generating and vocalizing associated words in response to a word - in which there was greater lexical semantic demand in contrast to the comparison task, reading aloud [22] , [23] . This is consistent with increased cognitive control when it is necessary to make a choice between synonymous or competing concepts, e.g. synonym words or differential diagnoses in our study, respectively [24] .

There were regions more activated by naming animals relative to diagnosing lesions. These regions are usually associated with visual processing and recognition of stimuli; namely, posterior fusiform cortex and posterolateral occipital cortex ( Table 2 and Figure 2 ) [25] . We believe that differences in the visual characteristics of the stimulus categories we used are responsible for the observed differences in activations in these cortical areas.

The cognitive mechanisms underlying medical diagnosis have been studied with different conceptual strategies [10] , [26] , [27] . One important approach relevant to the present study considers it a classification process similar to categorization in everyday life and several authors investigated diagnostic processes with different categorization models [28] , [29] , [30] , [31] , [32] , [33] , [34] . These studies explored several aspects of the cognitive psychology of diagnostic reasoning but did not include a comparison of medical diagnosis tasks with, e.g., categorization of objects.

Interestingly, from a historical point of view, similarities between the classification of diseases and living creatures have been suggested in the past. In the XVII th century Thomas Sydenham in his influential definition of diseases proposed to conceptualize them as specific (ontological) entities similar to plant species [35] . Influenced by Sydenham ideas, Boissier des Sauvages created a classification of diseases, nosology, based on taxonomic principles used in botany and his approach was followed by other phyisicians in the XVIII th century, including Carl Linnaeus [36] .

Instead of using categorization, a concept with different meanings and diverse and conflicting models [37] , [38] , [39] , we chose naming as a conceptually more prudent and descriptive approach. Picture naming has been a model extensively used in cognitive psychology [40] , [41] , [42] , [43] , and functional neuroimaging [21] to investigate how objects are recognized and named.

Some cognitive processes underlying medical skills have been studied with fMRI: one study investigated the neural substrate of visuo-spatial skills in surgery residents [44] and the other compared brain activations in radiologists versus lay participants while viewing X-ray images [45] . However, the neural basis of the medical diagnostic processes per se has not been investigated before.

Our experimental design was not planned to assess the generation of differential diagnostic hypotheses. Taking into account the limitations of the cued retrospective recall employed in the study, the conclusions resulting from the lexical semantic associations data need to be replicated using other methodological approaches.

In contrast to naturalistic and observational studies, planned experiments are by definition artificial and reductionistic due to the need to limit and control independent variables. Radiologists in their usual practice do not usually vocalize their diagnostic hypotheses as they come to their awareness. Conversely, we do not habitually vocalize the names of objects as we recognize them in our everyday life.

Radiologists customarily verbalize their diagnosis using more than one word, e.g. pleural effusion in right hemithorax, in contrast to one-word names for animals. This difference in length of responses could introduce an important confound variable [46] , [47] . To circumvent it, we trained the participants to preferentially use one-word names to diagnose lesions (see Methods and Appendix S1 for details). Probably as a consequence of this experimental stratagem the complete names of the lesions were reported to came to the awareness of participants and were considered competing lexical-semantic associations (see Results ).

Under the blanket rubric ‘medical diagnosis’ there are different cognitive tasks and processes. Considering the case in point of radiological diagnosis: The immediate recognition and diagnosis of an obvious lesion probably recruits different neurocognitive processes as compared to the diagnosis of a subtle and ambiguous alteration with complex differential diagnoses requiring a detailed examination of the radiological image. To create our experimental design we had to limit the scope of the investigation and the conclusions of our study are restricted to the diagnosis of lesions that are immediately identified and diagnosed. It will be important to replicate these results with other approaches, e.g. electrophysiological methods such as electroencephalography or magnetoencephalography. The conceptual hypothesis also needs to be tested in other medical specialties in which diagnosis is strongly based on visual clinical data, e.g. dermatology.

This study is an attempt to investigate the brain mechanisms subserving medical diagnosis. We have demonstrated that differential diagnoses can be automatically elicited in a time frame of seconds in response to clinical signs. Our results support the hypothesis that a process similar to naming things in everyday life occurs when a physician promptly recognizes a characteristic and previously known lesion. In our experimental model, the diagnostic task was cognitively more taxing; more activation in higher order cortical areas was plausibly associated with demands related to the selection of appropriate names as compared to the control task.

The importance of non-analytical reasoning in medical diagnosis has been increasingly stressed [9] , [10] , [27] , [48] . Our study is a contribution to the understanding of its mechanisms. There are recent reviews proposing the application of the knowledge acquired in neuroscience to improve medical education methods [49] , [50] , [51] . An implication of our results is that information obtained from cognitive neuroscience studies on the recognition and naming of objects can be brought to bear on the improvement of diagnostic expertise in the visual domain. In addition, the conceptual hypothesis and the methodological approach described in the present investigation may open new ways to develop studies in medical diagnosis.

Materials and Methods

Participants.

Twenty-six radiologists participated in the investigation. One subject was excluded because the responses were not recorded due to technical problems. Inclusion criteria were completion of radiology residency, right-handedness (as assessed by a modified version of the Edinburgh Handedness Inventory [52] ), and Portuguese as the native language; exclusion criteria were neurological and psychiatric disorders. Sixteen participants were male. The mean age of subjects was 35.9 years (range: 27–55), with a mean of 11.6 years of radiological practice (range: 4–30).

Ethics statement

The protocol was approved by the research ethics committee of the Clinics Hospital, Faculty of Medicine of the University of São Paulo, Brazil. All participants gave written informed consent. They did not receive monetary compensation for their participation.

Radiological images

Our experimental design required the radiological images to have just one circumscribed visual target that could be named. Since many thoracic radiological lesions co-occur, e.g. cardiomegaly is commonly associated with radiological signs of pulmonary venous congestion, we embedded lesions in normal X-ray images using image editing software.

Twenty different types of thoracic radiological lesions, with six different exemplars of each, were created. We used clearly identifiable and easily diagnosable lesions to minimize expertise confounds at the between-subject level and to ensure ceiling performance (to preclude performance confounds). The face validity [53] of radiological images with lesions was assessed by two senior thoracic radiologists. To create non-medical targets line drawings of animals were superimposed on the radiological images. These targets were selected from the database of the International Picture Naming Project [54] . Each type of animal, with six different exemplars, was paired with one type of lesion. Finally, 20 different consonant letters, each with six exemplars from different fonts, were paired with each type of lesion. The resulting radiological images comprised six sets of 60 different stimuli: 20 with lesions, 20 with animals, and 20 with letters.

Longer words or naming targets with more than one word might be associated with longer response times and different patterns of brain activations in regions involved in language processing [46] , [47] . To control for this confounder, we created a list of one-word names to diagnose lesions, e.g. ‘effusion’ for pleural effusion, and asked subjects to use these terms. In additon the duration of vocalization of the radiological lesion names was paired to that of the animal names. Searching for lesions, animals, and letters with the accompanying eye movements was an important variable. We controlled for it matching the locations of the three types of alterations in the chest X-ray images. The methodology used to create the radiological images is detailed in Appendix S1 .

There are many subtleties in a veridical radiological lesion that can not be reproduced with image editing software. For this reason the lesions we created can be considered caricatures of true lesions in the same way that line drawings are an iconic representation of animals.

The key differences between the three categories of stimuli comprised: 1- the visual attributes of lesions where most lesions had simpler and more heterogeneous forms compared to animals and letters which had more defined and homogeneous contours (see figure 1 ); 2- despite the absence of quantitative data on word frequency in medical domain, probably medical terms have in general a lower frequency in daily language and an older age of acquisition in comparison to animals and letters.

The experiment

The creation of images with just one target and the short viewing time of the stimulus images were critical points of our experimental strategy; they were intended to block a more careful scanning of the radiological images, which normally occurs in radiological practice. The neurocognitive processes involved in detailed scanning of the image and images with different numbers of targets to name would be important experimental confounders.

Radiological stimuli were projected through a magnetic shielded glass window to a screen inside the scanner room using a Dell 2400MP digital slide projector. The stimuli subtended 12.5° horizontal and 9.4° vertical visual angles. Each image was presented for 1.5 seconds. Radiologists were oriented just to name the target (lesion, animal or letter) as soon as it was recognized. The task implicitly involved localizing the target, recognizing it, retrieving its name and articulating the response [21] , [55] , [56] . Every image presentation was followed by a black screen with a white (central) fixation cross for 2.0 seconds (i.e., 3.5 seconds per trial). This design was optimized during pilot testing to minimize trial duration, while preserving near-ceiling performance. Participants were trained immediately before the scanning session with three different sets of images.

We used an event-related design: In each session, there were 60 trials (20 lesions, 20 animals, and 20 letters) and 20 null events (with just a fixation cross). There were three sessions per participant with three different sets of images to preclude perceptual learning, repetition suppression and other adaptation effects confounding the naming related responses. Three different sequences of stimuli presentation were optimized in terms of the efficiency to disclose fMRI responses using a genetic algorithm [57] . Each of the three sets of images was presented with one of the three optimal sequences. The order of the sequences was counterbalanced over subjects. The image sets and the order of their presentation for training and scanning were also counterbalanced between participants.

There were two control conditions: 1- naming letters, a high-level baseline, with all cognitive components of the task in diagnosing lesions and naming animals except word retrieval; 2- null events intermixed with stimuli, with a fixation cross during 3.5 seconds, during which participants had no task to execute.

Response time was defined as the elapsed time between the stimulus onset (image presentation) to the beginning of vocalization of the response. Error was defined as error proper or no response. Hesitations were considered as: 1- beginning to vocalize a word and changing to another or 2- stammering on the beginning of the vocalization.

Words and concepts may have different numbers of other words and concepts semantically associated to them [18] . There are indications that the number of potential associates to a word may influence the pattern of brain activations in tasks involving word production [58] . To control for this variable subjects were debriefed immediately after the experiment to assess lexical semantic associations - words other than that vocalized that came to their awareness while naming the different types of stimuli - in each type of stimulus following a standardized protocol (see Appendix S1 ). Participants were not informed beforehand of the debriefing protocol.

Retrospective recall has limitations [59] but we could not monitor those associations during fMRI data acquisition. To instruct subjects to report them after each stimulus could induce participants to actively search for associated words and concepts and create an important confounder in the experimental design. However, classical memory studies investigating cued recall found very high recall rates in tasks more demanding than in our investigation in particular when there was semantic processing while encoding the stimuli [60] , [61] , [62] . Also, retrospective recall methods have been used as reliable assessments in the context of fMRI experiments [63] , [64] .

Data acquisition

MR images were acquired in a 3 Tesla Philips Achieva system with an 8-channel head coil. Blood oxygenation level-dependent (BOLD) sensitive T2*-weighted images were obtained using an SENSE gradient-echo echo-planar imaging pulse sequence with the following parameters: repetition time: 2500 ms, echo time: 30 ms, flip angle: 90°, field of view: 240 mm 2 , in-plane voxel resolution: 3 mm 2 . Fifty 3 mm axial slices were acquired, with a slice gap of 0.3 mm and a +30° image plan tilt to reduce artifacts in inferior temporal lobe [65] . Functional sessions were preceded by 10.0 s of dummy scans to ensure steady-state magnetization. A T1-weighted structural image (voxel size: 1 mm 3 ) was acquired after the functional sessions for coregistration with the fMRI data.

Stimulus presentation and response recording were performed with E-Prime 2.0 software (Psychology Software Tools). A plastic mouthpiece was anatomically adjusted to the mouth of the participants enabling to isolate the sound of the vocalization of responses from the scanner's noise. The voice sound was conducted through a pneumatic system to a high-sensitivity microphone outside the scanner room, pre-amplified, and recorded. Barch et al. described a similar approach to record overt verbal responses [66] . Response times were measured following a standardized protocol using Praat 5.1, a software for phonetic analysis [67] , after filtration of the background noise.

Statistical and data analyses

Data processing and statistical analyses were conducted using SPM8 software (Wellcome Trust Centre for Neuroimaging) [68] . Functional volumes were realigned, un-warped, coregistered to the structural image, normalized to the MNI space, and smoothed with an isotropic Gaussian kernel with 6 mm FWHM.

After preprocessing, a first-level analysis was conducted at the individual level to estimate category-specific activations at each voxel. Time-series from each voxel were high-pass filtered with a cut-off period of 1/128 Hz to remove signal drift and low-frequency noise. A gray-matter image resulting from the segmentation of the structural image was used as a mask in the analysis of functional activations. All category-specific (lesion, animal or letter) trials were modeled as a stick-function and convolved with a canonical hemodynamic response function. Trials with errors, hesitations, and outlying RT's (>2 standard deviations of the mean RT for the respective target type) were modeled as events of no interest. RT's were also included as nuisance variables to remove response time effects within condition.

Category-specific activations were estimated in the usual way using the appropriate t-contrast. The resulting subject-specific contrast images were then entered into a second-level analysis of covariance (ANCOVA). This (random effects) between-subject analysis was conducted without and with lexical semantic associations (the mean number of lexical semantic associations for each category made by every participant) as a nuisance variable. The differences in category-specific activation were then assessed using statistical parametric maps (SPMs) with a criterion of p <0.05 (corrected for multiple comparisons using random field theory).

To limit head movements during the experiment participants were trained before the scanning session. The un-warping of the functional images during preprocessing with SPM8 was an additional measure to compensate for movements during the vocalization of responses [69] . Head movements during fMRI sessions were generally small, with intra-session translation and rotation movements of less than 1.5 mm and 1.5° respectively.

Supporting information.

The methodology for the creation of radiological images with embedded targets, training and debriefing protocols are detailed in Appendix S1 .

Supporting Information

Appendix s1..

https://doi.org/10.1371/journal.pone.0028752.s001

Acknowledgments

We are grateful for the cooperation of the radiologists that participated in all stages of the study and to the staff of the Magnetic Resonance Service of the Institute of Radiology, Clinics Hospital of the Faculty of Medicine, University of São Paulo. We are also thankful for the collaboration of the following people: Maria da Graça Morais Martin contributed to the experimental design suggesting the insertion of objects in the radiological images; Claudio Lucarelli and Marcelo Buarque de Gusmão Funari were responsible for the validation of the lesion images. José Jamelão Macedo de Medeiros, Fernando Araujo Del Lama, Claudia da Costa Leite, and Eduardo Massad assisted the project in several ways.

Data sharing . The complete set of radiological images used in the study are available on request from the corresponding author.

Author Contributions

Conceived and designed the experiments: MM DJS EA RBDP KJF CJP. Performed the experiments: MM. Analyzed the data: MM CJP JRS EA KJF. Contributed reagents/materials/analysis tools: KJF. Wrote the paper: MM KJF CJP. Development of the conceptual model: MM. Creation of the stimuli (radiological images): DJS RBDP MM.

  • View Article
  • Google Scholar
  • 4. Elstein AS, Shulman LS, Sprafka SA (1978) Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, MA: Harvard University Press.
  • 17. Anderson JR (1983) The Architecture of Cognition. Cambridge, MA: Harvard University Press.
  • 25. Grill-Spector K (2009) What has fMRI taught us about object recognition? In: Dickinson SJ, Leonardis A, Schiele B, Tarr MJ, editors. Object Categorization: Computer and Human Vision Perspectives. Cambridge, U.K.: Cambridge University Press. pp. 102–128.
  • 27. Kassirer JP, Wong J, Kopelman RI (2010) Learning Clinical Reasoning. Baltimore, MD: Wolters Kluwer.
  • 35. Faber K (1930) Nosography: The Evolution of Clinical Medicine in Modern Times. New York: Paul B. Hoeber Inc.
  • 36. Bynum WF (1997) Nosology. In: Bynum WF, Porter R, editors. Companion Encyclopedia of the History of Medicine. London: Routledge. pp. 335–356.
  • 37. Smith EE, Medin DL (1981) Categories and Concepts. Cambridge, MA: Harvard University Press.
  • 39. Murphy GL (2004) The Big Book of Concepts. Cambridge, MA: MIT Press.
  • 53. Kerlinger FN, Lee HB (2000) Validity. In: Kerlinger FN, Lee HB, editors. Foundations of Behavioral Research. 4th ed. Belmont, CA: Cengage. pp. 665–688.
  • 59. Lockhart RS (2000) Methods of memory research. In: Tulving E, Craik FIM, editors. The Oxford Handbook of Memory. Oxford, U.K.: Oxford University Press. pp. 45–57.
  • 68. Friston KJ, Ashburner JT, Kiebel SJ, Nichols TE, Penny WD (2007) Statistical Parametric Mapping: the Analysis of Functional Brain Images. London: Academic Press.

Enago Academy

Quick Guide to Biostatistics in Clinical Research: Hypothesis Testing

' src=

In this article series, we will be looking at some of the important concepts of biostatistics in clinical trials and clinical research. Statistics is frequently used to analyze quantitative research data. Clinical trials and clinical research both often rely on statistics. Clinical trials proceed through many phases . Contract Research Organizations (CRO) can be hired to conduct a clinical trial. Clinical trials are an important step in deciding if a treatment can be safely and effectively used in medical practice. Once the clinical trial phases are completed, biostatistics is used to analyze the results.

Research generally proceeds in an orderly fashion as shown below.

Research Process

Once you have identified the research question you need to answer, it is time to frame a good hypothesis. The hypothesis is the starting point for biostatistics and is usually based on a theory. Experiments are then designed to test the hypothesis. What is a hypothesis ? A research hypothesis is a statement describing a relationship between two or more variables that can be tested. A good hypothesis will be clear, avoid moral judgments, specific, objective, and relevant to the research question. Above all, a hypothesis must be testable.

A simple hypothesis would contain one predictor and one outcome variable. For instance, if your hypothesis was, “Chocolate consumption is linked to type II diabetes” the predictor would be whether or not a person eats chocolate and the outcome would be developing type II diabetes. A good hypothesis would also be specific. This means that it should be clear which subjects and research methodology will be used to test the hypothesis. An example of a specific hypothesis would be, “Adults who consume more than 20 grams of milk chocolate per day, as measured by a questionnaire over the course of 12 months, are more likely to develop type II diabetes than adults who consume less than 10 grams of milk chocolate per day.”

Null and Alternative Hypothesis

In statistics, the null hypothesis (H 0 ) states that there is no relationship between the predictor and the outcome variable in the population being studied. For instance, “There is no relationship between a family history of depression and the probability that a person will attempt suicide.” The alternative hypothesis (H 1 ) states that there is a relationship between the predictor (a history of depression) and the outcome (attempted suicide). It is impossible to prove a statement by making several observations but it is possible to disprove a statement with a single observation. If you always saw red tulips, it is not proof that no other colors exist. However, seeing a single tulip that was not red would immediately prove that the statement, “All tulips are red” is false. This is why statistics tests the null hypothesis. It is also why the alternative hypothesis cannot be tested directly.

The alternative hypothesis proposed in medical research may be one-tailed or two-tailed. A one-tailed alternative hypothesis would predict the direction of the effect. Clinical studies may have an alternative hypothesis that patients taking the study drug will have a lower cholesterol level than those taking a placebo. This is an example of a one-tailed hypothesis. A two-tailed alternative hypothesis would only state that there is an association without specifying a direction. An example would be, “Patients who take the study drug will have a significantly different cholesterol level than those patients taking a placebo”. The alternative hypothesis does not state if that level will be higher or lower in those taking the placebo.

The P-Value Approach to Test Hypothesis

Once the hypothesis has been designed, statistical tests help you to decide if you should accept or reject the null hypothesis. Statistical tests determine the p-value associated with the research data. The p-value is the probability that one could have obtained the result by chance; assuming the null hypothesis (H 0 ) was true. You must reject the null hypothesis if the p-value of the data falls below the predetermined level of statistical significance. Usually, the level of statistical significance is set at 0.05. If the p- value is less than 0.05, then you would reject the null hypothesis stating that there is no relationship between the predictor and the outcome in the sample population.

However, if the p-value is greater than the predetermined level of significance, then there is no statistically significant association between the predictor and the outcome variable. This does not mean that there is no association between the predictor and the outcome in the population. It only means that the difference between the relationship observed and the relationship that could have occurred by random chance is small.

For example, null hypothesis (H 0 ): The patients who take the study drug after a heart attack did not have a better chance of not having a second heart attack over the next 24 months.

Data suggests that those who did not take the study drug were twice as likely to have a second heart attack with a p-value of 0.08. This p-value would indicate that there was an 8% chance that you would see a similar result (people on the placebo being twice as likely to have a second heart attack) in the general population because of random chance.

The hypothesis is not a trivial part of the clinical research process. It is a key element in a good biostatistics plan regardless of the clinical trial phase. There are many other concepts that are important for analyzing data from clinical trials. In our next article in the series, we will examine hypothesis testing for one or many populations, as well as error types.

' src=

Thank you for this very informative article. You describe all the things very well. I am doing a fellowship in Clinical research training. This information really helps me a lot in my research studies. I have been connected with your site since a long time for such updates. Thank you once again

Rate this article Cancel Reply

Your email address will not be published.

example of a medical hypothesis

Enago Academy's Most Popular Articles

manuscript writing with AI

  • AI in Academia
  • Infographic
  • Manuscripts & Grants
  • Reporting Research
  • Trending Now

Can AI Tools Prepare a Research Manuscript From Scratch? — A comprehensive guide

As technology continues to advance, the question of whether artificial intelligence (AI) tools can prepare…

difference between abstract and introduction

Abstract Vs. Introduction — Do you know the difference?

Ross wants to publish his research. Feeling positive about his research outcomes, he begins to…

example of a medical hypothesis

  • Old Webinars
  • Webinar Mobile App

Demystifying Research Methodology With Field Experts

Choosing research methodology Research design and methodology Evidence-based research approach How RAxter can assist researchers

Best Research Methodology

  • Manuscript Preparation
  • Publishing Research

How to Choose Best Research Methodology for Your Study

Successful research conduction requires proper planning and execution. While there are multiple reasons and aspects…

Methods and Methodology

Top 5 Key Differences Between Methods and Methodology

While burning the midnight oil during literature review, most researchers do not realize that the…

How to Draft the Acknowledgment Section of a Manuscript

Discussion Vs. Conclusion: Know the Difference Before Drafting Manuscripts

example of a medical hypothesis

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

example of a medical hypothesis

As a researcher, what do you consider most when choosing an image manipulation detector?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.19(7); 2019 Jul

Hypothesis tests

Associated data.

  • • Hypothesis tests are used to assess whether a difference between two samples represents a real difference between the populations from which the samples were taken.
  • • A null hypothesis of ‘no difference’ is taken as a starting point, and we calculate the probability that both sets of data came from the same population. This probability is expressed as a p -value.
  • • When the null hypothesis is false, p- values tend to be small. When the null hypothesis is true, any p- value is equally likely.

Learning objectives

By reading this article, you should be able to:

  • • Explain why hypothesis testing is used.
  • • Use a table to determine which hypothesis test should be used for a particular situation.
  • • Interpret a p- value.

A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a ‘ p- value’, on the basis of which a decision is made about the truth of the hypothesis under investigation. All of the routine statistical ‘tests’ used in research— t- tests, χ 2 tests, Mann–Whitney tests, etc.—are all hypothesis tests, and in spite of their differences they are all used in essentially the same way. But why do we use them at all?

Comparing the heights of two individuals is easy: we can measure their height in a standardised way and compare them. When we want to compare the heights of two small well-defined groups (for example two groups of children), we need to use a summary statistic that we can calculate for each group. Such summaries (means, medians, etc.) form the basis of descriptive statistics, and are well described elsewhere. 1 However, a problem arises when we try to compare very large groups or populations: it may be impractical or even impossible to take a measurement from everyone in the population, and by the time you do so, the population itself will have changed. A similar problem arises when we try to describe the effects of drugs—for example by how much on average does a particular vasopressor increase MAP?

To solve this problem, we use random samples to estimate values for populations. By convention, the values we calculate from samples are referred to as statistics and denoted by Latin letters ( x ¯ for sample mean; SD for sample standard deviation) while the unknown population values are called parameters , and denoted by Greek letters (μ for population mean, σ for population standard deviation).

Inferential statistics describes the methods we use to estimate population parameters from random samples; how we can quantify the level of inaccuracy in a sample statistic; and how we can go on to use these estimates to compare populations.

Sampling error

There are many reasons why a sample may give an inaccurate picture of the population it represents: it may be biased, it may not be big enough, and it may not be truly random. However, even if we have been careful to avoid these pitfalls, there is an inherent difference between the sample and the population at large. To illustrate this, let us imagine that the actual average height of males in London is 174 cm. If I were to sample 100 male Londoners and take a mean of their heights, I would be very unlikely to get exactly 174 cm. Furthermore, if somebody else were to perform the same exercise, it would be unlikely that they would get the same answer as I did. The sample mean is different each time it is taken, and the way it differs from the actual mean of the population is described by the standard error of the mean (standard error, or SEM ). The standard error is larger if there is a lot of variation in the population, and becomes smaller as the sample size increases. It is calculated thus:

where SD is the sample standard deviation, and n is the sample size.

As errors are normally distributed, we can use this to estimate a 95% confidence interval on our sample mean as follows:

We can interpret this as meaning ‘We are 95% confident that the actual mean is within this range.’

Some confusion arises at this point between the SD and the standard error. The SD is a measure of variation in the sample. The range x ¯ ± ( 1.96 × SD ) will normally contain 95% of all your data. It can be used to illustrate the spread of the data and shows what values are likely. In contrast, standard error tells you about the precision of the mean and is used to calculate confidence intervals.

One straightforward way to compare two samples is to use confidence intervals. If we calculate the mean height of two groups and find that the 95% confidence intervals do not overlap, this can be taken as evidence of a difference between the two means. This method of statistical inference is reasonably intuitive and can be used in many situations. 2 Many journals, however, prefer to report inferential statistics using p -values.

Inference testing using a null hypothesis

In 1925, the British statistician R.A. Fisher described a technique for comparing groups using a null hypothesis , a method which has dominated statistical comparison ever since. The technique itself is rather straightforward, but often gets lost in the mechanics of how it is done. To illustrate, imagine we want to compare the HR of two different groups of people. We take a random sample from each group, which we call our data. Then:

  • (i) Assume that both samples came from the same group. This is our ‘null hypothesis’.
  • (ii) Calculate the probability that an experiment would give us these data, assuming that the null hypothesis is true. We express this probability as a p- value, a number between 0 and 1, where 0 is ‘impossible’ and 1 is ‘certain’.
  • (iii) If the probability of the data is low, we reject the null hypothesis and conclude that there must be a difference between the two groups.

Formally, we can define a p- value as ‘the probability of finding the observed result or a more extreme result, if the null hypothesis were true.’ Standard practice is to set a cut-off at p <0.05 (this cut-off is termed the alpha value). If the null hypothesis were true, a result such as this would only occur 5% of the time or less; this in turn would indicate that the null hypothesis itself is unlikely. Fisher described the process as follows: ‘Set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.’ 3 This probably remains the most succinct description of the procedure.

A question which often arises at this point is ‘Why do we use a null hypothesis?’ The simple answer is that it is easy: we can readily describe what we would expect of our data under a null hypothesis, we know how data would behave, and we can readily work out the probability of getting the result that we did. It therefore makes a very simple starting point for our probability assessment. All probabilities require a set of starting conditions, in much the same way that measuring the distance to London needs a starting point. The null hypothesis can be thought of as an easy place to put the start of your ruler.

If a null hypothesis is rejected, an alternate hypothesis must be adopted in its place. The null and alternate hypotheses must be mutually exclusive, but must also between them describe all situations. If a null hypothesis is ‘no difference exists’ then the alternate should be simply ‘a difference exists’.

Hypothesis testing in practice

The components of a hypothesis test can be readily described using the acronym GOST: identify the Groups you wish to compare; define the Outcome to be measured; collect and Summarise the data; then evaluate the likelihood of the null hypothesis, using a Test statistic .

When considering groups, think first about how many. Is there just one group being compared against an audit standard, or are you comparing one group with another? Some studies may wish to compare more than two groups. Another situation may involve a single group measured at different points in time, for example before or after a particular treatment. In this situation each participant is compared with themselves, and this is often referred to as a ‘paired’ or a ‘repeated measures’ design. It is possible to combine these types of groups—for example a researcher may measure arterial BP on a number of different occasions in five different groups of patients. Such studies can be difficult, both to analyse and interpret.

In other studies we may want to see how a continuous variable (such as age or height) affects the outcomes. These techniques involve regression analysis, and are beyond the scope of this article.

The outcome measures are the data being collected. This may be a continuous measure, such as temperature or BMI, or it may be a categorical measure, such as ASA status or surgical specialty. Often, inexperienced researchers will strive to collect lots of outcome measures in an attempt to find something that differs between the groups of interest; if this is done, a ‘primary outcome measure’ should be identified before the research begins. In addition, the results of any hypothesis tests will need to be corrected for multiple measures.

The summary and the test statistic will be defined by the type of data that have been collected. The test statistic is calculated then transformed into a p- value using tables or software. It is worth looking at two common tests in a little more detail: the χ 2 test, and the t -test.

Categorical data: the χ 2 test

The χ 2 test of independence is a test for comparing categorical outcomes in two or more groups. For example, a number of trials have compared surgical site infections in patients who have been given different concentrations of oxygen perioperatively. In the PROXI trial, 4 685 patients received oxygen 80%, and 701 patients received oxygen 30%. In the 80% group there were 131 infections, while in the 30% group there were 141 infections. In this study, the groups were oxygen 80% and oxygen 30%, and the outcome measure was the presence of a surgical site infection.

The summary is a table ( Table 1 ), and the hypothesis test compares this table (the ‘observed’ table) with the table that would be expected if the proportion of infections in each group was the same (the ‘expected’ table). The test statistic is χ 2 , from which a p- value is calculated. In this instance the p -value is 0.64, which means that results like this would occur 64% of the time if the null hypothesis were true. We thus have no evidence to reject the null hypothesis; the observed difference probably results from sampling variation rather than from an inherent difference between the two groups.

Table 1

Summary of the results of the PROXI trial. Figures are numbers of patients.

Continuous data: the t- test

The t- test is a statistical method for comparing means, and is one of the most widely used hypothesis tests. Imagine a study where we try to see if there is a difference in the onset time of a new neuromuscular blocking agent compared with suxamethonium. We could enlist 100 volunteers, give them a general anaesthetic, and randomise 50 of them to receive the new drug and 50 of them to receive suxamethonium. We then time how long it takes (in seconds) to have ideal intubation conditions, as measured by a quantitative nerve stimulator. Our data are therefore a list of times. In this case, the groups are ‘new drug’ and suxamethonium, and the outcome is time, measured in seconds. This can be summarised by using means; the hypothesis test will compare the means of the two groups, using a p- value calculated from a ‘ t statistic’. Hopefully it is becoming obvious at this point that the test statistic is usually identified by a letter, and this letter is often cited in the name of the test.

The t -test comes in a number of guises, depending on the comparison being made. A single sample can be compared with a standard (Is the BMI of school leavers in this town different from the national average?); two samples can be compared with each other, as in the example above; or the same study subjects can be measured at two different times. The latter case is referred to as a paired t- test, because each participant provides a pair of measurements—such as in a pre- or postintervention study.

A large number of methods for testing hypotheses exist; the commonest ones and their uses are described in Table 2 . In each case, the test can be described by detailing the groups being compared ( Table 2 , columns) the outcome measures (rows), the summary, and the test statistic. The decision to use a particular test or method should be made during the planning stages of a trial or experiment. At this stage, an estimate needs to be made of how many test subjects will be needed. Such calculations are described in detail elsewhere. 5

Table 2

The principle types of hypothesis test. Tests comparing more than two samples can indicate that one group differs from the others, but will not identify which. Subsequent ‘post hoc’ testing is required if a difference is found.

Controversies surrounding hypothesis testing

Although hypothesis tests have been the basis of modern science since the middle of the 20th century, they have been plagued by misconceptions from the outset; this has led to what has been described as a crisis in science in the last few years: some journals have gone so far as to ban p -value s outright. 6 This is not because of any flaw in the concept of a p -value, but because of a lack of understanding of what they mean.

Possibly the most pervasive misunderstanding is the belief that the p- value is the chance that the null hypothesis is true, or that the p- value represents the frequency with which you will be wrong if you reject the null hypothesis (i.e. claim to have found a difference). This interpretation has frequently made it into the literature, and is a very easy trap to fall into when discussing hypothesis tests. To avoid this, it is important to remember that the p- value is telling us something about our sample , not about the null hypothesis. Put in simple terms, we would like to know the probability that the null hypothesis is true, given our data. The p- value tells us the probability of getting these data if the null hypothesis were true, which is not the same thing. This fallacy is referred to as ‘flipping the conditional’; the probability of an outcome under certain conditions is not the same as the probability of those conditions given that the outcome has happened.

A useful example is to imagine a magic trick in which you select a card from a normal deck of 52 cards, and the performer reveals your chosen card in a surprising manner. If the performer were relying purely on chance, this would only happen on average once in every 52 attempts. On the basis of this, we conclude that it is unlikely that the magician is simply relying on chance. Although simple, we have just performed an entire hypothesis test. We have declared a null hypothesis (the performer was relying on chance); we have even calculated a p -value (1 in 52, ≈0.02); and on the basis of this low p- value we have rejected our null hypothesis. We would, however, be wrong to suggest that there is a probability of 0.02 that the performer is relying on chance—that is not what our figure of 0.02 is telling us.

To explore this further we can create two populations, and watch what happens when we use simulation to take repeated samples to compare these populations. Computers allow us to do this repeatedly, and to see what p- value s are generated (see Supplementary online material). 7 Fig 1 illustrates the results of 100,000 simulated t -tests, generated in two set of circumstances. In Fig 1 a , we have a situation in which there is a difference between the two populations. The p- value s cluster below the 0.05 cut-off, although there is a small proportion with p >0.05. Interestingly, the proportion of comparisons where p <0.05 is 0.8 or 80%, which is the power of the study (the sample size was specifically calculated to give a power of 80%).

Figure 1

The p- value s generated when 100,000 t -tests are used to compare two samples taken from defined populations. ( a ) The populations have a difference and the p- value s are mostly significant. ( b ) The samples were taken from the same population (i.e. the null hypothesis is true) and the p- value s are distributed uniformly.

Figure 1 b depicts the situation where repeated samples are taken from the same parent population (i.e. the null hypothesis is true). Somewhat surprisingly, all p- value s occur with equal frequency, with p <0.05 occurring exactly 5% of the time. Thus, when the null hypothesis is true, a type I error will occur with a frequency equal to the alpha significance cut-off.

Figure 1 highlights the underlying problem: when presented with a p -value <0.05, is it possible with no further information, to determine whether you are looking at something from Fig 1 a or Fig 1 b ?

Finally, it cannot be stressed enough that although hypothesis testing identifies whether or not a difference is likely, it is up to us as clinicians to decide whether or not a statistically significant difference is also significant clinically.

Hypothesis testing: what next?

As mentioned above, some have suggested moving away from p -values, but it is not entirely clear what we should use instead. Some sources have advocated focussing more on effect size; however, without a measure of significance we have merely returned to our original problem: how do we know that our difference is not just a result of sampling variation?

One solution is to use Bayesian statistics. Up until very recently, these techniques have been considered both too difficult and not sufficiently rigorous. However, recent advances in computing have led to the development of Bayesian equivalents of a number of standard hypothesis tests. 8 These generate a ‘Bayes Factor’ (BF), which tells us how more (or less) likely the alternative hypothesis is after our experiment. A BF of 1.0 indicates that the likelihood of the alternate hypothesis has not changed. A BF of 10 indicates that the alternate hypothesis is 10 times more likely than we originally thought. A number of classifications for BF exist; greater than 10 can be considered ‘strong evidence’, while BF greater than 100 can be classed as ‘decisive’.

Figures such as the BF can be quoted in conjunction with the traditional p- value, but it remains to be seen whether they will become mainstream.

Declaration of interest

The author declares that they have no conflict of interest.

The associated MCQs (to support CME/CPD activity) will be accessible at www.bjaed.org/cme/home by subscribers to BJA Education .

Jason Walker FRCA FRSS BSc (Hons) Math Stat is a consultant anaesthetist at Ysbyty Gwynedd Hospital, Bangor, Wales, and an honorary senior lecturer at Bangor University. He is vice chair of his local research ethics committee, and an examiner for the Primary FRCA.

Matrix codes: 1A03, 2A04, 3J03

Supplementary data to this article can be found online at https://doi.org/10.1016/j.bjae.2019.03.006 .

Supplementary material

The following is the Supplementary data to this article:

IMAGES

  1. Ppt One Sample Tests Of Hypothesis Powerpoint

    example of a medical hypothesis

  2. Forming a Good Hypothesis for Scientific Research

    example of a medical hypothesis

  3. What is Hypothesis? Functions- Characteristics-types-Criteria

    example of a medical hypothesis

  4. Hypothesis Testing Solved Examples(Questions and Solutions)

    example of a medical hypothesis

  5. Hypothesis Testing Assignment Help

    example of a medical hypothesis

  6. PPT

    example of a medical hypothesis

VIDEO

  1. Testing a Hypothesis About The mean

  2. Hypothesis in Research

  3. Biostats

  4. Welcome to Hypothesis Testing Using Medical Statistics

  5. What is the Allergy Hygiene Hypothesis?

  6. PRACTICAL RESEARCH 2

COMMENTS

  1. Medical Hypotheses

    Medical Hypotheses is a forum for ideas in medicine and related biomedical sciences. It will publish interesting and important theoretical papers that foster the diversity and debate upon which the scientific process thrives. The Aims and Scope of Medical Hypotheses are no different now from what was proposed by the founder of the journal, the ...

  2. Medical hypotheses: A clinician's guide to publication

    1. Introduction. A medical hypothesis article has two main aims: to serve as a forum for theoretical work in medicine; and to facilitate the publication of potentially radical ideas. Medical hypotheses are particularly important in a field such as integrative medicine. Not only do traditional theoretical concepts found in complementary medicine ...

  3. Hypothesis: Definition, Examples, and Types

    A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process. Consider a study designed to examine the relationship between sleep deprivation and test ...

  4. How to Write a Strong Hypothesis

    Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.

  5. Scientific Hypotheses: Writing, Promoting, and Predicting Implications

    A snapshot analysis of citation activity of hypothesis articles may reveal interest of the global scientific community towards their implications across various disciplines and countries. As a prime example, Strachan's hygiene hypothesis, published in 1989,10 is still attracting numerous citations on Scopus, the largest bibliographic database ...

  6. Formulating Hypotheses for Different Study Designs

    An example of this is the hygiene hypothesis proposing an inverse relationship between infections in early life and allergies or autoimmunity in adulthood. ... backed by preliminary evidence, and has positive ethical and clinical implications. General medical journals might consider publishing hypotheses as a specific article type to enable ...

  7. Hypothesis Testing, P Values, Confidence Intervals, and Significance

    Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting ...

  8. Hypothesis Testing

    Hypothesis testing is the process used to evaluate the strength of evidence from the sample and provides a framework for making determinations related to the population, ie, it provides a method for understanding how reliably one can extrapolate observed findings in a sample under study to the larger population from which the sample was drawn ...

  9. Medical Hypotheses

    Medical Hypotheses is a not-conventionally-peer-reviewed medical journal published by Elsevier.It was originally intended as a forum for unconventional ideas without the traditional filter of scientific peer review, "as long as (the ideas) are coherent and clearly expressed" in order to "foster the diversity and debate upon which the scientific process thrives."

  10. Hypothesis Testing

    Hypothesis testing is the method for determining the probability of an observed event that occurs only by chance. If chance were not the cause of an event, then something else must have been the cause, such as the treatment having had an effect on the observed event (the outcome) that was measured. This process of testing a hypothesis is at the ...

  11. Hypothesis Testing

    Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).

  12. Toward Hypothesis-Driven Medical Education Research: Task Fo ...

    Three members of the task force (R.-M.E.F., R.S., and C.B.W.) revised the priorities by redefining them more narrowly in the form of a problem to be addressed, a hypothesis to be tested, a proposal for study, and outcomes to be assessed. This process initially resulted in an expansion of the list from 17 to 19 priorities.

  13. How Doctors Generate Diagnostic Hypotheses: A Study of ...

    Background In medical practice, diagnostic hypotheses are often made by physicians in the first moments of contact with patients; sometimes even before they report their symptoms. We propose that generation of diagnostic hypotheses in this context is the result of cognitive processes subserved by brain mechanisms that are similar to those involved in naming objects or concepts in everyday life.

  14. Hypothesis-generating research and predictive medicine

    In the classic, hypothesis-testing paradigm, clinicians gather background information including chief complaint, 2 medical and family history, and physical examination, and use these data to formulate the differential diagnosis, which is a set of potential medical diagnoses that could explain the patient's signs and symptoms. Then, the ...

  15. Medical Hypotheses

    Monthly internet usage of Medical Hypotheses run at an average of about 26000 papers downloaded per month. An IF of 1.3 means that Medical Hypotheses has now entered the mainstream level of ...

  16. Hypothesis Testing in Medical Research: a Key Statistical Application

    Abstract. p>"Hypothesis testing" is an integral and most important component of research methodology, in all researches, whether in medical sciences, social sciences or any such allied field ...

  17. Quick Guide to Biostatistics in Clinical Research: Hypothesis ...

    The alternative hypothesis proposed in medical research may be one-tailed or two-tailed. A one-tailed alternative hypothesis would predict the direction of the effect. ... For example, null hypothesis (H 0): The patients who take the study drug after a heart attack did not have a better chance of not having a second heart attack over the next ...

  18. Medical Hypotheses

    Medical Hypothesis. 2008: An abnormal high trans-lamina cribrosa pressure: a missing link between Alzheimer's disease and glaucoma? Clinical Neurology and Neurosurgery. ... For example, papyri stored in royal libraries are more likely to be preserved than those in a provincial temple, yet the contents of the latter may be more representative ...

  19. Probability, clinical decision making and hypothesis testing

    The present paper attempts to put the P value in proper perspective by explaining different types of probabilities, their role in clinical decision making, medical research and hypothesis testing. Keywords: Hypothesis testing, P value, Probability. The clinician who wishes to remain abreast with the results of medical research needs to develop ...

  20. Hypothesis tests

    A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a 'p-value', on the basis of which a decision is made about the truth of the hypothesis under investigation.All of the routine statistical 'tests' used in research—t-tests, χ 2 tests, Mann-Whitney tests, etc.—are all ...

  21. 4 Examples of Hypothesis Testing in Real Life

    Example 1: Biology. Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals. For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than ...