U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • AJNR Am J Neuroradiol
  • v.34(9); 2013 Sep

Logo of ajnr

The Scientific Method: A Need for Something Better?

Here is the last part of the triptych that started with the “Perspectives” on brainstorming that was followed by the one on verbal overshadowing. I have decided to keep this for last because it deals with and in many ways attempts to debunk the use of the scientific method as the Holy Grail of research. Needless to say, the topic is controversial and will anger some.

In the “natural sciences,” advances occur through research that employs the scientific method. Just imagine trying to publish an original investigation or getting funds for a project without using it! Although research in the pure (fundamental) sciences (eg, biology, physics, and chemistry) must adhere to it, investigations pertaining to soft (a pejorative term) sciences (eg, sociology, economics, and anthropology) do not use it and yet produce valid ideas important enough to be published in peer-reviewed journals and even win Nobel Prizes.

The scientific method is better thought of as a set of “methods” or different techniques used to prove or disprove 1 or more hypotheses. A hypothesis is a proposed explanation for observed phenomena. These phenomena are, in general, empirical—that is, they are gathered by observation and/or experimentation. “Hypothesis” is a term often confused with “theory.” A theory is the end result of a previously tested hypothesis, meaning a proved set of principles that explain observed phenomena. Thus, a hypothesis is sometimes called a “working hypothesis,” to avoid this confusion. A working hypothesis needs to be proved or disproved by investigation. The entire approach employed to validate a hypothesis is more broadly called the “hypothetico-deductivism” method. Not all hypotheses are proved by empirical testing, and most of what we know and accept as truth about the economy and ancient civilizations is solely based on … just observation and thoughts. Conversely, the deep thinkers in the non-natural disciplines see many things wrong with the scientific method because it does not entirely reflect the chaotic environment that we live in—that is, the scientific method is rigid and constrained in its design and produces results that are isolated from real environments and that only address specific issues.

One of the most important features of the scientific method is its repeatability. The experiments performed to prove a working hypothesis must clearly record all details so that others may replicate them and eventually allow the hypothesis to become widely accepted. Objectivity must be used in experiments to reduce bias. “Bias” refers to the inclination to favor one perspective over others. The opposite of bias is “neutrality,” and all experiments (and their peer review) need to be devoid of bias and be neutral. In medicine, bias is also a part of conflict of interest and produces corrupt results. In medicine, conflict of interest is often due to relationships with the pharmaceutical/device industries. The American Journal of Neuroradiology ( AJNR ), as do most other serious journals, requires that contributors fill out the standard disclosure form regarding conflict of interest proposed by the International Committee of Medical Journal Editors, and it publishes these at the end of articles. 1

Like many other scientific advances, the scientific method originated in the Muslim world. About 1000 years ago, the Iraqi mathematician Ibn al-Haytham was already using it. In the Western world, the scientific method was first welcomed by astronomers such as Galileo and Kepler, and after the 17th century, its use became widespread. As we now know it, the scientific method dates only from the 1930s. The first step in the scientific method is observation from which one formulates a question. From that question, the hypothesis is generated. A hypothesis must be phrased in a way that it can be proved or disproved (“falsifiable”). The so-called “null hypothesis” represents the default position. For example, if you are trying to prove the relationship between 2 phenomena, the null hypothesis may be a statement that there is no relationship between the observed phenomena. The next step is to test the hypothesis via 1 or more experiments. The best experiments, at least in medicine, are those that are blinded and accompanied by control groups (not submitted to the same experiments). Third is the analysis of the data obtained. The results may support the working hypothesis or “falsify” (disprove) it, leading to the creation of a new hypothesis again to be tested scientifically. Not surprising, the structure of abstracts and articles published in AJNR and other scientific journals reflects the 4 steps in the scientific method (Background and Purpose, Materials and Methods, Results, and Conclusions). Another way in which our journals adhere to the scientific method is peer review—that is, every part of the article must be open to review by others who look for possible mistakes and biases. The last part of the modern scientific method is publication.

Despite its rigid structure, the scientific method still depends on the most human capabilities: creativity, imagination, and intelligence; and without these, it cannot exist. Documentation of experiments is always flawed because everything cannot be recorded. One of the most significant problems with the scientific method is the lack of importance placed on observations that lie outside of the main hypothesis (related to lateral thinking). No matter how carefully you record what you observe, if these observations are not also submitted to the method, they cannot be accepted. This is a common problem found by paleontologists who really have no way of testing their observations; yet many of their observations (primary and secondary) are accepted as valid. Also, think about the works of Sigmund Freud that led to improved understanding of psychological development and related disorders; most were based just on observations. Many argue that because the scientific method discards observations extemporaneous to it, this actually limits the growth of scientific knowledge. Because a hypothesis only reflects current knowledge, data that contradict it may be discarded only to later become important.

Because the scientific method is basically a “trial-and-error” scheme, progress is slow. In older disciplines, there may not have been enough knowledge to develop good theories, which led to the creation of bad theories that have resulted in significant delay of progress. It can also be said that progress is many times fortuitous; while one is trying to test a hypothesis, completely unexpected and often accidental results lead to new discoveries. Just imagine how many important data have been discarded because the results did not fit the initial hypothesis.

A lot of time goes into the trial-and-error phase of an experiment, so why do it when we already know perfectly well what to expect from the results? Just peruse AJNR , and most proposed hypotheses are proved true! Hypotheses proved false are never sexy, and journals are generally not interested in publishing such studies. In the scientific method, unexpected results are not trusted, while expected and understood ones are immediately trusted. The fact that we do “this” to observe “that” may be very misleading in the long run. 2 However, in reality, many controversies could have been avoided if instead of calling it “The Scientific Method,” we simply would have called it “A Scientific Method,” leaving space for development of other methods and acceptance of those used by other disciplines. Some argue that it was called “scientific” because the ones who invented it were arrogant and pretentious.

The term “science” comes from the Latin “scientia,” meaning knowledge. Aristotle equated science with reliability because it could be rationally and logically explained. Curiously, science was, for many centuries, a part of the greater discipline of philosophy. In the 14th and 15th centuries, “natural philosophy” was born; by the start of the 17th century, it had become “natural sciences.” It was during the 16th century that Francis Bacon popularized the inductive reasoning methods that would thereafter become known as the scientific method. Western reasoning is based on our faith in truth, many times absolute truth. Beginning assumptions that then become hypotheses are subjectively accepted as being true; thus, the scientific method took longer to be accepted by Eastern civilizations whose concept of truth differs from ours. It is possible that the scientific method is the greatest unifying activity of the human race. Although medicine and philosophy have been separated from each other by centuries, there is a current trend to unite both again.

The specialty of psychiatry did not become “scientific” until the widespread use of medications and therapeutic procedures offered the possibility of being examined by the scientific method. In the United States and Europe, the number of psychoanalysts has progressively declined; and most surprising, philosophers are taking their place. 3 The benefits philosophy offers are that it puts patients first, supports new models of service delivery, and reconnects researchers in different disciplines (it is the advances in neurosciences that demand answers to the more abstract questions that define a human “being”). Philosophy provides psychiatrists with much-needed generic thinking skills; and because philosophy is more widespread than psychiatry and recognizes its importance, it provides a more universal and open environment. 4 This is an example of a soft discipline merging with a hard one (medicine) for the improvement of us all. However, this is not the case in other areas.

For about 10 years, the National Science Foundation has sponsored the “Empirical Implications of Theoretical Models” initiative in political science. 5 A major complaint is that most political science literature consists of noncumulative empirical studies and very few have a “formal” component. The formal part refers to accumulation of data and use of statistics to prove or disprove an observation (thus, the use of the scientific method). For academics in political science, the problem is that some journals no longer accept publications that are based on unproven theoretic models, and this poses a significant problem to the “non-natural” sciences. 6 In this case, the social sciences try to emulate the “hard” sciences, and this may not be the best approach. These academics and others think that using the scientific method in such instances emphasizes predictions rather than ideas, focuses learning on material activities rather than on a deep understanding of a subject, and lacks epistemic framing relevant to a discipline. 7 So, is there a better approach than the scientific method?

A provocative method called “model-based inquiry” respects the precepts of the scientific method (that knowledge is testable, revisable, explanatory, conjectural, and generative). 7 While the scientific method attempts to find patterns in natural phenomena, the model-based inquiry method attempts to develop defensible explanations. This new system sees models as tools for explanations and not explanations proper and allows going beyond data; thus, new hypotheses, new concepts, and new predictions can be generated at any point along the inquiry, something not allowed within the rigidity of the traditional scientific method.

In a different approach, the National Science Foundation charged scientists, philosophers, and educators from the University of California at Berkeley to come up with a “dynamic” alternative to the scientific method. 8 The proposed method accepts input from serendipitous occurrences and emphasizes that science is a dynamic process engaging many individuals and activities. Unlike the traditional scientific method, this new one accepts data that do not fit into organized and neat conclusions. Science is about discovery, not the justifications it seems to emphasize. 9

Obviously, I am not proposing that we immediately get rid of the traditional scientific method. Until another one is proved better, it should continue to be the cornerstone of our endeavors. However, in a world where information will grow more in the next 50 years than in the past 400 years, where the Internet has 1 trillion links, where 300 billion e-mail messages are generated every day, and 200 million Tweets occur daily, ask yourself whether it is still valid to use the same scientific method that was invented nearly 400 years ago?

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

scientific method hypothesis alternative

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

scientific method hypothesis alternative

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Biology LibreTexts

1.1: The Scientific Method

  • Last updated
  • Save as PDF
  • Page ID 24832

  • Laci M. Gerhart-Barley
  • College of Biological Sciences - UC Davis

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Biologists, and other scientists, study the world using a formal process referred to as the scientific method . The scientific method was first documented by Sir Francis Bacon (1561–1626) of England, and can be applied to almost all fields of study. The scientific method is founded upon observation, which then leads to a question and the development of a hypothesis which answers that question. The scientist can then design an experiment to test the proposed hypothesis, and makes a prediction for the outcome of the experiment, if the proposed hypothesis is true. In the following sections, we will use a simple example of the scientific method, based on a simple observation of the classroom being too warm.

Proposing a Hypothesis

A hypothesis is one possible answer to the question that arises from observations. In our example, the observation is that the classroom is too warm, and the question taht arises from that observation is why the classroom is too warm. One (of many) hypotheses is “The classroom is warm because no one turned on the air conditioning.” Another hypothesis could be “The classroom is warm because the heating is set too high."

Once a hypothesis has been developed, the scientist then makes a prediction, which is similar to a hypothesis, but generally follows the format of “If . . . then . . . .” In our example, a prediction arising from the first hypothesis might be, “ If the air-conditioning is turned on, then the classroom will no longer be too warm.” The initial steps of the scientific method (observation to prediction) are outlined in Figure 1.1.1.

Fig1_1_1.png

Testing a Hypothesis

A valid hypothesis must be testable. It should also be falsifiable, meaning that it can be disproven by experimental results. Importantly, science does not claim to “prove” anything because scientific understandings are always subject to modification with further information. To test a hypothesis, a researcher will conduct one or more experiments designed to eliminate one or more of the hypotheses. Each experiment will have one or more variables and one or more controls. A variable is any part of the experiment that can vary or change during the experiment. The control group contains every feature of the experimental group except it is not given the manipulation that tests the hypothesis. Therefore, if the results of the experimental group differ from the control group, the difference must be due to the hypothesized manipulation, rather than some outside factor. Look for the variables and controls in the examples that follow. To test the first hypothesis, the student would find out if the air conditioning is on. If the air conditioning is turned on but does not work, then the hypothesis that the air conditioning was not turned on should be rejected. To test the second hypothesis, the student could check the settings of the classroom heating unit. If the heating unit is set at an appropriate temperature, then this hypothesis should also be rejected. Each hypothesis should be tested by carrying out appropriate experiments. Be aware that rejecting one hypothesis does not determine whether or not the other hypotheses can be accepted; it simply eliminates one hypothesis that is not valid. Using the scientific method, the hypotheses that are inconsistent with experimental data are rejected.

While this “warm classroom” example is based on observational results, other hypotheses and experiments might have clearer controls. For instance, a student might attend class on Monday and realize they had difficulty concentrating on the lecture. One observation to explain this occurrence might be, “When I eat breakfast before class, I am better able to pay attention.” The student could then design an experiment with a control to test this hypothesis.

Exercise \(\PageIndex{1}\)

In the example below, the scientific method is used to solve an everyday problem. Order the scientific method steps (numbered items) with the process of solving the everyday problem (lettered items). Based on the results of the experiment, is the hypothesis correct? If it is incorrect, propose some alternative hypotheses.

  • Observation
  • Hypothesis (answer)
  • The car battery is dead.
  • If the battery is dead, then the headlights also will not turn on.
  • My car won't start.
  • I turn on the headlights.
  • The headlights work.
  • Why does the car not start?

C, F, A, B, D, E

The scientific method may seem overly rigid and structured; however, there is flexibility. Often, the process of science is not as linear as the scientific method suggests and experimental results frequently inspire a new approach, highlight patterns or themes in the study system, or generate entirely new and different observations and questions. In our warm classroom example, testing the air conditioning hypothesis could, for example, unearth evidence of faulty wiring in the classroom. This observation could then inspire additional questions related to other classroom electrical concerns such as inconsistent wireless internet access, faulty audio/visual equipment functioning, non-functional power outlets, flickering lighting, etc. Notice, too, that the scientific method can be applied to solving problems that aren’t necessarily scientific in nature.

This section was adapted from OpenStax Chapter 1:2 The Process of Science

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Scientific Method

Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories. How these are carried out in detail can vary greatly, but characteristics like these have been looked to as a way of demarcating scientific activity from non-science, where only enterprises which employ some canonical form of scientific method or methods should be considered science (see also the entry on science and pseudo-science ). Others have questioned whether there is anything like a fixed toolkit of methods which is common across science and only science. Some reject privileging one view of method as part of rejecting broader views about the nature of science, such as naturalism (Dupré 2004); some reject any restriction in principle (pluralism).

Scientific method should be distinguished from the aims and products of science, such as knowledge, predictions, or control. Methods are the means by which those goals are achieved. Scientific method should also be distinguished from meta-methodology, which includes the values and justifications behind a particular characterization of scientific method (i.e., a methodology) — values such as objectivity, reproducibility, simplicity, or past successes. Methodological rules are proposed to govern method and it is a meta-methodological question whether methods obeying those rules satisfy given values. Finally, method is distinct, to some degree, from the detailed and contextual practices through which methods are implemented. The latter might range over: specific laboratory techniques; mathematical formalisms or other specialized languages used in descriptions and reasoning; technological or other material means; ways of communicating and sharing results, whether with other scientists or with the public at large; or the conventions, habits, enforced customs, and institutional controls over how and what science is carried out.

While it is important to recognize these distinctions, their boundaries are fuzzy. Hence, accounts of method cannot be entirely divorced from their methodological and meta-methodological motivations or justifications, Moreover, each aspect plays a crucial role in identifying methods. Disputes about method have therefore played out at the detail, rule, and meta-rule levels. Changes in beliefs about the certainty or fallibility of scientific knowledge, for instance (which is a meta-methodological consideration of what we can hope for methods to deliver), have meant different emphases on deductive and inductive reasoning, or on the relative importance attached to reasoning over observation (i.e., differences over particular methods.) Beliefs about the role of science in society will affect the place one gives to values in scientific method.

The issue which has shaped debates over scientific method the most in the last half century is the question of how pluralist do we need to be about method? Unificationists continue to hold out for one method essential to science; nihilism is a form of radical pluralism, which considers the effectiveness of any methodological prescription to be so context sensitive as to render it not explanatory on its own. Some middle degree of pluralism regarding the methods embodied in scientific practice seems appropriate. But the details of scientific practice vary with time and place, from institution to institution, across scientists and their subjects of investigation. How significant are the variations for understanding science and its success? How much can method be abstracted from practice? This entry describes some of the attempts to characterize scientific method or methods, as well as arguments for a more context-sensitive approach to methods embedded in actual scientific practices.

1. Overview and organizing themes

2. historical review: aristotle to mill, 3.1 logical constructionism and operationalism, 3.2. h-d as a logic of confirmation, 3.3. popper and falsificationism, 3.4 meta-methodology and the end of method, 4. statistical methods for hypothesis testing, 5.1 creative and exploratory practices.

  • 5.2 Computer methods and the ‘new ways’ of doing science

6.1 “The scientific method” in science education and as seen by scientists

6.2 privileged methods and ‘gold standards’, 6.3 scientific method in the court room, 6.4 deviating practices, 7. conclusion, other internet resources, related entries.

This entry could have been given the title Scientific Methods and gone on to fill volumes, or it could have been extremely short, consisting of a brief summary rejection of the idea that there is any such thing as a unique Scientific Method at all. Both unhappy prospects are due to the fact that scientific activity varies so much across disciplines, times, places, and scientists that any account which manages to unify it all will either consist of overwhelming descriptive detail, or trivial generalizations.

The choice of scope for the present entry is more optimistic, taking a cue from the recent movement in philosophy of science toward a greater attention to practice: to what scientists actually do. This “turn to practice” can be seen as the latest form of studies of methods in science, insofar as it represents an attempt at understanding scientific activity, but through accounts that are neither meant to be universal and unified, nor singular and narrowly descriptive. To some extent, different scientists at different times and places can be said to be using the same method even though, in practice, the details are different.

Whether the context in which methods are carried out is relevant, or to what extent, will depend largely on what one takes the aims of science to be and what one’s own aims are. For most of the history of scientific methodology the assumption has been that the most important output of science is knowledge and so the aim of methodology should be to discover those methods by which scientific knowledge is generated.

Science was seen to embody the most successful form of reasoning (but which form?) to the most certain knowledge claims (but how certain?) on the basis of systematically collected evidence (but what counts as evidence, and should the evidence of the senses take precedence, or rational insight?) Section 2 surveys some of the history, pointing to two major themes. One theme is seeking the right balance between observation and reasoning (and the attendant forms of reasoning which employ them); the other is how certain scientific knowledge is or can be.

Section 3 turns to 20 th century debates on scientific method. In the second half of the 20 th century the epistemic privilege of science faced several challenges and many philosophers of science abandoned the reconstruction of the logic of scientific method. Views changed significantly regarding which functions of science ought to be captured and why. For some, the success of science was better identified with social or cultural features. Historical and sociological turns in the philosophy of science were made, with a demand that greater attention be paid to the non-epistemic aspects of science, such as sociological, institutional, material, and political factors. Even outside of those movements there was an increased specialization in the philosophy of science, with more and more focus on specific fields within science. The combined upshot was very few philosophers arguing any longer for a grand unified methodology of science. Sections 3 and 4 surveys the main positions on scientific method in 20 th century philosophy of science, focusing on where they differ in their preference for confirmation or falsification or for waiving the idea of a special scientific method altogether.

In recent decades, attention has primarily been paid to scientific activities traditionally falling under the rubric of method, such as experimental design and general laboratory practice, the use of statistics, the construction and use of models and diagrams, interdisciplinary collaboration, and science communication. Sections 4–6 attempt to construct a map of the current domains of the study of methods in science.

As these sections illustrate, the question of method is still central to the discourse about science. Scientific method remains a topic for education, for science policy, and for scientists. It arises in the public domain where the demarcation or status of science is at issue. Some philosophers have recently returned, therefore, to the question of what it is that makes science a unique cultural product. This entry will close with some of these recent attempts at discerning and encapsulating the activities by which scientific knowledge is achieved.

Attempting a history of scientific method compounds the vast scope of the topic. This section briefly surveys the background to modern methodological debates. What can be called the classical view goes back to antiquity, and represents a point of departure for later divergences. [ 1 ]

We begin with a point made by Laudan (1968) in his historical survey of scientific method:

Perhaps the most serious inhibition to the emergence of the history of theories of scientific method as a respectable area of study has been the tendency to conflate it with the general history of epistemology, thereby assuming that the narrative categories and classificatory pigeon-holes applied to the latter are also basic to the former. (1968: 5)

To see knowledge about the natural world as falling under knowledge more generally is an understandable conflation. Histories of theories of method would naturally employ the same narrative categories and classificatory pigeon holes. An important theme of the history of epistemology, for example, is the unification of knowledge, a theme reflected in the question of the unification of method in science. Those who have identified differences in kinds of knowledge have often likewise identified different methods for achieving that kind of knowledge (see the entry on the unity of science ).

Different views on what is known, how it is known, and what can be known are connected. Plato distinguished the realms of things into the visible and the intelligible ( The Republic , 510a, in Cooper 1997). Only the latter, the Forms, could be objects of knowledge. The intelligible truths could be known with the certainty of geometry and deductive reasoning. What could be observed of the material world, however, was by definition imperfect and deceptive, not ideal. The Platonic way of knowledge therefore emphasized reasoning as a method, downplaying the importance of observation. Aristotle disagreed, locating the Forms in the natural world as the fundamental principles to be discovered through the inquiry into nature ( Metaphysics Z , in Barnes 1984).

Aristotle is recognized as giving the earliest systematic treatise on the nature of scientific inquiry in the western tradition, one which embraced observation and reasoning about the natural world. In the Prior and Posterior Analytics , Aristotle reflects first on the aims and then the methods of inquiry into nature. A number of features can be found which are still considered by most to be essential to science. For Aristotle, empiricism, careful observation (but passive observation, not controlled experiment), is the starting point. The aim is not merely recording of facts, though. For Aristotle, science ( epistêmê ) is a body of properly arranged knowledge or learning—the empirical facts, but also their ordering and display are of crucial importance. The aims of discovery, ordering, and display of facts partly determine the methods required of successful scientific inquiry. Also determinant is the nature of the knowledge being sought, and the explanatory causes proper to that kind of knowledge (see the discussion of the four causes in the entry on Aristotle on causality ).

In addition to careful observation, then, scientific method requires a logic as a system of reasoning for properly arranging, but also inferring beyond, what is known by observation. Methods of reasoning may include induction, prediction, or analogy, among others. Aristotle’s system (along with his catalogue of fallacious reasoning) was collected under the title the Organon . This title would be echoed in later works on scientific reasoning, such as Novum Organon by Francis Bacon, and Novum Organon Restorum by William Whewell (see below). In Aristotle’s Organon reasoning is divided primarily into two forms, a rough division which persists into modern times. The division, known most commonly today as deductive versus inductive method, appears in other eras and methodologies as analysis/​synthesis, non-ampliative/​ampliative, or even confirmation/​verification. The basic idea is there are two “directions” to proceed in our methods of inquiry: one away from what is observed, to the more fundamental, general, and encompassing principles; the other, from the fundamental and general to instances or implications of principles.

The basic aim and method of inquiry identified here can be seen as a theme running throughout the next two millennia of reflection on the correct way to seek after knowledge: carefully observe nature and then seek rules or principles which explain or predict its operation. The Aristotelian corpus provided the framework for a commentary tradition on scientific method independent of science itself (cosmos versus physics.) During the medieval period, figures such as Albertus Magnus (1206–1280), Thomas Aquinas (1225–1274), Robert Grosseteste (1175–1253), Roger Bacon (1214/1220–1292), William of Ockham (1287–1347), Andreas Vesalius (1514–1546), Giacomo Zabarella (1533–1589) all worked to clarify the kind of knowledge obtainable by observation and induction, the source of justification of induction, and best rules for its application. [ 2 ] Many of their contributions we now think of as essential to science (see also Laudan 1968). As Aristotle and Plato had employed a framework of reasoning either “to the forms” or “away from the forms”, medieval thinkers employed directions away from the phenomena or back to the phenomena. In analysis, a phenomena was examined to discover its basic explanatory principles; in synthesis, explanations of a phenomena were constructed from first principles.

During the Scientific Revolution these various strands of argument, experiment, and reason were forged into a dominant epistemic authority. The 16 th –18 th centuries were a period of not only dramatic advance in knowledge about the operation of the natural world—advances in mechanical, medical, biological, political, economic explanations—but also of self-awareness of the revolutionary changes taking place, and intense reflection on the source and legitimation of the method by which the advances were made. The struggle to establish the new authority included methodological moves. The Book of Nature, according to the metaphor of Galileo Galilei (1564–1642) or Francis Bacon (1561–1626), was written in the language of mathematics, of geometry and number. This motivated an emphasis on mathematical description and mechanical explanation as important aspects of scientific method. Through figures such as Henry More and Ralph Cudworth, a neo-Platonic emphasis on the importance of metaphysical reflection on nature behind appearances, particularly regarding the spiritual as a complement to the purely mechanical, remained an important methodological thread of the Scientific Revolution (see the entries on Cambridge platonists ; Boyle ; Henry More ; Galileo ).

In Novum Organum (1620), Bacon was critical of the Aristotelian method for leaping from particulars to universals too quickly. The syllogistic form of reasoning readily mixed those two types of propositions. Bacon aimed at the invention of new arts, principles, and directions. His method would be grounded in methodical collection of observations, coupled with correction of our senses (and particularly, directions for the avoidance of the Idols, as he called them, kinds of systematic errors to which naïve observers are prone.) The community of scientists could then climb, by a careful, gradual and unbroken ascent, to reliable general claims.

Bacon’s method has been criticized as impractical and too inflexible for the practicing scientist. Whewell would later criticize Bacon in his System of Logic for paying too little attention to the practices of scientists. It is hard to find convincing examples of Bacon’s method being put in to practice in the history of science, but there are a few who have been held up as real examples of 16 th century scientific, inductive method, even if not in the rigid Baconian mold: figures such as Robert Boyle (1627–1691) and William Harvey (1578–1657) (see the entry on Bacon ).

It is to Isaac Newton (1642–1727), however, that historians of science and methodologists have paid greatest attention. Given the enormous success of his Principia Mathematica and Opticks , this is understandable. The study of Newton’s method has had two main thrusts: the implicit method of the experiments and reasoning presented in the Opticks, and the explicit methodological rules given as the Rules for Philosophising (the Regulae) in Book III of the Principia . [ 3 ] Newton’s law of gravitation, the linchpin of his new cosmology, broke with explanatory conventions of natural philosophy, first for apparently proposing action at a distance, but more generally for not providing “true”, physical causes. The argument for his System of the World ( Principia , Book III) was based on phenomena, not reasoned first principles. This was viewed (mainly on the continent) as insufficient for proper natural philosophy. The Regulae counter this objection, re-defining the aims of natural philosophy by re-defining the method natural philosophers should follow. (See the entry on Newton’s philosophy .)

To his list of methodological prescriptions should be added Newton’s famous phrase “ hypotheses non fingo ” (commonly translated as “I frame no hypotheses”.) The scientist was not to invent systems but infer explanations from observations, as Bacon had advocated. This would come to be known as inductivism. In the century after Newton, significant clarifications of the Newtonian method were made. Colin Maclaurin (1698–1746), for instance, reconstructed the essential structure of the method as having complementary analysis and synthesis phases, one proceeding away from the phenomena in generalization, the other from the general propositions to derive explanations of new phenomena. Denis Diderot (1713–1784) and editors of the Encyclopédie did much to consolidate and popularize Newtonianism, as did Francesco Algarotti (1721–1764). The emphasis was often the same, as much on the character of the scientist as on their process, a character which is still commonly assumed. The scientist is humble in the face of nature, not beholden to dogma, obeys only his eyes, and follows the truth wherever it leads. It was certainly Voltaire (1694–1778) and du Chatelet (1706–1749) who were most influential in propagating the latter vision of the scientist and their craft, with Newton as hero. Scientific method became a revolutionary force of the Enlightenment. (See also the entries on Newton , Leibniz , Descartes , Boyle , Hume , enlightenment , as well as Shank 2008 for a historical overview.)

Not all 18 th century reflections on scientific method were so celebratory. Famous also are George Berkeley’s (1685–1753) attack on the mathematics of the new science, as well as the over-emphasis of Newtonians on observation; and David Hume’s (1711–1776) undermining of the warrant offered for scientific claims by inductive justification (see the entries on: George Berkeley ; David Hume ; Hume’s Newtonianism and Anti-Newtonianism ). Hume’s problem of induction motivated Immanuel Kant (1724–1804) to seek new foundations for empirical method, though as an epistemic reconstruction, not as any set of practical guidelines for scientists. Both Hume and Kant influenced the methodological reflections of the next century, such as the debate between Mill and Whewell over the certainty of inductive inferences in science.

The debate between John Stuart Mill (1806–1873) and William Whewell (1794–1866) has become the canonical methodological debate of the 19 th century. Although often characterized as a debate between inductivism and hypothetico-deductivism, the role of the two methods on each side is actually more complex. On the hypothetico-deductive account, scientists work to come up with hypotheses from which true observational consequences can be deduced—hence, hypothetico-deductive. Because Whewell emphasizes both hypotheses and deduction in his account of method, he can be seen as a convenient foil to the inductivism of Mill. However, equally if not more important to Whewell’s portrayal of scientific method is what he calls the “fundamental antithesis”. Knowledge is a product of the objective (what we see in the world around us) and subjective (the contributions of our mind to how we perceive and understand what we experience, which he called the Fundamental Ideas). Both elements are essential according to Whewell, and he was therefore critical of Kant for too much focus on the subjective, and John Locke (1632–1704) and Mill for too much focus on the senses. Whewell’s fundamental ideas can be discipline relative. An idea can be fundamental even if it is necessary for knowledge only within a given scientific discipline (e.g., chemical affinity for chemistry). This distinguishes fundamental ideas from the forms and categories of intuition of Kant. (See the entry on Whewell .)

Clarifying fundamental ideas would therefore be an essential part of scientific method and scientific progress. Whewell called this process “Discoverer’s Induction”. It was induction, following Bacon or Newton, but Whewell sought to revive Bacon’s account by emphasising the role of ideas in the clear and careful formulation of inductive hypotheses. Whewell’s induction is not merely the collecting of objective facts. The subjective plays a role through what Whewell calls the Colligation of Facts, a creative act of the scientist, the invention of a theory. A theory is then confirmed by testing, where more facts are brought under the theory, called the Consilience of Inductions. Whewell felt that this was the method by which the true laws of nature could be discovered: clarification of fundamental concepts, clever invention of explanations, and careful testing. Mill, in his critique of Whewell, and others who have cast Whewell as a fore-runner of the hypothetico-deductivist view, seem to have under-estimated the importance of this discovery phase in Whewell’s understanding of method (Snyder 1997a,b, 1999). Down-playing the discovery phase would come to characterize methodology of the early 20 th century (see section 3 ).

Mill, in his System of Logic , put forward a narrower view of induction as the essence of scientific method. For Mill, induction is the search first for regularities among events. Among those regularities, some will continue to hold for further observations, eventually gaining the status of laws. One can also look for regularities among the laws discovered in a domain, i.e., for a law of laws. Which “law law” will hold is time and discipline dependent and open to revision. One example is the Law of Universal Causation, and Mill put forward specific methods for identifying causes—now commonly known as Mill’s methods. These five methods look for circumstances which are common among the phenomena of interest, those which are absent when the phenomena are, or those for which both vary together. Mill’s methods are still seen as capturing basic intuitions about experimental methods for finding the relevant explanatory factors ( System of Logic (1843), see Mill entry). The methods advocated by Whewell and Mill, in the end, look similar. Both involve inductive generalization to covering laws. They differ dramatically, however, with respect to the necessity of the knowledge arrived at; that is, at the meta-methodological level (see the entries on Whewell and Mill entries).

3. Logic of method and critical responses

The quantum and relativistic revolutions in physics in the early 20 th century had a profound effect on methodology. Conceptual foundations of both theories were taken to show the defeasibility of even the most seemingly secure intuitions about space, time and bodies. Certainty of knowledge about the natural world was therefore recognized as unattainable. Instead a renewed empiricism was sought which rendered science fallible but still rationally justifiable.

Analyses of the reasoning of scientists emerged, according to which the aspects of scientific method which were of primary importance were the means of testing and confirming of theories. A distinction in methodology was made between the contexts of discovery and justification. The distinction could be used as a wedge between the particularities of where and how theories or hypotheses are arrived at, on the one hand, and the underlying reasoning scientists use (whether or not they are aware of it) when assessing theories and judging their adequacy on the basis of the available evidence. By and large, for most of the 20 th century, philosophy of science focused on the second context, although philosophers differed on whether to focus on confirmation or refutation as well as on the many details of how confirmation or refutation could or could not be brought about. By the mid-20 th century these attempts at defining the method of justification and the context distinction itself came under pressure. During the same period, philosophy of science developed rapidly, and from section 4 this entry will therefore shift from a primarily historical treatment of the scientific method towards a primarily thematic one.

Advances in logic and probability held out promise of the possibility of elaborate reconstructions of scientific theories and empirical method, the best example being Rudolf Carnap’s The Logical Structure of the World (1928). Carnap attempted to show that a scientific theory could be reconstructed as a formal axiomatic system—that is, a logic. That system could refer to the world because some of its basic sentences could be interpreted as observations or operations which one could perform to test them. The rest of the theoretical system, including sentences using theoretical or unobservable terms (like electron or force) would then either be meaningful because they could be reduced to observations, or they had purely logical meanings (called analytic, like mathematical identities). This has been referred to as the verifiability criterion of meaning. According to the criterion, any statement not either analytic or verifiable was strictly meaningless. Although the view was endorsed by Carnap in 1928, he would later come to see it as too restrictive (Carnap 1956). Another familiar version of this idea is operationalism of Percy William Bridgman. In The Logic of Modern Physics (1927) Bridgman asserted that every physical concept could be defined in terms of the operations one would perform to verify the application of that concept. Making good on the operationalisation of a concept even as simple as length, however, can easily become enormously complex (for measuring very small lengths, for instance) or impractical (measuring large distances like light years.)

Carl Hempel’s (1950, 1951) criticisms of the verifiability criterion of meaning had enormous influence. He pointed out that universal generalizations, such as most scientific laws, were not strictly meaningful on the criterion. Verifiability and operationalism both seemed too restrictive to capture standard scientific aims and practice. The tenuous connection between these reconstructions and actual scientific practice was criticized in another way. In both approaches, scientific methods are instead recast in methodological roles. Measurements, for example, were looked to as ways of giving meanings to terms. The aim of the philosopher of science was not to understand the methods per se , but to use them to reconstruct theories, their meanings, and their relation to the world. When scientists perform these operations, however, they will not report that they are doing them to give meaning to terms in a formal axiomatic system. This disconnect between methodology and the details of actual scientific practice would seem to violate the empiricism the Logical Positivists and Bridgman were committed to. The view that methodology should correspond to practice (to some extent) has been called historicism, or intuitionism. We turn to these criticisms and responses in section 3.4 . [ 4 ]

Positivism also had to contend with the recognition that a purely inductivist approach, along the lines of Bacon-Newton-Mill, was untenable. There was no pure observation, for starters. All observation was theory laden. Theory is required to make any observation, therefore not all theory can be derived from observation alone. (See the entry on theory and observation in science .) Even granting an observational basis, Hume had already pointed out that one could not deductively justify inductive conclusions without begging the question by presuming the success of the inductive method. Likewise, positivist attempts at analyzing how a generalization can be confirmed by observations of its instances were subject to a number of criticisms. Goodman (1965) and Hempel (1965) both point to paradoxes inherent in standard accounts of confirmation. Recent attempts at explaining how observations can serve to confirm a scientific theory are discussed in section 4 below.

The standard starting point for a non-inductive analysis of the logic of confirmation is known as the Hypothetico-Deductive (H-D) method. In its simplest form, a sentence of a theory which expresses some hypothesis is confirmed by its true consequences. As noted in section 2 , this method had been advanced by Whewell in the 19 th century, as well as Nicod (1924) and others in the 20 th century. Often, Hempel’s (1966) description of the H-D method, illustrated by the case of Semmelweiss’ inferential procedures in establishing the cause of childbed fever, has been presented as a key account of H-D as well as a foil for criticism of the H-D account of confirmation (see, for example, Lipton’s (2004) discussion of inference to the best explanation; also the entry on confirmation ). Hempel described Semmelsweiss’ procedure as examining various hypotheses explaining the cause of childbed fever. Some hypotheses conflicted with observable facts and could be rejected as false immediately. Others needed to be tested experimentally by deducing which observable events should follow if the hypothesis were true (what Hempel called the test implications of the hypothesis), then conducting an experiment and observing whether or not the test implications occurred. If the experiment showed the test implication to be false, the hypothesis could be rejected. If the experiment showed the test implications to be true, however, this did not prove the hypothesis true. The confirmation of a test implication does not verify a hypothesis, though Hempel did allow that “it provides at least some support, some corroboration or confirmation for it” (Hempel 1966: 8). The degree of this support then depends on the quantity, variety and precision of the supporting evidence.

Another approach that took off from the difficulties with inductive inference was Karl Popper’s critical rationalism or falsificationism (Popper 1959, 1963). Falsification is deductive and similar to H-D in that it involves scientists deducing observational consequences from the hypothesis under test. For Popper, however, the important point was not the degree of confirmation that successful prediction offered to a hypothesis. The crucial thing was the logical asymmetry between confirmation, based on inductive inference, and falsification, which can be based on a deductive inference. (This simple opposition was later questioned, by Lakatos, among others. See the entry on historicist theories of scientific rationality. )

Popper stressed that, regardless of the amount of confirming evidence, we can never be certain that a hypothesis is true without committing the fallacy of affirming the consequent. Instead, Popper introduced the notion of corroboration as a measure for how well a theory or hypothesis has survived previous testing—but without implying that this is also a measure for the probability that it is true.

Popper was also motivated by his doubts about the scientific status of theories like the Marxist theory of history or psycho-analysis, and so wanted to demarcate between science and pseudo-science. Popper saw this as an importantly different distinction than demarcating science from metaphysics. The latter demarcation was the primary concern of many logical empiricists. Popper used the idea of falsification to draw a line instead between pseudo and proper science. Science was science because its method involved subjecting theories to rigorous tests which offered a high probability of failing and thus refuting the theory.

A commitment to the risk of failure was important. Avoiding falsification could be done all too easily. If a consequence of a theory is inconsistent with observations, an exception can be added by introducing auxiliary hypotheses designed explicitly to save the theory, so-called ad hoc modifications. This Popper saw done in pseudo-science where ad hoc theories appeared capable of explaining anything in their field of application. In contrast, science is risky. If observations showed the predictions from a theory to be wrong, the theory would be refuted. Hence, scientific hypotheses must be falsifiable. Not only must there exist some possible observation statement which could falsify the hypothesis or theory, were it observed, (Popper called these the hypothesis’ potential falsifiers) it is crucial to the Popperian scientific method that such falsifications be sincerely attempted on a regular basis.

The more potential falsifiers of a hypothesis, the more falsifiable it would be, and the more the hypothesis claimed. Conversely, hypotheses without falsifiers claimed very little or nothing at all. Originally, Popper thought that this meant the introduction of ad hoc hypotheses only to save a theory should not be countenanced as good scientific method. These would undermine the falsifiabililty of a theory. However, Popper later came to recognize that the introduction of modifications (immunizations, he called them) was often an important part of scientific development. Responding to surprising or apparently falsifying observations often generated important new scientific insights. Popper’s own example was the observed motion of Uranus which originally did not agree with Newtonian predictions. The ad hoc hypothesis of an outer planet explained the disagreement and led to further falsifiable predictions. Popper sought to reconcile the view by blurring the distinction between falsifiable and not falsifiable, and speaking instead of degrees of testability (Popper 1985: 41f.).

From the 1960s on, sustained meta-methodological criticism emerged that drove philosophical focus away from scientific method. A brief look at those criticisms follows, with recommendations for further reading at the end of the entry.

Thomas Kuhn’s The Structure of Scientific Revolutions (1962) begins with a well-known shot across the bow for philosophers of science:

History, if viewed as a repository for more than anecdote or chronology, could produce a decisive transformation in the image of science by which we are now possessed. (1962: 1)

The image Kuhn thought needed transforming was the a-historical, rational reconstruction sought by many of the Logical Positivists, though Carnap and other positivists were actually quite sympathetic to Kuhn’s views. (See the entry on the Vienna Circle .) Kuhn shares with other of his contemporaries, such as Feyerabend and Lakatos, a commitment to a more empirical approach to philosophy of science. Namely, the history of science provides important data, and necessary checks, for philosophy of science, including any theory of scientific method.

The history of science reveals, according to Kuhn, that scientific development occurs in alternating phases. During normal science, the members of the scientific community adhere to the paradigm in place. Their commitment to the paradigm means a commitment to the puzzles to be solved and the acceptable ways of solving them. Confidence in the paradigm remains so long as steady progress is made in solving the shared puzzles. Method in this normal phase operates within a disciplinary matrix (Kuhn’s later concept of a paradigm) which includes standards for problem solving, and defines the range of problems to which the method should be applied. An important part of a disciplinary matrix is the set of values which provide the norms and aims for scientific method. The main values that Kuhn identifies are prediction, problem solving, simplicity, consistency, and plausibility.

An important by-product of normal science is the accumulation of puzzles which cannot be solved with resources of the current paradigm. Once accumulation of these anomalies has reached some critical mass, it can trigger a communal shift to a new paradigm and a new phase of normal science. Importantly, the values that provide the norms and aims for scientific method may have transformed in the meantime. Method may therefore be relative to discipline, time or place

Feyerabend also identified the aims of science as progress, but argued that any methodological prescription would only stifle that progress (Feyerabend 1988). His arguments are grounded in re-examining accepted “myths” about the history of science. Heroes of science, like Galileo, are shown to be just as reliant on rhetoric and persuasion as they are on reason and demonstration. Others, like Aristotle, are shown to be far more reasonable and far-reaching in their outlooks then they are given credit for. As a consequence, the only rule that could provide what he took to be sufficient freedom was the vacuous “anything goes”. More generally, even the methodological restriction that science is the best way to pursue knowledge, and to increase knowledge, is too restrictive. Feyerabend suggested instead that science might, in fact, be a threat to a free society, because it and its myth had become so dominant (Feyerabend 1978).

An even more fundamental kind of criticism was offered by several sociologists of science from the 1970s onwards who rejected the methodology of providing philosophical accounts for the rational development of science and sociological accounts of the irrational mistakes. Instead, they adhered to a symmetry thesis on which any causal explanation of how scientific knowledge is established needs to be symmetrical in explaining truth and falsity, rationality and irrationality, success and mistakes, by the same causal factors (see, e.g., Barnes and Bloor 1982, Bloor 1991). Movements in the Sociology of Science, like the Strong Programme, or in the social dimensions and causes of knowledge more generally led to extended and close examination of detailed case studies in contemporary science and its history. (See the entries on the social dimensions of scientific knowledge and social epistemology .) Well-known examinations by Latour and Woolgar (1979/1986), Knorr-Cetina (1981), Pickering (1984), Shapin and Schaffer (1985) seem to bear out that it was social ideologies (on a macro-scale) or individual interactions and circumstances (on a micro-scale) which were the primary causal factors in determining which beliefs gained the status of scientific knowledge. As they saw it therefore, explanatory appeals to scientific method were not empirically grounded.

A late, and largely unexpected, criticism of scientific method came from within science itself. Beginning in the early 2000s, a number of scientists attempting to replicate the results of published experiments could not do so. There may be close conceptual connection between reproducibility and method. For example, if reproducibility means that the same scientific methods ought to produce the same result, and all scientific results ought to be reproducible, then whatever it takes to reproduce a scientific result ought to be called scientific method. Space limits us to the observation that, insofar as reproducibility is a desired outcome of proper scientific method, it is not strictly a part of scientific method. (See the entry on reproducibility of scientific results .)

By the close of the 20 th century the search for the scientific method was flagging. Nola and Sankey (2000b) could introduce their volume on method by remarking that “For some, the whole idea of a theory of scientific method is yester-year’s debate …”.

Despite the many difficulties that philosophers encountered in trying to providing a clear methodology of conformation (or refutation), still important progress has been made on understanding how observation can provide evidence for a given theory. Work in statistics has been crucial for understanding how theories can be tested empirically, and in recent decades a huge literature has developed that attempts to recast confirmation in Bayesian terms. Here these developments can be covered only briefly, and we refer to the entry on confirmation for further details and references.

Statistics has come to play an increasingly important role in the methodology of the experimental sciences from the 19 th century onwards. At that time, statistics and probability theory took on a methodological role as an analysis of inductive inference, and attempts to ground the rationality of induction in the axioms of probability theory have continued throughout the 20 th century and in to the present. Developments in the theory of statistics itself, meanwhile, have had a direct and immense influence on the experimental method, including methods for measuring the uncertainty of observations such as the Method of Least Squares developed by Legendre and Gauss in the early 19 th century, criteria for the rejection of outliers proposed by Peirce by the mid-19 th century, and the significance tests developed by Gosset (a.k.a. “Student”), Fisher, Neyman & Pearson and others in the 1920s and 1930s (see, e.g., Swijtink 1987 for a brief historical overview; and also the entry on C.S. Peirce ).

These developments within statistics then in turn led to a reflective discussion among both statisticians and philosophers of science on how to perceive the process of hypothesis testing: whether it was a rigorous statistical inference that could provide a numerical expression of the degree of confidence in the tested hypothesis, or if it should be seen as a decision between different courses of actions that also involved a value component. This led to a major controversy among Fisher on the one side and Neyman and Pearson on the other (see especially Fisher 1955, Neyman 1956 and Pearson 1955, and for analyses of the controversy, e.g., Howie 2002, Marks 2000, Lenhard 2006). On Fisher’s view, hypothesis testing was a methodology for when to accept or reject a statistical hypothesis, namely that a hypothesis should be rejected by evidence if this evidence would be unlikely relative to other possible outcomes, given the hypothesis were true. In contrast, on Neyman and Pearson’s view, the consequence of error also had to play a role when deciding between hypotheses. Introducing the distinction between the error of rejecting a true hypothesis (type I error) and accepting a false hypothesis (type II error), they argued that it depends on the consequences of the error to decide whether it is more important to avoid rejecting a true hypothesis or accepting a false one. Hence, Fisher aimed for a theory of inductive inference that enabled a numerical expression of confidence in a hypothesis. To him, the important point was the search for truth, not utility. In contrast, the Neyman-Pearson approach provided a strategy of inductive behaviour for deciding between different courses of action. Here, the important point was not whether a hypothesis was true, but whether one should act as if it was.

Similar discussions are found in the philosophical literature. On the one side, Churchman (1948) and Rudner (1953) argued that because scientific hypotheses can never be completely verified, a complete analysis of the methods of scientific inference includes ethical judgments in which the scientists must decide whether the evidence is sufficiently strong or that the probability is sufficiently high to warrant the acceptance of the hypothesis, which again will depend on the importance of making a mistake in accepting or rejecting the hypothesis. Others, such as Jeffrey (1956) and Levi (1960) disagreed and instead defended a value-neutral view of science on which scientists should bracket their attitudes, preferences, temperament, and values when assessing the correctness of their inferences. For more details on this value-free ideal in the philosophy of science and its historical development, see Douglas (2009) and Howard (2003). For a broad set of case studies examining the role of values in science, see e.g. Elliott & Richards 2017.

In recent decades, philosophical discussions of the evaluation of probabilistic hypotheses by statistical inference have largely focused on Bayesianism that understands probability as a measure of a person’s degree of belief in an event, given the available information, and frequentism that instead understands probability as a long-run frequency of a repeatable event. Hence, for Bayesians probabilities refer to a state of knowledge, whereas for frequentists probabilities refer to frequencies of events (see, e.g., Sober 2008, chapter 1 for a detailed introduction to Bayesianism and frequentism as well as to likelihoodism). Bayesianism aims at providing a quantifiable, algorithmic representation of belief revision, where belief revision is a function of prior beliefs (i.e., background knowledge) and incoming evidence. Bayesianism employs a rule based on Bayes’ theorem, a theorem of the probability calculus which relates conditional probabilities. The probability that a particular hypothesis is true is interpreted as a degree of belief, or credence, of the scientist. There will also be a probability and a degree of belief that a hypothesis will be true conditional on a piece of evidence (an observation, say) being true. Bayesianism proscribes that it is rational for the scientist to update their belief in the hypothesis to that conditional probability should it turn out that the evidence is, in fact, observed (see, e.g., Sprenger & Hartmann 2019 for a comprehensive treatment of Bayesian philosophy of science). Originating in the work of Neyman and Person, frequentism aims at providing the tools for reducing long-run error rates, such as the error-statistical approach developed by Mayo (1996) that focuses on how experimenters can avoid both type I and type II errors by building up a repertoire of procedures that detect errors if and only if they are present. Both Bayesianism and frequentism have developed over time, they are interpreted in different ways by its various proponents, and their relations to previous criticism to attempts at defining scientific method are seen differently by proponents and critics. The literature, surveys, reviews and criticism in this area are vast and the reader is referred to the entries on Bayesian epistemology and confirmation .

5. Method in Practice

Attention to scientific practice, as we have seen, is not itself new. However, the turn to practice in the philosophy of science of late can be seen as a correction to the pessimism with respect to method in philosophy of science in later parts of the 20 th century, and as an attempted reconciliation between sociological and rationalist explanations of scientific knowledge. Much of this work sees method as detailed and context specific problem-solving procedures, and methodological analyses to be at the same time descriptive, critical and advisory (see Nickles 1987 for an exposition of this view). The following section contains a survey of some of the practice focuses. In this section we turn fully to topics rather than chronology.

A problem with the distinction between the contexts of discovery and justification that figured so prominently in philosophy of science in the first half of the 20 th century (see section 2 ) is that no such distinction can be clearly seen in scientific activity (see Arabatzis 2006). Thus, in recent decades, it has been recognized that study of conceptual innovation and change should not be confined to psychology and sociology of science, but are also important aspects of scientific practice which philosophy of science should address (see also the entry on scientific discovery ). Looking for the practices that drive conceptual innovation has led philosophers to examine both the reasoning practices of scientists and the wide realm of experimental practices that are not directed narrowly at testing hypotheses, that is, exploratory experimentation.

Examining the reasoning practices of historical and contemporary scientists, Nersessian (2008) has argued that new scientific concepts are constructed as solutions to specific problems by systematic reasoning, and that of analogy, visual representation and thought-experimentation are among the important reasoning practices employed. These ubiquitous forms of reasoning are reliable—but also fallible—methods of conceptual development and change. On her account, model-based reasoning consists of cycles of construction, simulation, evaluation and adaption of models that serve as interim interpretations of the target problem to be solved. Often, this process will lead to modifications or extensions, and a new cycle of simulation and evaluation. However, Nersessian also emphasizes that

creative model-based reasoning cannot be applied as a simple recipe, is not always productive of solutions, and even its most exemplary usages can lead to incorrect solutions. (Nersessian 2008: 11)

Thus, while on the one hand she agrees with many previous philosophers that there is no logic of discovery, discoveries can derive from reasoned processes, such that a large and integral part of scientific practice is

the creation of concepts through which to comprehend, structure, and communicate about physical phenomena …. (Nersessian 1987: 11)

Similarly, work on heuristics for discovery and theory construction by scholars such as Darden (1991) and Bechtel & Richardson (1993) present science as problem solving and investigate scientific problem solving as a special case of problem-solving in general. Drawing largely on cases from the biological sciences, much of their focus has been on reasoning strategies for the generation, evaluation, and revision of mechanistic explanations of complex systems.

Addressing another aspect of the context distinction, namely the traditional view that the primary role of experiments is to test theoretical hypotheses according to the H-D model, other philosophers of science have argued for additional roles that experiments can play. The notion of exploratory experimentation was introduced to describe experiments driven by the desire to obtain empirical regularities and to develop concepts and classifications in which these regularities can be described (Steinle 1997, 2002; Burian 1997; Waters 2007)). However the difference between theory driven experimentation and exploratory experimentation should not be seen as a sharp distinction. Theory driven experiments are not always directed at testing hypothesis, but may also be directed at various kinds of fact-gathering, such as determining numerical parameters. Vice versa , exploratory experiments are usually informed by theory in various ways and are therefore not theory-free. Instead, in exploratory experiments phenomena are investigated without first limiting the possible outcomes of the experiment on the basis of extant theory about the phenomena.

The development of high throughput instrumentation in molecular biology and neighbouring fields has given rise to a special type of exploratory experimentation that collects and analyses very large amounts of data, and these new ‘omics’ disciplines are often said to represent a break with the ideal of hypothesis-driven science (Burian 2007; Elliott 2007; Waters 2007; O’Malley 2007) and instead described as data-driven research (Leonelli 2012; Strasser 2012) or as a special kind of “convenience experimentation” in which many experiments are done simply because they are extraordinarily convenient to perform (Krohs 2012).

5.2 Computer methods and ‘new ways’ of doing science

The field of omics just described is possible because of the ability of computers to process, in a reasonable amount of time, the huge quantities of data required. Computers allow for more elaborate experimentation (higher speed, better filtering, more variables, sophisticated coordination and control), but also, through modelling and simulations, might constitute a form of experimentation themselves. Here, too, we can pose a version of the general question of method versus practice: does the practice of using computers fundamentally change scientific method, or merely provide a more efficient means of implementing standard methods?

Because computers can be used to automate measurements, quantifications, calculations, and statistical analyses where, for practical reasons, these operations cannot be otherwise carried out, many of the steps involved in reaching a conclusion on the basis of an experiment are now made inside a “black box”, without the direct involvement or awareness of a human. This has epistemological implications, regarding what we can know, and how we can know it. To have confidence in the results, computer methods are therefore subjected to tests of verification and validation.

The distinction between verification and validation is easiest to characterize in the case of computer simulations. In a typical computer simulation scenario computers are used to numerically integrate differential equations for which no analytic solution is available. The equations are part of the model the scientist uses to represent a phenomenon or system under investigation. Verifying a computer simulation means checking that the equations of the model are being correctly approximated. Validating a simulation means checking that the equations of the model are adequate for the inferences one wants to make on the basis of that model.

A number of issues related to computer simulations have been raised. The identification of validity and verification as the testing methods has been criticized. Oreskes et al. (1994) raise concerns that “validiation”, because it suggests deductive inference, might lead to over-confidence in the results of simulations. The distinction itself is probably too clean, since actual practice in the testing of simulations mixes and moves back and forth between the two (Weissart 1997; Parker 2008a; Winsberg 2010). Computer simulations do seem to have a non-inductive character, given that the principles by which they operate are built in by the programmers, and any results of the simulation follow from those in-built principles in such a way that those results could, in principle, be deduced from the program code and its inputs. The status of simulations as experiments has therefore been examined (Kaufmann and Smarr 1993; Humphreys 1995; Hughes 1999; Norton and Suppe 2001). This literature considers the epistemology of these experiments: what we can learn by simulation, and also the kinds of justifications which can be given in applying that knowledge to the “real” world. (Mayo 1996; Parker 2008b). As pointed out, part of the advantage of computer simulation derives from the fact that huge numbers of calculations can be carried out without requiring direct observation by the experimenter/​simulator. At the same time, many of these calculations are approximations to the calculations which would be performed first-hand in an ideal situation. Both factors introduce uncertainties into the inferences drawn from what is observed in the simulation.

For many of the reasons described above, computer simulations do not seem to belong clearly to either the experimental or theoretical domain. Rather, they seem to crucially involve aspects of both. This has led some authors, such as Fox Keller (2003: 200) to argue that we ought to consider computer simulation a “qualitatively different way of doing science”. The literature in general tends to follow Kaufmann and Smarr (1993) in referring to computer simulation as a “third way” for scientific methodology (theoretical reasoning and experimental practice are the first two ways.). It should also be noted that the debates around these issues have tended to focus on the form of computer simulation typical in the physical sciences, where models are based on dynamical equations. Other forms of simulation might not have the same problems, or have problems of their own (see the entry on computer simulations in science ).

In recent years, the rapid development of machine learning techniques has prompted some scholars to suggest that the scientific method has become “obsolete” (Anderson 2008, Carrol and Goodstein 2009). This has resulted in an intense debate on the relative merit of data-driven and hypothesis-driven research (for samples, see e.g. Mazzocchi 2015 or Succi and Coveney 2018). For a detailed treatment of this topic, we refer to the entry scientific research and big data .

6. Discourse on scientific method

Despite philosophical disagreements, the idea of the scientific method still figures prominently in contemporary discourse on many different topics, both within science and in society at large. Often, reference to scientific method is used in ways that convey either the legend of a single, universal method characteristic of all science, or grants to a particular method or set of methods privilege as a special ‘gold standard’, often with reference to particular philosophers to vindicate the claims. Discourse on scientific method also typically arises when there is a need to distinguish between science and other activities, or for justifying the special status conveyed to science. In these areas, the philosophical attempts at identifying a set of methods characteristic for scientific endeavors are closely related to the philosophy of science’s classical problem of demarcation (see the entry on science and pseudo-science ) and to the philosophical analysis of the social dimension of scientific knowledge and the role of science in democratic society.

One of the settings in which the legend of a single, universal scientific method has been particularly strong is science education (see, e.g., Bauer 1992; McComas 1996; Wivagg & Allchin 2002). [ 5 ] Often, ‘the scientific method’ is presented in textbooks and educational web pages as a fixed four or five step procedure starting from observations and description of a phenomenon and progressing over formulation of a hypothesis which explains the phenomenon, designing and conducting experiments to test the hypothesis, analyzing the results, and ending with drawing a conclusion. Such references to a universal scientific method can be found in educational material at all levels of science education (Blachowicz 2009), and numerous studies have shown that the idea of a general and universal scientific method often form part of both students’ and teachers’ conception of science (see, e.g., Aikenhead 1987; Osborne et al. 2003). In response, it has been argued that science education need to focus more on teaching about the nature of science, although views have differed on whether this is best done through student-led investigations, contemporary cases, or historical cases (Allchin, Andersen & Nielsen 2014)

Although occasionally phrased with reference to the H-D method, important historical roots of the legend in science education of a single, universal scientific method are the American philosopher and psychologist Dewey’s account of inquiry in How We Think (1910) and the British mathematician Karl Pearson’s account of science in Grammar of Science (1892). On Dewey’s account, inquiry is divided into the five steps of

(i) a felt difficulty, (ii) its location and definition, (iii) suggestion of a possible solution, (iv) development by reasoning of the bearing of the suggestions, (v) further observation and experiment leading to its acceptance or rejection. (Dewey 1910: 72)

Similarly, on Pearson’s account, scientific investigations start with measurement of data and observation of their correction and sequence from which scientific laws can be discovered with the aid of creative imagination. These laws have to be subject to criticism, and their final acceptance will have equal validity for “all normally constituted minds”. Both Dewey’s and Pearson’s accounts should be seen as generalized abstractions of inquiry and not restricted to the realm of science—although both Dewey and Pearson referred to their respective accounts as ‘the scientific method’.

Occasionally, scientists make sweeping statements about a simple and distinct scientific method, as exemplified by Feynman’s simplified version of a conjectures and refutations method presented, for example, in the last of his 1964 Cornell Messenger lectures. [ 6 ] However, just as often scientists have come to the same conclusion as recent philosophy of science that there is not any unique, easily described scientific method. For example, the physicist and Nobel Laureate Weinberg described in the paper “The Methods of Science … And Those By Which We Live” (1995) how

The fact that the standards of scientific success shift with time does not only make the philosophy of science difficult; it also raises problems for the public understanding of science. We do not have a fixed scientific method to rally around and defend. (1995: 8)

Interview studies with scientists on their conception of method shows that scientists often find it hard to figure out whether available evidence confirms their hypothesis, and that there are no direct translations between general ideas about method and specific strategies to guide how research is conducted (Schickore & Hangel 2019, Hangel & Schickore 2017)

Reference to the scientific method has also often been used to argue for the scientific nature or special status of a particular activity. Philosophical positions that argue for a simple and unique scientific method as a criterion of demarcation, such as Popperian falsification, have often attracted practitioners who felt that they had a need to defend their domain of practice. For example, references to conjectures and refutation as the scientific method are abundant in much of the literature on complementary and alternative medicine (CAM)—alongside the competing position that CAM, as an alternative to conventional biomedicine, needs to develop its own methodology different from that of science.

Also within mainstream science, reference to the scientific method is used in arguments regarding the internal hierarchy of disciplines and domains. A frequently seen argument is that research based on the H-D method is superior to research based on induction from observations because in deductive inferences the conclusion follows necessarily from the premises. (See, e.g., Parascandola 1998 for an analysis of how this argument has been made to downgrade epidemiology compared to the laboratory sciences.) Similarly, based on an examination of the practices of major funding institutions such as the National Institutes of Health (NIH), the National Science Foundation (NSF) and the Biomedical Sciences Research Practices (BBSRC) in the UK, O’Malley et al. (2009) have argued that funding agencies seem to have a tendency to adhere to the view that the primary activity of science is to test hypotheses, while descriptive and exploratory research is seen as merely preparatory activities that are valuable only insofar as they fuel hypothesis-driven research.

In some areas of science, scholarly publications are structured in a way that may convey the impression of a neat and linear process of inquiry from stating a question, devising the methods by which to answer it, collecting the data, to drawing a conclusion from the analysis of data. For example, the codified format of publications in most biomedical journals known as the IMRAD format (Introduction, Method, Results, Analysis, Discussion) is explicitly described by the journal editors as “not an arbitrary publication format but rather a direct reflection of the process of scientific discovery” (see the so-called “Vancouver Recommendations”, ICMJE 2013: 11). However, scientific publications do not in general reflect the process by which the reported scientific results were produced. For example, under the provocative title “Is the scientific paper a fraud?”, Medawar argued that scientific papers generally misrepresent how the results have been produced (Medawar 1963/1996). Similar views have been advanced by philosophers, historians and sociologists of science (Gilbert 1976; Holmes 1987; Knorr-Cetina 1981; Schickore 2008; Suppe 1998) who have argued that scientists’ experimental practices are messy and often do not follow any recognizable pattern. Publications of research results, they argue, are retrospective reconstructions of these activities that often do not preserve the temporal order or the logic of these activities, but are instead often constructed in order to screen off potential criticism (see Schickore 2008 for a review of this work).

Philosophical positions on the scientific method have also made it into the court room, especially in the US where judges have drawn on philosophy of science in deciding when to confer special status to scientific expert testimony. A key case is Daubert vs Merrell Dow Pharmaceuticals (92–102, 509 U.S. 579, 1993). In this case, the Supreme Court argued in its 1993 ruling that trial judges must ensure that expert testimony is reliable, and that in doing this the court must look at the expert’s methodology to determine whether the proffered evidence is actually scientific knowledge. Further, referring to works of Popper and Hempel the court stated that

ordinarily, a key question to be answered in determining whether a theory or technique is scientific knowledge … is whether it can be (and has been) tested. (Justice Blackmun, Daubert v. Merrell Dow Pharmaceuticals; see Other Internet Resources for a link to the opinion)

But as argued by Haack (2005a,b, 2010) and by Foster & Hubner (1999), by equating the question of whether a piece of testimony is reliable with the question whether it is scientific as indicated by a special methodology, the court was producing an inconsistent mixture of Popper’s and Hempel’s philosophies, and this has later led to considerable confusion in subsequent case rulings that drew on the Daubert case (see Haack 2010 for a detailed exposition).

The difficulties around identifying the methods of science are also reflected in the difficulties of identifying scientific misconduct in the form of improper application of the method or methods of science. One of the first and most influential attempts at defining misconduct in science was the US definition from 1989 that defined misconduct as

fabrication, falsification, plagiarism, or other practices that seriously deviate from those that are commonly accepted within the scientific community . (Code of Federal Regulations, part 50, subpart A., August 8, 1989, italics added)

However, the “other practices that seriously deviate” clause was heavily criticized because it could be used to suppress creative or novel science. For example, the National Academy of Science stated in their report Responsible Science (1992) that it

wishes to discourage the possibility that a misconduct complaint could be lodged against scientists based solely on their use of novel or unorthodox research methods. (NAS: 27)

This clause was therefore later removed from the definition. For an entry into the key philosophical literature on conduct in science, see Shamoo & Resnick (2009).

The question of the source of the success of science has been at the core of philosophy since the beginning of modern science. If viewed as a matter of epistemology more generally, scientific method is a part of the entire history of philosophy. Over that time, science and whatever methods its practitioners may employ have changed dramatically. Today, many philosophers have taken up the banners of pluralism or of practice to focus on what are, in effect, fine-grained and contextually limited examinations of scientific method. Others hope to shift perspectives in order to provide a renewed general account of what characterizes the activity we call science.

One such perspective has been offered recently by Hoyningen-Huene (2008, 2013), who argues from the history of philosophy of science that after three lengthy phases of characterizing science by its method, we are now in a phase where the belief in the existence of a positive scientific method has eroded and what has been left to characterize science is only its fallibility. First was a phase from Plato and Aristotle up until the 17 th century where the specificity of scientific knowledge was seen in its absolute certainty established by proof from evident axioms; next was a phase up to the mid-19 th century in which the means to establish the certainty of scientific knowledge had been generalized to include inductive procedures as well. In the third phase, which lasted until the last decades of the 20 th century, it was recognized that empirical knowledge was fallible, but it was still granted a special status due to its distinctive mode of production. But now in the fourth phase, according to Hoyningen-Huene, historical and philosophical studies have shown how “scientific methods with the characteristics as posited in the second and third phase do not exist” (2008: 168) and there is no longer any consensus among philosophers and historians of science about the nature of science. For Hoyningen-Huene, this is too negative a stance, and he therefore urges the question about the nature of science anew. His own answer to this question is that “scientific knowledge differs from other kinds of knowledge, especially everyday knowledge, primarily by being more systematic” (Hoyningen-Huene 2013: 14). Systematicity can have several different dimensions: among them are more systematic descriptions, explanations, predictions, defense of knowledge claims, epistemic connectedness, ideal of completeness, knowledge generation, representation of knowledge and critical discourse. Hence, what characterizes science is the greater care in excluding possible alternative explanations, the more detailed elaboration with respect to data on which predictions are based, the greater care in detecting and eliminating sources of error, the more articulate connections to other pieces of knowledge, etc. On this position, what characterizes science is not that the methods employed are unique to science, but that the methods are more carefully employed.

Another, similar approach has been offered by Haack (2003). She sets off, similar to Hoyningen-Huene, from a dissatisfaction with the recent clash between what she calls Old Deferentialism and New Cynicism. The Old Deferentialist position is that science progressed inductively by accumulating true theories confirmed by empirical evidence or deductively by testing conjectures against basic statements; while the New Cynics position is that science has no epistemic authority and no uniquely rational method and is merely just politics. Haack insists that contrary to the views of the New Cynics, there are objective epistemic standards, and there is something epistemologically special about science, even though the Old Deferentialists pictured this in a wrong way. Instead, she offers a new Critical Commonsensist account on which standards of good, strong, supportive evidence and well-conducted, honest, thorough and imaginative inquiry are not exclusive to the sciences, but the standards by which we judge all inquirers. In this sense, science does not differ in kind from other kinds of inquiry, but it may differ in the degree to which it requires broad and detailed background knowledge and a familiarity with a technical vocabulary that only specialists may possess.

  • Aikenhead, G.S., 1987, “High-school graduates’ beliefs about science-technology-society. III. Characteristics and limitations of scientific knowledge”, Science Education , 71(4): 459–487.
  • Allchin, D., H.M. Andersen and K. Nielsen, 2014, “Complementary Approaches to Teaching Nature of Science: Integrating Student Inquiry, Historical Cases, and Contemporary Cases in Classroom Practice”, Science Education , 98: 461–486.
  • Anderson, C., 2008, “The end of theory: The data deluge makes the scientific method obsolete”, Wired magazine , 16(7): 16–07
  • Arabatzis, T., 2006, “On the inextricability of the context of discovery and the context of justification”, in Revisiting Discovery and Justification , J. Schickore and F. Steinle (eds.), Dordrecht: Springer, pp. 215–230.
  • Barnes, J. (ed.), 1984, The Complete Works of Aristotle, Vols I and II , Princeton: Princeton University Press.
  • Barnes, B. and D. Bloor, 1982, “Relativism, Rationalism, and the Sociology of Knowledge”, in Rationality and Relativism , M. Hollis and S. Lukes (eds.), Cambridge: MIT Press, pp. 1–20.
  • Bauer, H.H., 1992, Scientific Literacy and the Myth of the Scientific Method , Urbana: University of Illinois Press.
  • Bechtel, W. and R.C. Richardson, 1993, Discovering complexity , Princeton, NJ: Princeton University Press.
  • Berkeley, G., 1734, The Analyst in De Motu and The Analyst: A Modern Edition with Introductions and Commentary , D. Jesseph (trans. and ed.), Dordrecht: Kluwer Academic Publishers, 1992.
  • Blachowicz, J., 2009, “How science textbooks treat scientific method: A philosopher’s perspective”, The British Journal for the Philosophy of Science , 60(2): 303–344.
  • Bloor, D., 1991, Knowledge and Social Imagery , Chicago: University of Chicago Press, 2 nd edition.
  • Boyle, R., 1682, New experiments physico-mechanical, touching the air , Printed by Miles Flesher for Richard Davis, bookseller in Oxford.
  • Bridgman, P.W., 1927, The Logic of Modern Physics , New York: Macmillan.
  • –––, 1956, “The Methodological Character of Theoretical Concepts”, in The Foundations of Science and the Concepts of Science and Psychology , Herbert Feigl and Michael Scriven (eds.), Minnesota: University of Minneapolis Press, pp. 38–76.
  • Burian, R., 1997, “Exploratory Experimentation and the Role of Histochemical Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life Sciences , 19(1): 27–45.
  • –––, 2007, “On microRNA and the need for exploratory experimentation in post-genomic molecular biology”, History and Philosophy of the Life Sciences , 29(3): 285–311.
  • Carnap, R., 1928, Der logische Aufbau der Welt , Berlin: Bernary, transl. by R.A. George, The Logical Structure of the World , Berkeley: University of California Press, 1967.
  • –––, 1956, “The methodological character of theoretical concepts”, Minnesota studies in the philosophy of science , 1: 38–76.
  • Carrol, S., and D. Goodstein, 2009, “Defining the scientific method”, Nature Methods , 6: 237.
  • Churchman, C.W., 1948, “Science, Pragmatics, Induction”, Philosophy of Science , 15(3): 249–268.
  • Cooper, J. (ed.), 1997, Plato: Complete Works , Indianapolis: Hackett.
  • Darden, L., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press
  • Dewey, J., 1910, How we think , New York: Dover Publications (reprinted 1997).
  • Douglas, H., 2009, Science, Policy, and the Value-Free Ideal , Pittsburgh: University of Pittsburgh Press.
  • Dupré, J., 2004, “Miracle of Monism ”, in Naturalism in Question , Mario De Caro and David Macarthur (eds.), Cambridge, MA: Harvard University Press, pp. 36–58.
  • Elliott, K.C., 2007, “Varieties of exploratory experimentation in nanotoxicology”, History and Philosophy of the Life Sciences , 29(3): 311–334.
  • Elliott, K. C., and T. Richards (eds.), 2017, Exploring inductive risk: Case studies of values in science , Oxford: Oxford University Press.
  • Falcon, Andrea, 2005, Aristotle and the science of nature: Unity without uniformity , Cambridge: Cambridge University Press.
  • Feyerabend, P., 1978, Science in a Free Society , London: New Left Books
  • –––, 1988, Against Method , London: Verso, 2 nd edition.
  • Fisher, R.A., 1955, “Statistical Methods and Scientific Induction”, Journal of The Royal Statistical Society. Series B (Methodological) , 17(1): 69–78.
  • Foster, K. and P.W. Huber, 1999, Judging Science. Scientific Knowledge and the Federal Courts , Cambridge: MIT Press.
  • Fox Keller, E., 2003, “Models, Simulation, and ‘computer experiments’”, in The Philosophy of Scientific Experimentation , H. Radder (ed.), Pittsburgh: Pittsburgh University Press, 198–215.
  • Gilbert, G., 1976, “The transformation of research findings into scientific knowledge”, Social Studies of Science , 6: 281–306.
  • Gimbel, S., 2011, Exploring the Scientific Method , Chicago: University of Chicago Press.
  • Goodman, N., 1965, Fact , Fiction, and Forecast , Indianapolis: Bobbs-Merrill.
  • Haack, S., 1995, “Science is neither sacred nor a confidence trick”, Foundations of Science , 1(3): 323–335.
  • –––, 2003, Defending science—within reason , Amherst: Prometheus.
  • –––, 2005a, “Disentangling Daubert: an epistemological study in theory and practice”, Journal of Philosophy, Science and Law , 5, Haack 2005a available online . doi:10.5840/jpsl2005513
  • –––, 2005b, “Trial and error: The Supreme Court’s philosophy of science”, American Journal of Public Health , 95: S66-S73.
  • –––, 2010, “Federal Philosophy of Science: A Deconstruction-and a Reconstruction”, NYUJL & Liberty , 5: 394.
  • Hangel, N. and J. Schickore, 2017, “Scientists’ conceptions of good research practice”, Perspectives on Science , 25(6): 766–791
  • Harper, W.L., 2011, Isaac Newton’s Scientific Method: Turning Data into Evidence about Gravity and Cosmology , Oxford: Oxford University Press.
  • Hempel, C., 1950, “Problems and Changes in the Empiricist Criterion of Meaning”, Revue Internationale de Philosophie , 41(11): 41–63.
  • –––, 1951, “The Concept of Cognitive Significance: A Reconsideration”, Proceedings of the American Academy of Arts and Sciences , 80(1): 61–77.
  • –––, 1965, Aspects of scientific explanation and other essays in the philosophy of science , New York–London: Free Press.
  • –––, 1966, Philosophy of Natural Science , Englewood Cliffs: Prentice-Hall.
  • Holmes, F.L., 1987, “Scientific writing and scientific discovery”, Isis , 78(2): 220–235.
  • Howard, D., 2003, “Two left turns make a right: On the curious political career of North American philosophy of science at midcentury”, in Logical Empiricism in North America , G.L. Hardcastle & A.W. Richardson (eds.), Minneapolis: University of Minnesota Press, pp. 25–93.
  • Hoyningen-Huene, P., 2008, “Systematicity: The nature of science”, Philosophia , 36(2): 167–180.
  • –––, 2013, Systematicity. The Nature of Science , Oxford: Oxford University Press.
  • Howie, D., 2002, Interpreting probability: Controversies and developments in the early twentieth century , Cambridge: Cambridge University Press.
  • Hughes, R., 1999, “The Ising Model, Computer Simulation, and Universal Physics”, in Models as Mediators , M. Morgan and M. Morrison (eds.), Cambridge: Cambridge University Press, pp. 97–145
  • Hume, D., 1739, A Treatise of Human Nature , D. Fate Norton and M.J. Norton (eds.), Oxford: Oxford University Press, 2000.
  • Humphreys, P., 1995, “Computational science and scientific method”, Minds and Machines , 5(1): 499–512.
  • ICMJE, 2013, “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals”, International Committee of Medical Journal Editors, available online , accessed August 13 2014
  • Jeffrey, R.C., 1956, “Valuation and Acceptance of Scientific Hypotheses”, Philosophy of Science , 23(3): 237–246.
  • Kaufmann, W.J., and L.L. Smarr, 1993, Supercomputing and the Transformation of Science , New York: Scientific American Library.
  • Knorr-Cetina, K., 1981, The Manufacture of Knowledge , Oxford: Pergamon Press.
  • Krohs, U., 2012, “Convenience experimentation”, Studies in History and Philosophy of Biological and BiomedicalSciences , 43: 52–57.
  • Kuhn, T.S., 1962, The Structure of Scientific Revolutions , Chicago: University of Chicago Press
  • Latour, B. and S. Woolgar, 1986, Laboratory Life: The Construction of Scientific Facts , Princeton: Princeton University Press, 2 nd edition.
  • Laudan, L., 1968, “Theories of scientific method from Plato to Mach”, History of Science , 7(1): 1–63.
  • Lenhard, J., 2006, “Models and statistical inference: The controversy between Fisher and Neyman-Pearson”, The British Journal for the Philosophy of Science , 57(1): 69–91.
  • Leonelli, S., 2012, “Making Sense of Data-Driven Research in the Biological and the Biomedical Sciences”, Studies in the History and Philosophy of the Biological and Biomedical Sciences , 43(1): 1–3.
  • Levi, I., 1960, “Must the scientist make value judgments?”, Philosophy of Science , 57(11): 345–357
  • Lindley, D., 1991, Theory Change in Science: Strategies from Mendelian Genetics , Oxford: Oxford University Press.
  • Lipton, P., 2004, Inference to the Best Explanation , London: Routledge, 2 nd edition.
  • Marks, H.M., 2000, The progress of experiment: science and therapeutic reform in the United States, 1900–1990 , Cambridge: Cambridge University Press.
  • Mazzochi, F., 2015, “Could Big Data be the end of theory in science?”, EMBO reports , 16: 1250–1255.
  • Mayo, D.G., 1996, Error and the Growth of Experimental Knowledge , Chicago: University of Chicago Press.
  • McComas, W.F., 1996, “Ten myths of science: Reexamining what we think we know about the nature of science”, School Science and Mathematics , 96(1): 10–16.
  • Medawar, P.B., 1963/1996, “Is the scientific paper a fraud”, in The Strange Case of the Spotted Mouse and Other Classic Essays on Science , Oxford: Oxford University Press, 33–39.
  • Mill, J.S., 1963, Collected Works of John Stuart Mill , J. M. Robson (ed.), Toronto: University of Toronto Press
  • NAS, 1992, Responsible Science: Ensuring the integrity of the research process , Washington DC: National Academy Press.
  • Nersessian, N.J., 1987, “A cognitive-historical approach to meaning in scientific theories”, in The process of science , N. Nersessian (ed.), Berlin: Springer, pp. 161–177.
  • –––, 2008, Creating Scientific Concepts , Cambridge: MIT Press.
  • Newton, I., 1726, Philosophiae naturalis Principia Mathematica (3 rd edition), in The Principia: Mathematical Principles of Natural Philosophy: A New Translation , I.B. Cohen and A. Whitman (trans.), Berkeley: University of California Press, 1999.
  • –––, 1704, Opticks or A Treatise of the Reflections, Refractions, Inflections & Colors of Light , New York: Dover Publications, 1952.
  • Neyman, J., 1956, “Note on an Article by Sir Ronald Fisher”, Journal of the Royal Statistical Society. Series B (Methodological) , 18: 288–294.
  • Nickles, T., 1987, “Methodology, heuristics, and rationality”, in Rational changes in science: Essays on Scientific Reasoning , J.C. Pitt (ed.), Berlin: Springer, pp. 103–132.
  • Nicod, J., 1924, Le problème logique de l’induction , Paris: Alcan. (Engl. transl. “The Logical Problem of Induction”, in Foundations of Geometry and Induction , London: Routledge, 2000.)
  • Nola, R. and H. Sankey, 2000a, “A selective survey of theories of scientific method”, in Nola and Sankey 2000b: 1–65.
  • –––, 2000b, After Popper, Kuhn and Feyerabend. Recent Issues in Theories of Scientific Method , London: Springer.
  • –––, 2007, Theories of Scientific Method , Stocksfield: Acumen.
  • Norton, S., and F. Suppe, 2001, “Why atmospheric modeling is good science”, in Changing the Atmosphere: Expert Knowledge and Environmental Governance , C. Miller and P. Edwards (eds.), Cambridge, MA: MIT Press, 88–133.
  • O’Malley, M., 2007, “Exploratory experimentation and scientific practice: Metagenomics and the proteorhodopsin case”, History and Philosophy of the Life Sciences , 29(3): 337–360.
  • O’Malley, M., C. Haufe, K. Elliot, and R. Burian, 2009, “Philosophies of Funding”, Cell , 138: 611–615.
  • Oreskes, N., K. Shrader-Frechette, and K. Belitz, 1994, “Verification, Validation and Confirmation of Numerical Models in the Earth Sciences”, Science , 263(5147): 641–646.
  • Osborne, J., S. Simon, and S. Collins, 2003, “Attitudes towards science: a review of the literature and its implications”, International Journal of Science Education , 25(9): 1049–1079.
  • Parascandola, M., 1998, “Epidemiology—2 nd -Rate Science”, Public Health Reports , 113(4): 312–320.
  • Parker, W., 2008a, “Franklin, Holmes and the Epistemology of Computer Simulation”, International Studies in the Philosophy of Science , 22(2): 165–83.
  • –––, 2008b, “Computer Simulation through an Error-Statistical Lens”, Synthese , 163(3): 371–84.
  • Pearson, K. 1892, The Grammar of Science , London: J.M. Dents and Sons, 1951
  • Pearson, E.S., 1955, “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society , B, 17: 204–207.
  • Pickering, A., 1984, Constructing Quarks: A Sociological History of Particle Physics , Edinburgh: Edinburgh University Press.
  • Popper, K.R., 1959, The Logic of Scientific Discovery , London: Routledge, 2002
  • –––, 1963, Conjectures and Refutations , London: Routledge, 2002.
  • –––, 1985, Unended Quest: An Intellectual Autobiography , La Salle: Open Court Publishing Co..
  • Rudner, R., 1953, “The Scientist Qua Scientist Making Value Judgments”, Philosophy of Science , 20(1): 1–6.
  • Rudolph, J.L., 2005, “Epistemology for the masses: The origin of ‘The Scientific Method’ in American Schools”, History of Education Quarterly , 45(3): 341–376
  • Schickore, J., 2008, “Doing science, writing science”, Philosophy of Science , 75: 323–343.
  • Schickore, J. and N. Hangel, 2019, “‘It might be this, it should be that…’ uncertainty and doubt in day-to-day science practice”, European Journal for Philosophy of Science , 9(2): 31. doi:10.1007/s13194-019-0253-9
  • Shamoo, A.E. and D.B. Resnik, 2009, Responsible Conduct of Research , Oxford: Oxford University Press.
  • Shank, J.B., 2008, The Newton Wars and the Beginning of the French Enlightenment , Chicago: The University of Chicago Press.
  • Shapin, S. and S. Schaffer, 1985, Leviathan and the air-pump , Princeton: Princeton University Press.
  • Smith, G.E., 2002, “The Methodology of the Principia”, in The Cambridge Companion to Newton , I.B. Cohen and G.E. Smith (eds.), Cambridge: Cambridge University Press, 138–173.
  • Snyder, L.J., 1997a, “Discoverers’ Induction”, Philosophy of Science , 64: 580–604.
  • –––, 1997b, “The Mill-Whewell Debate: Much Ado About Induction”, Perspectives on Science , 5: 159–198.
  • –––, 1999, “Renovating the Novum Organum: Bacon, Whewell and Induction”, Studies in History and Philosophy of Science , 30: 531–557.
  • Sober, E., 2008, Evidence and Evolution. The logic behind the science , Cambridge: Cambridge University Press
  • Sprenger, J. and S. Hartmann, 2019, Bayesian philosophy of science , Oxford: Oxford University Press.
  • Steinle, F., 1997, “Entering New Fields: Exploratory Uses of Experimentation”, Philosophy of Science (Proceedings), 64: S65–S74.
  • –––, 2002, “Experiments in History and Philosophy of Science”, Perspectives on Science , 10(4): 408–432.
  • Strasser, B.J., 2012, “Data-driven sciences: From wonder cabinets to electronic databases”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences , 43(1): 85–87.
  • Succi, S. and P.V. Coveney, 2018, “Big data: the end of the scientific method?”, Philosophical Transactions of the Royal Society A , 377: 20180145. doi:10.1098/rsta.2018.0145
  • Suppe, F., 1998, “The Structure of a Scientific Paper”, Philosophy of Science , 65(3): 381–405.
  • Swijtink, Z.G., 1987, “The objectification of observation: Measurement and statistical methods in the nineteenth century”, in The probabilistic revolution. Ideas in History, Vol. 1 , L. Kruger (ed.), Cambridge MA: MIT Press, pp. 261–285.
  • Waters, C.K., 2007, “The nature and context of exploratory experimentation: An introduction to three case studies of exploratory research”, History and Philosophy of the Life Sciences , 29(3): 275–284.
  • Weinberg, S., 1995, “The methods of science… and those by which we live”, Academic Questions , 8(2): 7–13.
  • Weissert, T., 1997, The Genesis of Simulation in Dynamics: Pursuing the Fermi-Pasta-Ulam Problem , New York: Springer Verlag.
  • William H., 1628, Exercitatio Anatomica de Motu Cordis et Sanguinis in Animalibus , in On the Motion of the Heart and Blood in Animals , R. Willis (trans.), Buffalo: Prometheus Books, 1993.
  • Winsberg, E., 2010, Science in the Age of Computer Simulation , Chicago: University of Chicago Press.
  • Wivagg, D. & D. Allchin, 2002, “The Dogma of the Scientific Method”, The American Biology Teacher , 64(9): 645–646
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Blackmun opinion , in Daubert v. Merrell Dow Pharmaceuticals (92–102), 509 U.S. 579 (1993).
  • Scientific Method at philpapers. Darrell Rowbottom (ed.).
  • Recent Articles | Scientific Method | The Scientist Magazine

al-Kindi | Albert the Great [= Albertus magnus] | Aquinas, Thomas | Arabic and Islamic Philosophy, disciplines in: natural philosophy and natural science | Arabic and Islamic Philosophy, historical and methodological topics in: Greek sources | Arabic and Islamic Philosophy, historical and methodological topics in: influence of Arabic and Islamic Philosophy on the Latin West | Aristotle | Bacon, Francis | Bacon, Roger | Berkeley, George | biology: experiment in | Boyle, Robert | Cambridge Platonists | confirmation | Descartes, René | Enlightenment | epistemology | epistemology: Bayesian | epistemology: social | Feyerabend, Paul | Galileo Galilei | Grosseteste, Robert | Hempel, Carl | Hume, David | Hume, David: Newtonianism and Anti-Newtonianism | induction: problem of | Kant, Immanuel | Kuhn, Thomas | Leibniz, Gottfried Wilhelm | Locke, John | Mill, John Stuart | More, Henry | Neurath, Otto | Newton, Isaac | Newton, Isaac: philosophy | Ockham [Occam], William | operationalism | Peirce, Charles Sanders | Plato | Popper, Karl | rationality: historicist theories of | Reichenbach, Hans | reproducibility, scientific | Schlick, Moritz | science: and pseudo-science | science: theory and observation in | science: unity of | scientific discovery | scientific knowledge: social dimensions of | simulations in science | skepticism: medieval | space and time: absolute and relational space and motion, post-Newtonian theories | Vienna Circle | Whewell, William | Zabarella, Giacomo

Copyright © 2021 by Brian Hepburn < brian . hepburn @ wichita . edu > Hanne Andersen < hanne . andersen @ ind . ku . dk >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null and Alternative Hypotheses | Definitions & Examples

Published on 5 October 2022 by Shaun Turney . Revised on 6 December 2022.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis (H 0 ): There’s no effect in the population .
  • Alternative hypothesis (H A ): There’s an effect in the population.

The effect is usually the effect of the independent variable on the dependent variable .

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, differences between null and alternative hypotheses, how to write null and alternative hypotheses, frequently asked questions about null and alternative hypotheses.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”, the null hypothesis (H 0 ) answers “No, there’s no effect in the population.” On the other hand, the alternative hypothesis (H A ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample.

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept. Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect”, “no difference”, or “no relationship”. When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis (H A ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect”, “a difference”, or “a relationship”. When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes > or <). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question
  • They both make claims about the population
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis (H 0 ): Independent variable does not affect dependent variable .
  • Alternative hypothesis (H A ): Independent variable affects dependent variable .

Test-specific

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2022, December 06). Null and Alternative Hypotheses | Definitions & Examples. Scribbr. Retrieved 31 May 2024, from https://www.scribbr.co.uk/stats/null-and-alternative-hypothesis/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, levels of measurement: nominal, ordinal, interval, ratio, the standard normal distribution | calculator, examples & uses, types of variables in research | definitions & examples.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Mechanics (Essentials) - Class 11th

Course: mechanics (essentials) - class 11th   >   unit 2.

  • Introduction to physics
  • What is physics?

The scientific method

  • Models and Approximations in Physics

scientific method hypothesis alternative

Introduction

  • Make an observation.
  • Ask a question.
  • Form a hypothesis , or testable explanation.
  • Make a prediction based on the hypothesis.
  • Test the prediction.
  • Iterate: use the results to make new hypotheses or predictions.

Scientific method example: Failure to toast

1. make an observation., 2. ask a question., 3. propose a hypothesis., 4. make predictions., 5. test the predictions..

  • If the toaster does toast, then the hypothesis is supported—likely correct.
  • If the toaster doesn't toast, then the hypothesis is not supported—likely wrong.

Logical possibility

Practical possibility, building a body of evidence, 6. iterate..

  • If the hypothesis was supported, we might do additional tests to confirm it, or revise it to be more specific. For instance, we might investigate why the outlet is broken.
  • If the hypothesis was not supported, we would come up with a new hypothesis. For instance, the next hypothesis might be that there's a broken wire in the toaster.

Want to join the conversation?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

7.2: Null and Alternative Hypotheses

  • Last updated
  • Save as PDF
  • Page ID 28124

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

\(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

\(H_a\): The alternative hypothesis: It is a claim about the population that is contradictory to \(H_0\) and what we conclude when we reject \(H_0\). This is usually what the researcher is trying to prove.

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are "reject \(H_0\)" if the sample information favors the alternative hypothesis or "do not reject \(H_0\)" or "decline to reject \(H_0\)" if the sample information is insufficient to reject the null hypothesis.

\(H_{0}\) always has a symbol with an equal in it. \(H_{a}\) never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example \(\PageIndex{1}\)

  • \(H_{0}\): No more than 30% of the registered voters in Santa Clara County voted in the primary election. \(p \leq 30\)
  • \(H_{a}\): More than 30% of the registered voters in Santa Clara County voted in the primary election. \(p > 30\)

Exercise \(\PageIndex{1}\)

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

  • \(H_{0}\): The drug reduces cholesterol by 25%. \(p = 0.25\)
  • \(H_{a}\): The drug does not reduce cholesterol by 25%. \(p \neq 0.25\)

Example \(\PageIndex{2}\)

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

  • \(H_{0}: \mu = 2.0\)
  • \(H_{a}: \mu \neq 2.0\)

Exercise \(\PageIndex{2}\)

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol \((=, \neq, \geq, <, \leq, >)\) for the null and alternative hypotheses.

  • \(H_{0}: \mu \_ 66\)
  • \(H_{a}: \mu \_ 66\)
  • \(H_{0}: \mu = 66\)
  • \(H_{a}: \mu \neq 66\)

Example \(\PageIndex{3}\)

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

  • \(H_{0}: \mu \geq 5\)
  • \(H_{a}: \mu < 5\)

Exercise \(\PageIndex{3}\)

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • \(H_{0}: \mu \_ 45\)
  • \(H_{a}: \mu \_ 45\)
  • \(H_{0}: \mu \geq 45\)
  • \(H_{a}: \mu < 45\)

Example \(\PageIndex{4}\)

In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.

  • \(H_{0}: p \leq 0.066\)
  • \(H_{a}: p > 0.066\)

Exercise \(\PageIndex{4}\)

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (\(=, \neq, \geq, <, \leq, >\)) for the null and alternative hypotheses.

  • \(H_{0}: p \_ 0.40\)
  • \(H_{a}: p \_ 0.40\)
  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p > 0.40\)

COLLABORATIVE EXERCISE

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we:

  • Evaluate the null hypothesis , typically denoted with \(H_{0}\). The null is not rejected unless the hypothesis test shows otherwise. The null statement must always contain some form of equality \((=, \leq \text{or} \geq)\)
  • Always write the alternative hypothesis , typically denoted with \(H_{a}\) or \(H_{1}\), using less than, greater than, or not equals symbols, i.e., \((\neq, >, \text{or} <)\).
  • If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis.
  • Never state that a claim is proven true or false. Keep in mind the underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only in terms of non-absolute certainties.

Formula Review

\(H_{0}\) and \(H_{a}\) are contradictory.

  • If \(\alpha \leq p\)-value, then do not reject \(H_{0}\).
  • If\(\alpha > p\)-value, then reject \(H_{0}\).

\(\alpha\) is preconceived. Its value is set before the hypothesis test starts. The \(p\)-value is calculated from the data.References

Data from the National Institute of Mental Health. Available online at http://www.nimh.nih.gov/publicat/depression.cfm .

What is a scientific hypothesis?

It's the initial building block in the scientific method.

A girl looks at plants in a test tube for a science experiment. What's her scientific hypothesis?

Hypothesis basics

What makes a hypothesis testable.

  • Types of hypotheses
  • Hypothesis versus theory

Additional resources

Bibliography.

A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research. 

The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).

A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.

A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .

Here are some examples of hypothesis statements:

  • If garlic repels fleas, then a dog that is given garlic every day will not get fleas.
  • If sugar causes cavities, then people who eat a lot of candy may be more prone to cavities.
  • If ultraviolet light can damage the eyes, then maybe this light can cause blindness.

A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."

An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.

Types of scientific hypotheses

Elementary-age students study alternative energy using homemade windmills during public school science class.

In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .

For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."

If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (​​BCcampus, 2015). 

There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.

Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley . 

A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.

Scientific theory vs. scientific hypothesis

The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.

"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts." 

  • Read more about writing a hypothesis, from the American Medical Writers Association.
  • Find out why a hypothesis isn't always necessary in science, from The American Biology Teacher.
  • Learn about null and alternative hypotheses, from Prof. Essa on YouTube .

Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis

Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.

California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm  

Karl Popper, "Conjectures and Refutations," Routledge, 1963.

Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.‌

University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf  

William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/  

University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf  

University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Iceland volcano eruption throws spectacular 160-foot-high wall of lava toward Grindavík

Earth from space: Ethereal algal vortex blooms at the heart of massive Baltic 'dead zone'

James Webb telescope discovers the 2 earliest galaxies in the known universe — and 1 is shockingly big

Most Popular

  • 2 Alaska's rivers are turning bright orange and as acidic as vinegar as toxic metal escapes from melting permafrost
  • 3 32 stunning photos of auroras seen from space
  • 4 Things are finally looking up for the Voyager 1 interstellar spacecraft
  • 5 Reaching absolute zero for quantum computing now much quicker thanks to breakthrough refrigerator design
  • 2 32 optical illusions and why they trick your brain
  • 3 Secrets of radioactive 'promethium' — a rare earth element with mysterious applications — uncovered after 80-year search
  • 4 Auroras could paint Earth's skies again in early June. Here are the key nights to watch for.
  • 5 Ramesses II's sarcophagus finally identified thanks to overlooked hieroglyphics

scientific method hypothesis alternative

What Is a Hypothesis? (Science)

If...,Then...

Angela Lumsden/Getty Images

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

A hypothesis (plural hypotheses) is a proposed explanation for an observation. The definition depends on the subject.

In science, a hypothesis is part of the scientific method. It is a prediction or explanation that is tested by an experiment. Observations and experiments may disprove a scientific hypothesis, but can never entirely prove one.

In the study of logic, a hypothesis is an if-then proposition, typically written in the form, "If X , then Y ."

In common usage, a hypothesis is simply a proposed explanation or prediction, which may or may not be tested.

Writing a Hypothesis

Most scientific hypotheses are proposed in the if-then format because it's easy to design an experiment to see whether or not a cause and effect relationship exists between the independent variable and the dependent variable . The hypothesis is written as a prediction of the outcome of the experiment.

Null Hypothesis and Alternative Hypothesis

Statistically, it's easier to show there is no relationship between two variables than to support their connection. So, scientists often propose the null hypothesis . The null hypothesis assumes changing the independent variable will have no effect on the dependent variable.

In contrast, the alternative hypothesis suggests changing the independent variable will have an effect on the dependent variable. Designing an experiment to test this hypothesis can be trickier because there are many ways to state an alternative hypothesis.

For example, consider a possible relationship between getting a good night's sleep and getting good grades. The null hypothesis might be stated: "The number of hours of sleep students get is unrelated to their grades" or "There is no correlation between hours of sleep and grades."

An experiment to test this hypothesis might involve collecting data, recording average hours of sleep for each student and grades. If a student who gets eight hours of sleep generally does better than students who get four hours of sleep or 10 hours of sleep, the hypothesis might be rejected.

But the alternative hypothesis is harder to propose and test. The most general statement would be: "The amount of sleep students get affects their grades." The hypothesis might also be stated as "If you get more sleep, your grades will improve" or "Students who get nine hours of sleep have better grades than those who get more or less sleep."

In an experiment, you can collect the same data, but the statistical analysis is less likely to give you a high confidence limit.

Usually, a scientist starts out with the null hypothesis. From there, it may be possible to propose and test an alternative hypothesis, to narrow down the relationship between the variables.

Example of a Hypothesis

Examples of a hypothesis include:

  • If you drop a rock and a feather, (then) they will fall at the same rate.
  • Plants need sunlight in order to live. (if sunlight, then life)
  • Eating sugar gives you energy. (if sugar, then energy)
  • White, Jay D.  Research in Public Administration . Conn., 1998.
  • Schick, Theodore, and Lewis Vaughn.  How to Think about Weird Things: Critical Thinking for a New Age . McGraw-Hill Higher Education, 2002.
  • Null Hypothesis Examples
  • Examples of Independent and Dependent Variables
  • Difference Between Independent and Dependent Variables
  • Definition of a Hypothesis
  • Null Hypothesis Definition and Examples
  • What Are the Elements of a Good Hypothesis?
  • Six Steps of the Scientific Method
  • What Are Examples of a Hypothesis?
  • Independent Variable Definition and Examples
  • Understanding Simple vs Controlled Experiments
  • Scientific Method Flow Chart
  • What Is a Testable Hypothesis?
  • Scientific Method Vocabulary Terms
  • What 'Fail to Reject' Means in a Hypothesis Test
  • How To Design a Science Fair Experiment
  • What Is an Experiment? Definition and Design

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Hypothesis Examples

Hypothesis Examples

A hypothesis is a prediction of the outcome of a test. It forms the basis for designing an experiment in the scientific method . A good hypothesis is testable, meaning it makes a prediction you can check with observation or experimentation. Here are different hypothesis examples.

Null Hypothesis Examples

The null hypothesis (H 0 ) is also known as the zero-difference or no-difference hypothesis. It predicts that changing one variable ( independent variable ) will have no effect on the variable being measured ( dependent variable ). Here are null hypothesis examples:

  • Plant growth is unaffected by temperature.
  • If you increase temperature, then solubility of salt will increase.
  • Incidence of skin cancer is unrelated to ultraviolet light exposure.
  • All brands of light bulb last equally long.
  • Cats have no preference for the color of cat food.
  • All daisies have the same number of petals.

Sometimes the null hypothesis shows there is a suspected correlation between two variables. For example, if you think plant growth is affected by temperature, you state the null hypothesis: “Plant growth is not affected by temperature.” Why do you do this, rather than say “If you change temperature, plant growth will be affected”? The answer is because it’s easier applying a statistical test that shows, with a high level of confidence, a null hypothesis is correct or incorrect.

Research Hypothesis Examples

A research hypothesis (H 1 ) is a type of hypothesis used to design an experiment. This type of hypothesis is often written as an if-then statement because it’s easy identifying the independent and dependent variables and seeing how one affects the other. If-then statements explore cause and effect. In other cases, the hypothesis shows a correlation between two variables. Here are some research hypothesis examples:

  • If you leave the lights on, then it takes longer for people to fall asleep.
  • If you refrigerate apples, they last longer before going bad.
  • If you keep the curtains closed, then you need less electricity to heat or cool the house (the electric bill is lower).
  • If you leave a bucket of water uncovered, then it evaporates more quickly.
  • Goldfish lose their color if they are not exposed to light.
  • Workers who take vacations are more productive than those who never take time off.

Is It Okay to Disprove a Hypothesis?

Yes! You may even choose to write your hypothesis in such a way that it can be disproved because it’s easier to prove a statement is wrong than to prove it is right. In other cases, if your prediction is incorrect, that doesn’t mean the science is bad. Revising a hypothesis is common. It demonstrates you learned something you did not know before you conducted the experiment.

Test yourself with a Scientific Method Quiz .

  • Mellenbergh, G.J. (2008). Chapter 8: Research designs: Testing of research hypotheses. In H.J. Adèr & G.J. Mellenbergh (eds.), Advising on Research Methods: A Consultant’s Companion . Huizen, The Netherlands: Johannes van Kessel Publishing.
  • Popper, Karl R. (1959). The Logic of Scientific Discovery . Hutchinson & Co. ISBN 3-1614-8410-X.
  • Schick, Theodore; Vaughn, Lewis (2002). How to think about weird things: critical thinking for a New Age . Boston: McGraw-Hill Higher Education. ISBN 0-7674-2048-9.
  • Tobi, Hilde; Kampen, Jarl K. (2018). “Research design: the methodology for interdisciplinary research framework”. Quality & Quantity . 52 (3): 1209–1225. doi: 10.1007/s11135-017-0513-8

Related Posts

  • Privacy Policy

Research Method

Home » What is a Hypothesis – Types, Examples and Writing Guide

What is a Hypothesis – Types, Examples and Writing Guide

Table of Contents

What is a Hypothesis

Definition:

Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation.

Hypothesis is often used in scientific research to guide the design of experiments and the collection and analysis of data. It is an essential element of the scientific method, as it allows researchers to make predictions about the outcome of their experiments and to test those predictions to determine their accuracy.

Types of Hypothesis

Types of Hypothesis are as follows:

Research Hypothesis

A research hypothesis is a statement that predicts a relationship between variables. It is usually formulated as a specific statement that can be tested through research, and it is often used in scientific research to guide the design of experiments.

Null Hypothesis

The null hypothesis is a statement that assumes there is no significant difference or relationship between variables. It is often used as a starting point for testing the research hypothesis, and if the results of the study reject the null hypothesis, it suggests that there is a significant difference or relationship between variables.

Alternative Hypothesis

An alternative hypothesis is a statement that assumes there is a significant difference or relationship between variables. It is often used as an alternative to the null hypothesis and is tested against the null hypothesis to determine which statement is more accurate.

Directional Hypothesis

A directional hypothesis is a statement that predicts the direction of the relationship between variables. For example, a researcher might predict that increasing the amount of exercise will result in a decrease in body weight.

Non-directional Hypothesis

A non-directional hypothesis is a statement that predicts the relationship between variables but does not specify the direction. For example, a researcher might predict that there is a relationship between the amount of exercise and body weight, but they do not specify whether increasing or decreasing exercise will affect body weight.

Statistical Hypothesis

A statistical hypothesis is a statement that assumes a particular statistical model or distribution for the data. It is often used in statistical analysis to test the significance of a particular result.

Composite Hypothesis

A composite hypothesis is a statement that assumes more than one condition or outcome. It can be divided into several sub-hypotheses, each of which represents a different possible outcome.

Empirical Hypothesis

An empirical hypothesis is a statement that is based on observed phenomena or data. It is often used in scientific research to develop theories or models that explain the observed phenomena.

Simple Hypothesis

A simple hypothesis is a statement that assumes only one outcome or condition. It is often used in scientific research to test a single variable or factor.

Complex Hypothesis

A complex hypothesis is a statement that assumes multiple outcomes or conditions. It is often used in scientific research to test the effects of multiple variables or factors on a particular outcome.

Applications of Hypothesis

Hypotheses are used in various fields to guide research and make predictions about the outcomes of experiments or observations. Here are some examples of how hypotheses are applied in different fields:

  • Science : In scientific research, hypotheses are used to test the validity of theories and models that explain natural phenomena. For example, a hypothesis might be formulated to test the effects of a particular variable on a natural system, such as the effects of climate change on an ecosystem.
  • Medicine : In medical research, hypotheses are used to test the effectiveness of treatments and therapies for specific conditions. For example, a hypothesis might be formulated to test the effects of a new drug on a particular disease.
  • Psychology : In psychology, hypotheses are used to test theories and models of human behavior and cognition. For example, a hypothesis might be formulated to test the effects of a particular stimulus on the brain or behavior.
  • Sociology : In sociology, hypotheses are used to test theories and models of social phenomena, such as the effects of social structures or institutions on human behavior. For example, a hypothesis might be formulated to test the effects of income inequality on crime rates.
  • Business : In business research, hypotheses are used to test the validity of theories and models that explain business phenomena, such as consumer behavior or market trends. For example, a hypothesis might be formulated to test the effects of a new marketing campaign on consumer buying behavior.
  • Engineering : In engineering, hypotheses are used to test the effectiveness of new technologies or designs. For example, a hypothesis might be formulated to test the efficiency of a new solar panel design.

How to write a Hypothesis

Here are the steps to follow when writing a hypothesis:

Identify the Research Question

The first step is to identify the research question that you want to answer through your study. This question should be clear, specific, and focused. It should be something that can be investigated empirically and that has some relevance or significance in the field.

Conduct a Literature Review

Before writing your hypothesis, it’s essential to conduct a thorough literature review to understand what is already known about the topic. This will help you to identify the research gap and formulate a hypothesis that builds on existing knowledge.

Determine the Variables

The next step is to identify the variables involved in the research question. A variable is any characteristic or factor that can vary or change. There are two types of variables: independent and dependent. The independent variable is the one that is manipulated or changed by the researcher, while the dependent variable is the one that is measured or observed as a result of the independent variable.

Formulate the Hypothesis

Based on the research question and the variables involved, you can now formulate your hypothesis. A hypothesis should be a clear and concise statement that predicts the relationship between the variables. It should be testable through empirical research and based on existing theory or evidence.

Write the Null Hypothesis

The null hypothesis is the opposite of the alternative hypothesis, which is the hypothesis that you are testing. The null hypothesis states that there is no significant difference or relationship between the variables. It is important to write the null hypothesis because it allows you to compare your results with what would be expected by chance.

Refine the Hypothesis

After formulating the hypothesis, it’s important to refine it and make it more precise. This may involve clarifying the variables, specifying the direction of the relationship, or making the hypothesis more testable.

Examples of Hypothesis

Here are a few examples of hypotheses in different fields:

  • Psychology : “Increased exposure to violent video games leads to increased aggressive behavior in adolescents.”
  • Biology : “Higher levels of carbon dioxide in the atmosphere will lead to increased plant growth.”
  • Sociology : “Individuals who grow up in households with higher socioeconomic status will have higher levels of education and income as adults.”
  • Education : “Implementing a new teaching method will result in higher student achievement scores.”
  • Marketing : “Customers who receive a personalized email will be more likely to make a purchase than those who receive a generic email.”
  • Physics : “An increase in temperature will cause an increase in the volume of a gas, assuming all other variables remain constant.”
  • Medicine : “Consuming a diet high in saturated fats will increase the risk of developing heart disease.”

Purpose of Hypothesis

The purpose of a hypothesis is to provide a testable explanation for an observed phenomenon or a prediction of a future outcome based on existing knowledge or theories. A hypothesis is an essential part of the scientific method and helps to guide the research process by providing a clear focus for investigation. It enables scientists to design experiments or studies to gather evidence and data that can support or refute the proposed explanation or prediction.

The formulation of a hypothesis is based on existing knowledge, observations, and theories, and it should be specific, testable, and falsifiable. A specific hypothesis helps to define the research question, which is important in the research process as it guides the selection of an appropriate research design and methodology. Testability of the hypothesis means that it can be proven or disproven through empirical data collection and analysis. Falsifiability means that the hypothesis should be formulated in such a way that it can be proven wrong if it is incorrect.

In addition to guiding the research process, the testing of hypotheses can lead to new discoveries and advancements in scientific knowledge. When a hypothesis is supported by the data, it can be used to develop new theories or models to explain the observed phenomenon. When a hypothesis is not supported by the data, it can help to refine existing theories or prompt the development of new hypotheses to explain the phenomenon.

When to use Hypothesis

Here are some common situations in which hypotheses are used:

  • In scientific research , hypotheses are used to guide the design of experiments and to help researchers make predictions about the outcomes of those experiments.
  • In social science research , hypotheses are used to test theories about human behavior, social relationships, and other phenomena.
  • I n business , hypotheses can be used to guide decisions about marketing, product development, and other areas. For example, a hypothesis might be that a new product will sell well in a particular market, and this hypothesis can be tested through market research.

Characteristics of Hypothesis

Here are some common characteristics of a hypothesis:

  • Testable : A hypothesis must be able to be tested through observation or experimentation. This means that it must be possible to collect data that will either support or refute the hypothesis.
  • Falsifiable : A hypothesis must be able to be proven false if it is not supported by the data. If a hypothesis cannot be falsified, then it is not a scientific hypothesis.
  • Clear and concise : A hypothesis should be stated in a clear and concise manner so that it can be easily understood and tested.
  • Based on existing knowledge : A hypothesis should be based on existing knowledge and research in the field. It should not be based on personal beliefs or opinions.
  • Specific : A hypothesis should be specific in terms of the variables being tested and the predicted outcome. This will help to ensure that the research is focused and well-designed.
  • Tentative: A hypothesis is a tentative statement or assumption that requires further testing and evidence to be confirmed or refuted. It is not a final conclusion or assertion.
  • Relevant : A hypothesis should be relevant to the research question or problem being studied. It should address a gap in knowledge or provide a new perspective on the issue.

Advantages of Hypothesis

Hypotheses have several advantages in scientific research and experimentation:

  • Guides research: A hypothesis provides a clear and specific direction for research. It helps to focus the research question, select appropriate methods and variables, and interpret the results.
  • Predictive powe r: A hypothesis makes predictions about the outcome of research, which can be tested through experimentation. This allows researchers to evaluate the validity of the hypothesis and make new discoveries.
  • Facilitates communication: A hypothesis provides a common language and framework for scientists to communicate with one another about their research. This helps to facilitate the exchange of ideas and promotes collaboration.
  • Efficient use of resources: A hypothesis helps researchers to use their time, resources, and funding efficiently by directing them towards specific research questions and methods that are most likely to yield results.
  • Provides a basis for further research: A hypothesis that is supported by data provides a basis for further research and exploration. It can lead to new hypotheses, theories, and discoveries.
  • Increases objectivity: A hypothesis can help to increase objectivity in research by providing a clear and specific framework for testing and interpreting results. This can reduce bias and increase the reliability of research findings.

Limitations of Hypothesis

Some Limitations of the Hypothesis are as follows:

  • Limited to observable phenomena: Hypotheses are limited to observable phenomena and cannot account for unobservable or intangible factors. This means that some research questions may not be amenable to hypothesis testing.
  • May be inaccurate or incomplete: Hypotheses are based on existing knowledge and research, which may be incomplete or inaccurate. This can lead to flawed hypotheses and erroneous conclusions.
  • May be biased: Hypotheses may be biased by the researcher’s own beliefs, values, or assumptions. This can lead to selective interpretation of data and a lack of objectivity in research.
  • Cannot prove causation: A hypothesis can only show a correlation between variables, but it cannot prove causation. This requires further experimentation and analysis.
  • Limited to specific contexts: Hypotheses are limited to specific contexts and may not be generalizable to other situations or populations. This means that results may not be applicable in other contexts or may require further testing.
  • May be affected by chance : Hypotheses may be affected by chance or random variation, which can obscure or distort the true relationship between variables.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Evaluating Research

Evaluating Research – Process, Examples and...

What Are The Steps Of The Scientific Method?

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Science is not just knowledge. It is also a method for obtaining knowledge. Scientific understanding is organized into theories.

The scientific method is a step-by-step process used by researchers and scientists to determine if there is a relationship between two or more variables. Psychologists use this method to conduct psychological research, gather data, process information, and describe behaviors.

It involves careful observation, asking questions, formulating hypotheses, experimental testing, and refining hypotheses based on experimental findings.

How it is Used

The scientific method can be applied broadly in science across many different fields, such as chemistry, physics, geology, and psychology. In a typical application of this process, a researcher will develop a hypothesis, test this hypothesis, and then modify the hypothesis based on the outcomes of the experiment.

The process is then repeated with the modified hypothesis until the results align with the observed phenomena. Detailed steps of the scientific method are described below.

Keep in mind that the scientific method does not have to follow this fixed sequence of steps; rather, these steps represent a set of general principles or guidelines.

7 Steps of the Scientific Method

Psychology uses an empirical approach.

Empiricism (founded by John Locke) states that the only source of knowledge comes through our senses – e.g., sight, hearing, touch, etc.

Empirical evidence does not rely on argument or belief. Thus, empiricism is the view that all knowledge is based on or may come from direct observation and experience.

The empiricist approach of gaining knowledge through experience quickly became the scientific approach and greatly influenced the development of physics and chemistry in the 17th and 18th centuries.

Steps of the Scientific Method

Step 1: Make an Observation (Theory Construction)

Every researcher starts at the very beginning. Before diving in and exploring something, one must first determine what they will study – it seems simple enough!

By making observations, researchers can establish an area of interest. Once this topic of study has been chosen, a researcher should review existing literature to gain insight into what has already been tested and determine what questions remain unanswered.

This assessment will provide helpful information about what has already been comprehended about the specific topic and what questions remain, and if one can go and answer them.

Specifically, a literature review might implicate examining a substantial amount of documented material from academic journals to books dating back decades. The most appropriate information gathered by the researcher will be shown in the introduction section or abstract of the published study results.

The background material and knowledge will help the researcher with the first significant step in conducting a psychology study, which is formulating a research question.

This is the inductive phase of the scientific process. Observations yield information that is used to formulate theories as explanations. A theory is a well-developed set of ideas that propose an explanation for observed phenomena.

Inductive reasoning moves from specific premises to a general conclusion. It starts with observations of phenomena in the natural world and derives a general law.

Step 2: Ask a Question

Once a researcher has made observations and conducted background research, the next step is to ask a scientific question. A scientific question must be defined, testable, and measurable.

A useful approach to develop a scientific question is: “What is the effect of…?” or “How does X affect Y?”

To answer an experimental question, a researcher must identify two variables: the independent and dependent variables.

The independent variable is the variable manipulated (the cause), and the dependent variable is the variable being measured (the effect).

An example of a research question could be, “Is handwriting or typing more effective for retaining information?” Answering the research question and proposing a relationship between the two variables is discussed in the next step.

Step 3: Form a Hypothesis (Make Predictions)

A hypothesis is an educated guess about the relationship between two or more variables. A hypothesis is an attempt to answer your research question based on prior observation and background research. Theories tend to be too complex to be tested all at once; instead, researchers create hypotheses to test specific aspects of a theory.

For example, a researcher might ask about the connection between sleep and educational performance. Do students who get less sleep perform worse on tests at school?

It is crucial to think about different questions one might have about a particular topic to formulate a reasonable hypothesis. It would help if one also considered how one could investigate the causalities.

It is important that the hypothesis is both testable against reality and falsifiable. This means that it can be tested through an experiment and can be proven wrong.

The falsification principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory to be considered scientific, it must be able to be tested and conceivably proven false.

To test a hypothesis, we first assume that there is no difference between the populations from which the samples were taken. This is known as the null hypothesis and predicts that the independent variable will not influence the dependent variable.

Examples of “if…then…” Hypotheses:

  • If one gets less than 6 hours of sleep, then one will do worse on tests than if one obtains more rest.
  • If one drinks lots of water before going to bed, one will have to use the bathroom often at night.
  • If one practices exercising and lighting weights, then one’s body will begin to build muscle.

The research hypothesis is often called the alternative hypothesis and predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and that they are significant in terms of supporting the theory being investigated.

Although one could state and write a scientific hypothesis in many ways, hypotheses are usually built like “if…then…” statements.

Step 4: Run an Experiment (Gather Data)

The next step in the scientific method is to test your hypothesis and collect data. A researcher will design an experiment to test the hypothesis and gather data that will either support or refute the hypothesis.

The exact research methods used to examine a hypothesis depend on what is being studied. A psychologist might utilize two primary forms of research, experimental research, and descriptive research.

The scientific method is objective in that researchers do not let preconceived ideas or biases influence the collection of data and is systematic in that experiments are conducted in a logical way.

Experimental Research

Experimental research is used to investigate cause-and-effect associations between two or more variables. This type of research systematically controls an independent variable and measures its effect on a specified dependent variable.

Experimental research involves manipulating an independent variable and measuring the effect(s) on the dependent variable. Repeating the experiment multiple times is important to confirm that your results are accurate and consistent.

One of the significant advantages of this method is that it permits researchers to determine if changes in one variable cause shifts in each other.

While experiments in psychology typically have many moving parts (and can be relatively complex), an easy investigation is rather fundamental. Still, it does allow researchers to specify cause-and-effect associations between variables.

Most simple experiments use a control group, which involves those who do not receive the treatment, and an experimental group, which involves those who do receive the treatment.

An example of experimental research would be when a pharmaceutical company wants to test a new drug. They give one group a placebo (control group) and the other the actual pill (experimental group).

Descriptive Research

Descriptive research is generally used when it is challenging or even impossible to control the variables in question. Examples of descriptive analysis include naturalistic observation, case studies , and correlation studies .

One example of descriptive research includes phone surveys that marketers often use. While they typically do not allow researchers to identify cause and effect, correlational studies are quite common in psychology research. They make it possible to spot associations between distinct variables and measure the solidity of those relationships.

Step 5: Analyze the Data and Draw Conclusions

Once a researcher has designed and done the investigation and collected sufficient data, it is time to inspect this gathered information and judge what has been found. Researchers can summarize the data, interpret the results, and draw conclusions based on this evidence using analyses and statistics.

Upon completion of the experiment, you can collect your measurements and analyze the data using statistics. Based on the outcomes, you will either reject or confirm your hypothesis.

Analyze the Data

So, how does a researcher determine what the results of their study mean? Statistical analysis can either support or refute a researcher’s hypothesis and can also be used to determine if the conclusions are statistically significant.

When outcomes are said to be “statistically significant,” it is improbable that these results are due to luck or chance. Based on these observations, investigators must then determine what the results mean.

An experiment will support a hypothesis in some circumstances, but sometimes it fails to be truthful in other cases.

What occurs if the developments of a psychology investigation do not endorse the researcher’s hypothesis? It does mean that the study was worthless. Simply because the findings fail to defend the researcher’s hypothesis does not mean that the examination is not helpful or instructive.

This kind of research plays a vital role in supporting scientists in developing unexplored questions and hypotheses to investigate in the future. After decisions have been made, the next step is to communicate the results with the rest of the scientific community.

This is an integral part of the process because it contributes to the general knowledge base and can assist other scientists in finding new research routes to explore.

If the hypothesis is not supported, a researcher should acknowledge the experiment’s results, formulate a new hypothesis, and develop a new experiment.

We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist that could refute a theory.

Draw Conclusions and Interpret the Data

When the empirical observations disagree with the hypothesis, a number of possibilities must be considered. It might be that the theory is incorrect, in which case it needs altering, so it fully explains the data.

Alternatively, it might be that the hypothesis was poorly derived from the original theory, in which case the scientists were expecting the wrong thing to happen.

It might also be that the research was poorly conducted, or used an inappropriate method, or there were factors in play that the researchers did not consider. This will begin the process of the scientific method again.

If the hypothesis is supported, the researcher can find more evidence to support their hypothesis or look for counter-evidence to strengthen their hypothesis further.

In either scenario, the researcher should share their results with the greater scientific community.

Step 6: Share Your Results

One of the final stages of the research cycle involves the publication of the research. Once the report is written, the researcher(s) may submit the work for publication in an appropriate journal.

Usually, this is done by writing up a study description and publishing the article in a professional or academic journal. The studies and conclusions of psychological work can be seen in peer-reviewed journals such as  Developmental Psychology , Psychological Bulletin, the  Journal of Social Psychology, and numerous others.

Scientists should report their findings by writing up a description of their study and any subsequent findings. This enables other researchers to build upon the present research or replicate the results.

As outlined by the American Psychological Association (APA), there is a typical structure of a journal article that follows a specified format. In these articles, researchers:

  • Supply a brief narrative and background on previous research
  • Give their hypothesis
  • Specify who participated in the study and how they were chosen
  • Provide operational definitions for each variable
  • Explain the measures and methods used to collect data
  • Describe how the data collected was interpreted
  • Discuss what the outcomes mean

A detailed record of psychological studies and all scientific studies is vital to clearly explain the steps and procedures used throughout the study. So that other researchers can try this experiment too and replicate the results.

The editorial process utilized by academic and professional journals guarantees that each submitted article undergoes a thorough peer review to help assure that the study is scientifically sound. Once published, the investigation becomes another piece of the current puzzle of our knowledge “base” on that subject.

This last step is important because all results, whether they supported or did not support the hypothesis, can contribute to the scientific community. Publication of empirical observations leads to more ideas that are tested against the real world, and so on. In this sense, the scientific process is circular.

The editorial process utilized by academic and professional journals guarantees that each submitted article undergoes a thorough peer review to help assure that the study is scientifically sound.

Once published, the investigation becomes another piece of the current puzzle of our knowledge “base” on that subject.

By replicating studies, psychologists can reduce errors, validate theories, and gain a stronger understanding of a particular topic.

Step 7: Repeat the Scientific Method (Iteration)

Now, if one’s hypothesis turns out to be accurate, find more evidence or find counter-evidence. If one’s hypothesis is false, create a new hypothesis or try again.

One may wish to revise their first hypothesis to make a more niche experiment to design or a different specific question to test.

The amazingness of the scientific method is that it is a comprehensive and straightforward process that scientists, and everyone, can utilize over and over again.

So, draw conclusions and repeat because the scientific method is never-ending, and no result is ever considered perfect.

The scientific method is a process of:

  • Making an observation.
  • Forming a hypothesis.
  • Making a prediction.
  • Experimenting to test the hypothesis.

The procedure of repeating the scientific method is crucial to science and all fields of human knowledge.

Further Information

  • Karl Popper – Falsification
  • Thomas – Kuhn Paradigm Shift
  • Positivism in Sociology: Definition, Theory & Examples
  • Is Psychology a Science?
  • Psychology as a Science (PDF)

List the 6 steps of the scientific methods in order

  • Make an observation (theory construction)
  • Ask a question. A scientific question must be defined, testable, and measurable.
  • Form a hypothesis (make predictions)
  • Run an experiment to test the hypothesis (gather data)
  • Analyze the data and draw conclusions
  • Share your results so that other researchers can make new hypotheses

What is the first step of the scientific method?

The first step of the scientific method is making an observation. This involves noticing and describing a phenomenon or group of phenomena that one finds interesting and wishes to explain.

Observations can occur in a natural setting or within the confines of a laboratory. The key point is that the observation provides the initial question or problem that the rest of the scientific method seeks to answer or solve.

What is the scientific method?

The scientific method is a step-by-step process that investigators can follow to determine if there is a causal connection between two or more variables.

Psychologists and other scientists regularly suggest motivations for human behavior. On a more casual level, people judge other people’s intentions, incentives, and actions daily.

While our standard assessments of human behavior are subjective and anecdotal, researchers use the scientific method to study psychology objectively and systematically.

All utilize a scientific method to study distinct aspects of people’s thinking and behavior. This process allows scientists to analyze and understand various psychological phenomena, but it also provides investigators and others a way to disseminate and debate the results of their studies.

The outcomes of these studies are often noted in popular media, which leads numerous to think about how or why researchers came to the findings they did.

Why Use the Six Steps of the Scientific Method

The goal of scientists is to understand better the world that surrounds us. Scientific research is the most critical tool for navigating and learning about our complex world.

Without it, we would be compelled to rely solely on intuition, other people’s power, and luck. We can eliminate our preconceived concepts and superstitions through methodical scientific research and gain an objective sense of ourselves and our world.

All psychological studies aim to explain, predict, and even control or impact mental behaviors or processes. So, psychologists use and repeat the scientific method (and its six steps) to perform and record essential psychological research.

So, psychologists focus on understanding behavior and the cognitive (mental) and physiological (body) processes underlying behavior.

In the real world, people use to understand the behavior of others, such as intuition and personal experience. The hallmark of scientific research is evidence to support a claim.

Scientific knowledge is empirical, meaning it is grounded in objective, tangible evidence that can be observed repeatedly, regardless of who is watching.

The scientific method is crucial because it minimizes the impact of bias or prejudice on the experimenter. Regardless of how hard one tries, even the best-intentioned scientists can’t escape discrimination. can’t

It stems from personal opinions and cultural beliefs, meaning any mortal filters data based on one’s experience. Sadly, this “filtering” process can cause a scientist to favor one outcome over another.

For an everyday person trying to solve a minor issue at home or work, succumbing to these biases is not such a big deal; in fact, most times, it is important.

But in the scientific community, where results must be inspected and reproduced, bias or discrimination must be avoided.

When to Use the Six Steps of the Scientific Method ?

One can use the scientific method anytime, anywhere! From the smallest conundrum to solving global problems, it is a process that can be applied to any science and any investigation.

Even if you are not considered a “scientist,” you will be surprised to know that people of all disciplines use it for all kinds of dilemmas.

Try to catch yourself next time you come by a question and see how you subconsciously or consciously use the scientific method.

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

What is the Scientific Method: How does it work and why is it important?

The scientific method is a systematic process involving steps like defining questions, forming hypotheses, conducting experiments, and analyzing data. It minimizes biases and enables replicable research, leading to groundbreaking discoveries like Einstein's theory of relativity, penicillin, and the structure of DNA. This ongoing approach promotes reason, evidence, and the pursuit of truth in science.

Updated on November 18, 2023

What is the Scientific Method: How does it work and why is it important?

Beginning in elementary school, we are exposed to the scientific method and taught how to put it into practice. As a tool for learning, it prepares children to think logically and use reasoning when seeking answers to questions.

Rather than jumping to conclusions, the scientific method gives us a recipe for exploring the world through observation and trial and error. We use it regularly, sometimes knowingly in academics or research, and sometimes subconsciously in our daily lives.

In this article we will refresh our memories on the particulars of the scientific method, discussing where it comes from, which elements comprise it, and how it is put into practice. Then, we will consider the importance of the scientific method, who uses it and under what circumstances.

What is the scientific method?

The scientific method is a dynamic process that involves objectively investigating questions through observation and experimentation . Applicable to all scientific disciplines, this systematic approach to answering questions is more accurately described as a flexible set of principles than as a fixed series of steps.

The following representations of the scientific method illustrate how it can be both condensed into broad categories and also expanded to reveal more and more details of the process. These graphics capture the adaptability that makes this concept universally valuable as it is relevant and accessible not only across age groups and educational levels but also within various contexts.

a graph of the scientific method

Steps in the scientific method

While the scientific method is versatile in form and function, it encompasses a collection of principles that create a logical progression to the process of problem solving:

  • Define a question : Constructing a clear and precise problem statement that identifies the main question or goal of the investigation is the first step. The wording must lend itself to experimentation by posing a question that is both testable and measurable.
  • Gather information and resources : Researching the topic in question to find out what is already known and what types of related questions others are asking is the next step in this process. This background information is vital to gaining a full understanding of the subject and in determining the best design for experiments. 
  • Form a hypothesis : Composing a concise statement that identifies specific variables and potential results, which can then be tested, is a crucial step that must be completed before any experimentation. An imperfection in the composition of a hypothesis can result in weaknesses to the entire design of an experiment.
  • Perform the experiments : Testing the hypothesis by performing replicable experiments and collecting resultant data is another fundamental step of the scientific method. By controlling some elements of an experiment while purposely manipulating others, cause and effect relationships are established.
  • Analyze the data : Interpreting the experimental process and results by recognizing trends in the data is a necessary step for comprehending its meaning and supporting the conclusions. Drawing inferences through this systematic process lends substantive evidence for either supporting or rejecting the hypothesis.
  • Report the results : Sharing the outcomes of an experiment, through an essay, presentation, graphic, or journal article, is often regarded as a final step in this process. Detailing the project's design, methods, and results not only promotes transparency and replicability but also adds to the body of knowledge for future research.
  • Retest the hypothesis : Repeating experiments to see if a hypothesis holds up in all cases is a step that is manifested through varying scenarios. Sometimes a researcher immediately checks their own work or replicates it at a future time, or another researcher will repeat the experiments to further test the hypothesis.

a chart of the scientific method

Where did the scientific method come from?

Oftentimes, ancient peoples attempted to answer questions about the unknown by:

  • Making simple observations
  • Discussing the possibilities with others deemed worthy of a debate
  • Drawing conclusions based on dominant opinions and preexisting beliefs

For example, take Greek and Roman mythology. Myths were used to explain everything from the seasons and stars to the sun and death itself.

However, as societies began to grow through advancements in agriculture and language, ancient civilizations like Egypt and Babylonia shifted to a more rational analysis for understanding the natural world. They increasingly employed empirical methods of observation and experimentation that would one day evolve into the scientific method . 

In the 4th century, Aristotle, considered the Father of Science by many, suggested these elements , which closely resemble the contemporary scientific method, as part of his approach for conducting science:

  • Study what others have written about the subject.
  • Look for the general consensus about the subject.
  • Perform a systematic study of everything even partially related to the topic.

a pyramid of the scientific method

By continuing to emphasize systematic observation and controlled experiments, scholars such as Al-Kindi and Ibn al-Haytham helped expand this concept throughout the Islamic Golden Age . 

In his 1620 treatise, Novum Organum , Sir Francis Bacon codified the scientific method, arguing not only that hypotheses must be tested through experiments but also that the results must be replicated to establish a truth. Coming at the height of the Scientific Revolution, this text made the scientific method accessible to European thinkers like Galileo and Isaac Newton who then put the method into practice.

As science modernized in the 19th century, the scientific method became more formalized, leading to significant breakthroughs in fields such as evolution and germ theory. Today, it continues to evolve, underpinning scientific progress in diverse areas like quantum mechanics, genetics, and artificial intelligence.

Why is the scientific method important?

The history of the scientific method illustrates how the concept developed out of a need to find objective answers to scientific questions by overcoming biases based on fear, religion, power, and cultural norms. This still holds true today.

By implementing this standardized approach to conducting experiments, the impacts of researchers’ personal opinions and preconceived notions are minimized. The organized manner of the scientific method prevents these and other mistakes while promoting the replicability and transparency necessary for solid scientific research.

The importance of the scientific method is best observed through its successes, for example: 

  • “ Albert Einstein stands out among modern physicists as the scientist who not only formulated a theory of revolutionary significance but also had the genius to reflect in a conscious and technical way on the scientific method he was using.” Devising a hypothesis based on the prevailing understanding of Newtonian physics eventually led Einstein to devise the theory of general relativity .
  • Howard Florey “Perhaps the most useful lesson which has come out of the work on penicillin has been the demonstration that success in this field depends on the development and coordinated use of technical methods.” After discovering a mold that prevented the growth of Staphylococcus bacteria, Dr. Alexander Flemimg designed experiments to identify and reproduce it in the lab, thus leading to the development of penicillin .
  • James D. Watson “Every time you understand something, religion becomes less likely. Only with the discovery of the double helix and the ensuing genetic revolution have we had grounds for thinking that the powers held traditionally to be the exclusive property of the gods might one day be ours. . . .” By using wire models to conceive a structure for DNA, Watson and Crick crafted a hypothesis for testing combinations of amino acids, X-ray diffraction images, and the current research in atomic physics, resulting in the discovery of DNA’s double helix structure .

Final thoughts

As the cases exemplify, the scientific method is never truly completed, but rather started and restarted. It gave these researchers a structured process that was easily replicated, modified, and built upon. 

While the scientific method may “end” in one context, it never literally ends. When a hypothesis, design, methods, and experiments are revisited, the scientific method simply picks up where it left off. Each time a researcher builds upon previous knowledge, the scientific method is restored with the pieces of past efforts.

By guiding researchers towards objective results based on transparency and reproducibility, the scientific method acts as a defense against bias, superstition, and preconceived notions. As we embrace the scientific method's enduring principles, we ensure that our quest for knowledge remains firmly rooted in reason, evidence, and the pursuit of truth.

The AJE Team

The AJE Team

See our "Privacy Policy"

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 30 May 2024

A strategy for differential abundance analysis of sparse microbiome data with group-wise structured zeros

  • Fentaw Abegaz 1 , 2 ,
  • Davar Abedini 1 ,
  • Fred White 1 ,
  • Alessandra Guerrieri 1 ,
  • Anouk Zancarini 3 ,
  • Lemeng Dong 1 ,
  • Johan A. Westerhuis 1 ,
  • Fred van Eeuwijk 2 ,
  • Harro Bouwmeester 1 &
  • Age K. Smilde 1  

Scientific Reports volume  14 , Article number:  12433 ( 2024 ) Cite this article

55 Accesses

Metrics details

  • Bioinformatics

Comparing the abundance of microbial communities between different groups or obtained under different experimental conditions using count sequence data is a challenging task due to various issues such as inflated zero counts, overdispersion, and non-normality. Several methods and procedures based on counts, their transformation and compositionality have been proposed in the literature to detect differentially abundant species in datasets containing hundreds to thousands of microbial species. Despite efforts to address the large numbers of zeros present in microbiome datasets, even after careful data preprocessing, the performance of existing methods is impaired by the presence of inflated zero counts and group-wise structured zeros (i.e. all zero counts in a group). We propose and validate using extensive simulations an approach combining two differential abundance testing methods, namely DESeq2-ZINBWaVE and DESeq2, to address the issues of zero-inflation and group-wise structured zeros, respectively. This combined approach was subsequently successfully applied to two plant microbiome datasets that revealed a number of taxa as interesting candidates for further experimental validation.

Similar content being viewed by others

scientific method hypothesis alternative

Microbiome differential abundance methods produce different results across 38 datasets

scientific method hypothesis alternative

Analysis of compositions of microbiomes with bias correction

scientific method hypothesis alternative

Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data

Introduction.

The plant and soil microbiomes, comprising a diverse community of beneficial and harmful microbes, play an important role in plant growth and health 1 , 2 , 3 . To understand the mechanisms that govern plant-microbiome interactions, high-throughput sequencing methods have been dramatically advanced 4 . Amplicon sequencing (e.g. 16S rRNA gene) and whole genome shotgun sequencing are the two major methods for the identification of microbial communities by utilizing sequence data 5 , 6 . The resulting microbiome data consisting of counts and compositional data, are typically characterized by sparsity (zero-inflation), overdispersion (variance that is much higher than expected), high-dimensionality (number of taxa is much higher than the number of samples), non-normality, and variable sequencing depth among samples. Such characteristics of the microbiome data make its analysis challenging 7 . However, failure to consider these special characteristics of microbiome count data during statistical analysis can result in false positive results and irreproducible relationships 8 .

One important aspect of microbiome data analysis that attracts significant scientific interest is the detection of differential abundance among microbial species across two or more conditions or treatments. However, microbiomes consist of hundreds to thousands of distinct taxa (a term used to refer to OTUs: operational taxonomic units or ASVs: amplicon sequencing variants), with just a small percentage likely to be differentially abundant 9 . Several methods and procedures based on counts, compositionality, and transformation perspectives have been introduced in the literature for discovering differentially abundant species (see Refs. 10 , 11 and references therein for a list of methods used in differential abundance analysis). Current practices for detecting differential abundance in microbiome data involve careful data pre-processing (filtering and normalization) and the use of suitable statistical tools that consider the special characteristics of the data 12 , 13 . However, there is a continued debate over the appropriate approaches for assessing differential abundance in microbiome data 12 .

Despite strict quality control and contaminant removal utilizing QIIME 14 and DADA2 15 software, microbiome data still contain many rare and low-prevalence taxa and are thus highly zero-inflated 16 , which makes microbiome data analysis challenging. In microbiome data, typically between 80 and 95 percent of the counts are zero 12 , while the number of zeros varies significantly across taxa, ranging from none or a few to many zeros. Zero counts in the sample can simply reflect absence but also presence with low frequency that was not detected due to technical detection limits. In particular, zeros in sequence count data can be either biological zeros which indicate the true absence of taxa under specific environmental conditions, or non-biological zeros which can arise from various factors such as sequencing errors, limited sequencing depth, uneven sampling depth, and PCR amplification bias 14 , 17 , 18 , 19 , 20 . Unfortunately, without prior biological knowledge or spike-in controls, distinguishing biological and non-biological zeros in sequence count data is difficult 17 . In spite of this, reviews cited in Ref. 10 demonstrated that many rare taxa are a result of sequencing artifact contamination, and/or sequencing errors 14 are not informative in the analysis and have no influence on the scientific conclusions. The inclusion of a filtering step to remove potentially uninformative taxa before performing statistical tests can reduce the burden of adjusting for multiple tests, which considerably improves detecting differentially abundant taxa 11 . Several ad hoc and principled filtering algorithms have been presented in this respect. The choice of filtering strategy will influence the results of the subsequent analysis 12 .

While filtering reduces the complexity of microbiome data, the highly-inflated-zeros that remain even after filtering can result in significant reductions in statistical power if not adequately modeled 15 . Several statistical tools have been introduced to properly deal with the analysis of zero-inflation, including zero-inflated Gaussian 16 and zero-inflated negative-binomial 21 models, as well as observation weights-based strategies for using popular RNA-seq tools such as DESeq2, edgeR, and limma-voom 22 . The inflated-zeros, on the other hand, directly contribute to another statistical issue, the problem of perfect separation 23 or group-wise structured zeros 24 , which arises when many non-zero counts are in one group and all zero counts in the other. When there is enough evidence to believe that biological factors caused the occurrence of taxa with group-wise structured zeros they can be identified as structural zeros and labelled as significant without further differential abundance testing because they are abundant in one group but not at all present in the other group 24 . On the other hand, because it is difficult to identify biological from non-biological zeros in sequence count data, it would not be fair to relate group-wise structured zeros to only biological zeros. For instance, the presence of group-wise structured zeros is more likely to be noticeable in small samples but it is possible that group-wise structured zeros may not be present in the population as a whole or with more sampled data. In such circumstances, the occurrence of group-wise structured zeros needs to be explored to see if it is caused by an inherent biological factor or sampling variability. To this end, implementing standard likelihood inference in the presence of taxa with perfect separation or group-wise structured zeros, hampers statistical inference. This is because it results in large or infinite parameter estimates of effects coupled with extremely inflated standard errors, causing such taxa to be nonsignificant 23 .

In the statistical literature, a penalized likelihood strategy that provides finite parameter estimates has been suggested as a solution to the issues that perfect separation or group-wise structured zeros present when using maximum likelihood-based techniques 23 , 25 . Furthermore, penalized likelihood ratio-based tests help in providing appropriate significance test results for perfectly separated taxa 23 , 25 . However, no or little efforts have been made to include penalized likelihood inference into many of the existing differential abundance techniques. Among the differential abundance techniques, the estimation approach implemented in DESeq2 by combining ridge type penalized likelihood estimation with a likelihood ratio based test 26 has the potential to address the problem of perfect separation or group-wise structured zeros.

Differential abundance analysis is generally conducted using a single or combination of methods adopted from bulk RNA sequencing analysis, single-cell RNA sequencing analysis, or particularly developed for microbiome data 16 . There have been a few large-scale benchmarking studies that look at the adequacy of using these methods 10 , 11 , 27 . According to the benchmarking studies, the various differential abundance analysis methods are inadequate at controlling false discovery rates at nominal levels, and there was no consistency in the opinion of what the right approach is 21 . Here we focus mainly on popular tools that take into consideration the count nature of the data, such as edgeR 28 , DESeq2 26 , limma-voom 29 , as well as their weighted counterparts referred to as edgeR-ZINBWaVE, DESeq2-ZINBWaVE and limma-voom-ZINBWaVE 22 , which utilize a weighting mechanism based on the ZINBWaVE model 30 . Some of these and other approaches were evaluated on a vast variety of microbiome datasets 10 . However, many of these techniques treat taxa with group-wise structured zeros differently in differential abundance testing. To ensure a fair comparison of these techniques, taxa with group-wise structured zeros were identified and excluded from the differential abundance testing in our simulated comparisons.

Another major challenge in analyzing microbiome datasets generated by high-throughput sequencing is compositionality, as the sequencing procedure generates a total sequence read count, also known as sequencing depth or library size, that varies between samples 24 , 31 , 32 , 33 , 34 . Various strategies have been employed to deal with the compositional nature of microbiome data. One strategy is to use compositionally aware differential abundance tools that rely on log-ratio transformations such as ALDEx2 35 and ANCOM 31 . An alternative strategy, as used in this work, is to use normalization methods to mitigate the impact of compositionality on count-based differential abundance analysis 36 , 37 . Some examples of normalization include: trimmed mean of M-values (TMM) normalization used in edgeR 28 , median-of-ratios method normalization used in DESeq2 26 . Interestingly, some normalization methods used in count-based differential abundance analysis, such as the median-of-ratios method, trimmed mean of M-values and Wrench 37 normalization, use similar mathematical expressions as a compositionally aware transformation, such as centered log-ratio, to determine size factors based on sequencing depth and a method-specific compositional scale factor derived from the ratio of proportions 37 . However, the use of these normalization techniques, which involve log-ratio calculations in compositional methods and geometric means or ratios in count-based methods, is complicated by the presence of zeros in microbiome data. To address the problem with zeros in microbiome data normalizations, several approaches have been considered, including adding pseudo-counts (1, 0.5, or small fractions) to the abundance data, replacing zeros with multivariate imputation 36 , 38 and deep learning methods 38 , using only non-zero counts in the computations 39 and developing methods that specifically address zero-inflation such as geometric mean of pairwise ratios 40 and Wrench normalizations 37 . The choice of normalization method can have a significant impact on the results; thus, it is important to carefully consider the purpose of data analysis, data characteristics, and the assumptions underlying each normalization method 38 , 41 .

In this work, first, we considered a simulation strategy similar to Mallick et al. 8 that mimics experimental plant microbiome data using the simulation model SparseDOSSA 6 with several performance metrics to compare weighted and unweighted differential abundance methods; second, we implemented a combination of differential abundance tools that include (i) DESeq2-ZINBWaVE: ZINBWaVE-weighted methods to address the problem of zero inflation and control false discovery rate and (ii) DESeq2: penalized likelihood ratio based method to properly address the analysis of taxa with perfect separation or group-wise structured zeros; third, we created a comprehensive pipeline for differential abundance analysis of microbiome data that includes data pre-processing and identification of differentially abundant taxa using the combined approach DESeq2-ZINBWaVE and DESeq2; finally, we applied the pipeline for detecting differential abundance on two experimentally obtained plant microbiome 16S rRNA gene sequencing datasets from the MiCRop (Microbial Imprinting for Crop Resilience; www.microp.org ) project.

Methods and materials

Pipeline design.

The combined approach, DESeq2-ZINBWaVE-DESeq2, is designed to perform a thorough assessment of microbial abundance differences while accounting for zero-inflation and group-wise structured zeros. Figure  1 depicts the essential stages in the implementation of DESeq2-ZINBWaVE-DESeq2. The input data includes an abundance table, a taxonomy table, and a metadata table in any standard microbiome data format. The initial stages of data processing involve filtering and normalization. For the analysis of a single treatment with two factor levels, the data pre-processing step is readily followed by categorizing taxa as having group-wise structured zeros or not. Here we note that if there is sufficient evidence to assume that taxa with group-wise structured zeros are due to biological causes, they can be labelled as significant without further differential abundance testing 24 . Otherwise, we will proceed as follows. For each category, differential abundance analysis is done independently (Analysis Part A and B). In Analysis Part A, DESeq2 likelihood ratio test (LRT) is used to perform differential abundance testing for taxa with group-wise structured zeros while in Analysis Part B, DESeq2-ZINBWaVE-based LRT is utilized for taxa without group-wise structured zeros. Finally, we collect significant taxa from both analyses for diagnostic purposes and biological interpretation. On the other hand, when the identification of group-wise structured zeros becomes more difficult, for example, in the presence of multiple categorical factors, the structural zero grouping step can be skipped. Without much loss of power, the entire filtered and normalized dataset can be analyzed using both DESeq2 (which is appropriate in detecting differential abundance for taxa with group-wise structured zeros) and DESeq2-ZINBWaVE (which is more powerful in detecting differential abundance for taxa with no group-wise structured zeros) based LRTs. This is followed by collecting unique significant taxa from both analyses for diagnostic purposes and biological interpretation.

figure 1

Flowchart for the microbiome data differential abundance analysis pipeline. Data: the input microbiome data include the abundance table, taxonomy table, and metadata table. Pre-processing step: includes filtering and normalization. Grouping taxa by group-wise structured zeros for a single covariate with two factor levels. Differential abundance testing can be performed in two parts. Analysis Part A: differential abundance testing for taxa with group-wise structured zeros using DESeq2 likelihood ratio test (LRT). Analysis Part B: differential abundance testing for taxa without group-wise structured zeros using DESeq2-ZINBWaVE with LRT. Diagnostics: collect significant taxa from both analyses for diagnostic purposes and biological interpretation. When there are multiple covariates, skip the structural zero grouping step and apply both DESeq2 and DESeq2-ZINBWaVE on the whole filtered and normalized data (dashed arrows).

Data preparation

Following raw sequence read processing with DADA2 or QIIME, the sequencing data is typically presented in two table formats: an abundance table for counts and a taxonomy table for phylogenetic information. Differential abundance analysis generally requires the use of three input datasets: (i) Abundance data table: taxa count abundance across samples; (ii) Taxonomy data table: taxonomy information across taxa (required for higher hierarchical levels of analysis or interpretation of results); and (iii) Metadata table: information on treatments, phenotypes or covariates of interest across samples (Fig.  1 ). It is important to note that, as part of the data preparation step, we must examine the abundance and metadata tables for missing data values. We considered two template datasets differing in the level of zero-inflation for evaluating the performance of popular unweighted and weighted microbiome differential abundance tools (see details in the “Data analysis” section).

Data filtering

Because microbiome data sets are usually sparse, it is necessary to filter the data set by removing low-quality or uninformative taxa to improve downstream statistical analysis. Several filtering approaches are implemented in R/Bioconductor packages that include filter_taxa in phyloseq 39 , filterByExpr in edgeR 28 , and simultaneous or permutation based statistical filtering in PERFect 12 . The edgeR filterByExpr function allows filtering based on replications or treatment groups. Unless indicated, our filtering to retain a taxon for differential abundance analysis is based on a minimum of two counts in at least two samples per treatment group. However, depending on the nature of the count data, the minimum thresholds can be set at a higher value. Setting the threshold as low as two may help to ensure that important taxa, including relevant rare taxa with low counts, remain in the dataset, but it will increase the number of zeros and low counts, which has an impact on statistical analysis. The proposed analysis strategy in this work addresses excess zeros and low expected counts 22 .

Taxa grouping

Following the filtering stage, taxa are grouped based on whether they have group-wise structured zeros or not. Taxa with group-wise structured zeros have zero counts in all samples of one of the groups.

Data normalization

Normalization is another important step in microbiome sequencing data analysis that is used to remove any bias caused by compositional effect or differences in sequencing depths or library sizes between samples. For microbiome studies, several forms of normalization have been used 41 : rarefying, scaling 9 , 42 , log- transformation, zero-inflation based normalization and compositionally aware normalization. Rarefying is subsampling to equal sequencing depth without replacement 9 . We did not use rarefying-based normalization since its application in differential abundance analysis is debatable 11 , 43 . Scaling based normalization: this is to acquire a scaling factor that can be used to adjust the read counts to produce normalized counts or to produce normalized library sizes 40 . Normalized library sizes are used as offsets in count-based regression models such as DESeq2 and edgeR and their weighted counterparts to remove biases caused by uneven sequencing depths in differential abundance analysis 40 . Some commonly used scaling-based normalization procedures adopted from RNA-Seq data include Cumulative-Sum Scaling (CSS) implemented in metagenomeSeq 16 , median-of-ratios method in DESeq2 26 , Upper Quartile (UQ) in limma-voom 29 , and Trimmed Mean of M-values (TMM) in edgeR 28 . These normalization procedures were developed primarily for RNA-Seq data that do not contain a large number of zeros. To address zero-inflation in normalization, geometric mean of pairwise ratios (GMPR) 40 , Wrench normalization 37 , geometric mean of positive counts (poscounts) and deconvolution 44 methods were introduced; however, normalization methods for zero-inflated microbiome data are still under development. The compositionally aware normalizations that are in common use include centered log-ratio transformation (CLR) in ALDEx2 35 and additive log-ratio transformation in ANCOM 31 . Log-transformation based normalization is used in MaAsLin2 8 .

The use of log-ratio or log-transformation normalization in compositional methods (like CLR), as well as the use of geometric mean and ratios in count-based approaches (such as median-of-ratios method and TMM), presents difficulties when dealing with zeros. Several strategies have been used to address the zero problem in normalizations, including the addition of pseudo-counts (1, 0.5, or small fractions) and the replacement of zeros using multivariate imputation 36 , 38 (used in the R packages zCompositions 45 and robCompositions 46 ), as well as deep learning methods 38 . A detailed comparison of these replacement strategies in compositional methods has been provided, with a recommendation for an appropriate method based on the purpose of the analysis, data characteristics (such as dimension and extent of zero inflation), and time constraints to do the zero replacement 38 . Similarly, in count-based approaches, normalization such as TMM in edgeR and median-of-ratios method in DESeq2 ignore zeros or add pseudo count of 1 to the abundance data when calculating pairwise ratios or geometric means to deal with the problem of zeros in microbiome data. However, the arbitrary choice of pseudo counts for zero replacement or the removal of features with zero counts in normalizing microbiome data has an impact on downstream analysis 40 , 42 .

As a solution, new strategies such as GMPR and Wrench normalizations have been developed specifically to handle zeros more effectively in normalizing zero-inflated sequencing data, such as microbiome sequencing data. GMPR is calculated by utilizing pairwise samples that share many features, allowing for the use of more information than other normalization methods that exclude features with zero values. GMPR first computes the median of all pairwise ratios of nonzero counts between any two samples and then uses the geometric mean of the median values to calculate the normalizing factor of a sample. While GMPR addresses some of the shortcomings of existing normalization methods, its dependence on pairwise comparison may limit its applicability. Wrench normalization is a reference-based compositional correction method for sparse count data that employs an empirical Bayes approach to borrow information from multiple features and samples when modeling taxon-wise ratios of abundance proportion using a hurdle log-normal model. Wrench normalization avoids the problem of zeros in ratio computation by using the averaged relative abundances (proportions) of each feature across the dataset as the reference.

Finding differentially abundant features is significantly impacted by the different normalization strategies and statistical tests, as demonstrated in several studies 10 , 27 , 40 . Normalization methods are most effective at controlling FDR and increasing statistical power with their underlying statistical models or tests and specific datasets; however, they are not as successful when used with other statistical models, tests, or datasets 41 . As a result, in our analysis, we used TMM, the default normalization in edgeR and limma-voom, as well as poscounts (geometric mean of positive counts), a recommended 22 , 39 normalization for analysing sparse data with DESeq2 26 . The poscounts normalization is implemented in the DESeq2 estimateSizeFactors function with option type = “poscounts”. To deal with the problem of zero values, TMM computes pairwise log ratios using a trimmed set of features for each sample without including features with zero counts for that sample, while poscounts calculates the geometric mean for each feature using only its positive counts. Notably, both TMM and poscounts allow features with zero counts to contribute to normalization based on their non-zero counts. These normalizations, moreover, have been shown to reduce the effect of compositionality on differential abundance analysis results 37 . We employed the same normalizations on the ZINB-WaVE weighted counterparts. We also used GMPR, which was developed specifically for zero-inflated microbiome data and has been shown to improve power and false positive control in differential abundance analysis 40 . Furthermore, MaAsLin2, metagenomeSeq, and ANCOM-BC use log-transformed abundance as the default, with a pseudo count of 1 added to the abundance table to account for zero values.

Data exploratory analysis

Evaluating the extent of zero inflation.

One of the primary challenges in microbiome differential abundance analyses is the presence of excessively inflated zero counts 14 . We used a graphical depiction of the biological coefficient of variation 22 , which is the square root of the estimated negative binomial dispersion parameter, to examine the extent of inflated-zeros and how they influence downstream differential abundance analyses. Taxa with few counts or many zeros result in very high dispersion estimates, which appear as striped patterns on the biological coefficient of variation plot. The increased dispersion due to inflated zeros hampered the capacity to detect differential abundance using negative binomial based models such as edgeR and DESeq2 22 , which do not account for excess zeros.

Type I error control

Another way to assess the suitability of a differential abundance method is to evaluate its type I error rate control using mock or model-based simulated data. On the one hand, we generated mock samples from real null data that did not include taxa whose abundance differed between two groups (i.e. no differentially abundant taxa between two groups), by randomly assigning the two groups to each taxon. On the other hand, we generated 100 datasets using the SparseDOSSA model with an effect size of zero, i.e. under the null hypothesis of no differentially abundant taxa between two groups. Filtering, grouping by group-wise structured zeros, and normalization were applied to the mock and simulated datasets. Taxa containing group-wise structured zeros were removed from differential abundance testing in the simulation studies. Then, using several methods, we conducted a differential abundance analysis between two groups of the covariate and obtained the p-values. The proportion of p-values less than the commonly used nominal 5% level is used to calculate the observed type I error rates. For each differential abundance approach, the average observed type I error rates across mock or model-based simulated datasets were computed.

Differential abundance testing

Many methods for analyzing differential abundance in microbiome data are adapted from methods developed for analyzing bulk RNA-Seq or single cell RNA-Seq data 10 . Differential abundance approaches tailored to the study of microbiome data have also been developed. In our comparative study, we looked at a few common approaches for detecting differential abundance that have previously been tested using multiple real microbiome datasets 10 , 11 , 47 and synthetic abundance data 6 . We considered microbiome differential abundance analysis approaches adapted from RNA-Seq analysis techniques, such as edgeR, DESeq2, and limma-voom as well as their weighted counterparts: DESeq2-Zimbwave, edgeR-ZINBWaVE, and limma-voom-ZINBWaVE, respectively, recently introduced for single-cell RNA-Seq, which employ weights generated using the ZINBWaVE model 10 , 22 (Text S1 ). We also included ANCOM-BC from the compositional approach and MaAsLin2 and metagenomeSeq from transformation-based microbiome data analysis approaches (see details in Text S1 ).

Plant microbiome experimental datasets

In this work, we used plant microbiome datasets with varying degrees of zero-inflation and count pattern obtained from two experimental studies to assess several differential abundance techniques. The collection of plant material complies with University of Amsterdam, The Netherlands, and international guidelines and legislation. This study protocol also complies with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora.

Plant material and DNA extraction

In the N-P starvation experiment, the impact of nitrogen (N) and phosphate (P) starvation on the bacterial composition in tomato ( Solanum Iycopersicum ) roots was investigated. S. lycopersicum cv. Moneymaker (SS687) seeds were obtained by UvA greenhouse (University of Amsterdam, The Netherlands) and surface sterilized using 70% EtOH (2 min), 20% bleach (20 min), 10 mM HCl (10 min) and milli-Q water (5 times), respectively. The seeds were then pregerminated in a climate room at 24 °C in the dark on sterilized and moisturized filter paper for 3 days. The germinated seeds were transferred to soil-filled baskets in the greenhouse at 22 °C under an 8/16 h dark/light regime. Ten days after germination, the baskets were transferred to a custom-made aeroponics system, in which the roots were sprayed with 1/4 strength Hoagland solution for 15 s every 10-min. For the N starvation treatment, NH4NO3 was removed from the solution. For the P starvation treatment, K2HPO4 was replaced with 0.8 M KCL, which compensated for the loss of K. The roots were collected immediately, transferred to liquid nitrogen and stored at − 80 °C for further analysis.

In the Forest-Potting soil experiment, the effect of different soil types on the bacterial community of tomato roots was investigated. To this end, tomato ( S. lycopersicum cv. Moneymaker) seeds were surface-sterilized as described above and then sown in different soils with or without the addition of 10% forest soil. After four weeks, the roots were collected and used for microbial DNA extraction.

Bacterial DNA isolation, 16S rDNA sequencing and preprocessing

The frozen root tissues were ground into a fine powder using liquid nitrogen and a mortar and a pestle. 200 mg of the powdered tissue was used to extract microbial DNA using PowerSoil DNA Isolation following the manufacturer’s instructions. The quality and quantity of the isolated DNA were assessed using a Qubit 2·0 Fluorimeter (ds DNA high-sensitivity assays kit; Invitrogen). The metabarcoding analysis was conducted at BaseClear B.V. (Leiden, The Netherlands), utilizing an Illumina NovaSeq6000 SP platform with the 2 × 250 bp paired-end sequencing approach for samples from the N-P experiment. For the amplification of bacterial 16S rDNA, the V3 and V4 regions were targeted using the 341F (5′- CNTACGGGNGGCWGCAG-3′) and 805R (5′- GACTACHVGGGTWTCTAATCC-3′) primers. The sequencing reads were obtained as demultiplexed reads and after quality check, the primers were trimmed from the reads using cutadapt (v.1.9.1). For samples from the Forest_Potting soil experiment, the metabarcoding analysis was conducted on Illumina MiSeq PE250 platform at Génome Québec Innovation Centre (Montréal, QC, Canada). The sequencing data were processed using the DADA2 package and the plugin Quantitative Insights Into Microbial Ecology version 2 (QIIME2) to generate an OTU count table. The reads were filtered to maintain 220 bp and 200 bp for the forward and reverse reads, respectively. After merging denoised pair-end sequences, the chimeric sequences were removed from the reads. The obtained Operational Taxonomic Units (OTUs) were taxonomically assigned based on the SILVA database (v138), and an OTU count table was created for each dataset.

The N-P starvation dataset consisted of 15 samples and 2945 taxa with at least one count per treatment group. There were three treatment groups: N-starvation, P-starvation and control, each with 5 samples. We filtered out low abundance taxa using the criteria of at least two counts in at least two samples per treatment group, which resulted in 829 taxa for differential abundance analysis. The percentage of zero counts was around 55% after filtering. The Forest-Potting soils dataset included a total of 28 samples with 1796 taxa that had at least one count from each of the two soil types: Forest and Potting soils. There were 16 and 12 observations for forest and potting soils, respectively. The Forest soil type includes different soils with the addition of at least 10% forest soil. We filtered out low abundance taxa, leaving 244 with a minimum of two counts in at least two samples of each soil type for differential abundance analysis. Even after filtering, the fraction of zero counts remained high (84%), indicating that the Forest-Potting soils dataset is considerably zero-inflated as compared to the N-P starvation dataset. Moreover, very low taxa counts in the Forest-Potting soils dataset were another feature that distinguishes it from the N-P starvation dataset.

Simulation studies

We looked at both model-based 8 data simulation and creating mock 10 datasets by permuting the samples.

Model-based simulation studies

To evaluate the performance of our method and others on differential abundance detection for microbiome data, we used the SparseDOSSA 6 model to generate synthetic microbiome data 6 similar to Mallick et al. 8 which mimics experimental plant microbiome data. This model has the benefit of generating realistic simulated data based on parameterizing real-world template microbial datasets targeting the main characteristics of microbiome data such as counts, compositionality, zero-inflation, and over-dispersion 6 . SparseDOSSA uses a Bayesian hierarchical model to estimate taxa-specific parameters from a template dataset, which are then used to generate synthetic taxa from a zero-inflated, truncated log-normal distribution. The Forest-Potting soils and N-P starvation datasets were used as template datasets for the simulation.

Simulating null synthetic taxa abundance data

In the model-based simulation scenario, we first generated null abundance data for taxa with no true differential abundance using sparseDOSSA2 6 package. This was done independently of metadata features, using data templates from plant microbial communities for both the Forest-Potting soils and N-P starvation experimental datasets. Along with the template sequence count datasets, additional simulation parameters were defined as follows. The sample sizes varied from small to large (10, 20, 50, 100, 200) and were equal in both groups. The number of simulated taxa was determined by the size of the less strict filtered Forest-Potting soils and N-P starvation template datasets, which were 303 and 928, respectively. In addition, average (median) sequencing depths of 1883 and 215,356 from Forest-Potting soils and N-P starvation were used to generate realistic variation in library size in both datasets. With this simulation scenario, we generated 100 null synthetic taxa abundance datasets that were used to evaluate the performance of differential abundance methods in controlling type I error rates.

Simulating synthetic metadata

For synthetic metadata generation, we used a single factor with two groups (control and treatment) in an experimental design setting, with a focus on microbial differential abundance analysis. The metadata was generated by randomly assigning a value of 1 to half of the samples (treatment group) and a value of 0 to the other half (control group).

Simulating spike-in taxa to describe synthetic taxa-metadata relationships

To introduce differences in abundance between the two groups (control and treatment) for spike-in microbial taxa, varying effect sizes or log-fold changes were used in a generalized linear model setting based on the single factor covariate with two levels 6 . Specifically, using the template data as a basis, we generated synthetic data with spike-in taxa as follows. First, we selected about 10% (30 and 100) of the taxa from the Forest-Potting soils and N-P starvation template datasets, which included 303 and 928 taxa, respectively, with a relative abundance of at least 20% to be spiked-in or differentially abundant between the two groups in the metadata, with known effect sizes or log-fold differences. The effect sizes were varied, with half of the spiked features at (0.5, 1, 2) indicating an increase in abundance under the treated condition and the other half at (− 0.5, − 1, − 2) indicating a decrease in abundance under the treated condition. The sample and library sizes were as specified above. Then, the sparseDOSSA2 R/Bioconductor package was used to generate 100 simulated abundance datasets for each effect and sample size combination, which included about 10% spiked-in (“true” positives) taxa based on their effect sizes and the remaining 90% null taxa that were not differentially abundant (“true” negatives). These synthetic datasets with "true" or known effect sizes were used to evaluate the effectiveness of microbiome analysis methods in detecting differentially abundant taxa.

Mock dataset-based studies

To evaluate how the differential abundance method controls Type I error rate, we conducted mock group comparisons by permuting samples from the plant microbiome datasets. For the Forest-Potting Soils dataset, each sample was randomly assigned to one of two experimental groups: potting or forest soil, and the process was repeated 1000 times to produce 1000 mock datasets. Similarly, 1000 mock datasets were generated for the N-P starvation dataset, with each sample from the N-starvation and control groups assigned randomly to one of these groups. Because samples are assigned at random, no true differential abundance exists 10 ; therefore, all differential abundance discoveries should be considered false 27 , which can be used to assess type I error rates.

Performance evaluation

Mainly, two performance indicators are used for evaluation: statistical power or sensitivity and false discovery rate (FDR), which are computed based on false positives (FPs): taxa not spiked-in but found significant, true positives (TPs): taxa spiked-in and found significant, true negatives (TNs): taxa not spiked-in and not found significant), and false negatives (FNs): taxa spiked-in but not found significant.

Pipeline implementation

The pipeline (Fig.  1 ) provides a comprehensive differential abundance analysis of microbiome data, including data preparation, filtering, normalization, differential abundance testing and diagnostic plots. The entire pipeline is written in R. Several R and Bioconductor packages are used in the pipeline to analyze and visualize differential abundance detection using DESeq2, edgeR, limma-VOOM, and their ZINBWaVE weighted counterparts: DESeq2-Zimbwave, edgeR-ZINBWaVE, and limma-voom-ZINBWaVE.

In the pipeline, following the test of significance of all filtered taxa for differential abundance using one of the methods listed; log-fold changes, p-values, and adjusted p-values that are corrected for multiple hypothesis testing using Benjamini–Hochberg are provided in a summary table. Plots depicting statistically significant differential abundant taxa are also generated. A variety of summary and diagnostic plots are also provided to visualize significant results in the pipeline: (i) plots of significant taxa vs. log fold change; (ii) plots of log fold change vs. average log CPM (counts per million) for all taxa; (iii) count plots to evaluate significant taxa; and (iv) heatmap plots for significant taxa with relative abundances.

Zero-inflation and perfect separation or group-wise structured zeros

Popular differential abundance tools handle perfect separation or group-wise structured zeros differently.

In order to investigate how group-wise structured zeros are handled and how they impact differential abundance detection, we analyzed the N-P starvation plant microbiome dataset using DESeq2 and ZINBWaVE-weighted DESeq2. Following data filtering, poscounts normalization and identification of taxa with group-wise structured zeros based on observed zero counts (see the “ Methods and materials ” section), first we performed differential abundance testing using DESeq2 and DESeq2-ZINBWaVE on the entire set of taxa. We observed a substantial variation in the number and type of significantly differentially abundant taxa identified by DESeq2 and DESeq2-ZINBWaVE (Fig.  2 ). Figure  2 , highlights taxa depending on their statistical significance and whether or not they are involved in perfect separation or group-wise structured zeros. Results from DESeq2 (Fig.  2 A) show that many taxa with group-wise structured zeros were found significant with substantial log-fold changes by lying on the boundary of the plot (cyan dots in Fig.  2 A). Using DESeq2-ZINBWaVE which down-weights excess zeros, in contrast, no or a few taxa with group-wise structured zeros were found to be significant for the N-P starvation data as displayed in Fig.  2 B. This is not necessarily the case using DESeq2-ZINBWaVE, since we found many taxa with group-wise structured zeros to be significant after reanalyzing the Arctic-soil data, which contains a large number of samples investigated in Ref. 8 (Fig. S1 B).

figure 2

Comparing differential abundance detection tools in the presence of perfect separation or group-wise structured zeros for the N-P starvation dataset comparing Nitrogen starvation to control. SigDown: significant taxa with negative log-fold change, SigUp: significant with a positive log-fold change, NotSig: not significant, StrZeroSig: significant for taxa with group-wise structured zeros, StrZeroNotSig: not significant for taxa with group-wise structured zeros. ( A ) Analysis with DESeq2, taxa with group-wise structured zeros found to be significant having relatively large log-fold changes and located on the boundary of the plot (cyan); ( B ) Analysis with DESeq2-ZINBWaVE, taxa with group-wise structured zeros found not to be significant (purple) because of down weighting excess zeros. The number of significant taxa identified by DESeq2 and DESeq2-ZINBWaVE differed considerably due to the presence of taxa with group-wise structured zeros. ( C ) Analysis with DESeq2 after excluding taxa with group-wise structured zeros; ( D ) Analysis with DESeq2-ZINBWaVE after excluding taxa with group-wise structured zeros.

Because there was a large difference in the number of significant taxa, which was mostly attributed to taxa with group-wise structured zeros in this case, we reanalyzed the data using DESeq2 and DESeq2-ZINBWaVE after excluding taxa with group-wise structured zeros. The findings shown in Fig.  2 C and D demonstrate that DESeq2 and DESeq2-ZINBWaVE detect a comparable number of taxa (red and blue dots), which may occur when zero-inflation is not a serious concern 22 . As a result, comparative benchmarking studies on microbiome differential abundance analysis tools that do not account for group-wise structured zeros could lead to unsatisfactory comparative conclusions.

We assessed the type I error rate using mock samples and model-based synthetic data derived from the two real template datasets under the null hypothesis of no differentially abundant taxa between two groups (see the “ Methods and materials ” section).

In the model-based simulation with SparseDOSSA2 under the null hypothesis, we used a log-fold change of zero, implying that no taxa were differentially abundant between the two groups. We then performed differential abundance analysis and recorded the p-values for several methods under consideration. The observed type I error rates were calculated based on the fraction of p-values smaller than the nominal 5% level. The resulting plots shown in Fig.  3 for model based simulations and Fig S2 for the mock samples, reveal how each method controls type I error under the null hypothesis of no differentially abundant taxa. Simulation results based on the N-P starvation template dataset with moderately inflated-zeros (Fig.  3 A), DESeq2, edgeR, MaAslin2, DESeq2-ZINBWaVE (for moderate and large samples), edgeR-ZINBWaVE (for small samples) and limma-voom (for small samples) demonstrated effective control of the type I error rate at the nominal level. DESeq2-ZINBWaVE (for small samples) and edgeR-ZINBWaVE (for large samples), on the other hand, had slightly higher observed type I error rates. In contrast, simulation findings using the Forest-Potting soils template dataset with highly inflated-zeros (Fig.  3 B) demonstrated that DeSeq2-ZINBWaVE, edgeR-ZINBWaVE and MaAsLin2 controlled type I error rate at the nominal level. On the other hand, DESeq2 and edgeR were conservative tests with extremely small observed type I error rates, which might impede true discoveries. In comparison to the other methods, the observed type I error rate for limma-voom-ZINBWaVE was much higher, showing poor performance in terms of controlling the type I error rate on both synthetic datasets. The performance of DESeq2-ZINBWaVE (for small samples) and edgeR-ZINBWaVE (for large samples) in controlling type I error rate improved on average as zero-inflation was increased.

figure 3

Model-based simulations: controlling type I error rates using several differential abundance tools. Unweighted and weighted differential abundance methods were evaluated for type I error control based on synthetic plant microbiome data with varying zero-inflation rates and sample sizes. Boxplots of observed type I error rates are colored by total sample size. ( A ) N-P starvation template dataset with 55% zeros. ( B ) Forest-Potting soils template dataset with 84% zeros. Weighted approaches to differential abundance, with the exception of limma-voom-ZINBWaVE, demonstrated acceptable control type I error rates for moderately and highly zero-inflated datasets.

Similar results were observed based on 1000 mock datasets generated from N-P starvation and Forest-Potting soils datasets (Fig S2 ). On the other hand, the observed type I error rates for limma-voom-ZINBWaVE, ANCOM-BC, and metagenomeSeq were high on both mock datasets, potentially leading to a large number of false discoveries (Fig S2 A and B). As a result, ANCOM-BC and metagenomeSeq were not included in the model-based simulation results presented above. Moreover, using GMPR instead of poscounts and TMM normalizations had a comparable impact on controlling type I error rate.

Differential abundance methods benchmarking using synthetic data

We utilized synthetic count data with spiked-in taxa between two defined groups of samples to evaluate the performance of differential abundance analysis tools (see the “ Methods and materials ” section). We simulated 100 datasets, each with a single treatment with two factor levels and a fixed number of true differential features selected based on at least 20% of differentially abundant taxa for varying effect sizes (0.5, 1.0, 2.0) and total sample sizes (10, 20, 50, 100, 200). The biological coefficient of variation plots computed from the template real data and simulated data are shown in Fig. S3 to demonstrate how the data generating process resembles the structure of the N-P starvation template real dataset.

We then used false discovery rate (FDR) and power (sensitivity) as performance indicators to assess the ability of different microbiome differential abundance analysis approaches to recover the relationship between spike-in taxa and a two-factor treatment. In the simulation analysis, following filtering, normalization and removing taxa with group-wise structured zeros (to place methods in comparable context), tests of differential abundance were performed for each of the 100 datasets using each of the methods under consideration. Based on the adjusted p-values (Benjamini-Hochberg), we identified true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) and computed the measures of performance such as power and FDR. For each method, the power and FDR are displayed in Figs.  4 and 5 .

figure 4

FDR and statistical power (sensitivity) for weighted and unweighted differential abundance detection methods evaluated on a highly zero-inflated Forest-Potting soils template plant microbiome data. Boxplots of FDR ( A – C ) and power ( D – F ) with effect sizes of 0.5 ( A , D ), 1 ( B , E ) and 2 ( C , F ) are colored by total sample size. DESeq2-ZINBWaVE demonstrated a FDR close to the nominal 5% level while maintaining power at reasonably high level except in a very small sample case. The power of limma-voom-ZINBWaVE was high but at the expense of a very high FDR. The performance of the unweighted methods DESeq2, edgeR and limma-voom was very poor for highly zero-inflated data.

figure 5

FDR and statistical power for weighted and unweighted differential abundance detection methods evaluated on moderately zero-inflated synthetic N-P starvation plant microbiome data. Boxplots of FDR ( A – C ) and power ( D – F ) with effect sizes of 0.5 ( A , D ), 1 ( B , E ) and 2 ( C , F ) are colored by total sample size. Unweighted methods demonstrated comparable performance with their weighted counterparts.The FDR of unweighted methods was on average slightly lower than their weighted counterparts but this was at a slight expense on their power. The comparable power of limma-voom and limma-voomZINBWaVE was overshadowed by a very high FDR.

Weighted methods outperform in highly zero-inflated data

For the highly zero-inflated Forest-Potting soils synthetic datasets, the simulation performance of differential abundance approaches using FDR and statistical power is shown in Fig.  4 . Except for limma-voom-ZINBWaVE, DESeq2-ZINBWaVE and edgeR-ZINBWaVE all methods evaluated had very poor statistical power (a very few or no true positives found significant) for small sample sizes across all effect sizes. Moreover, even with large sample and effect sizes, DESeq2, edgeR and limma-voom exhibited low power. In general, the power of all methods increased with increasing sample and effect sizes. The FDR values of DESeq2 and edgeR were mainly either 0 (no false positives were found) or 1 (all significant results were false positives). DESeq2-ZINBWaVE, on the other hand, resulted FDRs that were close to the nominal 5% on average except for small samples and small effect sizes. However, the FDR of DESeq2-ZINBWaVE decreased with increasing sample and effect sizes. For modest sample sizes, edgeR-ZINBWaVE demonstrated FDR values on average close to the nominal level, with a tendency to rise with sample size. Limma-voom-ZINBWaVE, on the other hand, demonstrated high power even in small sample sizes. However, in all sample and effect size settings, the FDR of limma-voom-ZINBWaVE exceeds the nominal threshold by a wide margin. That is, with a nominal FDR of 5%, on average, more than 60% of the features identified as significant by limma-voom-ZINBWaVE were false positives (Fig.  4 ). Across all samples and effect sizes, DESeq2-ZINBWaVE outperformed the other approaches in terms of power while maintaining a reasonable FDR control.

Better power of weighted methods even in moderately zero-inflated data

We also examined the FDR controlling and power behavior of different differential abundance detection methods for moderately zero-inflated N-P starvation synthetic datasets. Except limma-voom, limma-voom-ZINBWaVE and MaAslin2, the FDR in all other methods were close to the nominal 5% level (Fig.  5 ) for large effect and sample sizes. The FDR of unweighted methods was on average slightly lower than their weighted counterparts, but this was at a slight expense on power. DESeq2-ZINBWaVE showed a relatively better power performance than edgeR-ZINBWaVE in small sample sizes. For large samples, on the other hand, edgeR-ZINBWaVE had better power than DESeq2-ZINBWaVE but this was at the expense of increased FDR.

The potential of utilizing DESeq2-ZINBWaVE-DESeq2 approach for differential abundance analysis

To assess the performance of the combined approach DESeq2-ZINBWaVE-DESeq2, we reanalyzed the metagenome shotgun sequencing data from the Human Microbiome Project (HMP-2012), which included 5 supragingival and 5 subgingival plaque samples from the oral cavity 10 , 48 (see details in Text S1 ). Using enrichment analysis, DESeq2-ZINBWaVE-DESeq2 discovered many accurate enrichments when compared to the other differential abundance tools (Fig. S5 ).

Analysis of experimental datasets

Forest-potting soils microbiome analysis.

Out of 244 filtered taxa occurring in at least two replicates from one of the two soil types and with a minimum of two counts per taxon, 135 had group-wise structured zeros and 109 did not. DESeq2-ZINBWaVE-DESeq2 was applied to detect differentially abundant taxa between forest and potting soils. We used DESeq2 on 135 taxa that had group-wise structured zeros and DESeq2-ZINBWaVE on 109 taxa that did not have group-wise structured zeros. Figures  6 and S4 depict the taxa identified as differentially abundant between potting and forest soils (adjusted p-value < 0.05). In Fig.  6 , the relative abundances are colored, with higher relative abundances represented by dark red. The figures show that mainly specific Massilia and Streptomyces ASVs were more abundant in potting and forest soils, respectively.

figure 6

Heatmap of differentially abundant taxa between potting and forest soils. Higher relative abundances represented by darker red.

N-P starvation microbiome analysis

We also applied the combined DESeq2-ZINBWaVE-DESeq2 approach to detect differentially abundant microbes in tomato growing under nitrogen (N) and phosphate (P) deficiency compared to a control with complete Hoagland solution (C). To utilize the likelihood ratio test in DESeq2 and DESeq2-ZINBWaVE for multicategorical variables, we modified the design matrix so that the generated dummy variables could be easily included into the full and reduced model structures, comparable to the likelihood ratio implementation in edgeR-ZINBWaVE 30 . As a result, the full model structure contains the intercept, N versus C, and P versus C, but the reduced model can include the intercept and either N versus C or P versus C. In comparing N versus C, we had 451 taxa with group-wise structured zeros and 320 without. Similarly, for P versus C, there were 385 taxa with group-wise structured zeros and 444 without. Figure  7 depicts the taxa identified as differentially abundant comparing N versus C and Fig.  8 comparing P versus C using the methods DESeq2 for taxa with group-wise structured zeros and DESeq2-ZINBWaVE for taxa without group-wise structured zeros (adjusted pvalue < 0.05). Under N-starvation taxa belonging to the families: Mycobacteriaceae, Caulobacteraceae, Comamonadaceae, Bdellovibrionaceae, Rhizobiaceae, Streptomycetaceae, Reyranellaceae, Hyphomonadaceae, Candidatus_Kaiserbacteria and Candidatus_Nomurabacteria were more abundant while taxa belonging to the families: Acetobacteraceae, Acidobacteriaceae, Microbacteriaceae, Burkholderiaceae and Rhodanobacteraceae were less abundant. Under P-starvation taxa belonging to the families: Sphingomonadaceae, Acetobacteriaceae, Bdellovibrionaceae, Caulobacteraceae and Candidatus_Kaiserbacteria were more abundant while taxa belonging to the families: Microbacteriaceae, Intrasporangiaceae, Xanthobacteracea and Rhizobiales were less abundant.

figure 7

Differentially abundant taxa in comparing N-starvation with the control.

figure 8

Differentially abundant taxa in comparing P-starvation with the control.

Discussion and conclusion

This study aimed to contribute to the search for the best count-based differential abundance practices for microbiome data. In microbiome research, experimental datasets usually only have small to moderate number of replicates or total sample size. Moreover, in addition to inflated zeros, the microbiome count data may contain many taxa with group-wise structured zeros. The causes of group-wise structured zeros might be biological or non-biological factors, but identifying them in sequence count data is difficult. As a result, applying differential abundance methods in the presence of taxa with group-wise structured zeros makes statistical inference problematic due to extremely inflated standard errors, leaving many taxa nonsignificant. We included a pre-processing step to distinguish taxa with and without group-wise structured zeros as much as possible and analyze them separately. We implemented the DESeq2-ZINBWaVE-DESeq2 approach, which is a combination of differential abundance tools that include DESeq2-ZINBWaVE for analyzing taxa without group-wise structured zeros and DESeq2 for analyzing taxa with group-wise structured zeros.

With the zero-inflated microbiome data, we found a considerable difference in the number and type of significantly differentially abundant taxa using unweighted (DESeq2, edgeR, limma-voom-ZINBWaVE) and weighted (DESeq2-ZINBWaVE, edgeR-ZINBWaVE, limma-voom) differential abundance analysis techniques. In this regard, the handling of taxa with group-wise structured zeros in the various tools had a significant influence. The likelihood ratio-based tests in the unweighted methods produced many significant taxa with group-wise structured zeros, but the weighted counterparts produced none or a few, which may be attributed to down-weighting the excess zeros that might have distorted the contribution of zeros to perfect separation. In either the weighted or unweighted approaches considered, no or little attention was given to directly address the issue of group-wise structured zeros. As a result, comparative benchmarking studies on microbiome differential abundance analysis tools that do not account for group-wise structured zeros may yield incorrect results.

We next investigated the impact of zero-inflation on the power and FDR control performance of weighted and unweighted approaches in microbiome differential abundance analysis using mock samples and model-based synthetic datasets. Instead of being dependent on the underlying distributional assumptions of the methodologies under consideration, the model-based simulation experiments were designed to mimic real microbiome data using the SparseDOSSA model. We used two experimental plant microbiome datasets with moderately and highly inflated zeros. Noting the inconsistencies in how the various tools handled taxa with group-wise structured zeros (all zero counts in one of the groups) and to create a common platform for comparison, in the simulation analysis, taxa with group-wise structured zeros were excluded from the differential abundance testing. In our simulation study, we mainly compared frequently used differential abundance methods such as DESeq2, edgeR, and limma-voom to their ZINBWaVE weighted counterparts, DESeq2-ZINBWaVE, edgeR-ZINBWaVE, and limma-voom- ZINBWaVE 10 , 22 . ANCOM-BC 24 , metagenomeSeq 16 , and MaAsLin2 8 were also examined to some extent. We investigated the finite-sample properties of these methods, focusing on their performance on false discovery and detection power. A range of sample and effect sizes were taken into account.

According to our simulation assessment, ZINBWaVE weighted DESeq2 or edgeR demonstrated reasonable power for detecting differential abundance in substantially zero-inflated microbiome data with moderate to large sample and effect sizes. Thus, utilizing weights to downweight excess zeros with the popular RNAseq methods is a useful strategy for analyzing microbiome data. However, like the other differential abundance tools, ZINBWaVE weighted approaches had low power in detecting spike-in taxa with small effects and small sample sizes. This highlights the need for conducting power analysis when planning microbiome investigations, which helps in determining sample sizes while keeping the required power in mind 8 , 10 . Moreover, inaccurate estimates of weights might have a negative impact on differential abundance detection. Here we considered ZINBWaWE-based weighting, which uses a zero-inflated negative binomial model, but alternative weighting schemes could be considered.

On the other hand, consistent with findings in previous studies, (i) type I error control was satisfactory under the null hypothesis of no differential abundance using the mock samples as well as the model-based simulated datasets when inflated zeros were properly accounted for; (ii) the FDR in all the methods we considered to identify spiked-in taxa were on average higher than the nominal level in the analysis of the zero-inflated microbiome data 8 . In this regard, while our simulation findings showed that employing weighted techniques to discover differential abundance is a step forward, controlling FDR remains a challenge in microbiome data analysis, necessitating continued refining of existing methods or the development of new ones.

Furthermore, the simulation experiments demonstrated that increasing sample size increased FDR in some of the differential abundance analysis methods evaluated. In particular, FDR control of edgeR-based methods were not improved by increasing sample size combined with large effect sizes. This could be a result of the bias involved in the estimated effects, which increases with sample size 24 . To this end, the implementation of methods based on penalized likelihood inference in model parameter estimation could help to alleviate the problem of inflated FDR control.

Another possible solution for reducing the inflated FDR would be proper exploitation of the hierarchical nature of microbiome data. Recent findings in the literature offer several strategies for leveraging hierarchical structure to boost the identification of differentially abundant species. In this regard, methods were introduced using smoothing p-values according to the phylogeny 13 or correlation tree 5 and using hierarchically adjusted p-values 5 , 49 , 50 . However, the inclusion of phylogenetic information in microbiome differential abundance analysis generated inconsistent results in terms of detection power and FDR control 5 , 13 necessitating the development of novel tools.

The performance of the combined approach was assessed using enrichment analysis. In comparison to the independent analyses performed by DESeq2-ZINBWaVE and edgeR-ZINBWaVE, the combined technique DESeq2-ZINBWaVE-DESeq2 discovered many correct enrichments.

Finally, DESeq2-ZINBWaVE-DESeq2 was applied to investigate the two plant microbiome datasets utilized as templates for the data simulation. Our new approach identified many potentially important taxa that might be further explored in terms of effect sizes and abundance in order to prioritize them for biological validation. In conclusion, the combined method DESeq2-ZINBWaVE-DESeq2 described in this study provides a promising development in the analysis of microbiome datasets displaying zero-inflation and group-wise structured zeros.

Data availability

All data sets generated and analyzed in the current study and the R-code used to analyze the data are available in GitHub public repository https://github.com/fenab/MicrobDifAb.git .

Song, C., Zhu, F., Carrión, V. J. & Cordovez, V. Beyond plant microbiome composition: Exploiting microbial functions and plant traits via integrated approaches. Front. Bioeng. Biotechnol. 8 , 896 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Abedini, D., Jaupitre, S., Bouwmeester, H. & Dong, L. Metabolic interactions in beneficial microbe recruitment by plants. Curr. Opin. Biotechnol. 70 , 241–247 (2021).

Article   CAS   PubMed   Google Scholar  

Trivedi, P., Leach, J. E., Tringe, S. G., Sa, T. & Singh, B. K. Plant–microbiome interactions: From community assembly to plant health. Nat. Rev. Microbiol. 18 , 607–621 (2020).

Turner, T. R., James, E. K. & Poole, P. S. The plant microbiome. Genome Biol. 14 , 209 (2013).

Bichat, A., Plassais, J., Ambroise, C. & Mariadassou, M. Incorporating phylogenetic information in microbiome differential abundance studies has no effect on detection power and FDR control. Front. Microbiol. https://doi.org/10.3389/fmicb.2020.00649 (2020).

Ma, S. et al. A statistical model for describing and simulating microbial community profiles. PLoS Comput. Biol. 17 , e1008913 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Cao, K.-A.L. et al. MixMC: A multivariate statistical framework to gain insight into microbial communities. PLoS One 11 , e0160169 (2016).

Mallick, H. et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Comput. Biol. 17 , e1009442 (2021).

Lin, H. & Peddada, S. D. Analysis of microbial compositions: A review of normalization and differential abundance analysis. NPJ Biofilms Microbiomes 6 , 1–13 (2020).

Article   Google Scholar  

Calgaro, M., Romualdi, C., Waldron, L., Risso, D. & Vitulo, N. Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data. Genome Biol. 21 , 191 (2020).

Nearing, J. T. et al. Microbiome differential abundance methods produce different results across 38 datasets. Nat. Commun. 13 , 342 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Smirnova, E., Huzurbazar, S. & Jafari, F. PERFect: PERmutation filtering test for microbiome data. Biostatistics 20 , 615–631 (2019).

Article   MathSciNet   PubMed   Google Scholar  

Xiao, J., Cao, H. & Chen, J. False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Bioinform. Oxf. Engl. 33 , 2873–2881 (2017).

Article   CAS   Google Scholar  

Cao, Q. et al. Effects of rare microbiome taxa filtering on statistical analysis. Front. Microbiol. 11 , 3203 (2021).

Jonsson, V., Österlund, T., Nerman, O. & Kristiansson, E. Modelling of zero-inflation improves inference of metagenomic gene count data. Stat. Methods Med. Res. 28 , 3712–3728 (2019).

Paulson, J. N., Stine, O. C., Bravo, H. C. & Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10 , 1200–1202 (2013).

Jiang, R., Sun, T., Song, D. & Li, J. J. Statistics or biology: The zero-inflation controversy about scRNA-seq data. Genome Biol. 23 , 31 (2022).

Zeng, Y., Li, J., Wei, C., Zhao, H. & Tao, W. mbDenoise: Microbiome data denoising using zero-inflated probabilistic principal components analysis. Genome Biol. 23 , 94 (2022).

Silverman, J. D., Roche, K., Mukherjee, S. & David, L. A. Naught all zeros in sequence count data are the same. Comput. Struct. Biotechnol. J. 18 , 2789–2798 (2020).

Jiang, R., Li, W. V. & Li, J. J. mbImpute: An accurate and robust imputation method for microbiome data. Genome Biol. 22 , 192 (2021).

Zhang, X., Mallick, H. & Yi, N. Zero-inflated negative binomial regression for differential abundance testing in microbiome studies. J. Bioinform. Genom. https://doi.org/10.18454/jbg.2016.2.2.1 (2016).

Van den Berge, K. et al. Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biol. 19 , 24 (2018).

Heinze, G., Schemper, M., Heinze, G. & Schemper, M. A solution to the problem of separation in logistic regression. Stat. Med. 21 , 2409–2419 (2002).

Article   PubMed   Google Scholar  

Lin, H. & Peddada, S. D. Analysis of compositions of microbiomes with bias correction. Nat. Commun. 11 , 3514 (2020).

Puhr, R., Heinze, G., Nold, M., Lusa, L. & Geroldinger, A. Firth’s logistic regression with rare events: Accurate effect estimates and predictions?. Stat. Med. 36 , 2302–2317 (2017).

Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 , 550 (2014).

Hawinkel, S., Mattiello, F., Bijnens, L. & Thas, O. A broken promise: Microbiome differential abundance methods do not control the false discovery rate. Brief. Bioinform. 20 , 210–221 (2019).

Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 , 139–140 (2010).

Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15 , R29 (2014).

Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9 , 284 (2018).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Mandal, S. et al. Analysis of composition of microbiomes: A novel method for studying microbial composition. Microb. Ecol. Health Dis. https://doi.org/10.3402/mehd.v26.27663 (2015).

Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: And this is not optional. Front. Microbiol. https://doi.org/10.3389/fmicb.2017.02224 (2017).

Greenacre, M., Martínez-Álvaro, M. & Blasco, A. Compositional data analysis of microbiome and any-omics datasets: A validation of the additive logratio transformation. Front. Microbiol. https://doi.org/10.3389/fmicb.2021.727398 (2021).

Brill, B., Amir, A. & Heller, R. Testing for differential abundance in compositional counts data, with application to microbiome studies. Ann. Appl. Stat. 16 , 2648–2671 (2022).

Article   MathSciNet   Google Scholar  

Fernandes, A. D. et al. Unifying the analysis of high-throughput sequencing datasets: Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2 , 15 (2014).

Zhou, H., He, K., Chen, J. & Zhang, X. LinDA: Linear models for differential abundance analysis of microbiome compositional data. Genome Biol. https://doi.org/10.1186/s13059-022-02655-5 (2021).

Kumar, M. S. et al. Analysis and correction of compositional bias in sparse sequencing count data. BMC Genom. 19 , 799 (2018).

Lubbe, S., Filzmoser, P. & Templ, M. Comparison of zero replacement strategies for compositional data with large numbers of zeros. Chemom. Intell. Lab. Syst. 210 , 104248 (2021).

McMurdie, P. J. & Holmes, S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 8 , e61217 (2013).

Chen, L. et al. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data. PeerJ 6 , e4600 (2018).

Xia, Y. Statistical normalization methods in microbiome data with application to microbiome cancer research. Gut Microbes 15 , 2244139 (2023).

Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5 , 27 (2017).

McMurdie, P. J. & Holmes, S. Waste not, want not: Why rarefying microbiome data is inadmissible. PLOS Comput. Biol. 10 , e1003531 (2014).

Lun, L., Bach, A. T. K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17 , 75 (2016).

Palarea-Albaladejo, J. & Martín-Fernández, J. A. zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemom. Intell. Lab. Syst. 143 , 85–96 (2015).

Hron, K., Templ, M. & Filzmoser, P. Imputation of missing values for compositional data using classical and robust methods. Comput. Stat. Data Anal. 54 , 3095–3107 (2010).

Thorsen, J. et al. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome 4 , 62 (2016).

The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486 , 207–214 (2012).

Article   ADS   PubMed Central   Google Scholar  

Yekutieli, D. Hierarchical false discovery rate-controlling methodology. J. Am. Stat. Assoc. 103 , 309–316 (2008).

Article   MathSciNet   CAS   Google Scholar  

Hu, J. et al. A two-stage microbial association mapping framework with advanced FDR control. Microbiome 6 , 131 (2018).

Download references

Acknowledgements

We acknowledge funding by the Dutch Research Council (NWO/OCW) for the MiCRop Consortium program, Harnessing the second genome of plants (Grant number 024.004.014; to HB, LD, DA, AKS, JAW, FVE and FA), the Dutch Research Council (NWO-TTW grant 16873 Holland Innovative Potato; to HB, LD and DA), the ERC (Advanced grant CHEMCOMRHIZO, 670211; to HB, AZ and AG) and the Data Science Centre of the University of Amsterdam (to FW).

Author information

Authors and affiliations.

Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands

Fentaw Abegaz, Davar Abedini, Fred White, Alessandra Guerrieri, Lemeng Dong, Johan A. Westerhuis, Harro Bouwmeester & Age K. Smilde

Biometris, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands

Fentaw Abegaz & Fred van Eeuwijk

IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France

Anouk Zancarini

You can also search for this author in PubMed   Google Scholar

Contributions

FA, AKS, JAW and FVE conceptualized the methodology and study design. FA implemented the method, performed the analysis and wrote the draft manuscript. AKS, JAW and FVE provided feedback throughout the manuscript preparation. DA, FW, AG, AZ, LD and HB designed the plant experiments; processed, sequenced and provided the microbiome data. All authors provided critical revisions and approved the final manuscript.

Corresponding author

Correspondence to Fentaw Abegaz .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Abegaz, F., Abedini, D., White, F. et al. A strategy for differential abundance analysis of sparse microbiome data with group-wise structured zeros. Sci Rep 14 , 12433 (2024). https://doi.org/10.1038/s41598-024-62437-w

Download citation

Received : 28 August 2023

Accepted : 16 May 2024

Published : 30 May 2024

DOI : https://doi.org/10.1038/s41598-024-62437-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

scientific method hypothesis alternative

  • Open access
  • Published: 24 May 2024

Rosace : a robust deep mutational scanning analysis framework employing position and mean-variance shrinkage

  • Jingyou Rao 1 ,
  • Ruiqi Xin 2   na1 ,
  • Christian Macdonald 3   na1 ,
  • Matthew K. Howard 3 , 4 , 5 ,
  • Gabriella O. Estevam 3 , 4 ,
  • Sook Wah Yee 3 ,
  • Mingsen Wang 6 ,
  • James S. Fraser 3 , 7 ,
  • Willow Coyote-Maestas 3 , 7 &
  • Harold Pimentel   ORCID: orcid.org/0000-0001-8556-2499 1 , 8 , 9  

Genome Biology volume  25 , Article number:  138 ( 2024 ) Cite this article

391 Accesses

6 Altmetric

Metrics details

Deep mutational scanning (DMS) measures the effects of thousands of genetic variants in a protein simultaneously. The small sample size renders classical statistical methods ineffective. For example, p -values cannot be correctly calibrated when treating variants independently. We propose Rosace , a Bayesian framework for analyzing growth-based DMS data. Rosace leverages amino acid position information to increase power and control the false discovery rate by sharing information across parameters via shrinkage. We also developed Rosette for simulating the distributional properties of DMS. We show that Rosace is robust to the violation of model assumptions and is more powerful than existing tools.

Understanding how protein function is encoded at the residue level is a central challenge in modern protein science. Mutations can cause diseases and drive evolution through perturbing protein function in a myriad of ways, such as by altering its conformational ensemble and stability or its interaction with ligands and binding partners. In these contexts, mutations may result in a loss of function, gain of function, or a neutral phenotype (i.e., no discernable effects). Mutations also often exert effects across multiple phenotypes, and these perturbations can ultimately propagate to alter complex processes in cell biology and physiology. Reverse genetics approaches offer a powerful handle for researchers to investigate biology via introducing mutations and observing the resulting phenotypic changes.

Deep mutational scanning (DMS) is a technique for systematically determining the effect of a large library of mutations individually on a phenotype of interest by performing pooled assays and measuring the relative effects of each variant (Fig.  1 A) [ 1 , 2 , 3 ]. It has improved clinical variant interpretation [ 4 ] and provided insights into the biophysical modeling and mechanistic models of genetic variants [ 5 ]. Taking enzymes as an example, these phenotypes could include catalytic activity [ 6 ] or stability [ 7 , 8 ]. For a transcription factor, the phenotype could be DNA binding specificity or transcriptional activity [ 9 ]. The relevant phenotype for a membrane transporter might be folding and trafficking or substrate transport [ 10 ]. These phenotypes are often captured by growth-based [ 7 , 10 , 11 , 12 , 13 , 14 , 15 , 16 ], binding-based [ 9 , 17 , 18 ], or fluorescence-based assays [ 8 , 10 , 19 ]. Those experiments are inherently differently designed and merit separate analysis frameworks. In growth-based assays, the relative growth rates of cells are of interest. In a binding-based assay, the selection probabilities are of interest. In fluorescence-based assays, changes to the distribution of reporter gene expression are measured. In this paper, we focus solely on growth-based screens.

figure 1

Deep mutational scanning and overview of Rosace  framework. A Each amino acid of the selected protein sequence is mutated to another mutant in deep mutational scanning. B Cells carrying different variants are grown in the same pool under selection pressure. At each time point, cells are sequenced to output the count table. Replications can be produced either pre-transfection or post-transfection. C Rosace is an R package that accepts input from the raw sequencing count table and outputs the posterior distribution of functional score

In a growth-based DMS experiment, we grow a pool of cells carrying different variants under a selective pressure linked to gene function. At set intervals, we sequence the cells to identify each variant’s frequency in the pool. The change in the frequency over the course of the experiment, from initial frequencies to subsequent measurements, serves as a metric of the variant’s functional effects (Fig.  1 B). The functional score is often computed for each variant in the DMS screen and compared against those of synonymous mutations or wild-type cells to display the relative functional change of the protein caused by the mutation. Thus, reliable inference of functional scores is crucial to understanding both individual mutations and at which residue location variants tend to have significant functional effects.

The main challenge of functional score inference is that even under the simplest model, there are at least two estimators required for each mutation (mean and variance of functional change), and in practice, it is rare to have more than three replicates. As a result, it has been posited that under naïve estimators that have been commonly employed, there are likely issues with the false discovery rate and the statistical power of detecting mutations that significantly change the function of the protein [ 20 ]. Regardless, incorporating domain-specific assumptions is required to make inference tractable with few samples and thousands of parameters.

To alleviate the small-sample-size inference problem in DMS, four commonly used methods have been developed: dms_tools [ 21 ], Enrich2 [ 18 ], DiMSum [ 20 ], and EMPIRIC [ 22 ]. dms_tools uses Bayesian inference for reliable inference. However, rather than giving a score to each variant, dms_tools generates a score for each amino acid at each position, assuming linear addition of multiple mutation effects and ignoring epistasis coupling. Thus, dms_tools is not directly comparable to other methods and is excluded from our benchmarking analysis. Enrich2 simplifies the variance estimator by assuming that counts are Poisson-distributed (the variance being equal to the mean) and combines the replicates using a random-effect model. DiMSum , however, argues that the assumption in Enrich2 is not enough to control type-I error. As a result, DiMSum builds upon Enrich2 and includes additional variance terms to model the over-dispersion of sequencing counts. However, as presented in Faure et al. 2020 [ 20 ], this ratio-based method only applies to the DMS screen with one round of selection, while many DMS screens have more than two rounds of selection (i.e., sampling at multiple time points) [ 10 , 11 , 23 ]. Alternatively, EMPIRIC fits a Bayesian model that infers each variant separately with non-informative uniform prior to all parameters and thus does not shrink the estimates to robustly correct the variance in estimates due to the small sample size. Further, the model does not accommodate multiple replicates. In addition, mutscan [ 24 ], a recently developed R package for DMS analysis, employed two established statistical models edgeR and limma-voom . However, these two methods were originally designed for RNA-seq data and the data generation process for DMS is very different. One of the key differences is consistency among replicates. In RNA-seq, gene expression is relatively consistent across replicates under the same condition, while in DMS, counts of variants can vary much since the a priori representation in the initial variant library can be vastly inconsistent among replicates.

While these methods provide reasonable regularization of the score’s variance, additional information can further improve the prior. One solution is incorporating residue position information. It has been noted that amino acids in particular regions have an oversized effect on the protein’s function, and other frameworks have incorporated positions for various purposes. In the form of hidden Markov models (HMMs) and position-specific scoring matrices (PSSMs), this is the basis for the sensitive detection of homology in protein sequences [ 25 ]. These results directly imply that variants at the same position likely share some similarities in their behavior and thus that incorporating local information into modeling might produce more robust inferences. However, no existing methods have incorporated residue position information into their models yet.

To overcome these limitations, we present Rosace , the first growth-based DMS method that incorporates local positional information to increase inference performance. Rosace implements a hierarchical model that parameterizes each variant’s effect as a function of the positional effect, thus providing a way to incorporate both position-specific information and shrinkage into the model. Additionally, we developed Rosette , a simulation framework that attempts to simulate several properties of DMS such as bimodality, similarities in behavior across similar substitutions, and the overdispersion of counts. Compared to previous simulation frameworks such as the one in Enrich2 , Rosette uses parameters directly inferred from the specific input experiment and generates counts that reflect the true level of noise in the real experiment. We use Rosette to simulate several screening modalities and show that our inference method, Rosace , exhibits higher power and controls the false discovery rate (FDR) better on average than existing methods. Importantly, Rosace and Rosette are not two views of the same model— Rosette is based on a set of assumptions that are different from or even opposite to those of Rosace . Rosace ’s ability to accommodate data generated under different assumptions shows its robustness. Finally, we run Rosace on real datasets and it shows a much lower FDR than existing methods while maintaining similar power on experimentally validated positive controls.

Overview of Rosace  framework

Rosace is a Bayesian framework for analyzing growth-based deep mutational scanning data, producing variant-level estimates from sequencing counts. The full (position-aware) method requires as input the raw sequencing counts and the position labels of variants. It outputs the posterior distribution of variants’ functional scores, which can be further evaluated to conduct hypothesis testing, plotting, and other downstream analyses (Fig.  1 C). If the position label is hard to acquire with heuristics, for example, in the case of random multiple-mutation data, position-unaware Rosace model can be run without position label input. Rosace is available as an R package. To generate the input of Rosace from sequencing reads, we share a Snakemake workflow dubbed Dumpling for short-read-based experiments in the GitHub repository described in the “ Methods ” section. Additionally, Rosace supports input count data processed from Enrich2 [ 18 ] for other protocols such as barcoded sequencing libraries.

Rosace  hierarchical model with positional information and score shrinkage

Here, we begin by motivating the use of positional information. Next, we describe the intuition of how we use the positional information. Finally, we describe the remaining dimensions of shrinkage which assist in robust estimates with few experiment replicates.

A variant is herein defined as the amino acid identity at a position in a protein, where that identity may differ from the wild-type sequence. In this context, synonymous, missense, nonsense, and indel variants are all considered and can be processed by Rosace (see the “ Methods ” section for details). The sequence position of a variant p ( v ) provides information on the functional effects to the protein from the variant. We define the position-level functional score \(\phi _{p(v)}\) as the mean functional score of all variants on a given position.

To motivate the use of positional information, we take the posterior distribution of the position-level functional score estimated from a real DMS experiment, a cytotoxicity-based growth screen of a human transporter, OCT1 (Fig.  2 A). In this experiment, variants with decreased activity are expected to increase in abundance, as they lose the ability to import a cytotoxic substrate during selection, and variants with increased activity will decrease in abundance similarly. We observe that most position-level score estimates \(\widehat{\phi }_{p(v)}\) significantly deviate from the mean, implying that position has material idiosyncratic variation and thus carries information about the protein’s functional architecture.

figure 2

Rosace shares information at the same position to inform variant effects. A Smoothed position-specific score (sliding window = 5) across positions from OCT1 cytotoxicity screen. Red dotted lines at score = 0 (neutral position). B A conceptual view of the Rosace generative model. Each position has an overall effect, from which variant effects are conferred. Note the prior is wide enough to allow effects that do not follow the mean. Wild-type score distribution is assumed to be at 0. C Plate model representation of Rosace . See the “ Methods ” section for the description of parameters

To incorporate the positional information into our model, we introduce a position-specific score \(\phi _{p(v)}\) where p ( v ) maps variant v to its amino acid position. The variant-specific score \(\beta _v\) is regularized and controlled by the value of \(\phi _{p(v)}\) . To illustrate the point, we conceptually categorize position into three types: positively selected ( \(\phi _{p(v)} \gg 0\) ), (nearly) neutral ( \(\phi _{p(v)} \approx 0\) ), and negatively selected ( \(\phi _{p(v)} \ll 0\) ) (Fig.  2 B). Variants in a positively selected position tend to have scores centered around the positive mean estimate of \(\phi _{p(v)}\) , and vice versa for the negatively selected position. Variants in a neutral position tend to be statistically non-significant as the region might not be important to the measured phenotype.

Regularization of the score’s variance is achieved mainly by sharing information across variants within the position and asserting weakly informative priors on the parameters (Fig.  2 C). Functional scores of the variants within the position are drawn from the same set of parameters \(\phi _{p(v)}\) and \(\sigma _{p(v)}\) . The error term \(\epsilon _{g(v)}\) in the linear regression on normalized counts is also shared in the mean count group (see the “ Methods ” section) to prevent biased estimation of the error and incorporate mean-variance relationship commonly modeled in RNA-seq [ 26 , 27 ]. Importantly, while we use the position information to center the prior, the prior is weak enough to allow variants at a position to deviate from the mean. For example, we show that the nonsense variants indeed deviate from the positional mean (Additional file 1: Fig. S3). The variant-level intercept \(b_v\) is given a strong prior with a tight distribution centered at 0 to prevent over-fitting.

Rosace  performance on various datasets

To test the performance of Rosace , we ran Rosace along with Enrich2 , mutscan (both limma-voom and edgeR ), DiMSum , and simple linear regression (the naïve method) on the OCT1 cytotoxicity screen. DiMSum cannot analyze data with three selection rounds, so we ran DiMSum with only the first two time points. The data is pre-processed with wild-type normalization for all three methods. The analysis is done on all subsets of three replicates ( \(\{1\}, \{2\}, \{3\}, \{1,2\}, \{1,3\}, \{2,3\}, \{1,2,3\}\) ).

While we do not have a set of true negative control variants, we assume most synonymous mutations would not change the phenotype, and thus, we use synonymous mutation as a proxy for negative controls. We compute the percentage of significant synonymous mutations called by the hypothesis testing as one representation of the false discovery rate (FDR). The variants are ranked based on the hypothesis testing statistics from the method ( p -value for frequentist methods and local false sign rate [ 28 ], or lfsr ) for Bayesian methods). In an ideal scenario with no noise, the line of ranked variants by FDR is flat at 0 and slowly rises after all true variants with effect are called. Rosace has a very flat segment among the top 25% of the ranked variants compared to DiMSum , Enrich2 , and the naïve method and keeps the FDR lower than mutscan(limma) and mutscan(edgeR) until the end (Fig.  3 A). Importantly, we note that the Rosace curve moves only slightly from 1 replicate to 3 replicates, while the other methods shift more, implying that the change in the number of synonymous mutations called is minor for Rosace , despite having fewer replicates (Fig.  3 A).

figure 3

False discovery rate and sensitivity on OCT1 cytotoxicity data. A Percent of synonymous mutations called (false discovery rate) versus ranked variants by hypothesis testing. The left panel is from taking the mean of analysis of the three individual replicates. Ideally, the line would be flat at 0 until all the variants with true effects are discovered. B Number of validated variants called (in total 10) versus number of replicates. If only 1 or 2 replicates are used, we iterate through all possible combinations. For example, the three points for Rosace on 2 replicates use Replicate \(\{1, 2\}\) , \(\{1, 3\}\) , and \(\{2, 3\}\) respectively. (DiMSum can only process two time points, and thus is disadvantaged in experiments such as OCT1)

While lower FDR may result in lower power in the method, we show that Rosace is consistently powerful in detecting the OCT1-positive control variants. Yee et al. [ 10 ] conducted lower-throughput radioligand uptake experiments in HEK293T cells and validated 10 variants that have a loss-of-function or gain-of-function phenotype. We use the number of validated variants to approximate the power of the method. As shown in Fig.  3 B, Rosace has comparable power to Enrich2 , mutscan(limma) , and mutscan(edgeR) regardless of the number of replicates, while the naïve method is unable to detect anything in the case of one replicate. Rosace calls significantly fewer synonymous mutations than every other method while maintaining high power, showing that Rosace is robust in real data.

In OCT1, loss of function leads to enrichment rather than depletion, which is relatively uncommon. To complement findings on OCT1, we conducted a similar analysis on the kinase MET data [ 11 ] (3 replicates, 3 selection rounds), whose loss of function leads to depletion. Applied to this dataset, Rosace and its position-unaware version have comparable power to Enrich2 , mutscan(limma) , and mutscan(edgeR) with any number of replicates used, and the naïve method remains less powerful than other methods, especially with one replicate only. Consistent with OCT1, Rosace again calls fewer synonymous mutations and better controls the false discovery rate. The results are visualized in the Supplementary Figures (Additional file 1: Figs. S12-15).

To test Rosace performance on diverse datasets, we also ran all methods on the CARD11 data [ 14 ] (5 replicates, 1 selection round), the MSH2 data [ 12 ] (3 replicates, 1 selection round), the BRCA1 data [ 13 ] (2 replicates, 2 selection rounds), and the BRCA1-RING data [ 23 ] (6 replicates, 5 selection rounds) (Table S1). In addition to those human protein datasets, we also applied Rosace to a bacterial protein, Cohesin [ 29 ] (1 replicate, 1 selection round) (Table S1). We use the pathogenic and benign variants in ClinVar [ 30 ], EVE [ 31 ], and AlphaMissense [ 32 ] to provide a proxy of positive and negative control variants. Rosace consistently shows high sensitivity in detecting the positive control variants in all three datasets while controlling the false discovery rate (Additional file 1: Figs. S5-S11). Noting that the number of clinically verified variants is limited and those identified in the prediction models usually have extreme effects, we do not observe a large difference between the methods’ performance.

To alleviate a potential concern that the position-level shrinkage given by Rosace is too large, we plot the functional scores calculated by Rosace against those by Enrich2 across several DMS datasets (Additional file 1: Figs. S2-4). We find that the synonymous variants’ functional scores are similar in magnitude to those of other variants, so synonymous variants are not shrunken too strongly to zero. We also find that stop codon and indel variants have consistently significant effect scores, implying that position-level shrinkage is not so strong that those variants’ effects are neutralized. This result implies that the position prior benefits the model mainly through a more stable standard error estimate enabling improved prioritization as a function of local false sign rate or other posterior ranking criteria that are a function of the variance.

Rosette : DMS data simulation which matches marginal distributions from real DMS data

To further benchmark the performance of Rosace and other related methods, we propose a new simulation framework called Rosette , which generates DMS data using parameters directly inferred from the real experiment to gain the flexibility of mimicking the overall structure of most growth-based DMS screen data (Fig.  4 A).

figure 4

Rosette simulation framework preserves the overall structure of growth-based DMS screens. The plots show the result of using OCT1 data as input. A Rosette generates summary statistics from real data and simulates the sequencing count. B Generative model for Rosette simulation. C The distribution of real and predicted functional scores is similar. D , E Five summary statistics are needed for Rosette

Intuitively, if we construct a simulation that closely follows the assumptions of our model, our model should have outstanding performance. To facilitate a fair comparison with other methods, the simulation presented here is not aligned with the assumptions made in Rosace . In fact, the central assumption that variant position carries information is violated by construction to showcase the robustness of Rosace .

To re-clarify the terminology used throughout this paper, “mutant” refers to the substitution, insertion, or deletion of amino acids. A position-mutant pair is considered a variant. Mutants are categorized into mutant groups with hierarchical clustering schemes or predefined criteria (our model uses the former that are expected to align with the biophysical properties of amino acids). Variants are grouped in two ways: (1) by their functional change to the protein, namely neutral, loss-of-function (LOF), or gain-of-function (GOF), referred to as “variant groups,” and (2) by the mean of the raw sequencing counts across replicates, referred to as “variant mean groups.”

Rosette calculates two summary statistics from the raw sequencing counts (dispersion of the sequencing count \(\eta\) and dispersion of the variant library \(\eta _0\) ) (Fig.  4 D) and three others from the score estimates (the proportion of each mutant group \(\varvec{p}\) , the functional score’s distribution of each variant group \(\varvec{\theta }\) , and the weight of each variant group \(\varvec{\alpha }\) ) (Fig.  4 E). Since we are only learning the distribution of the scores instead of the functional characteristics of individual variants, the score estimates can be naïve (e.g., simple linear regression) or more complicated (e.g.,  Rosace ).

The dispersion of the sequencing counts \(\eta\) measures how much variability in variant representation there is in the entire experimental procedure, during both cell culture and sequencing. When \(\eta\) goes to infinity, it means that the sequencing count is almost the same as the expected true cell count (no over-dispersion). When \(\eta\) is small, it shows an over-dispersion of the sequencing count. In an ideal experiment with no over-dispersion, the proportion of synonymous mutations should be invariant to time due to the absence of functional changes. However, from the real data, we have observed a large variability of proportion changes within the synonymous mutations at different selection rounds, which is attributed to over-dispersion and cannot be explained by a simple multinomial distribution in existing simulation frameworks (Additional file 1: Fig. S1). Indeed, all methods, including the naïve method, achieve near-perfect performance in the Enrich2 simulations with a correlation score greater than 0.99 (Additional file 1: Fig. S27). Therefore, we choose to model the sequencing step with a Dirichlet-Multinomial distribution that includes \(\eta\) as the dispersion parameter.

The dispersion of variant library \(\eta _0\) measures how much variability already exists in variant representation before the cell selection. Theoretically, each variant would have around the same number of cells at the initial time point. However, due to the imbalance during the variant library generation process and the cell culture of the initial population that might already be under selection, we sometimes see a wide dispersion of counts across variants. To estimate this dispersion, we fit a Dirichlet-Multinomial distribution under the assumption that the variants in the cell pool at the initial time point should have equal proportions.

The distribution and the structure of the underlying true functional score across variants are controlled by the rest of the summary statistics. We make a few assumptions here. First, the functional score distribution of mutants across positions (or a row in the heatmap (Fig.  4 A)) is different, but within the mutant group, the mutants are independent and identically distributed (or exchangeable). We estimate the mutant group by hierarchical clustering with distance defined by empirical Jenson-Shannon Divergence and record its proportion \(\hat{\varvec{p}}\) . Second, each variant belongs to the neutral hypothesis (score close to 0, similar to synonymous mutations) or the alternative hypothesis (away from 0, different from synonymous mutations). The number of the variant group can be 1–3 (neutral, GOF, and LOF) based on the number of modes in the marginal functional score distribution, and the variants within a variant group are exchangeable. We estimate the borderline of the variant group by Gaussian mixture clustering and fit the distribution parameter \(\hat{\varvec{\theta }}\) . Finally, we assume that the positions are independent. While this is a simplifying assumption, to consider the relationship between positions, we would need to incorporate additional assumptions about the functional region of the protein. As a result, we treat the positions as exchangeable and model the proportion of variant group identity (neutral, GOF, LOF) in each mutant group by a Dirichlet distribution with parameter \(\hat{\varvec{\alpha }}\) .

To simulate the sequencing count from the summary statistics, we use a generative model that mimics the experiment process and is completely different from the Rosace inference model for fair benchmarking. We first draw the functional score of each variant \(\beta _v\) from the structure described in the summary statistics and the ones in the neutral group are set to be 0. Then, we map the functional score to its latent functional parameters: the cell growth rate in the growth screen. Next, we generate the cell count at a particular time point \(N_{v,t,r}\) by the cell count at the previous time point \(N_{v,t-1,r}\) and the latent functional parameters. Finally, the sequencing count is generated from a Dirichlet-Multinomial distribution with the summarized dispersion parameter and the cell count.

The simulation result shows that the simulated functional score distribution is comparable to the real experimental data (Fig.  4 C). We also demonstrate that the simulation is not particularly favorable to models containing positional information such as Rosace . From Fig.  4 E, we observe that in the simulation, the positional-level score is not as widespread as the real data. In addition, the positions with extreme scores (very positive scores in the OCT1 dataset) have reduced standard deviation in the real data, but not in the simulation (Additional file 1: Figs. S18d, S19d, S20d). As a result, we would expect the performance of Rosace to be better in real data than in the simulation.

Testing Rosace  false discovery control with Rosette  simulation

To test the performance of Rosace , we generate simulated data using Rosette from two distinctive growth-based assays: the transporter OCT1 data where LOF variants are positively selected [ 10 ] and the kinase MET data where LOF variants are negatively selected [ 11 ]. We further included the result of a saturation genome editing dataset CARD11 [ 14 ] in Additional file 1: Figs. S17-23. The OCT1 DMS screen measures the impact of variants on cytotoxic drug SM73 uptake mediated by the transporter OCT1. If a mutation causes the transporter protein to have decreased activity, the cells in the pool will import less substrate and thus die more slowly than wide-type or those with synonymous mutations, so the LOF variants would be positively selected. In the MET DMS screen, the kinase drives proliferation and cell growth in the BA/F3 mammalian cell line in the absence of IL-3 (interleukin-3) withdrawal. If the variant protein fails to function, the cells will die faster than the wild-type cells, so the LOF variants will be negatively selected. Both data sets have a clear separation of two modes in the functional score distribution (neutral and LOF) (Additional file 1: Figs. S18a, S19a). We benchmark Rosace with Enrich2 , mutscan(edgeR) , mutscan(limma) , and the naïve method in scenarios where we use 1 or all 3 of replicates and 1 or all 3 of selection rounds. DiMSum is benchmarked when there is only one round of selection because it is not designed to handle multiple rounds. Each scenario is repeated 10 times. The results of all methods show similar correlations with the latent growth rates (Additional file 1: Fig. S21), and thus, for benchmarking purposes, we focus on hypothesis testing.

We compare methods from a variant ranking point of view, comparing methods in terms of the number of false discoveries for any given number of variants selected to be LOF. This is because Rosace is a Bayesian framework that uses lfsr instead of p -values as the metric for variant selection and it is hard to translate lfsr to FDR for a hard threshold. Variants are ranked by adjusted p -values or lfsr (ascending). Methods that perform well will rank the truly LOF variants in the simulation ahead of non-LOF variants. In an ideal scenario with no noise, we would expect the line of ranked variants by FDR to be flat at 0 and slowly rise after all LOF variants are called. The results in Fig.  5 show that even though the position assumption is violated in the Rosette simulation, Rosace is robust enough to maintain a relatively low FDR in all simulation conditions.

figure 5

Benchmark of false discovery control on Rosette simulation. Variants are ranked by hypothesis testing (adjusted p-values or lfsr ). The false discovery rate at each rank is computed as the proportion of neutral variants assuming all the variants till the rank cutoff are called significant. R is the number of replicates and T is the number of selection rounds. MET data is used for negative selection and OCT1 data for positive selection. Ideally, the line would be flat at 0 until the rank where all variants with true effects are discovered. (DiMSum can only process two time points and thus is disadvantaged in experiments with more than two time points, or one selection round)

Testing Rosace  power with Rosette  simulation

Next, we investigate the sensitivity of benchmarking methods at different FDR or lfsr cutoff. It is important to keep in mind that Rosace uses raw lfsr from the sampling result while all other methods use the Benjamini-Hochberg Procedure to control the false discovery rate. As a result, the cutoff for Rosace is on a different scale.

Rosace is the only method that displays high sensitivity in all conditions with a low false discovery rate. In the case of one selection round and three replicates ( \(T = 1\) and \(R = 3\) ), mutscan(edgeR) and mutscan(limma) do not have the power to detect any significant variants with the FDR threshold at 0.1. The same scenario occurs with DiMSum at negative selection and the naïve method at \(T = 3\) and \(R = 1\) (Fig.  6 ). The naïve method in general has very low power, while Enrich2 has a very inflated FDR.

figure 6

Benchmark of sensitivity versus FDR. The upper row is simulated from a modified version of Rosette simulation to favor position-informed models. The bottom row is the results from standard Rosette . Circles, triangles, squares, and crosses represent LOF variant selection at adjusted p-values or lfsr of 0.001, 0.01, 0.05, and 0.10, respectively. Variants with the opposite sign of selection are then excluded. Ideally, for all methods besides Rosace , each symbol would lie directly above the corresponding symbol on the x-axis indicating true FDR. For Rosace , lfsr has no direct translation to FDR so the cutoff represented by the shape is theoretically on a different scale. (DiMSum can only process two time points, and thus is disadvantaged in experiments with more than two time points, or one selection round)

We benchmark Rosace on both Rosette simulations, which inherently violate the position assumption, and a modified version of Rosette that favors the position-informed model. We show that model misspecification does increase the false discovery rate of Rosace , but Rosace is robust enough to outperform all other methods (except for DiMSum with \(T = 1\) and \(R = 3\) and positive selection) even when the position assumption is strongly violated (Fig.  6 ).

One of Rosace ’s contributions is accounting for positional information in DMS analysis. The model assumes the prior information that variants on the same position have similar functional effects, resulting in higher sensitivity and better FDR. Furthermore, Rosace is also capable of incorporating other types of prior information on the similarity of variants.

Despite the value of positional information in statistical inference as demonstrated in this paper, it is unclear how multiple random mutations should be position-labeled. In this case, simple position heuristics are often unsatisfying, and one might argue that a position scalar should not cluster the variants in random mutagenesis experiments with large-scale in-frame insertion and deletion, such as those on viruses. These types of experiments are not the focus of this paper, but are still very important and require careful future research.

Another critique of Rosace is the extent of bias we introduce into the score inference through position-prior information. While it is certainly possible to introduce a large bias, Rosace was developed to be a robust model ensuring near-unbiased inference or prediction even when assumptions are not precisely complied with or even violated. We demonstrate the robustness of Rosace through our data simulation framework, Rosette . The generative procedures of Rosette explicitly violate the prior assumptions made by Rosace , but even with Rosette ’s data, Rosace can learn important information. We also show that the position-level shrinkage is not strong using real data, further manifesting the robustness of Rosace .

The development of DMS simulation frameworks such as Rosette can also drive experimental design. For example, to select the best number of time points and replicates with regard to the trade-off between statistical robustness and costs of the experiment, an experimentalist can conduct a pilot experiment and use its data to infer summary statistics through Rosette . Rosette will then generate simulations close to a real experiment. Experimentalists can find the optimal tool for data analysis given an experimental design by applying candidate tools to the simulation data. Similarly, given a data analysis framework, experimentalists can choose from multiple experiment designs by using Rosace to simulate all those experiments and observe if any designs have enough power to detect most of the LOF or GOF variants with a low false discovery rate.

This paper only applies our tool to growth screens, one of several functional phenotyping methods possible by DMS techniques. Another possibility is the binding experiment, where a portion of cells are selected at each time point. In this case, the expectation of functional scores computed by Rosace is a log transformation of the variant’s selection proportion [ 18 ], and one could potentially use Rosace for DMS analysis as in Enrich2 . The third method is fluorescently activated cell sorting (FACS-seq)—a branch of literature uses binned FACS-seq screens to sort the variant libraries based on protein phenotypes. Since the experiment has multiple bins, one can potentially capture the distributional change of molecular properties beyond mean shifting [ 8 , 10 , 19 , 33 ]. Although of different design, FACS-seq-based screens can also be analyzed using a framework similar to Rosace . Building such frameworks incorporating prior information for experiments beyond growth screens enables the community to exploit a wider range of experimental data.

As the function of a protein is rarely one-dimensional, one can measure multiple phenotypes of a variant in a set of experiments [ 10 , 16 , 34 ]. For example, the OCT1 data mentioned earlier [ 10 ] measures both the transporter surface expression from a FACS-seq screen and drug cytotoxicity with a growth screen. Multi-phenotype DMS experiments also call for analysis frameworks to accommodate multidimensional outcomes by modeling the interaction or the correlation of phenotypes of each variant. One successful attempt models the causal biophysical mechanism of protein folding and binding [ 35 ], and there are many more protein properties other than those two. A unifying framework for the multi-phenotype analysis remains unsolved and challenging. One needs to account for different experimental designs to directly compare scores between phenotypes, and carefully select inferred features most relevant to the scientific questions, requiring both efforts from the experimental and computational side. Nevertheless, we believe that the multi-phenotype analysis will eventually guide us to develop better mechanistic or probabilistic models for how mutations drive proteins in evolution, how they lead to malfunction and diseases, and how to better engineer new proteins.

Conclusions

We present Rosace , a Bayesian framework for analyzing growth-based deep mutational scanning data. In addition, we develop Rosette , a simulation framework that recapitulates the properties of actual DMS experiments, but relies on an orthogonal data generation process from Rosace . From both simulation and real data analysis, we show that Rosace has better FDR control and higher sensitivity compared to existing methods and that it provides reliable estimates for downstream analyses.

Pipeline: raw read to sequencing count

To facilitate the broader adoption of the Rosace framework for DMS experiments, we have developed a sequencing pipeline for short-read-based experiments using Snakemake which we dub Dumpling [ 36 ]. This pipeline handles directly sequenced single-variant libraries containing synonymous, missense, nonsense, and multi-length indel mutations, going from raw reads to final scores and quality control metrics. Raw sequencing data in the form of fastq files is first obtained as demultiplexed paired-end files. The user then defines the experimental architecture using a csv file defining the conditions, replicates, and time points corresponding to each file, which is parsed along with a configuration file. The reads are processed for quality and contaminants using BBDuk, and then the paired reads are error-corrected using BBMerge. The cleaned reads are then mapped onto the reference sequence using BBMap [ 37 ]. Variants in the resulting SAM file are called and counted using the AnalyzeSaturationMutagenesis tool in GATK v4 [ 38 ]. This tool provides a direct count of the number of times each distinct genotype is detected in an experiment. We generate various QC metrics throughout the process and combine them using MultiQC for an easy-to-read final overview [ 39 ].

Due to the degeneracy of indel alignments, the genotyping of codon-level deletions sometimes does not hew to the reading frame due to leftwise alignment. Additionally, due to errors in oligo synthesis, assembly, during in vivo passaging or during sequencing, some genotypes that were not designed as part of the library may be introduced. A fundamental assumption of DMS is the independence of individual variants, and so to reduce noise and eliminate error, our pipeline removes those that were not part of our planned design before analysis, as well as renames variants to be consistent at the amino acid level, before exporting the variant counts in a format for Rosace .

Pre-processing of sequencing count

In a growth DMS screen with V variants, we define v to be the variant index. A function p ( v ) maps the variant v to its position label. T indicates the number of selection rounds and index t is an integer ranging from 0 to T . A total of R replicates are measured, with r as the replicate index. We denote \(c_{v,t,r}\) the raw sequencing count of cells with variant v at time point t in replicate r .

In addition, “mutant” refers to substitution with one of the 20 amino acids, insertion of an amino acid, or deletion. Thus, a variant is uniquely identified by its mutant and the position where the mutant occurs ( p ( v )).

The default pre-processing pipeline of Rosace includes four steps: variant filtering, count imputation, count normalization, and replicate integration. First, variants with more than 50% of missing count data are filtered out in each replicate. Then, variants with a few missing data (less than 50%) are imputed using either the K-nearest neighbor averaging ( K = 10) or filled with 0. Next, imputed raw counts are log-transformed with added pseudo-count 1/2 and normalized by the wild-type cells or the sum of sequencing counts for synonymous mutations. This step, which is proposed by Enrich2 , allows for the computed functional score of wild-type cells to be approximately 0. Additionally, the counts for each variant before selection are aligned to be 0 for simple prior specification of the intercept.

Previous papers suggest the usage of other methods such as total-count normalization when the wild-type is incorrectly estimated or subject to high levels of error [ 18 , 20 ]. We include this in Rosace as an option. Finally, replicates in the same experiment are joined together for the input of the hierarchical model. If a variant is dropped out in some but not all replicates, Rosace imputes the missing replicate data with the mean of the other replicates.

Rosace : hierarchical model and functional score inference

Rosace assumes that the aligned counts are generated by the following time-dependent linear function. Let \(\beta _v\) be the defined functional score or slope, \(b_v\) be the intercept, and \(\epsilon _{g(v)}\) be the error term. The core of Rosace is a linear regression:

where g ( v ) maps the variant v to its mean group—the grouping method will be explained below.

p ( v ) is the function that maps a variant v to its amino acid position. If the information of variants’ mutation types is given, Rosace will assign synonymous variants to many artificial “control” positions. The number of synonymous variants per control position is determined by the maximum number of non-synonymous variants per position. Assigning synonymous variants to control positions incorporates the extra information while not giving too strong a shrinkage to synonymous variants (Additional file 1: Figs. S2-S4). In addition, we regroup positions with fewer than 10 variants together to avoid having too few variants in a position. For example, if the DMS screen has fewer than 10 mutants per position, adjacent positions will be grouped to form one position label. Also, the position of a continuous indel variant is labeled as a mutation of the leftmost amino acid residue (e.g., an insertion between positions 99 and 100 is labeled as position 99 and a deletion of positions 100 through 110 is labeled as position 100).

We assume that the variants at the same position are more likely to share similar functional effects. Thus, we build the layer above \(\beta _v\) using position-level parameters \(\phi _{p(v)}\) and \(\sigma _{p(v)}\) .

The mean and precision parameters are given a weakly informative normal prior and variance parameters are given weakly informative inverse-gamma distribution.

We further cluster the variant into mean groups of 25 based on its value of mean count across time points and replicates. The mapping between the variant and its mean group is denoted as g ( v ). Thus, we model the mean-variance relationship by assuming variants with a lower mean are expected to have higher error terms in the linear regression and vice versa.

Stan [ 40 ] is used in Rosace for Bayesian inference over our model. We use the default inference method, the No-U-Turn sampler (NUTS), a variant of the Hamiltonian Monte Carlo (HMC) algorithm. Compared to other widely used Monte Carlo samplers, for example, the Metropolis-Hastings algorithm, HMC has reduced correlation between successive samples, resulting in fewer samples reaching a similar level of accuracy [ 41 ]. NUTS further improves HMC by automatically determining the number of steps in each iteration of HMC sampling to more efficiently sample from the posterior [ 42 ].

The lower bound of the number of mutants per position index \(|\{v|p(v)=i\}|\) (10) and the size of the variant’s mean group \(g_p\) (25) can be changed.

Rosette : the OCT1 and MET datasets

We use the following datasets as input of the Rosette simulation: the OCT1 dataset by Yee et al. [ 10 ] as an example of positive selection and the MET dataset by Estevam et al . [ 11 ] as an example of negative selection. Specifically, we use replicate 2 of the cytotoxicity selection screen in the OCT1 dataset for both score distribution and raw count dispersion. For the MET dataset, we select the experiment with IL-3 withdrawal under wild-type genetic background (without exon 14 skipping). Raw counts are extracted from replicate 1 but the scores are calculated from all three replicates because of the frequent dropouts at the initial time point.

The sequencing reads and the resulting sequencing counts are processed in the default pipeline described in the previous method sections. Scores are then computed using simple linear regression (the naïve method). The naïve method is used as the Rosette input because we are trying to learn the global distribution of the scores instead of identifying individual variants and, while uncalibrated, naïve estimates are unbiased.

Rosette : summary statistics from real data

Summary statistics inferred by Rosette can be categorized into two types: one for the dispersion of sequencing counts and the other for the dispersion of score distribution.

First, we estimate dispersion \(\eta\) in the sequencing count. We assume the sequencing count at time point 0 reflects the true variant library before selection. Since the functional scores of synonymous variants are approximately 0, the proportion of synonymous mutations in the population should approximately be the same after selection. Let the set of indices of synonymous mutations be \(\textbf{v}_s = \{v_{s1}, v_{s2}, \dots \}\) . The count of each synonymous mutation at time point t is \(\textbf{c}_{\textbf{v}_s, t} = (c_{v_{s1}, t}, c_{v_{s2}, t}, \dots )\) . The model we use to fit \(\eta\) is thus

from which we find the maximum likelihood estimation \(\hat{\eta }\) .

Dispersion of the initial variant library \(\eta _0\) is estimated similarly by fitting a Dirichlet-Multinomial distribution on the sequencing counts of the initial time point assuming that in an ideal experiment, the proportion of each variant in the library should be the same. Similar to above, the indices of all mutations are \(\textbf{v} = \{1, 2, \dots , V\}\) , and the count of each mutation at time point 0 is \(\textbf{c}_{\textbf{v}, 0} = (c_{1, 0}, c_{2, 0}, \dots , c_{V, 0})\) . From the following model

we can again find the maximum likelihood of the variant library dispersion \(\hat{\eta _0}\) . Notice that \(\hat{\eta }_0\) is usually much smaller than \(\hat{\eta }\) (i.e. more overdispersed) because \(\hat{\eta }_0\) contains both the dispersion of the variant library as well as the sequencing step.

To characterize the distribution of functional scores, we first cluster mutants into groups, as mutants often have different properties and exert different influences on protein function. We calculate the empirical Jensen-Shannon divergence (JSD) to measure the distance between two mutants, using bins of 0.1 to find the empirical probability density function. Ideally, a clustering scheme should produce a grouping that reflects the inherent properties of an amino acid that are independent of position. Thus, we are more concerned with the general shape of the distribution than the similarity between paired observations. It leads to our preference for JSD over Euclidean distance as the clustering metric. To cluster mutants into four mutant groups \(g_{m} = \{1, 2, 3, 4\}\) , we use hierarchical clustering (“hclust” function with complete linkage method in R), and we record the proportions \(\widehat{\varvec{p}}\) to simulate any number of mutants in the simulation (the number of mutant groups can also be changed). The underlying assumption is that mutants in each mutant group are very similar and can be treated as interchangeable. We define \(f_1(v)\) as the function that maps a variant to its corresponding mutant group \(g_{m}\) .

Then, we cluster the variants into different variant groups. In the case of our examples, the shape is not unimodal but bimodal. The OCT1 screen has a LOF mode on the right (positive selection) and the MET screen has a LOF mode on the left (negative selection). While it is possible to observe both GOF and LOF variants, we observed in our datasets that GOF variants are so rare that they do not constitute a mode on the mixed distribution, resulting in a bimodal distribution. To cluster the non-synonymous variants into groups \(g_{v}\) , we use the Gaussian Mixture model with two mixtures for our examples to decide the cutoff of the groups, and we fit the Gaussian distribution for each variant group again to learn the parameters of the distribution. The synonymous variants have their own group labeled as control. Let \(f_2(v)\) denote the function that maps a variant to its corresponding variant group \(g_{v}\) . The result of the simulation shows that even the synonymous mutations with scores close to 0 can have large negative effects due to random dropout. Thus, we later set the effect of the control and the neutral group to be constant 0 and still observe a similar distribution as seen in the real data. For each variant, we have one of the models below, depending on whether the variant results in LOF or has no effects:

We use \(\widehat{\varvec{\theta }}\) to denote the collection of estimated distributional parameters for all variant groups.

Finally, we define the number of variants in each variant group at each position

For each position p , we can thus find the count of variants belonging to any mutant-variant group \(\varvec{o}_{p} \in \textbf{N}^{\Vert g_m \Vert \Vert g_v \Vert }\) . Treating each position as an observation, we fit a Dirichlet distribution to characterize the distribution of variant group identities among mutants at any position:

The final summary statistics are \(\hat{\eta }\) , \(\hat{\eta _0}\) , \(\hat{\varvec{p}}\) , \(\hat{\varvec{\theta }}\) , and \(\hat{\varvec{\alpha }}\) . We also need T , the number of selection rounds, to map \(\beta _v\) into the latent functional parameter \(\mu _v\) in growth screens.

Rosette : data generative model

We simulate as the real experiment the same number of mutants M , the number of positions P , and the number of variants V ( \(M \times P\) ). The important hyperparameters that need to be specified are the average number of reads per variant D (100, also referred to as the sequencing depth), initial cell population count \(P_0\) (200 V ), and wild-type doubling rate \(\delta\) between time points ( \(-2\) or 2). One also needs to specify the number of replicates R and selection rounds T .

The simulation largely consists of two major steps: (1) generating latent growth rates \(\mu _v\) and (2) generating cell counts \(N_{v,t,r}\) and sequencing counts \(c_{v,t,r}\) .

In step 1, the mutant group and variant group labeling of each variant is first generated. Specifically, we assign a mutant to the mutant group \(g_m\) by the proportion \(\hat{\varvec{p}}\) and then assign a variant to the variant group \(g_v\) by drawing \(\varvec{o}_p\) from Dirichlet distribution with parameter \(\hat{\varvec{\alpha }}\) (Eq.  10 ). Using \(\hat{\varvec{\theta }}\) , we randomly generate \(\beta _v\) for each variant based on its \(g_v\) (Eq.  8 ). The mapping between \(\beta _v\) and \(\mu _v\) requires an understanding of the generative model, so it will be defined after we present the cell growth model.

In step 2, the starting cell population \(N_{v,r,0}\) is drawn from a Dirichlet-Multinomial distribution using \(\hat{\eta }_0\) and we assume that replicates are biological replicates:

where \(P_0\) is the total cell population. The cells are growing exponentially and we determine the cell count by a Poisson distribution

where \(\Delta t\) is the pseudo-passing time. It differs from index t and will be defined in the next paragraph. Similar to how we define \(\textbf{c}_{\textbf{v}, t, r}\) , we define the true cell count of each variant at time point t and replicate r to be \(\textbf{N}_{\textbf{v}, t, r} = (N_{1, t, r}, \dots , N_{V, t, r})\) . The sequencing count for each variant is

where D is the sequencing depth per variant. Empirically, we can set input \(\hat{\eta }\) and \(\hat{\eta }_0\) slightly higher than the estimated summary statistics. This is because the estimated values encompass all the noises in the experiment, while the true values only represent the noise from the sequencing step.

To find the mapping between \(\beta _v\) and \(\mu _v\) , we define \(\delta\) to be the wild-type doubling rate and naturally compute \(\Delta t:= \frac{\delta \log 2}{\mu _{wt}}\) , the pseudo-passing time in each round. Then we can compute the expectation of \(\beta _v\) with the linear regression model. For simplicity, we omit the replicate index r and assume r is fixed in the next set of equations.

The final mapping between simulated \(\beta _v\) and \(\mu _v\) is then described in the following

with \(\mu _{wt}\) set to be \(\text {sgn}(\delta )\) .

Modified Rosette that favors position-informed models

In the original, position-agnostic version of Rosette , a \(\Vert g_m \Vert \Vert g_v \Vert\) -dimensional vector is drawn from the same Dirichlet distribution for each position. The vector can be regarded as a quota for each mutant-variant group. Variants at each position are assigned their mutant-variant group according to the quota. As a result, at one position, variants from all variant groups (neutral, LOF, and GOF) would exist, and this violates the assumption in Rosace that variants at one position would have similar functional effects (strong LOF and GOF variants are very unlikely to be at the same position). To show that Rosace could indeed take advantage of the position information when it exists in the data, we create a modified version of Rosette where variants at one position could only belong to one variant group. Specifically, a position can have either neutral, LOF, or GOF variants, but not a mixture among any variant groups.

Benchmarking

The naïve method (simple linear regression) is conducted by the “lm” function in R on processed data. For each variant, normalized counts are regressed against time. Raw two-sided p -values are computed from t -statistics given by the “lm” function. It is then corrected using the Benjamini-Hochberg Procedure to adjust the p -values.

For Enrich2 , we use the built-in variant filtering and wild-type (“wt”) normalization. All analyses use a random-effect model as presented in the paper. When there is more than one selection round, we use weighted linear regression. Otherwise, a simple ratio test is performed. The resulting p -values are adjusted using the Benjamini-Hochberg Procedure.

DiMSum requires the variant labeling to be DNA sequences. As a result, we have to generate dummy sequences. It is applied to all simulations with one selection round with the default settings. The z -statistics are computed using the variant’s mean estimate over the estimated standard deviation and the adjusted p -value is computed from the z -score with Benjamini-Hochberg procedure. DiMSum only processes data with one selection round (two time points) and thus may be disadvantaged when analyzing datasets with multiple selection rounds.

mutscan is an end-to-end pipeline that requires the input to be sequencing reads. Conversely, Rosette only generates sequencing counts, which can be calculated from sequencing reads but cannot be used to recover sequencing reads. To facilitate benchmarking, we use a SummarizedExperiment object to feed the Rosette output to their function “calculateRelativeFC,” which does take sequencing counts as input. We benchmark both mutscan(edgeR) and mutscan(limma) with default normalization and hyperparameters as provided in the function. We use the “logFC_shrunk” and “FDR” columns in mutscan(edgeR) output and the “logFC” and “adj.P.Val” columns in mutscan(limma) output.

We run Rosace with position information of variants and labeling of synonymous mutations. However, Rosace is a Bayesian framework so it does not compute FDR like the frequentist methods above. All Rosace power/FDR calculations are done under the Bayesian local false sign rate ( lfsr ) setting [ 28 ]. As a result, in the simulation, we present the rank-FDR curve and the FDR-Sensitivity curve as the metrics instead of setting an identical or different hard threshold on FDR and lfsr . In the real data benchmarking, both the FDR and lfsr thresholds are set to be 0.05.

Rosace without position label is denoted as Rosace (nopos) in the Additional file 1: Figs. S5–S15, S19–S23, and S25. It removes the position layer in Fig.  2 C and keeps only the variant and replicate layer. The test statistics and model evaluation are presented identically as the full Rosace model.

Availability of data and materials

Rosace is implemented as an R package and is distributed on GitHub ( https://github.com/pimentellab/rosace ), under the MIT open-source license. The package also includes functions for Rosette simulation. An archived version of Rosace is available on Zenodo [ 43 ].

The integrated sequencing pipeline for short-read-based experiments is available on GitHub ( https://github.com/odcambc/dumpling ).

Scripts and pre-processed public datasets used to perform data analysis and generate figures for the paper are uploaded on GitHub as well ( https://github.com/roserao/rosace-paper-script ).

The protein datasets we used are as follows: OCT1 [ 10 ], MET [ 11 ], CARD11 [ 14 ], MSH2 [ 12 ], BRCA1 [ 13 ], BRCA1-RING [ 23 ], and Cohesin [ 29 ]. OCT1 and MET are available on NIH NCBI BioProject with accession codes PRJNA980726 and PRJNA993160 . CARD11, BRCA1, and Cohesin are available as supplementary files to their respective publications. MSH2 is available on Gene Expression Omnibus with accession code GSE162130 . BRCA1-RING is available on MaveDB with accession code mavedb:00000003-a-1 .

The benchmarking datasets are EVE [ 31 ] ( evemodel.org ), ClinVar [ 30 ] ( gnomad.broadinstitute.org ), and AlphaMissense [ 32 ] ( alphamissense.hegelab.org ).

Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat Protoc. 2014;9(9):2267–84. https://doi.org/10.1038/nprot.2014.153 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nature Methods. 2014;11(8):801–7. https://doi.org/10.1038/nmeth.3027 .

Araya CL, Fowler DM. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 2011;29(9):435–42. https://doi.org/10.1016/j.tibtech.2011.04.003 .

Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable functional assays for the interpretation of human genetic variation. Annu Rev Genet. 2022;56(1):441–65. https://doi.org/10.1146/annurev-genet-072920-032107 .

Article   CAS   PubMed   Google Scholar  

Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem Sci. 2019;44(7):575–88. https://doi.org/10.1016/j.tibs.2019.01.003 .

Romero PA, Tran TM, Abate AR. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci USA. 2015;112:7159–64. https://doi.org/10.1073/PNAS.1422285112 .

Chen JZ, Fowler DM, Tokuriki N. Comprehensive exploration of the translocation, stability and substrate recognition requirements in vim-2 lactamase. eLife. 2020;9:1–31.

Article   CAS   Google Scholar  

Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet. 2018;50(6):874–82. https://doi.org/10.1038/s41588-018-0122-z .

Leander M, Liu Z, Cui Q, Raman S. Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins. eLife. 2022;11. https://doi.org/10.7554/ELIFE.79932 .

Yee SW, Macdonald C, Mitrovic D, Zhou X, Koleske ML, Yang J, et al. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. bioRxiv. 2023. https://doi.org/10.1101/2023.06.06.543963 .

Estevam GO, Linossi EM, Macdonald CB, Espinoza CA, Michaud JM, Coyote-Maestas W, et al. Conserved regulatory motifs in the juxtamembrane domain and kinase N-lobe revealed through deep mutational scanning of the MET receptor tyrosine kinase domain. eLife. 2023. https://doi.org/10.7554/elife.91619.1 .

Jia X, Burugula BB, Chen V, Lemons RM, Jayakody S, Maksutova M, et al. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am J Hum Genet. 2021;108:163–75. https://doi.org/10.1016/J.AJHG.2020.12.003 .

Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562(7726):217–22. https://doi.org/10.1038/s41586-018-0461-z .

Meitlis I, Allenspach EJ, Bauman BM, Phan IQ, Dabbah G, Schmitt EG, et al. Multiplexed functional assessment of genetic variants in CARD11. Am J Hum Genet. 2020;107:1029–43. https://doi.org/10.1016/J.AJHG.2020.10.015 .

Flynn JM, Rossouw A, Cote-Hammarlof P, Fragata I, Mavor D, Hollins C III, et al. Comprehensive fitness maps of Hsp90 show widespread environmental dependence. eLife. 2020;9:e53810. https://doi.org/10.7554/eLife.53810 .

Article   PubMed   PubMed Central   Google Scholar  

Steinberg B, Ostermeier M. Shifting fitness and epistatic landscapes reflect trade-offs along an evolutionary pathway. J Mol Biol. 2016;428(13):2730–43. https://doi.org/10.1016/j.jmb.2016.04.033 .

Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010;7(9):741–6. https://doi.org/10.1038/nmeth.1492 .

Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 2017;18:1–15. https://doi.org/10.1186/S13059-017-1272-5/FIGURES/7 .

Article   Google Scholar  

Coyote-Maestas W, Nedrud D, He Y, Schmidt D. Determinants of trafficking, conduction, and disease within a K + channel revealed through multiparametric deep mutational scanning. eLife. 2022;11:e76903. https://doi.org/10.7554/eLife.76903 .

Faure AJ, Schmiedel JM, Baeza-Centurion P, Lehner B. DiMSum: An error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 2020;21:1–23. https://doi.org/10.1186/S13059-020-02091-3/TABLES/2 .

Bloom JD. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinformatics. 2015;16:1–13. https://doi.org/10.1186/S12859-015-0590-4/FIGURES/6 .

Bank C, Hietpas RT, Wong A, Bolon DN, Jensen JD. A Bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: Uncovering the potential for adaptive walks in challenging environments. Genetics. 2014;196:841–52. https://doi.org/10.1534/GENETICS.113.156190/-/DC1 .

Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J, Hause RJ, et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics. 2015;200(2):413–22. https://doi.org/10.1534/genetics.115.175802 .

Soneson C, Bendel AM, Diss G, Stadler MB. mutscan-a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data. Genome Biol. 2023;12(24):1–22. https://doi.org/10.1186/S13059-023-02967-0/FIGURES/6 .

Eddy SR. Accelerated Profile HMM Searches. PLOS Comput Biol. 2011;7(10):1–16. https://doi.org/10.1371/journal.pcbi.1002195 .

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616 .

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21.

Stephens M. False discovery rates: a new deal. Biostatistics. 2017;18:275–94. https://doi.org/10.1093/BIOSTATISTICS/KXW041 .

Article   PubMed   Google Scholar  

Kowalsky CA, Whitehead TA. Determination of binding affinity upon mutation for type I dockerin-cohesin complexes from C lostridium thermocellum and C lostridium cellulolyticum using deep sequencing. Proteins Struct Funct Bioinforma. 2016;84(12):1914–28.

Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7.

Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599(7883):91–5.

Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381(6664):eadg7492.

Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell. 2020;182:1295-1310.e20. https://doi.org/10.1016/J.CELL.2020.08.012 .

Stiffler M, Hekstra D, Ranganathan R. Evolvability as a function of purifying selection in TEM-1 beta-lactamase. Cell. 2015;160(5):882–892. Publisher Copyright: © 2015 Elsevier Inc. https://doi.org/10.1016/j.cell.2015.01.035 .

Faure AJ, Domingo J, Schmiedel JM, Hidalgo-Carcedo C, Diss G, Lehner B. Mapping the energetic and allosteric landscapes of protein binding domains. Nature. 2022;604(7904):175–83. https://doi.org/10.1038/s41586-022-04586-4 .

Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, et al. Sustainable data analysis with Snakemake. F1000Research. 2021;10:33.  https://f1000research.com/articles/10-33/v2 .

Bushnell B. BBTools software package. 2014. https://sourceforge.net/projects/bbmap . Accessed 11 June 2021.

Van der Auwera GA, O’Connor BD. Genomics in the cloud: using Docker, GATK, and WDL in Terra. Sebastopol: O’Reilly Media; 2020.

Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8. https://doi.org/10.1093/bioinformatics/btw354 .

Stan Development Team. RStan: the R interface to Stan. 2023. R package version 2.21.8. https://mc-stan.org/ . Accessed 22 May 2024.

Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434. 2017.  https://arxiv.org/abs/1701.02434 .

Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(47):1593–623.

Google Scholar  

Rao J. pimentellab/rosace. 2023. Zenodo. https://doi.org/10.5281/zenodo.10814911 .

Download references

Review history

The review history is available as Additional file 2.

Peer review information

Andrew Cosgrove was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Author information

Ruiqi Xin and Christian Macdonald contributed equally to this work.

Authors and Affiliations

Department of Computer Science, UCLA, Los Angeles, CA, USA

Jingyou Rao & Harold Pimentel

Computational and Systems Biology Interdepartmental Program, UCLA, Los Angeles, CA, USA

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA

Christian Macdonald, Matthew K. Howard, Gabriella O. Estevam, Sook Wah Yee, James S. Fraser & Willow Coyote-Maestas

Tetrad Graduate Program, UCSF, San Francisco, CA, USA

Matthew K. Howard & Gabriella O. Estevam

Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, USA

Matthew K. Howard

Department of Mathematics, Baruch College, CUNY, New York, NY, USA

Mingsen Wang

Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA

James S. Fraser & Willow Coyote-Maestas

Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA

Harold Pimentel

Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

JR, CM, WCM, and HP jointly conceived the project. JR and HP developed the statistical model and the simulation framework. JR, MW, and RX wrote the software and its support. JR performed the data analysis and benchmarking. CM wrote the sequencing pipeline. SWY and CM performed the OCT1 experiment and GOE performed the MET experiment. JR and HP wrote the manuscript with input from MW, CM, WCM, MH, and JSF. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Willow Coyote-Maestas or Harold Pimentel .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Competing interests

JSF has consulted for Octant Bio, a company that develops multiplexed assays of variant effects. The other authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: supplementary figures and tables., additional file 2: review history., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Rao, J., Xin, R., Macdonald, C. et al. Rosace : a robust deep mutational scanning analysis framework employing position and mean-variance shrinkage. Genome Biol 25 , 138 (2024). https://doi.org/10.1186/s13059-024-03279-7

Download citation

Received : 31 October 2023

Accepted : 14 May 2024

Published : 24 May 2024

DOI : https://doi.org/10.1186/s13059-024-03279-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Genome Biology

ISSN: 1474-760X

scientific method hypothesis alternative

University of Illinois at Chicago

File(s) under embargo

until file(s) become available

Does Trait Rumination Shape Premenstrual Symptoms Trajectories; Preregistered Prospective Analysis

Degree grantor, degree level, degree name, committee member, submitted date, thesis type, usage metrics.

COMMENTS

  1. Is there an alternative to the scientific method?

    Let's look at the scientific method, and see if we can make some headway. There are two major features of the scientific method which stand out as "interesting" for this line of thinking: The scientific method is highly steeped in the language of statistics. The scientific method seeks objective theories. The scientific method tests theories.

  2. Null & Alternative Hypotheses

    Revised on June 22, 2023. The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (Ha or H1): There's an effect in the population. The effect is usually the effect of the ...

  3. The scientific method (article)

    The scientific method. At the core of biology and other sciences lies a problem-solving approach called the scientific method. The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis.

  4. The Scientific Method: A Need for Something Better?

    The scientific method is better thought of as a set of "methods" or different techniques used to prove or disprove 1 or more hypotheses. A hypothesis is a proposed explanation for observed phenomena. These phenomena are, in general, empirical—that is, they are gathered by observation and/or experimentation. "Hypothesis" is a term ...

  5. Hypothesis: Definition, Examples, and Types

    The Hypothesis in the Scientific Method . In the scientific method, whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. ... Alternative hypothesis: This hypothesis states the opposite of the null hypothesis.

  6. 1.1: The Scientific Method

    The scientific method was first documented by Sir Francis Bacon (1561-1626) of England, and can be applied to almost all fields of study. The scientific method is founded upon observation, which then leads to a question and the development of a hypothesis which answers that question. The scientist can then design an experiment to test the ...

  7. Scientific Method

    The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of hypotheses and theories.

  8. Scientific hypothesis

    hypothesis. science. scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ...

  9. Null and Alternative Hypotheses

    The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (HA): There's an effect in the population. The effect is usually the effect of the independent variable on the dependent ...

  10. The scientific method (article)

    The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis. Test the prediction. Iterate: use the results to make new hypotheses or predictions. The scientific method is used in all sciences—including chemistry ...

  11. 7.2: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  12. Null vs. Alternative Hypothesis

    In the scientific method, a prediction is referred to as a hypothesis. A scientist develops one or more hypotheses, the plural form of a hypothesis, based on the types of data, observations, and ...

  13. How to Write a Strong Hypothesis

    The null hypothesis is written as H 0, while the alternative hypothesis is H 1 or H a. H 0: ... It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

  14. What is a scientific hypothesis?

    A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method.Many describe it as an "educated guess ...

  15. What Is a Hypothesis? The Scientific Method

    A hypothesis (plural hypotheses) is a proposed explanation for an observation. The definition depends on the subject. In science, a hypothesis is part of the scientific method. It is a prediction or explanation that is tested by an experiment. Observations and experiments may disprove a scientific hypothesis, but can never entirely prove one.

  16. What differentiates the scientific method from other methods

    If someone asks "what is an alternative to the scientific method," I would like to be able to start by being able to reliably say whether a particular method is the scientific method or not. Of course, being a linguistic issue, the definitions of words themselves are flexible. ... But if a hypothesis is inherently not falsifiable, that means ...

  17. Scientific method

    The scientific method is an empirical method for acquiring knowledge that has characterized the development of science since at least the 17th century. The scientific method involves careful observation coupled with rigorous scepticism, because cognitive assumptions can distort the interpretation of the observation.Scientific inquiry includes creating a hypothesis through inductive reasoning ...

  18. Hypothesis Examples

    A hypothesis proposes a relationship between the independent and dependent variable. A hypothesis is a prediction of the outcome of a test. It forms the basis for designing an experiment in the scientific method.A good hypothesis is testable, meaning it makes a prediction you can check with observation or experimentation.

  19. What is a Hypothesis

    Definition: Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation. Hypothesis is often used in scientific research to guide the design of experiments ...

  20. Alternative "Scientific Methods"

    Alternative "Scientific Methods". I) The classical scientific method, dating back to Bacon. (This is the same Mr. Bacon that some people claim wrote Shakespeare.) A) Invent hypotheses. B) Design experiments that should turn out different ways, depending on whether your theory is correct. C) Write grant proposals to buy the equipment and ...

  21. What Are The Steps Of The Scientific Method?

    The scientific method is a process that includes several steps: First, an observation or question arises about a phenomenon. Then a hypothesis is formulated to explain the phenomenon, which is used to make predictions about other related occurrences or to predict the results of new observations quantitatively. Finally, these predictions are put to the test through experiments or further ...

  22. What is the Scientific Method: How does it work and why is it important

    The scientific method is a systematic process involving steps like defining questions, forming hypotheses, conducting experiments, and analyzing data. It minimizes biases and enables replicable research, leading to groundbreaking discoveries like Einstein's theory of relativity, penicillin, and the structure of DNA.

  23. PDF Appendix I The Scientific Method

    The Scientific Method The study of science is different from other disciplines in many ways. Perhaps the most important aspect of "hard" science is its adherence to the principle of the scientific method: the posing of questions and the use of rigorous methods to answer those questions. I. Our Friend, the Null Hypothesis

  24. A strategy for differential abundance analysis of sparse ...

    An alternative strategy, as used in this work, is to use normalization methods to mitigate the impact of compositionality on count-based differential abundance analysis 36,37.

  25. Rosace: a robust deep mutational scanning analysis framework employing

    Overview of Rosace framework. Rosace is a Bayesian framework for analyzing growth-based deep mutational scanning data, producing variant-level estimates from sequencing counts. The full (position-aware) method requires as input the raw sequencing counts and the position labels of variants. It outputs the posterior distribution of variants' functional scores, which can be further evaluated to ...

  26. Reciprocal conversion between annual and polycarpic ...

    E. nevadense exhibited a flowering phenotype comparable with that of C. himalaica in LDs (Figures 1 L-1N, 1U-1W, and S1 A). In contrast to C. himalaica, the saturated vernalization requirement for E. nevadense was only 10 weeks (Figure 1 V). Moreover, similar to A. alpina, all the axillary shoots that formed before or during vernalization were committed to flower in E. nevadense, whereas ...

  27. Does Trait Rumination Shape Premenstrual Symptoms Trajectories

    Method Using a prospective daily survey design and multilevel growth modeling, we tested the hypothesis that higher levels of trait rumination would predict (1) higher baseline symptoms, (2) a more rapid premenstrual increase in daily symptoms, and (3) a slower postmenstrual recovery to baseline. 232 naturally cycling females (mean age = 30.93 ...

  28. StatWhy: Formal Verification Tool for Statistical Hypothesis Testing

    This tool is implemented using the Why3 platform to verify the correctness of OCaml programs for statistical hypothesis testing. We demonstrate how StatWhy can be used to avoid common errors in a variety of popular hypothesis testing programs. Statistical methods have been widely misused and misinterpreted in various scientific fields, raising ...

  29. Exploring a mechanism-based approach for the identification of

    Endocrine disruptors (EDs) pose a serious threat to human health and the environment and require a comprehensive evaluation to be identified. The identification of EDs require a substantial amount of data, both in vitro and in vivo, due to the current scientific criteria in the EU. At the same time, the EU strives to reduce animal testing due to concerns regarding animal welfare and ...

  30. Sustainability

    This study explores the impact of market-based environmental regulations on green technological innovation and the differential regulatory effects of corporate social responsibility (CSR) on different levels of green technological innovation. By analyzing data from 746 Chinese A-share listed companies from the period of 2008-2021, this paper examines the effect of market-based environmental ...