• Affiliate Program

Wordvice

  • UNITED STATES
  • 台灣 (TAIWAN)
  • TÜRKIYE (TURKEY)
  • Academic Editing Services
  • - Research Paper
  • - Journal Manuscript
  • - Dissertation
  • - College & University Assignments
  • Admissions Editing Services
  • - Application Essay
  • - Personal Statement
  • - Recommendation Letter
  • - Cover Letter
  • - CV/Resume
  • Business Editing Services
  • - Business Documents
  • - Report & Brochure
  • - Website & Blog
  • Writer Editing Services
  • - Script & Screenplay
  • Our Editors
  • Client Reviews
  • Editing & Proofreading Prices
  • Wordvice Points
  • Partner Discount
  • Plagiarism Checker
  • APA Citation Generator
  • MLA Citation Generator
  • Chicago Citation Generator
  • Vancouver Citation Generator
  • - APA Style
  • - MLA Style
  • - Chicago Style
  • - Vancouver Style
  • Writing & Editing Guide
  • Academic Resources
  • Admissions Resources

How to Write a Research Hypothesis: Good & Bad Examples

example of a hypothesis for a research proposal

What is a research hypothesis?

A research hypothesis is an attempt at explaining a phenomenon or the relationships between phenomena/variables in the real world. Hypotheses are sometimes called “educated guesses”, but they are in fact (or let’s say they should be) based on previous observations, existing theories, scientific evidence, and logic. A research hypothesis is also not a prediction—rather, predictions are ( should be) based on clearly formulated hypotheses. For example, “We tested the hypothesis that KLF2 knockout mice would show deficiencies in heart development” is an assumption or prediction, not a hypothesis. 

The research hypothesis at the basis of this prediction is “the product of the KLF2 gene is involved in the development of the cardiovascular system in mice”—and this hypothesis is probably (hopefully) based on a clear observation, such as that mice with low levels of Kruppel-like factor 2 (which KLF2 codes for) seem to have heart problems. From this hypothesis, you can derive the idea that a mouse in which this particular gene does not function cannot develop a normal cardiovascular system, and then make the prediction that we started with. 

What is the difference between a hypothesis and a prediction?

You might think that these are very subtle differences, and you will certainly come across many publications that do not contain an actual hypothesis or do not make these distinctions correctly. But considering that the formulation and testing of hypotheses is an integral part of the scientific method, it is good to be aware of the concepts underlying this approach. The two hallmarks of a scientific hypothesis are falsifiability (an evaluation standard that was introduced by the philosopher of science Karl Popper in 1934) and testability —if you cannot use experiments or data to decide whether an idea is true or false, then it is not a hypothesis (or at least a very bad one).

So, in a nutshell, you (1) look at existing evidence/theories, (2) come up with a hypothesis, (3) make a prediction that allows you to (4) design an experiment or data analysis to test it, and (5) come to a conclusion. Of course, not all studies have hypotheses (there is also exploratory or hypothesis-generating research), and you do not necessarily have to state your hypothesis as such in your paper. 

But for the sake of understanding the principles of the scientific method, let’s first take a closer look at the different types of hypotheses that research articles refer to and then give you a step-by-step guide for how to formulate a strong hypothesis for your own paper.

Types of Research Hypotheses

Hypotheses can be simple , which means they describe the relationship between one single independent variable (the one you observe variations in or plan to manipulate) and one single dependent variable (the one you expect to be affected by the variations/manipulation). If there are more variables on either side, you are dealing with a complex hypothesis. You can also distinguish hypotheses according to the kind of relationship between the variables you are interested in (e.g., causal or associative ). But apart from these variations, we are usually interested in what is called the “alternative hypothesis” and, in contrast to that, the “null hypothesis”. If you think these two should be listed the other way round, then you are right, logically speaking—the alternative should surely come second. However, since this is the hypothesis we (as researchers) are usually interested in, let’s start from there.

Alternative Hypothesis

If you predict a relationship between two variables in your study, then the research hypothesis that you formulate to describe that relationship is your alternative hypothesis (usually H1 in statistical terms). The goal of your hypothesis testing is thus to demonstrate that there is sufficient evidence that supports the alternative hypothesis, rather than evidence for the possibility that there is no such relationship. The alternative hypothesis is usually the research hypothesis of a study and is based on the literature, previous observations, and widely known theories. 

Null Hypothesis

The hypothesis that describes the other possible outcome, that is, that your variables are not related, is the null hypothesis ( H0 ). Based on your findings, you choose between the two hypotheses—usually that means that if your prediction was correct, you reject the null hypothesis and accept the alternative. Make sure, however, that you are not getting lost at this step of the thinking process: If your prediction is that there will be no difference or change, then you are trying to find support for the null hypothesis and reject H1. 

Directional Hypothesis

While the null hypothesis is obviously “static”, the alternative hypothesis can specify a direction for the observed relationship between variables—for example, that mice with higher expression levels of a certain protein are more active than those with lower levels. This is then called a one-tailed hypothesis. 

Another example for a directional one-tailed alternative hypothesis would be that 

H1: Attending private classes before important exams has a positive effect on performance. 

Your null hypothesis would then be that

H0: Attending private classes before important exams has no/a negative effect on performance.

Nondirectional Hypothesis

A nondirectional hypothesis does not specify the direction of the potentially observed effect, only that there is a relationship between the studied variables—this is called a two-tailed hypothesis. For instance, if you are studying a new drug that has shown some effects on pathways involved in a certain condition (e.g., anxiety) in vitro in the lab, but you can’t say for sure whether it will have the same effects in an animal model or maybe induce other/side effects that you can’t predict and potentially increase anxiety levels instead, you could state the two hypotheses like this:

H1: The only lab-tested drug (somehow) affects anxiety levels in an anxiety mouse model.

You then test this nondirectional alternative hypothesis against the null hypothesis:

H0: The only lab-tested drug has no effect on anxiety levels in an anxiety mouse model.

hypothesis in a research paper

How to Write a Hypothesis for a Research Paper

Now that we understand the important distinctions between different kinds of research hypotheses, let’s look at a simple process of how to write a hypothesis.

Writing a Hypothesis Step:1

Ask a question, based on earlier research. Research always starts with a question, but one that takes into account what is already known about a topic or phenomenon. For example, if you are interested in whether people who have pets are happier than those who don’t, do a literature search and find out what has already been demonstrated. You will probably realize that yes, there is quite a bit of research that shows a relationship between happiness and owning a pet—and even studies that show that owning a dog is more beneficial than owning a cat ! Let’s say you are so intrigued by this finding that you wonder: 

What is it that makes dog owners even happier than cat owners? 

Let’s move on to Step 2 and find an answer to that question.

Writing a Hypothesis Step 2:

Formulate a strong hypothesis by answering your own question. Again, you don’t want to make things up, take unicorns into account, or repeat/ignore what has already been done. Looking at the dog-vs-cat papers your literature search returned, you see that most studies are based on self-report questionnaires on personality traits, mental health, and life satisfaction. What you don’t find is any data on actual (mental or physical) health measures, and no experiments. You therefore decide to make a bold claim come up with the carefully thought-through hypothesis that it’s maybe the lifestyle of the dog owners, which includes walking their dog several times per day, engaging in fun and healthy activities such as agility competitions, and taking them on trips, that gives them that extra boost in happiness. You could therefore answer your question in the following way:

Dog owners are happier than cat owners because of the dog-related activities they engage in.

Now you have to verify that your hypothesis fulfills the two requirements we introduced at the beginning of this resource article: falsifiability and testability . If it can’t be wrong and can’t be tested, it’s not a hypothesis. We are lucky, however, because yes, we can test whether owning a dog but not engaging in any of those activities leads to lower levels of happiness or well-being than owning a dog and playing and running around with them or taking them on trips.  

Writing a Hypothesis Step 3:

Make your predictions and define your variables. We have verified that we can test our hypothesis, but now we have to define all the relevant variables, design our experiment or data analysis, and make precise predictions. You could, for example, decide to study dog owners (not surprising at this point), let them fill in questionnaires about their lifestyle as well as their life satisfaction (as other studies did), and then compare two groups of active and inactive dog owners. Alternatively, if you want to go beyond the data that earlier studies produced and analyzed and directly manipulate the activity level of your dog owners to study the effect of that manipulation, you could invite them to your lab, select groups of participants with similar lifestyles, make them change their lifestyle (e.g., couch potato dog owners start agility classes, very active ones have to refrain from any fun activities for a certain period of time) and assess their happiness levels before and after the intervention. In both cases, your independent variable would be “ level of engagement in fun activities with dog” and your dependent variable would be happiness or well-being . 

Examples of a Good and Bad Hypothesis

Let’s look at a few examples of good and bad hypotheses to get you started.

Good Hypothesis Examples

Bad hypothesis examples, tips for writing a research hypothesis.

If you understood the distinction between a hypothesis and a prediction we made at the beginning of this article, then you will have no problem formulating your hypotheses and predictions correctly. To refresh your memory: We have to (1) look at existing evidence, (2) come up with a hypothesis, (3) make a prediction, and (4) design an experiment. For example, you could summarize your dog/happiness study like this:

(1) While research suggests that dog owners are happier than cat owners, there are no reports on what factors drive this difference. (2) We hypothesized that it is the fun activities that many dog owners (but very few cat owners) engage in with their pets that increases their happiness levels. (3) We thus predicted that preventing very active dog owners from engaging in such activities for some time and making very inactive dog owners take up such activities would lead to an increase and decrease in their overall self-ratings of happiness, respectively. (4) To test this, we invited dog owners into our lab, assessed their mental and emotional well-being through questionnaires, and then assigned them to an “active” and an “inactive” group, depending on… 

Note that you use “we hypothesize” only for your hypothesis, not for your experimental prediction, and “would” or “if – then” only for your prediction, not your hypothesis. A hypothesis that states that something “would” affect something else sounds as if you don’t have enough confidence to make a clear statement—in which case you can’t expect your readers to believe in your research either. Write in the present tense, don’t use modal verbs that express varying degrees of certainty (such as may, might, or could ), and remember that you are not drawing a conclusion while trying not to exaggerate but making a clear statement that you then, in a way, try to disprove . And if that happens, that is not something to fear but an important part of the scientific process.

Similarly, don’t use “we hypothesize” when you explain the implications of your research or make predictions in the conclusion section of your manuscript, since these are clearly not hypotheses in the true sense of the word. As we said earlier, you will find that many authors of academic articles do not seem to care too much about these rather subtle distinctions, but thinking very clearly about your own research will not only help you write better but also ensure that even that infamous Reviewer 2 will find fewer reasons to nitpick about your manuscript. 

Perfect Your Manuscript With Professional Editing

Now that you know how to write a strong research hypothesis for your research paper, you might be interested in our free AI proofreader , Wordvice AI, which finds and fixes errors in grammar, punctuation, and word choice in academic texts. Or if you are interested in human proofreading , check out our English editing services , including research paper editing and manuscript editing .

On the Wordvice academic resources website , you can also find many more articles and other resources that can help you with writing the other parts of your research paper , with making a research paper outline before you put everything together, or with writing an effective cover letter once you are ready to submit.

How to Write a Research Hypothesis

  • Research Process
  • Peer Review

Since grade school, we've all been familiar with hypotheses. The hypothesis is an essential step of the scientific method. But what makes an effective research hypothesis, how do you create one, and what types of hypotheses are there? We answer these questions and more.

Updated on April 27, 2022

the word hypothesis being typed on white paper

What is a research hypothesis?

General hypothesis.

Since grade school, we've all been familiar with the term “hypothesis.” A hypothesis is a fact-based guess or prediction that has not been proven. It is an essential step of the scientific method. The hypothesis of a study is a drive for experimentation to either prove the hypothesis or dispute it.

Research Hypothesis

A research hypothesis is more specific than a general hypothesis. It is an educated, expected prediction of the outcome of a study that is testable.

What makes an effective research hypothesis?

A good research hypothesis is a clear statement of the relationship between a dependent variable(s) and independent variable(s) relevant to the study that can be disproven.

Research hypothesis checklist

Once you've written a possible hypothesis, make sure it checks the following boxes:

  • It must be testable: You need a means to prove your hypothesis. If you can't test it, it's not a hypothesis.
  • It must include a dependent and independent variable: At least one independent variable ( cause ) and one dependent variable ( effect ) must be included.
  • The language must be easy to understand: Be as clear and concise as possible. Nothing should be left to interpretation.
  • It must be relevant to your research topic: You probably shouldn't be talking about cats and dogs if your research topic is outer space. Stay relevant to your topic.

How to create an effective research hypothesis

Pose it as a question first.

Start your research hypothesis from a journalistic approach. Ask one of the five W's: Who, what, when, where, or why.

A possible initial question could be: Why is the sky blue?

Do the preliminary research

Once you have a question in mind, read research around your topic. Collect research from academic journals.

If you're looking for information about the sky and why it is blue, research information about the atmosphere, weather, space, the sun, etc.

Write a draft hypothesis

Once you're comfortable with your subject and have preliminary knowledge, create a working hypothesis. Don't stress much over this. Your first hypothesis is not permanent. Look at it as a draft.

Your first draft of a hypothesis could be: Certain molecules in the Earth's atmosphere are responsive to the sky being the color blue.

Make your working draft perfect

Take your working hypothesis and make it perfect. Narrow it down to include only the information listed in the “Research hypothesis checklist” above.

Now that you've written your working hypothesis, narrow it down. Your new hypothesis could be: Light from the sun hitting oxygen molecules in the sky makes the color of the sky appear blue.

Write a null hypothesis

Your null hypothesis should be the opposite of your research hypothesis. It should be able to be disproven by your research.

In this example, your null hypothesis would be: Light from the sun hitting oxygen molecules in the sky does not make the color of the sky appear blue.

Why is it important to have a clear, testable hypothesis?

One of the main reasons a manuscript can be rejected from a journal is because of a weak hypothesis. “Poor hypothesis, study design, methodology, and improper use of statistics are other reasons for rejection of a manuscript,” says Dr. Ish Kumar Dhammi and Dr. Rehan-Ul-Haq in Indian Journal of Orthopaedics.

According to Dr. James M. Provenzale in American Journal of Roentgenology , “The clear declaration of a research question (or hypothesis) in the Introduction is critical for reviewers to understand the intent of the research study. It is best to clearly state the study goal in plain language (for example, “We set out to determine whether condition x produces condition y.”) An insufficient problem statement is one of the more common reasons for manuscript rejection.”

Characteristics that make a hypothesis weak include:

  • Unclear variables
  • Unoriginality
  • Too general
  • Too specific

A weak hypothesis leads to weak research and methods . The goal of a paper is to prove or disprove a hypothesis - or to prove or disprove a null hypothesis. If the hypothesis is not a dependent variable of what is being studied, the paper's methods should come into question.

A strong hypothesis is essential to the scientific method. A hypothesis states an assumed relationship between at least two variables and the experiment then proves or disproves that relationship with statistical significance. Without a proven and reproducible relationship, the paper feeds into the reproducibility crisis. Learn more about writing for reproducibility .

In a study published in The Journal of Obstetrics and Gynecology of India by Dr. Suvarna Satish Khadilkar, she reviewed 400 rejected manuscripts to see why they were rejected. Her studies revealed that poor methodology was a top reason for the submission having a final disposition of rejection.

Aside from publication chances, Dr. Gareth Dyke believes a clear hypothesis helps efficiency.

“Developing a clear and testable hypothesis for your research project means that you will not waste time, energy, and money with your work,” said Dyke. “Refining a hypothesis that is both meaningful, interesting, attainable, and testable is the goal of all effective research.”

Types of research hypotheses

There can be overlap in these types of hypotheses.

Simple hypothesis

A simple hypothesis is a hypothesis at its most basic form. It shows the relationship of one independent and one independent variable.

Example: Drinking soda (independent variable) every day leads to obesity (dependent variable).

Complex hypothesis

A complex hypothesis shows the relationship of two or more independent and dependent variables.

Example: Drinking soda (independent variable) every day leads to obesity (dependent variable) and heart disease (dependent variable).

Directional hypothesis

A directional hypothesis guesses which way the results of an experiment will go. It uses words like increase, decrease, higher, lower, positive, negative, more, or less. It is also frequently used in statistics.

Example: Humans exposed to radiation have a higher risk of cancer than humans not exposed to radiation.

Non-directional hypothesis

A non-directional hypothesis says there will be an effect on the dependent variable, but it does not say which direction.

Associative hypothesis

An associative hypothesis says that when one variable changes, so does the other variable.

Alternative hypothesis

An alternative hypothesis states that the variables have a relationship.

  • The opposite of a null hypothesis

Example: An apple a day keeps the doctor away.

Null hypothesis

A null hypothesis states that there is no relationship between the two variables. It is posed as the opposite of what the alternative hypothesis states.

Researchers use a null hypothesis to work to be able to reject it. A null hypothesis:

  • Can never be proven
  • Can only be rejected
  • Is the opposite of an alternative hypothesis

Example: An apple a day does not keep the doctor away.

Logical hypothesis

A logical hypothesis is a suggested explanation while using limited evidence.

Example: Bats can navigate in the dark better than tigers.

In this hypothesis, the researcher knows that tigers cannot see in the dark, and bats mostly live in darkness.

Empirical hypothesis

An empirical hypothesis is also called a “working hypothesis.” It uses the trial and error method and changes around the independent variables.

  • An apple a day keeps the doctor away.
  • Two apples a day keep the doctor away.
  • Three apples a day keep the doctor away.

In this case, the research changes the hypothesis as the researcher learns more about his/her research.

Statistical hypothesis

A statistical hypothesis is a look of a part of a population or statistical model. This type of hypothesis is especially useful if you are making a statement about a large population. Instead of having to test the entire population of Illinois, you could just use a smaller sample of people who live there.

Example: 70% of people who live in Illinois are iron deficient.

Causal hypothesis

A causal hypothesis states that the independent variable will have an effect on the dependent variable.

Example: Using tobacco products causes cancer.

Final thoughts

Make sure your research is error-free before you send it to your preferred journal . Check our our English Editing services to avoid your chances of desk rejection.

Jonny Rhein, BA

Jonny Rhein, BA

See our "Privacy Policy"

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

example of a hypothesis for a research proposal

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

example of a hypothesis for a research proposal

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Research process
  • How to Write a Research Proposal | Examples & Templates

How to Write a Research Proposal | Examples & Templates

Published on 30 October 2022 by Shona McCombes and Tegan George. Revised on 13 June 2023.

Structure of a research proposal

A research proposal describes what you will investigate, why it’s important, and how you will conduct your research.

The format of a research proposal varies between fields, but most proposals will contain at least these elements:

Introduction

Literature review.

  • Research design

Reference list

While the sections may vary, the overall objective is always the same. A research proposal serves as a blueprint and guide for your research plan, helping you get organised and feel confident in the path forward you choose to take.

Table of contents

Research proposal purpose, research proposal examples, research design and methods, contribution to knowledge, research schedule, frequently asked questions.

Academics often have to write research proposals to get funding for their projects. As a student, you might have to write a research proposal as part of a grad school application , or prior to starting your thesis or dissertation .

In addition to helping you figure out what your research can look like, a proposal can also serve to demonstrate why your project is worth pursuing to a funder, educational institution, or supervisor.

Research proposal length

The length of a research proposal can vary quite a bit. A bachelor’s or master’s thesis proposal can be just a few pages, while proposals for PhD dissertations or research funding are usually much longer and more detailed. Your supervisor can help you determine the best length for your work.

One trick to get started is to think of your proposal’s structure as a shorter version of your thesis or dissertation , only without the results , conclusion and discussion sections.

Download our research proposal template

Prevent plagiarism, run a free check.

Writing a research proposal can be quite challenging, but a good starting point could be to look at some examples. We’ve included a few for you below.

  • Example research proposal #1: ‘A Conceptual Framework for Scheduling Constraint Management’
  • Example research proposal #2: ‘ Medical Students as Mediators of Change in Tobacco Use’

Like your dissertation or thesis, the proposal will usually have a title page that includes:

  • The proposed title of your project
  • Your supervisor’s name
  • Your institution and department

The first part of your proposal is the initial pitch for your project. Make sure it succinctly explains what you want to do and why.

Your introduction should:

  • Introduce your topic
  • Give necessary background and context
  • Outline your  problem statement  and research questions

To guide your introduction , include information about:

  • Who could have an interest in the topic (e.g., scientists, policymakers)
  • How much is already known about the topic
  • What is missing from this current knowledge
  • What new insights your research will contribute
  • Why you believe this research is worth doing

As you get started, it’s important to demonstrate that you’re familiar with the most important research on your topic. A strong literature review  shows your reader that your project has a solid foundation in existing knowledge or theory. It also shows that you’re not simply repeating what other people have already done or said, but rather using existing research as a jumping-off point for your own.

In this section, share exactly how your project will contribute to ongoing conversations in the field by:

  • Comparing and contrasting the main theories, methods, and debates
  • Examining the strengths and weaknesses of different approaches
  • Explaining how will you build on, challenge, or synthesise prior scholarship

Following the literature review, restate your main  objectives . This brings the focus back to your own project. Next, your research design or methodology section will describe your overall approach, and the practical steps you will take to answer your research questions.

To finish your proposal on a strong note, explore the potential implications of your research for your field. Emphasise again what you aim to contribute and why it matters.

For example, your results might have implications for:

  • Improving best practices
  • Informing policymaking decisions
  • Strengthening a theory or model
  • Challenging popular or scientific beliefs
  • Creating a basis for future research

Last but not least, your research proposal must include correct citations for every source you have used, compiled in a reference list . To create citations quickly and easily, you can use our free APA citation generator .

Some institutions or funders require a detailed timeline of the project, asking you to forecast what you will do at each stage and how long it may take. While not always required, be sure to check the requirements of your project.

Here’s an example schedule to help you get started. You can also download a template at the button below.

Download our research schedule template

If you are applying for research funding, chances are you will have to include a detailed budget. This shows your estimates of how much each part of your project will cost.

Make sure to check what type of costs the funding body will agree to cover. For each item, include:

  • Cost : exactly how much money do you need?
  • Justification : why is this cost necessary to complete the research?
  • Source : how did you calculate the amount?

To determine your budget, think about:

  • Travel costs : do you need to go somewhere to collect your data? How will you get there, and how much time will you need? What will you do there (e.g., interviews, archival research)?
  • Materials : do you need access to any tools or technologies?
  • Help : do you need to hire any research assistants for the project? What will they do, and how much will you pay them?

Once you’ve decided on your research objectives , you need to explain them in your paper, at the end of your problem statement.

Keep your research objectives clear and concise, and use appropriate verbs to accurately convey the work that you will carry out for each one.

I will compare …

A research aim is a broad statement indicating the general purpose of your research project. It should appear in your introduction at the end of your problem statement , before your research objectives.

Research objectives are more specific than your research aim. They indicate the specific ways you’ll address the overarching aim.

A PhD, which is short for philosophiae doctor (doctor of philosophy in Latin), is the highest university degree that can be obtained. In a PhD, students spend 3–5 years writing a dissertation , which aims to make a significant, original contribution to current knowledge.

A PhD is intended to prepare students for a career as a researcher, whether that be in academia, the public sector, or the private sector.

A master’s is a 1- or 2-year graduate degree that can prepare you for a variety of careers.

All master’s involve graduate-level coursework. Some are research-intensive and intend to prepare students for further study in a PhD; these usually require their students to write a master’s thesis . Others focus on professional training for a specific career.

Critical thinking refers to the ability to evaluate information and to be aware of biases or assumptions, including your own.

Like information literacy , it involves evaluating arguments, identifying and solving problems in an objective and systematic way, and clearly communicating your ideas.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. & George, T. (2023, June 13). How to Write a Research Proposal | Examples & Templates. Scribbr. Retrieved 26 May 2024, from https://www.scribbr.co.uk/the-research-process/research-proposal-explained/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is a research methodology | steps & tips, what is a literature review | guide, template, & examples, how to write a results section | tips & examples.

Elsevier QRcode Wechat

  • Manuscript Preparation

What is and How to Write a Good Hypothesis in Research?

  • 4 minute read
  • 317.7K views

Table of Contents

One of the most important aspects of conducting research is constructing a strong hypothesis. But what makes a hypothesis in research effective? In this article, we’ll look at the difference between a hypothesis and a research question, as well as the elements of a good hypothesis in research. We’ll also include some examples of effective hypotheses, and what pitfalls to avoid.

What is a Hypothesis in Research?

Simply put, a hypothesis is a research question that also includes the predicted or expected result of the research. Without a hypothesis, there can be no basis for a scientific or research experiment. As such, it is critical that you carefully construct your hypothesis by being deliberate and thorough, even before you set pen to paper. Unless your hypothesis is clearly and carefully constructed, any flaw can have an adverse, and even grave, effect on the quality of your experiment and its subsequent results.

Research Question vs Hypothesis

It’s easy to confuse research questions with hypotheses, and vice versa. While they’re both critical to the Scientific Method, they have very specific differences. Primarily, a research question, just like a hypothesis, is focused and concise. But a hypothesis includes a prediction based on the proposed research, and is designed to forecast the relationship of and between two (or more) variables. Research questions are open-ended, and invite debate and discussion, while hypotheses are closed, e.g. “The relationship between A and B will be C.”

A hypothesis is generally used if your research topic is fairly well established, and you are relatively certain about the relationship between the variables that will be presented in your research. Since a hypothesis is ideally suited for experimental studies, it will, by its very existence, affect the design of your experiment. The research question is typically used for new topics that have not yet been researched extensively. Here, the relationship between different variables is less known. There is no prediction made, but there may be variables explored. The research question can be casual in nature, simply trying to understand if a relationship even exists, descriptive or comparative.

How to Write Hypothesis in Research

Writing an effective hypothesis starts before you even begin to type. Like any task, preparation is key, so you start first by conducting research yourself, and reading all you can about the topic that you plan to research. From there, you’ll gain the knowledge you need to understand where your focus within the topic will lie.

Remember that a hypothesis is a prediction of the relationship that exists between two or more variables. Your job is to write a hypothesis, and design the research, to “prove” whether or not your prediction is correct. A common pitfall is to use judgments that are subjective and inappropriate for the construction of a hypothesis. It’s important to keep the focus and language of your hypothesis objective.

An effective hypothesis in research is clearly and concisely written, and any terms or definitions clarified and defined. Specific language must also be used to avoid any generalities or assumptions.

Use the following points as a checklist to evaluate the effectiveness of your research hypothesis:

  • Predicts the relationship and outcome
  • Simple and concise – avoid wordiness
  • Clear with no ambiguity or assumptions about the readers’ knowledge
  • Observable and testable results
  • Relevant and specific to the research question or problem

Research Hypothesis Example

Perhaps the best way to evaluate whether or not your hypothesis is effective is to compare it to those of your colleagues in the field. There is no need to reinvent the wheel when it comes to writing a powerful research hypothesis. As you’re reading and preparing your hypothesis, you’ll also read other hypotheses. These can help guide you on what works, and what doesn’t, when it comes to writing a strong research hypothesis.

Here are a few generic examples to get you started.

Eating an apple each day, after the age of 60, will result in a reduction of frequency of physician visits.

Budget airlines are more likely to receive more customer complaints. A budget airline is defined as an airline that offers lower fares and fewer amenities than a traditional full-service airline. (Note that the term “budget airline” is included in the hypothesis.

Workplaces that offer flexible working hours report higher levels of employee job satisfaction than workplaces with fixed hours.

Each of the above examples are specific, observable and measurable, and the statement of prediction can be verified or shown to be false by utilizing standard experimental practices. It should be noted, however, that often your hypothesis will change as your research progresses.

Language Editing Plus

Elsevier’s Language Editing Plus service can help ensure that your research hypothesis is well-designed, and articulates your research and conclusions. Our most comprehensive editing package, you can count on a thorough language review by native-English speakers who are PhDs or PhD candidates. We’ll check for effective logic and flow of your manuscript, as well as document formatting for your chosen journal, reference checks, and much more.

Systematic Literature Review or Literature Review

  • Research Process

Systematic Literature Review or Literature Review?

What is a Problem Statement

What is a Problem Statement? [with examples]

You may also like.

impactful introduction section

Make Hook, Line, and Sinker: The Art of Crafting Engaging Introductions

Limitations of a Research

Can Describing Study Limitations Improve the Quality of Your Paper?

Guide to Crafting Impactful Sentences

A Guide to Crafting Shorter, Impactful Sentences in Academic Writing

Write an Excellent Discussion in Your Manuscript

6 Steps to Write an Excellent Discussion in Your Manuscript

How to Write Clear Civil Engineering Papers

How to Write Clear and Crisp Civil Engineering Papers? Here are 5 Key Tips to Consider

Writing an Impactful Paper

The Clear Path to An Impactful Paper: ②

Essentials of Writing to Communicate Research in Medicine

The Essentials of Writing to Communicate Research in Medicine

There are some recognizable elements and patterns often used for framing engaging sentences in English. Find here the sentence patterns in Academic Writing

Changing Lines: Sentence Patterns in Academic Writing

Input your search keywords and press Enter.

Enago Academy

How to Develop a Good Research Hypothesis

' src=

The story of a research study begins by asking a question. Researchers all around the globe are asking curious questions and formulating research hypothesis. However, whether the research study provides an effective conclusion depends on how well one develops a good research hypothesis. Research hypothesis examples could help researchers get an idea as to how to write a good research hypothesis.

This blog will help you understand what is a research hypothesis, its characteristics and, how to formulate a research hypothesis

Table of Contents

What is Hypothesis?

Hypothesis is an assumption or an idea proposed for the sake of argument so that it can be tested. It is a precise, testable statement of what the researchers predict will be outcome of the study.  Hypothesis usually involves proposing a relationship between two variables: the independent variable (what the researchers change) and the dependent variable (what the research measures).

What is a Research Hypothesis?

Research hypothesis is a statement that introduces a research question and proposes an expected result. It is an integral part of the scientific method that forms the basis of scientific experiments. Therefore, you need to be careful and thorough when building your research hypothesis. A minor flaw in the construction of your hypothesis could have an adverse effect on your experiment. In research, there is a convention that the hypothesis is written in two forms, the null hypothesis, and the alternative hypothesis (called the experimental hypothesis when the method of investigation is an experiment).

Characteristics of a Good Research Hypothesis

As the hypothesis is specific, there is a testable prediction about what you expect to happen in a study. You may consider drawing hypothesis from previously published research based on the theory.

A good research hypothesis involves more effort than just a guess. In particular, your hypothesis may begin with a question that could be further explored through background research.

To help you formulate a promising research hypothesis, you should ask yourself the following questions:

  • Is the language clear and focused?
  • What is the relationship between your hypothesis and your research topic?
  • Is your hypothesis testable? If yes, then how?
  • What are the possible explanations that you might want to explore?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate your variables without hampering the ethical standards?
  • Does your research predict the relationship and outcome?
  • Is your research simple and concise (avoids wordiness)?
  • Is it clear with no ambiguity or assumptions about the readers’ knowledge
  • Is your research observable and testable results?
  • Is it relevant and specific to the research question or problem?

research hypothesis example

The questions listed above can be used as a checklist to make sure your hypothesis is based on a solid foundation. Furthermore, it can help you identify weaknesses in your hypothesis and revise it if necessary.

Source: Educational Hub

How to formulate a research hypothesis.

A testable hypothesis is not a simple statement. It is rather an intricate statement that needs to offer a clear introduction to a scientific experiment, its intentions, and the possible outcomes. However, there are some important things to consider when building a compelling hypothesis.

1. State the problem that you are trying to solve.

Make sure that the hypothesis clearly defines the topic and the focus of the experiment.

2. Try to write the hypothesis as an if-then statement.

Follow this template: If a specific action is taken, then a certain outcome is expected.

3. Define the variables

Independent variables are the ones that are manipulated, controlled, or changed. Independent variables are isolated from other factors of the study.

Dependent variables , as the name suggests are dependent on other factors of the study. They are influenced by the change in independent variable.

4. Scrutinize the hypothesis

Evaluate assumptions, predictions, and evidence rigorously to refine your understanding.

Types of Research Hypothesis

The types of research hypothesis are stated below:

1. Simple Hypothesis

It predicts the relationship between a single dependent variable and a single independent variable.

2. Complex Hypothesis

It predicts the relationship between two or more independent and dependent variables.

3. Directional Hypothesis

It specifies the expected direction to be followed to determine the relationship between variables and is derived from theory. Furthermore, it implies the researcher’s intellectual commitment to a particular outcome.

4. Non-directional Hypothesis

It does not predict the exact direction or nature of the relationship between the two variables. The non-directional hypothesis is used when there is no theory involved or when findings contradict previous research.

5. Associative and Causal Hypothesis

The associative hypothesis defines interdependency between variables. A change in one variable results in the change of the other variable. On the other hand, the causal hypothesis proposes an effect on the dependent due to manipulation of the independent variable.

6. Null Hypothesis

Null hypothesis states a negative statement to support the researcher’s findings that there is no relationship between two variables. There will be no changes in the dependent variable due the manipulation of the independent variable. Furthermore, it states results are due to chance and are not significant in terms of supporting the idea being investigated.

7. Alternative Hypothesis

It states that there is a relationship between the two variables of the study and that the results are significant to the research topic. An experimental hypothesis predicts what changes will take place in the dependent variable when the independent variable is manipulated. Also, it states that the results are not due to chance and that they are significant in terms of supporting the theory being investigated.

Research Hypothesis Examples of Independent and Dependent Variables

Research Hypothesis Example 1 The greater number of coal plants in a region (independent variable) increases water pollution (dependent variable). If you change the independent variable (building more coal factories), it will change the dependent variable (amount of water pollution).
Research Hypothesis Example 2 What is the effect of diet or regular soda (independent variable) on blood sugar levels (dependent variable)? If you change the independent variable (the type of soda you consume), it will change the dependent variable (blood sugar levels)

You should not ignore the importance of the above steps. The validity of your experiment and its results rely on a robust testable hypothesis. Developing a strong testable hypothesis has few advantages, it compels us to think intensely and specifically about the outcomes of a study. Consequently, it enables us to understand the implication of the question and the different variables involved in the study. Furthermore, it helps us to make precise predictions based on prior research. Hence, forming a hypothesis would be of great value to the research. Here are some good examples of testable hypotheses.

More importantly, you need to build a robust testable research hypothesis for your scientific experiments. A testable hypothesis is a hypothesis that can be proved or disproved as a result of experimentation.

Importance of a Testable Hypothesis

To devise and perform an experiment using scientific method, you need to make sure that your hypothesis is testable. To be considered testable, some essential criteria must be met:

  • There must be a possibility to prove that the hypothesis is true.
  • There must be a possibility to prove that the hypothesis is false.
  • The results of the hypothesis must be reproducible.

Without these criteria, the hypothesis and the results will be vague. As a result, the experiment will not prove or disprove anything significant.

What are your experiences with building hypotheses for scientific experiments? What challenges did you face? How did you overcome these challenges? Please share your thoughts with us in the comments section.

Frequently Asked Questions

The steps to write a research hypothesis are: 1. Stating the problem: Ensure that the hypothesis defines the research problem 2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a ‘if-then’ structure. 3. Defining the variables: Define the variables as Dependent or Independent based on their dependency to other factors. 4. Scrutinizing the hypothesis: Identify the type of your hypothesis

Hypothesis testing is a statistical tool which is used to make inferences about a population data to draw conclusions for a particular hypothesis.

Hypothesis in statistics is a formal statement about the nature of a population within a structured framework of a statistical model. It is used to test an existing hypothesis by studying a population.

Research hypothesis is a statement that introduces a research question and proposes an expected result. It forms the basis of scientific experiments.

The different types of hypothesis in research are: • Null hypothesis: Null hypothesis is a negative statement to support the researcher’s findings that there is no relationship between two variables. • Alternate hypothesis: Alternate hypothesis predicts the relationship between the two variables of the study. • Directional hypothesis: Directional hypothesis specifies the expected direction to be followed to determine the relationship between variables. • Non-directional hypothesis: Non-directional hypothesis does not predict the exact direction or nature of the relationship between the two variables. • Simple hypothesis: Simple hypothesis predicts the relationship between a single dependent variable and a single independent variable. • Complex hypothesis: Complex hypothesis predicts the relationship between two or more independent and dependent variables. • Associative and casual hypothesis: Associative and casual hypothesis predicts the relationship between two or more independent and dependent variables. • Empirical hypothesis: Empirical hypothesis can be tested via experiments and observation. • Statistical hypothesis: A statistical hypothesis utilizes statistical models to draw conclusions about broader populations.

' src=

Wow! You really simplified your explanation that even dummies would find it easy to comprehend. Thank you so much.

Thanks a lot for your valuable guidance.

I enjoy reading the post. Hypotheses are actually an intrinsic part in a study. It bridges the research question and the methodology of the study.

Useful piece!

This is awesome.Wow.

It very interesting to read the topic, can you guide me any specific example of hypothesis process establish throw the Demand and supply of the specific product in market

Nicely explained

It is really a useful for me Kindly give some examples of hypothesis

It was a well explained content ,can you please give me an example with the null and alternative hypothesis illustrated

clear and concise. thanks.

So Good so Amazing

Good to learn

Thanks a lot for explaining to my level of understanding

Explained well and in simple terms. Quick read! Thank you

It awesome. It has really positioned me in my research project

Rate this article Cancel Reply

Your email address will not be published.

example of a hypothesis for a research proposal

Enago Academy's Most Popular Articles

Content Analysis vs Thematic Analysis: What's the difference?

  • Reporting Research

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for data interpretation

In research, choosing the right approach to understand data is crucial for deriving meaningful insights.…

Cross-sectional and Longitudinal Study Design

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right approach

The process of choosing the right research design can put ourselves at the crossroads of…

example of a hypothesis for a research proposal

  • Industry News

COPE Forum Discussion Highlights Challenges and Urges Clarity in Institutional Authorship Standards

The COPE forum discussion held in December 2023 initiated with a fundamental question — is…

Networking in Academic Conferences

  • Career Corner

Unlocking the Power of Networking in Academic Conferences

Embarking on your first academic conference experience? Fear not, we got you covered! Academic conferences…

Research recommendation

Research Recommendations – Guiding policy-makers for evidence-based decision making

Research recommendations play a crucial role in guiding scholars and researchers toward fruitful avenues of…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

How to Design Effective Research Questionnaires for Robust Findings

example of a hypothesis for a research proposal

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

example of a hypothesis for a research proposal

As a researcher, what do you consider most when choosing an image manipulation detector?

Grad Coach

What (Exactly) Is A Research Proposal?

A simple explainer with examples + free template.

By: Derek Jansen (MBA) | Reviewed By: Dr Eunice Rautenbach | June 2020 (Updated April 2023)

Whether you’re nearing the end of your degree and your dissertation is on the horizon, or you’re planning to apply for a PhD program, chances are you’ll need to craft a convincing research proposal . If you’re on this page, you’re probably unsure exactly what the research proposal is all about. Well, you’ve come to the right place.

Overview: Research Proposal Basics

  • What a research proposal is
  • What a research proposal needs to cover
  • How to structure your research proposal
  • Example /sample proposals
  • Proposal writing FAQs
  • Key takeaways & additional resources

What is a research proposal?

Simply put, a research proposal is a structured, formal document that explains what you plan to research (your research topic), why it’s worth researching (your justification), and how  you plan to investigate it (your methodology). 

The purpose of the research proposal (its job, so to speak) is to convince  your research supervisor, committee or university that your research is  suitable  (for the requirements of the degree program) and  manageable  (given the time and resource constraints you will face). 

The most important word here is “ convince ” – in other words, your research proposal needs to  sell  your research idea (to whoever is going to approve it). If it doesn’t convince them (of its suitability and manageability), you’ll need to revise and resubmit . This will cost you valuable time, which will either delay the start of your research or eat into its time allowance (which is bad news). 

A research proposal is a  formal document that explains what you plan to research , why it's worth researching and how you'll do it.

What goes into a research proposal?

A good dissertation or thesis proposal needs to cover the “ what “, “ why ” and” how ” of the proposed study. Let’s look at each of these attributes in a little more detail:

Your proposal needs to clearly articulate your research topic . This needs to be specific and unambiguous . Your research topic should make it clear exactly what you plan to research and in what context. Here’s an example of a well-articulated research topic:

An investigation into the factors which impact female Generation Y consumer’s likelihood to promote a specific makeup brand to their peers: a British context

As you can see, this topic is extremely clear. From this one line we can see exactly:

  • What’s being investigated – factors that make people promote or advocate for a brand of a specific makeup brand
  • Who it involves – female Gen-Y consumers
  • In what context – the United Kingdom

So, make sure that your research proposal provides a detailed explanation of your research topic . If possible, also briefly outline your research aims and objectives , and perhaps even your research questions (although in some cases you’ll only develop these at a later stage). Needless to say, don’t start writing your proposal until you have a clear topic in mind , or you’ll end up waffling and your research proposal will suffer as a result of this.

Need a helping hand?

example of a hypothesis for a research proposal

As we touched on earlier, it’s not good enough to simply propose a research topic – you need to justify why your topic is original . In other words, what makes it  unique ? What gap in the current literature does it fill? If it’s simply a rehash of the existing research, it’s probably not going to get approval – it needs to be fresh.

But,  originality  alone is not enough. Once you’ve ticked that box, you also need to justify why your proposed topic is  important . In other words, what value will it add to the world if you achieve your research aims?

As an example, let’s look at the sample research topic we mentioned earlier (factors impacting brand advocacy). In this case, if the research could uncover relevant factors, these findings would be very useful to marketers in the cosmetics industry, and would, therefore, have commercial value . That is a clear justification for the research.

So, when you’re crafting your research proposal, remember that it’s not enough for a topic to simply be unique. It needs to be useful and value-creating – and you need to convey that value in your proposal. If you’re struggling to find a research topic that makes the cut, watch  our video covering how to find a research topic .

Free Webinar: How To Write A Research Proposal

It’s all good and well to have a great topic that’s original and valuable, but you’re not going to convince anyone to approve it without discussing the practicalities – in other words:

  • How will you actually undertake your research (i.e., your methodology)?
  • Is your research methodology appropriate given your research aims?
  • Is your approach manageable given your constraints (time, money, etc.)?

While it’s generally not expected that you’ll have a fully fleshed-out methodology at the proposal stage, you’ll likely still need to provide a high-level overview of your research methodology . Here are some important questions you’ll need to address in your research proposal:

  • Will you take a qualitative , quantitative or mixed -method approach?
  • What sampling strategy will you adopt?
  • How will you collect your data (e.g., interviews, surveys, etc)?
  • How will you analyse your data (e.g., descriptive and inferential statistics , content analysis, discourse analysis, etc, .)?
  • What potential limitations will your methodology carry?

So, be sure to give some thought to the practicalities of your research and have at least a basic methodological plan before you start writing up your proposal. If this all sounds rather intimidating, the video below provides a good introduction to research methodology and the key choices you’ll need to make.

How To Structure A Research Proposal

Now that we’ve covered the key points that need to be addressed in a proposal, you may be wondering, “ But how is a research proposal structured? “.

While the exact structure and format required for a research proposal differs from university to university, there are four “essential ingredients” that commonly make up the structure of a research proposal:

  • A rich introduction and background to the proposed research
  • An initial literature review covering the existing research
  • An overview of the proposed research methodology
  • A discussion regarding the practicalities (project plans, timelines, etc.)

In the video below, we unpack each of these four sections, step by step.

Research Proposal Examples/Samples

In the video below, we provide a detailed walkthrough of two successful research proposals (Master’s and PhD-level), as well as our popular free proposal template.

Proposal Writing FAQs

How long should a research proposal be.

This varies tremendously, depending on the university, the field of study (e.g., social sciences vs natural sciences), and the level of the degree (e.g. undergraduate, Masters or PhD) – so it’s always best to check with your university what their specific requirements are before you start planning your proposal.

As a rough guide, a formal research proposal at Masters-level often ranges between 2000-3000 words, while a PhD-level proposal can be far more detailed, ranging from 5000-8000 words. In some cases, a rough outline of the topic is all that’s needed, while in other cases, universities expect a very detailed proposal that essentially forms the first three chapters of the dissertation or thesis.

The takeaway – be sure to check with your institution before you start writing.

How do I choose a topic for my research proposal?

Finding a good research topic is a process that involves multiple steps. We cover the topic ideation process in this video post.

How do I write a literature review for my proposal?

While you typically won’t need a comprehensive literature review at the proposal stage, you still need to demonstrate that you’re familiar with the key literature and are able to synthesise it. We explain the literature review process here.

How do I create a timeline and budget for my proposal?

We explain how to craft a project plan/timeline and budget in Research Proposal Bootcamp .

Which referencing format should I use in my research proposal?

The expectations and requirements regarding formatting and referencing vary from institution to institution. Therefore, you’ll need to check this information with your university.

What common proposal writing mistakes do I need to look out for?

We’ve create a video post about some of the most common mistakes students make when writing a proposal – you can access that here . If you’re short on time, here’s a quick summary:

  • The research topic is too broad (or just poorly articulated).
  • The research aims, objectives and questions don’t align.
  • The research topic is not well justified.
  • The study has a weak theoretical foundation.
  • The research design is not well articulated well enough.
  • Poor writing and sloppy presentation.
  • Poor project planning and risk management.
  • Not following the university’s specific criteria.

Key Takeaways & Additional Resources

As you write up your research proposal, remember the all-important core purpose:  to convince . Your research proposal needs to sell your study in terms of suitability and viability. So, focus on crafting a convincing narrative to ensure a strong proposal.

At the same time, pay close attention to your university’s requirements. While we’ve covered the essentials here, every institution has its own set of expectations and it’s essential that you follow these to maximise your chances of approval.

By the way, we’ve got plenty more resources to help you fast-track your research proposal. Here are some of our most popular resources to get you started:

  • Proposal Writing 101 : A Introductory Webinar
  • Research Proposal Bootcamp : The Ultimate Online Course
  • Template : A basic template to help you craft your proposal

If you’re looking for 1-on-1 support with your research proposal, be sure to check out our private coaching service , where we hold your hand through the proposal development process (and the entire research journey), step by step.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Research Proposal Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Discourse analysis 101

51 Comments

Myrna Pereira

I truly enjoyed this video, as it was eye-opening to what I have to do in the preparation of preparing a Research proposal.

I would be interested in getting some coaching.

BARAKAELI TEREVAELI

I real appreciate on your elaboration on how to develop research proposal,the video explains each steps clearly.

masebo joseph

Thank you for the video. It really assisted me and my niece. I am a PhD candidate and she is an undergraduate student. It is at times, very difficult to guide a family member but with this video, my job is done.

In view of the above, I welcome more coaching.

Zakia Ghafoor

Wonderful guidelines, thanks

Annie Malupande

This is very helpful. Would love to continue even as I prepare for starting my masters next year.

KYARIKUNDA MOREEN

Thanks for the work done, the text was helpful to me

Ahsanullah Mangal

Bundle of thanks to you for the research proposal guide it was really good and useful if it is possible please send me the sample of research proposal

Derek Jansen

You’re most welcome. We don’t have any research proposals that we can share (the students own the intellectual property), but you might find our research proposal template useful: https://gradcoach.com/research-proposal-template/

Cheruiyot Moses Kipyegon

Cheruiyot Moses Kipyegon

Thanks alot. It was an eye opener that came timely enough before my imminent proposal defense. Thanks, again

agnelius

thank you very much your lesson is very interested may God be with you

Abubakar

I am an undergraduate student (First Degree) preparing to write my project,this video and explanation had shed more light to me thanks for your efforts keep it up.

Synthia Atieno

Very useful. I am grateful.

belina nambeya

this is a very a good guidance on research proposal, for sure i have learnt something

Wonderful guidelines for writing a research proposal, I am a student of m.phil( education), this guideline is suitable for me. Thanks

You’re welcome 🙂

Marjorie

Thank you, this was so helpful.

Amitash Degan

A really great and insightful video. It opened my eyes as to how to write a research paper. I would like to receive more guidance for writing my research paper from your esteemed faculty.

Glaudia Njuguna

Thank you, great insights

Thank you, great insights, thank you so much, feeling edified

Yebirgual

Wow thank you, great insights, thanks a lot

Roseline Soetan

Thank you. This is a great insight. I am a student preparing for a PhD program. I am requested to write my Research Proposal as part of what I am required to submit before my unconditional admission. I am grateful having listened to this video which will go a long way in helping me to actually choose a topic of interest and not just any topic as well as to narrow down the topic and be specific about it. I indeed need more of this especially as am trying to choose a topic suitable for a DBA am about embarking on. Thank you once more. The video is indeed helpful.

Rebecca

Have learnt a lot just at the right time. Thank you so much.

laramato ikayo

thank you very much ,because have learn a lot things concerning research proposal and be blessed u for your time that you providing to help us

Cheruiyot M Kipyegon

Hi. For my MSc medical education research, please evaluate this topic for me: Training Needs Assessment of Faculty in Medical Training Institutions in Kericho and Bomet Counties

Rebecca

I have really learnt a lot based on research proposal and it’s formulation

Arega Berlie

Thank you. I learn much from the proposal since it is applied

Siyanda

Your effort is much appreciated – you have good articulation.

You have good articulation.

Douglas Eliaba

I do applaud your simplified method of explaining the subject matter, which indeed has broaden my understanding of the subject matter. Definitely this would enable me writing a sellable research proposal.

Weluzani

This really helping

Roswitta

Great! I liked your tutoring on how to find a research topic and how to write a research proposal. Precise and concise. Thank you very much. Will certainly share this with my students. Research made simple indeed.

Alice Kuyayama

Thank you very much. I an now assist my students effectively.

Thank you very much. I can now assist my students effectively.

Abdurahman Bayoh

I need any research proposal

Silverline

Thank you for these videos. I will need chapter by chapter assistance in writing my MSc dissertation

Nosi

Very helpfull

faith wugah

the videos are very good and straight forward

Imam

thanks so much for this wonderful presentations, i really enjoyed it to the fullest wish to learn more from you

Bernie E. Balmeo

Thank you very much. I learned a lot from your lecture.

Ishmael kwame Appiah

I really enjoy the in-depth knowledge on research proposal you have given. me. You have indeed broaden my understanding and skills. Thank you

David Mweemba

interesting session this has equipped me with knowledge as i head for exams in an hour’s time, am sure i get A++

Andrea Eccleston

This article was most informative and easy to understand. I now have a good idea of how to write my research proposal.

Thank you very much.

Georgina Ngufan

Wow, this literature is very resourceful and interesting to read. I enjoyed it and I intend reading it every now then.

Charity

Thank you for the clarity

Mondika Solomon

Thank you. Very helpful.

BLY

Thank you very much for this essential piece. I need 1o1 coaching, unfortunately, your service is not available in my country. Anyways, a very important eye-opener. I really enjoyed it. A thumb up to Gradcoach

Md Moneruszzaman Kayes

What is JAM? Please explain.

Gentiana

Thank you so much for these videos. They are extremely helpful! God bless!

azeem kakar

very very wonderful…

Koang Kuany Bol Nyot

thank you for the video but i need a written example

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Yale Shield

Science & Quantitative Reasoning Education

Yale undergraduate research, how to write a proposal.

The abstract should summarize your proposal. Include one sentence to introduce the problem you are investigating, why this problem is significant, the hypothesis to be tested, a brief summary of experiments that you wish to conduct and a single concluding sentence. (250 word limit)

Introduction

The introduction discusses the background and significance of the problem you are investigating. Lead the reader from the general to the specific. For example, if you want to write about the role that Brca1 mutations play in breast cancer pathogenesis, talk first about the significance of breast cancer as a disease in the US/world population, then about familial breast cancer as a small subset of breast cancers in general, then about discovery of Brca1 mutations in familial breast cancer, then Brca1’s normal functions in DNA repair, then about how Brca1 mutations result in damaged DNA and onset of familial breast cancer, etc. Definitely include figures with properly labeled text boxes (designated as Figure 1, Figure 2, etc) here to better illustrate your points and help your reader wade through unfamiliar science. (3 pages max)

Formulate a hypothesis that will be tested in your grant proposal. Remember, you are doing hypothesis-driven research so there should be a hypothesis to be tested! The hypothesis should be focused, concise and flow logically from the introduction. For example, your hypothesis could be “I hypothesize that overexpressing wild type Brca1 in Brca1 null tumor cells will prevent metastatic spread in a mouse xenograph model.” Based on your hypothesis, your Specific Aims section should be geared to support it. The hypothesis is stated in one sentence in the proposal. 

Specific Aims (listed as Specific Aim 1, Specific Aim 2)

This is where you will want to work with your mentor to craft the experimental portion of your proposal. Propose two original specific aims to test your hypothesis. Don’t propose more than two aims-you will NOT have enough time to do more. In the example presented, Specific Aim 1 might be “To determine the oncogenic potential of Brca1 null cell lines expressing wild type Brca1 cDNA”. Specific aim 2 might be “To determine the metastatic potential of Brca1 null cells that express WT Brca1”. You do not have to go into extensive technical details, just enough for the reader to understand what you propose to do. The best aims yield mechanistic insights-that is, experiments proposed address some mechanisms of biology. A less desirable aim proposes correlative experiments that does not address mechanistically how BRCA1 mutations generate cancer. It is also very important that the two aims are related but NOT interdependent. What this means is that if Aim 1 doesn’t work, Aim 2 is not automatically dead. For example, say you propose in Aim 1 to generate a BRCA1 knockout mouse model, and in Aim 2 you will take tissues from this mouse to do experiments. If knocking out BRCA1 results in early embryonic death, you will never get a mouse that yields tissues for Aim 2. You can include some of your mentor’s data here as “Preliminary data”. Remember to carefully cite all your sources. (4 pages max; 2 pages per Aim)

Potential pitfalls and alternative strategies

This is a very important part of any proposal. This is where you want to discuss the experiments you propose in Aims 1 and 2. Remember, no experiment is perfect. Are there any reasons why experiments you proposed might not work? Why? What will you do to resolve this? What are other possible strategies you might use if your experiments don’t work? If a reviewer spots these deficiencies and you don’t propose methods to correct them, your proposal will not get funded. You will want to work with your mentor to write this section. (1/2 page per Aim)

Cite all references, including unpublished data from your mentor. Last, First, (year), Title, Journal, volume, pages.

*8 page proposal limit (not including References), 1.5 spacing, 12pt Times New Roman font

  • View an example of a research proposal submitted for the Yale College First-Year Summer Research Fellowship (PDF).  
  • View an example of a research proposal submitted for the Yale College Dean’s Research Fellowship and the Rosenfeld Science Scholars Program (PDF) .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

How to write a research proposal?

Department of Anaesthesiology, Bangalore Medical College and Research Institute, Bengaluru, Karnataka, India

Devika Rani Duggappa

Writing the proposal of a research work in the present era is a challenging task due to the constantly evolving trends in the qualitative research design and the need to incorporate medical advances into the methodology. The proposal is a detailed plan or ‘blueprint’ for the intended study, and once it is completed, the research project should flow smoothly. Even today, many of the proposals at post-graduate evaluation committees and application proposals for funding are substandard. A search was conducted with keywords such as research proposal, writing proposal and qualitative using search engines, namely, PubMed and Google Scholar, and an attempt has been made to provide broad guidelines for writing a scientifically appropriate research proposal.

INTRODUCTION

A clean, well-thought-out proposal forms the backbone for the research itself and hence becomes the most important step in the process of conduct of research.[ 1 ] The objective of preparing a research proposal would be to obtain approvals from various committees including ethics committee [details under ‘Research methodology II’ section [ Table 1 ] in this issue of IJA) and to request for grants. However, there are very few universally accepted guidelines for preparation of a good quality research proposal. A search was performed with keywords such as research proposal, funding, qualitative and writing proposals using search engines, namely, PubMed, Google Scholar and Scopus.

Five ‘C’s while writing a literature review

An external file that holds a picture, illustration, etc.
Object name is IJA-60-631-g001.jpg

BASIC REQUIREMENTS OF A RESEARCH PROPOSAL

A proposal needs to show how your work fits into what is already known about the topic and what new paradigm will it add to the literature, while specifying the question that the research will answer, establishing its significance, and the implications of the answer.[ 2 ] The proposal must be capable of convincing the evaluation committee about the credibility, achievability, practicality and reproducibility (repeatability) of the research design.[ 3 ] Four categories of audience with different expectations may be present in the evaluation committees, namely academic colleagues, policy-makers, practitioners and lay audiences who evaluate the research proposal. Tips for preparation of a good research proposal include; ‘be practical, be persuasive, make broader links, aim for crystal clarity and plan before you write’. A researcher must be balanced, with a realistic understanding of what can be achieved. Being persuasive implies that researcher must be able to convince other researchers, research funding agencies, educational institutions and supervisors that the research is worth getting approval. The aim of the researcher should be clearly stated in simple language that describes the research in a way that non-specialists can comprehend, without use of jargons. The proposal must not only demonstrate that it is based on an intelligent understanding of the existing literature but also show that the writer has thought about the time needed to conduct each stage of the research.[ 4 , 5 ]

CONTENTS OF A RESEARCH PROPOSAL

The contents or formats of a research proposal vary depending on the requirements of evaluation committee and are generally provided by the evaluation committee or the institution.

In general, a cover page should contain the (i) title of the proposal, (ii) name and affiliation of the researcher (principal investigator) and co-investigators, (iii) institutional affiliation (degree of the investigator and the name of institution where the study will be performed), details of contact such as phone numbers, E-mail id's and lines for signatures of investigators.

The main contents of the proposal may be presented under the following headings: (i) introduction, (ii) review of literature, (iii) aims and objectives, (iv) research design and methods, (v) ethical considerations, (vi) budget, (vii) appendices and (viii) citations.[ 4 ]

Introduction

It is also sometimes termed as ‘need for study’ or ‘abstract’. Introduction is an initial pitch of an idea; it sets the scene and puts the research in context.[ 6 ] The introduction should be designed to create interest in the reader about the topic and proposal. It should convey to the reader, what you want to do, what necessitates the study and your passion for the topic.[ 7 ] Some questions that can be used to assess the significance of the study are: (i) Who has an interest in the domain of inquiry? (ii) What do we already know about the topic? (iii) What has not been answered adequately in previous research and practice? (iv) How will this research add to knowledge, practice and policy in this area? Some of the evaluation committees, expect the last two questions, elaborated under a separate heading of ‘background and significance’.[ 8 ] Introduction should also contain the hypothesis behind the research design. If hypothesis cannot be constructed, the line of inquiry to be used in the research must be indicated.

Review of literature

It refers to all sources of scientific evidence pertaining to the topic in interest. In the present era of digitalisation and easy accessibility, there is an enormous amount of relevant data available, making it a challenge for the researcher to include all of it in his/her review.[ 9 ] It is crucial to structure this section intelligently so that the reader can grasp the argument related to your study in relation to that of other researchers, while still demonstrating to your readers that your work is original and innovative. It is preferable to summarise each article in a paragraph, highlighting the details pertinent to the topic of interest. The progression of review can move from the more general to the more focused studies, or a historical progression can be used to develop the story, without making it exhaustive.[ 1 ] Literature should include supporting data, disagreements and controversies. Five ‘C's may be kept in mind while writing a literature review[ 10 ] [ Table 1 ].

Aims and objectives

The research purpose (or goal or aim) gives a broad indication of what the researcher wishes to achieve in the research. The hypothesis to be tested can be the aim of the study. The objectives related to parameters or tools used to achieve the aim are generally categorised as primary and secondary objectives.

Research design and method

The objective here is to convince the reader that the overall research design and methods of analysis will correctly address the research problem and to impress upon the reader that the methodology/sources chosen are appropriate for the specific topic. It should be unmistakably tied to the specific aims of your study.

In this section, the methods and sources used to conduct the research must be discussed, including specific references to sites, databases, key texts or authors that will be indispensable to the project. There should be specific mention about the methodological approaches to be undertaken to gather information, about the techniques to be used to analyse it and about the tests of external validity to which researcher is committed.[ 10 , 11 ]

The components of this section include the following:[ 4 ]

Population and sample

Population refers to all the elements (individuals, objects or substances) that meet certain criteria for inclusion in a given universe,[ 12 ] and sample refers to subset of population which meets the inclusion criteria for enrolment into the study. The inclusion and exclusion criteria should be clearly defined. The details pertaining to sample size are discussed in the article “Sample size calculation: Basic priniciples” published in this issue of IJA.

Data collection

The researcher is expected to give a detailed account of the methodology adopted for collection of data, which include the time frame required for the research. The methodology should be tested for its validity and ensure that, in pursuit of achieving the results, the participant's life is not jeopardised. The author should anticipate and acknowledge any potential barrier and pitfall in carrying out the research design and explain plans to address them, thereby avoiding lacunae due to incomplete data collection. If the researcher is planning to acquire data through interviews or questionnaires, copy of the questions used for the same should be attached as an annexure with the proposal.

Rigor (soundness of the research)

This addresses the strength of the research with respect to its neutrality, consistency and applicability. Rigor must be reflected throughout the proposal.

It refers to the robustness of a research method against bias. The author should convey the measures taken to avoid bias, viz. blinding and randomisation, in an elaborate way, thus ensuring that the result obtained from the adopted method is purely as chance and not influenced by other confounding variables.

Consistency

Consistency considers whether the findings will be consistent if the inquiry was replicated with the same participants and in a similar context. This can be achieved by adopting standard and universally accepted methods and scales.

Applicability

Applicability refers to the degree to which the findings can be applied to different contexts and groups.[ 13 ]

Data analysis

This section deals with the reduction and reconstruction of data and its analysis including sample size calculation. The researcher is expected to explain the steps adopted for coding and sorting the data obtained. Various tests to be used to analyse the data for its robustness, significance should be clearly stated. Author should also mention the names of statistician and suitable software which will be used in due course of data analysis and their contribution to data analysis and sample calculation.[ 9 ]

Ethical considerations

Medical research introduces special moral and ethical problems that are not usually encountered by other researchers during data collection, and hence, the researcher should take special care in ensuring that ethical standards are met. Ethical considerations refer to the protection of the participants' rights (right to self-determination, right to privacy, right to autonomy and confidentiality, right to fair treatment and right to protection from discomfort and harm), obtaining informed consent and the institutional review process (ethical approval). The researcher needs to provide adequate information on each of these aspects.

Informed consent needs to be obtained from the participants (details discussed in further chapters), as well as the research site and the relevant authorities.

When the researcher prepares a research budget, he/she should predict and cost all aspects of the research and then add an additional allowance for unpredictable disasters, delays and rising costs. All items in the budget should be justified.

Appendices are documents that support the proposal and application. The appendices will be specific for each proposal but documents that are usually required include informed consent form, supporting documents, questionnaires, measurement tools and patient information of the study in layman's language.

As with any scholarly research paper, you must cite the sources you used in composing your proposal. Although the words ‘references and bibliography’ are different, they are used interchangeably. It refers to all references cited in the research proposal.

Successful, qualitative research proposals should communicate the researcher's knowledge of the field and method and convey the emergent nature of the qualitative design. The proposal should follow a discernible logic from the introduction to presentation of the appendices.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Privacy Policy

Research Method

Home » How To Write A Research Proposal – Step-by-Step [Template]

How To Write A Research Proposal – Step-by-Step [Template]

Table of Contents

How To Write a Research Proposal

How To Write a Research Proposal

Writing a Research proposal involves several steps to ensure a well-structured and comprehensive document. Here is an explanation of each step:

1. Title and Abstract

  • Choose a concise and descriptive title that reflects the essence of your research.
  • Write an abstract summarizing your research question, objectives, methodology, and expected outcomes. It should provide a brief overview of your proposal.

2. Introduction:

  • Provide an introduction to your research topic, highlighting its significance and relevance.
  • Clearly state the research problem or question you aim to address.
  • Discuss the background and context of the study, including previous research in the field.

3. Research Objectives

  • Outline the specific objectives or aims of your research. These objectives should be clear, achievable, and aligned with the research problem.

4. Literature Review:

  • Conduct a comprehensive review of relevant literature and studies related to your research topic.
  • Summarize key findings, identify gaps, and highlight how your research will contribute to the existing knowledge.

5. Methodology:

  • Describe the research design and methodology you plan to employ to address your research objectives.
  • Explain the data collection methods, instruments, and analysis techniques you will use.
  • Justify why the chosen methods are appropriate and suitable for your research.

6. Timeline:

  • Create a timeline or schedule that outlines the major milestones and activities of your research project.
  • Break down the research process into smaller tasks and estimate the time required for each task.

7. Resources:

  • Identify the resources needed for your research, such as access to specific databases, equipment, or funding.
  • Explain how you will acquire or utilize these resources to carry out your research effectively.

8. Ethical Considerations:

  • Discuss any ethical issues that may arise during your research and explain how you plan to address them.
  • If your research involves human subjects, explain how you will ensure their informed consent and privacy.

9. Expected Outcomes and Significance:

  • Clearly state the expected outcomes or results of your research.
  • Highlight the potential impact and significance of your research in advancing knowledge or addressing practical issues.

10. References:

  • Provide a list of all the references cited in your proposal, following a consistent citation style (e.g., APA, MLA).

11. Appendices:

  • Include any additional supporting materials, such as survey questionnaires, interview guides, or data analysis plans.

Research Proposal Format

The format of a research proposal may vary depending on the specific requirements of the institution or funding agency. However, the following is a commonly used format for a research proposal:

1. Title Page:

  • Include the title of your research proposal, your name, your affiliation or institution, and the date.

2. Abstract:

  • Provide a brief summary of your research proposal, highlighting the research problem, objectives, methodology, and expected outcomes.

3. Introduction:

  • Introduce the research topic and provide background information.
  • State the research problem or question you aim to address.
  • Explain the significance and relevance of the research.
  • Review relevant literature and studies related to your research topic.
  • Summarize key findings and identify gaps in the existing knowledge.
  • Explain how your research will contribute to filling those gaps.

5. Research Objectives:

  • Clearly state the specific objectives or aims of your research.
  • Ensure that the objectives are clear, focused, and aligned with the research problem.

6. Methodology:

  • Describe the research design and methodology you plan to use.
  • Explain the data collection methods, instruments, and analysis techniques.
  • Justify why the chosen methods are appropriate for your research.

7. Timeline:

8. Resources:

  • Explain how you will acquire or utilize these resources effectively.

9. Ethical Considerations:

  • If applicable, explain how you will ensure informed consent and protect the privacy of research participants.

10. Expected Outcomes and Significance:

11. References:

12. Appendices:

Research Proposal Template

Here’s a template for a research proposal:

1. Introduction:

2. Literature Review:

3. Research Objectives:

4. Methodology:

5. Timeline:

6. Resources:

7. Ethical Considerations:

8. Expected Outcomes and Significance:

9. References:

10. Appendices:

Research Proposal Sample

Title: The Impact of Online Education on Student Learning Outcomes: A Comparative Study

1. Introduction

Online education has gained significant prominence in recent years, especially due to the COVID-19 pandemic. This research proposal aims to investigate the impact of online education on student learning outcomes by comparing them with traditional face-to-face instruction. The study will explore various aspects of online education, such as instructional methods, student engagement, and academic performance, to provide insights into the effectiveness of online learning.

2. Objectives

The main objectives of this research are as follows:

  • To compare student learning outcomes between online and traditional face-to-face education.
  • To examine the factors influencing student engagement in online learning environments.
  • To assess the effectiveness of different instructional methods employed in online education.
  • To identify challenges and opportunities associated with online education and suggest recommendations for improvement.

3. Methodology

3.1 Study Design

This research will utilize a mixed-methods approach to gather both quantitative and qualitative data. The study will include the following components:

3.2 Participants

The research will involve undergraduate students from two universities, one offering online education and the other providing face-to-face instruction. A total of 500 students (250 from each university) will be selected randomly to participate in the study.

3.3 Data Collection

The research will employ the following data collection methods:

  • Quantitative: Pre- and post-assessments will be conducted to measure students’ learning outcomes. Data on student demographics and academic performance will also be collected from university records.
  • Qualitative: Focus group discussions and individual interviews will be conducted with students to gather their perceptions and experiences regarding online education.

3.4 Data Analysis

Quantitative data will be analyzed using statistical software, employing descriptive statistics, t-tests, and regression analysis. Qualitative data will be transcribed, coded, and analyzed thematically to identify recurring patterns and themes.

4. Ethical Considerations

The study will adhere to ethical guidelines, ensuring the privacy and confidentiality of participants. Informed consent will be obtained, and participants will have the right to withdraw from the study at any time.

5. Significance and Expected Outcomes

This research will contribute to the existing literature by providing empirical evidence on the impact of online education on student learning outcomes. The findings will help educational institutions and policymakers make informed decisions about incorporating online learning methods and improving the quality of online education. Moreover, the study will identify potential challenges and opportunities related to online education and offer recommendations for enhancing student engagement and overall learning outcomes.

6. Timeline

The proposed research will be conducted over a period of 12 months, including data collection, analysis, and report writing.

The estimated budget for this research includes expenses related to data collection, software licenses, participant compensation, and research assistance. A detailed budget breakdown will be provided in the final research plan.

8. Conclusion

This research proposal aims to investigate the impact of online education on student learning outcomes through a comparative study with traditional face-to-face instruction. By exploring various dimensions of online education, this research will provide valuable insights into the effectiveness and challenges associated with online learning. The findings will contribute to the ongoing discourse on educational practices and help shape future strategies for maximizing student learning outcomes in online education settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

How To Write A Proposal

How To Write A Proposal – Step By Step Guide...

Grant Proposal

Grant Proposal – Example, Template and Guide

How To Write A Business Proposal

How To Write A Business Proposal – Step-by-Step...

Business Proposal

Business Proposal – Templates, Examples and Guide

Proposal

Proposal – Types, Examples, and Writing Guide

How to choose an Appropriate Method for Research?

How to choose an Appropriate Method for Research?

Site logo

17 Research Proposal Examples

research proposal example sections definition and purpose, explained below

A research proposal systematically and transparently outlines a proposed research project.

The purpose of a research proposal is to demonstrate a project’s viability and the researcher’s preparedness to conduct an academic study. It serves as a roadmap for the researcher.

The process holds value both externally (for accountability purposes and often as a requirement for a grant application) and intrinsic value (for helping the researcher to clarify the mechanics, purpose, and potential signficance of the study).

Key sections of a research proposal include: the title, abstract, introduction, literature review, research design and methods, timeline, budget, outcomes and implications, references, and appendix. Each is briefly explained below.

Watch my Guide: How to Write a Research Proposal

Get your Template for Writing your Research Proposal Here (With AI Prompts!)

Research Proposal Sample Structure

Title: The title should present a concise and descriptive statement that clearly conveys the core idea of the research projects. Make it as specific as possible. The reader should immediately be able to grasp the core idea of the intended research project. Often, the title is left too vague and does not help give an understanding of what exactly the study looks at.

Abstract: Abstracts are usually around 250-300 words and provide an overview of what is to follow – including the research problem , objectives, methods, expected outcomes, and significance of the study. Use it as a roadmap and ensure that, if the abstract is the only thing someone reads, they’ll get a good fly-by of what will be discussed in the peice.

Introduction: Introductions are all about contextualization. They often set the background information with a statement of the problem. At the end of the introduction, the reader should understand what the rationale for the study truly is. I like to see the research questions or hypotheses included in the introduction and I like to get a good understanding of what the significance of the research will be. It’s often easiest to write the introduction last

Literature Review: The literature review dives deep into the existing literature on the topic, demosntrating your thorough understanding of the existing literature including themes, strengths, weaknesses, and gaps in the literature. It serves both to demonstrate your knowledge of the field and, to demonstrate how the proposed study will fit alongside the literature on the topic. A good literature review concludes by clearly demonstrating how your research will contribute something new and innovative to the conversation in the literature.

Research Design and Methods: This section needs to clearly demonstrate how the data will be gathered and analyzed in a systematic and academically sound manner. Here, you need to demonstrate that the conclusions of your research will be both valid and reliable. Common points discussed in the research design and methods section include highlighting the research paradigm, methodologies, intended population or sample to be studied, data collection techniques, and data analysis procedures . Toward the end of this section, you are encouraged to also address ethical considerations and limitations of the research process , but also to explain why you chose your research design and how you are mitigating the identified risks and limitations.

Timeline: Provide an outline of the anticipated timeline for the study. Break it down into its various stages (including data collection, data analysis, and report writing). The goal of this section is firstly to establish a reasonable breakdown of steps for you to follow and secondly to demonstrate to the assessors that your project is practicable and feasible.

Budget: Estimate the costs associated with the research project and include evidence for your estimations. Typical costs include staffing costs, equipment, travel, and data collection tools. When applying for a scholarship, the budget should demonstrate that you are being responsible with your expensive and that your funding application is reasonable.

Expected Outcomes and Implications: A discussion of the anticipated findings or results of the research, as well as the potential contributions to the existing knowledge, theory, or practice in the field. This section should also address the potential impact of the research on relevant stakeholders and any broader implications for policy or practice.

References: A complete list of all the sources cited in the research proposal, formatted according to the required citation style. This demonstrates the researcher’s familiarity with the relevant literature and ensures proper attribution of ideas and information.

Appendices (if applicable): Any additional materials, such as questionnaires, interview guides, or consent forms, that provide further information or support for the research proposal. These materials should be included as appendices at the end of the document.

Research Proposal Examples

Research proposals often extend anywhere between 2,000 and 15,000 words in length. The following snippets are samples designed to briefly demonstrate what might be discussed in each section.

1. Education Studies Research Proposals

See some real sample pieces:

  • Assessment of the perceptions of teachers towards a new grading system
  • Does ICT use in secondary classrooms help or hinder student learning?
  • Digital technologies in focus project
  • Urban Middle School Teachers’ Experiences of the Implementation of
  • Restorative Justice Practices
  • Experiences of students of color in service learning

Consider this hypothetical education research proposal:

The Impact of Game-Based Learning on Student Engagement and Academic Performance in Middle School Mathematics

Abstract: The proposed study will explore multiplayer game-based learning techniques in middle school mathematics curricula and their effects on student engagement. The study aims to contribute to the current literature on game-based learning by examining the effects of multiplayer gaming in learning.

Introduction: Digital game-based learning has long been shunned within mathematics education for fears that it may distract students or lower the academic integrity of the classrooms. However, there is emerging evidence that digital games in math have emerging benefits not only for engagement but also academic skill development. Contributing to this discourse, this study seeks to explore the potential benefits of multiplayer digital game-based learning by examining its impact on middle school students’ engagement and academic performance in a mathematics class.

Literature Review: The literature review has identified gaps in the current knowledge, namely, while game-based learning has been extensively explored, the role of multiplayer games in supporting learning has not been studied.

Research Design and Methods: This study will employ a mixed-methods research design based upon action research in the classroom. A quasi-experimental pre-test/post-test control group design will first be used to compare the academic performance and engagement of middle school students exposed to game-based learning techniques with those in a control group receiving instruction without the aid of technology. Students will also be observed and interviewed in regard to the effect of communication and collaboration during gameplay on their learning.

Timeline: The study will take place across the second term of the school year with a pre-test taking place on the first day of the term and the post-test taking place on Wednesday in Week 10.

Budget: The key budgetary requirements will be the technologies required, including the subscription cost for the identified games and computers.

Expected Outcomes and Implications: It is expected that the findings will contribute to the current literature on game-based learning and inform educational practices, providing educators and policymakers with insights into how to better support student achievement in mathematics.

2. Psychology Research Proposals

See some real examples:

  • A situational analysis of shared leadership in a self-managing team
  • The effect of musical preference on running performance
  • Relationship between self-esteem and disordered eating amongst adolescent females

Consider this hypothetical psychology research proposal:

The Effects of Mindfulness-Based Interventions on Stress Reduction in College Students

Abstract: This research proposal examines the impact of mindfulness-based interventions on stress reduction among college students, using a pre-test/post-test experimental design with both quantitative and qualitative data collection methods .

Introduction: College students face heightened stress levels during exam weeks. This can affect both mental health and test performance. This study explores the potential benefits of mindfulness-based interventions such as meditation as a way to mediate stress levels in the weeks leading up to exam time.

Literature Review: Existing research on mindfulness-based meditation has shown the ability for mindfulness to increase metacognition, decrease anxiety levels, and decrease stress. Existing literature has looked at workplace, high school and general college-level applications. This study will contribute to the corpus of literature by exploring the effects of mindfulness directly in the context of exam weeks.

Research Design and Methods: Participants ( n= 234 ) will be randomly assigned to either an experimental group, receiving 5 days per week of 10-minute mindfulness-based interventions, or a control group, receiving no intervention. Data will be collected through self-report questionnaires, measuring stress levels, semi-structured interviews exploring participants’ experiences, and students’ test scores.

Timeline: The study will begin three weeks before the students’ exam week and conclude after each student’s final exam. Data collection will occur at the beginning (pre-test of self-reported stress levels) and end (post-test) of the three weeks.

Expected Outcomes and Implications: The study aims to provide evidence supporting the effectiveness of mindfulness-based interventions in reducing stress among college students in the lead up to exams, with potential implications for mental health support and stress management programs on college campuses.

3. Sociology Research Proposals

  • Understanding emerging social movements: A case study of ‘Jersey in Transition’
  • The interaction of health, education and employment in Western China
  • Can we preserve lower-income affordable neighbourhoods in the face of rising costs?

Consider this hypothetical sociology research proposal:

The Impact of Social Media Usage on Interpersonal Relationships among Young Adults

Abstract: This research proposal investigates the effects of social media usage on interpersonal relationships among young adults, using a longitudinal mixed-methods approach with ongoing semi-structured interviews to collect qualitative data.

Introduction: Social media platforms have become a key medium for the development of interpersonal relationships, particularly for young adults. This study examines the potential positive and negative effects of social media usage on young adults’ relationships and development over time.

Literature Review: A preliminary review of relevant literature has demonstrated that social media usage is central to development of a personal identity and relationships with others with similar subcultural interests. However, it has also been accompanied by data on mental health deline and deteriorating off-screen relationships. The literature is to-date lacking important longitudinal data on these topics.

Research Design and Methods: Participants ( n = 454 ) will be young adults aged 18-24. Ongoing self-report surveys will assess participants’ social media usage, relationship satisfaction, and communication patterns. A subset of participants will be selected for longitudinal in-depth interviews starting at age 18 and continuing for 5 years.

Timeline: The study will be conducted over a period of five years, including recruitment, data collection, analysis, and report writing.

Expected Outcomes and Implications: This study aims to provide insights into the complex relationship between social media usage and interpersonal relationships among young adults, potentially informing social policies and mental health support related to social media use.

4. Nursing Research Proposals

  • Does Orthopaedic Pre-assessment clinic prepare the patient for admission to hospital?
  • Nurses’ perceptions and experiences of providing psychological care to burns patients
  • Registered psychiatric nurse’s practice with mentally ill parents and their children

Consider this hypothetical nursing research proposal:

The Influence of Nurse-Patient Communication on Patient Satisfaction and Health Outcomes following Emergency Cesarians

Abstract: This research will examines the impact of effective nurse-patient communication on patient satisfaction and health outcomes for women following c-sections, utilizing a mixed-methods approach with patient surveys and semi-structured interviews.

Introduction: It has long been known that effective communication between nurses and patients is crucial for quality care. However, additional complications arise following emergency c-sections due to the interaction between new mother’s changing roles and recovery from surgery.

Literature Review: A review of the literature demonstrates the importance of nurse-patient communication, its impact on patient satisfaction, and potential links to health outcomes. However, communication between nurses and new mothers is less examined, and the specific experiences of those who have given birth via emergency c-section are to date unexamined.

Research Design and Methods: Participants will be patients in a hospital setting who have recently had an emergency c-section. A self-report survey will assess their satisfaction with nurse-patient communication and perceived health outcomes. A subset of participants will be selected for in-depth interviews to explore their experiences and perceptions of the communication with their nurses.

Timeline: The study will be conducted over a period of six months, including rolling recruitment, data collection, analysis, and report writing within the hospital.

Expected Outcomes and Implications: This study aims to provide evidence for the significance of nurse-patient communication in supporting new mothers who have had an emergency c-section. Recommendations will be presented for supporting nurses and midwives in improving outcomes for new mothers who had complications during birth.

5. Social Work Research Proposals

  • Experiences of negotiating employment and caring responsibilities of fathers post-divorce
  • Exploring kinship care in the north region of British Columbia

Consider this hypothetical social work research proposal:

The Role of a Family-Centered Intervention in Preventing Homelessness Among At-Risk Youthin a working-class town in Northern England

Abstract: This research proposal investigates the effectiveness of a family-centered intervention provided by a local council area in preventing homelessness among at-risk youth. This case study will use a mixed-methods approach with program evaluation data and semi-structured interviews to collect quantitative and qualitative data .

Introduction: Homelessness among youth remains a significant social issue. This study aims to assess the effectiveness of family-centered interventions in addressing this problem and identify factors that contribute to successful prevention strategies.

Literature Review: A review of the literature has demonstrated several key factors contributing to youth homelessness including lack of parental support, lack of social support, and low levels of family involvement. It also demonstrates the important role of family-centered interventions in addressing this issue. Drawing on current evidence, this study explores the effectiveness of one such intervention in preventing homelessness among at-risk youth in a working-class town in Northern England.

Research Design and Methods: The study will evaluate a new family-centered intervention program targeting at-risk youth and their families. Quantitative data on program outcomes, including housing stability and family functioning, will be collected through program records and evaluation reports. Semi-structured interviews with program staff, participants, and relevant stakeholders will provide qualitative insights into the factors contributing to program success or failure.

Timeline: The study will be conducted over a period of six months, including recruitment, data collection, analysis, and report writing.

Budget: Expenses include access to program evaluation data, interview materials, data analysis software, and any related travel costs for in-person interviews.

Expected Outcomes and Implications: This study aims to provide evidence for the effectiveness of family-centered interventions in preventing youth homelessness, potentially informing the expansion of or necessary changes to social work practices in Northern England.

Research Proposal Template

Get your Detailed Template for Writing your Research Proposal Here (With AI Prompts!)

This is a template for a 2500-word research proposal. You may find it difficult to squeeze everything into this wordcount, but it’s a common wordcount for Honors and MA-level dissertations.

Your research proposal is where you really get going with your study. I’d strongly recommend working closely with your teacher in developing a research proposal that’s consistent with the requirements and culture of your institution, as in my experience it varies considerably. The above template is from my own courses that walk students through research proposals in a British School of Education.

Chris

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 10 Conditioned Response Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 25 Humanistic Psychology Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 17 Behaviorism Examples
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 25 Positive Psychology Examples

8 thoughts on “17 Research Proposal Examples”

' src=

Very excellent research proposals

' src=

very helpful

' src=

Very helpful

' src=

Dear Sir, I need some help to write an educational research proposal. Thank you.

' src=

Hi Levi, use the site search bar to ask a question and I’ll likely have a guide already written for your specific question. Thanks for reading!

' src=

very good research proposal

' src=

Thank you so much sir! ❤️

' src=

Very helpful 👌

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Examples

Market Research Proposal

Proposal maker.

example of a hypothesis for a research proposal

Setting the direction for any market research effort is an essential and critical step that you have to consider whenever you would like to look into the trends in the marketplace or assess the key factors that affect the purchasing decisions of your target audience. Before doing any program or activity related to the specified matter, you first have to know how to execute an effective proposal writing procedure.

Developing a comprehensive and detailed market research proposal can help you a lot in terms of organizing the market research processes that you would like to conduct as well as the resources that you will be needing.

Market Research Proposal Template

  • Google Docs

Size: A4, US

State the objectives, scope of work, research methodology, target market, and other such important information of your market research by downloading and using this above-shown research  proposal example  template. This ready-made template’s content can be edited and customized in various file formats such as MS Word, Pages, Gooogle Docs, and editable PDF. Hurry up and try it out now!

Market Research Proposal Example

Market Research Proposal Example

Download and make use of this market research template so that you can conduct your market research effectively. Using this, you can conveniently outline the objectives and goals for your market research, thereby, saving you enough time to do other tasks related to the research. Edit and customize using  MS Word and Pages. You can also click on  multimedia project proposal examples .

Marketing Research Proposal Example

Marketing Research Proposal Example

It is important to not let yourself get confused between a market research proposal and a marketing research proposal. The marketing research proposal deals with the presentation of potential promotional and advertising activities that can be implemented by a company to present and market its products, services, deals, and other offers while the marketing research proposal is focused on learning the market movement based on the market’s trends, activities, and size.

Proposal for Market Research Example

Proposal for Market Research Example

Size: 10 KB

If you want to create your market research proposal, one of the things that you can do to help you have an easier time when developing the document is to look into references like downloadable examples. Simply browse through the market research proposal examples in PDF that are available in this post so you can have an idea of how to properly create the best market research proposal for your business.

Free Market Research Proposal Example

Free Market Research Proposal Example

Size: 340 KB

Importance of a Market Research Proposal

A market research proposal helps you properly think of the things that truly matter when it comes to the market research. With the help of this document, you can give priority to the factors and elements that can contribute to the advancement and growth of your business .

Using a market research proposal can also give you time to put together relevant and necessary processes that are most likely helpful in achieving not only the goals of your market research activities but the corporate goals of the business as well. Here are some of the reasons why you need to create and use a market research proposal:

1. A market research proposal is one of the most essential documents that are used by businesses to properly plan the entire process of their market research activities. It presents the outline of the market research’s goals and it also focuses on the action plans that can lead the business to the achievement of its objectives and vision.

2. A market research proposal can give an idea about the funding that is needed by the team to execute the activities for market research. Financial support from the organization is needed to be addressed to make sure that all plotted procedures will be implemented accordingly. You may also see business proposal examples .

Developing the market research proposal with the knowledge that funding will be given for its implementation can make the marketing team, as well as the other people involved in the activity, become more proactive and efficient as it is most likely that what they envisioned and planned will be realized.

3. A market research proposal, especially one that contains a marketing SWOT analysis and a market condition overview, can help you look into the external and internal factors that affect your business operations.

The knowledge about the nature of your business, the competition that you need to look out for, the threats and risks that you need to prepare for, the needs and demands of your audience, the movement and shifts in the marketplace, and the opportunities that you should grab can make you become more well-rounded and multifaceted when drafting the market research proposal that you would like to present. You may also check out project proposal examples .

4. A market research proposal can discuss the milestones that are expected to be achieved by the business with the help of market research strategies and general action plans . Hence, this document can persuade and convince its target audience that a proposed market research activity must be approved especially if expected results can excessively benefit the business or provide a solution to its current issues, problems, and concerns.

Sample Marketing Research Proposal Example

Sample Marketing Research Proposal Example

Size: 90 KB

Proposal for Marketing Research and Market Intelligence Example

Proposal for Marketing Research and Market Intelligence Example

Size: 607 KB

Market Research and Analysis Report for Proposal Referencing Example

Market Research and Analysis Report for Proposal Referencing Example

Size: 152 KB

Market Research Proposal Content

Different market research proposals have different sections, clauses, or areas of discussion. The content of a market research proposal depends on the purpose of its usage, the scope of the activity, the expected returns of the business, the professional goals of the market research, and the relation of the document’s usage to the vision of the business.

Even if there are differences when it comes to the information that you can see in many market research proposals used in various industries, there are still common or usual information that is seen in any market research proposal. Some of the details that are essential to be included in a market research proposal are as follows:

1. Develop a hypothesis. This is very important as you need to present the potential impacts of the market research proposal when implemented. This can also help you identify the ways on how you can interlink or align all the elements that are essential for the successful execution of all the market research proposal’s areas. You may also see short proposal examples .

2. Present an overview of the market research activities that you would like the business to consider. You have to sum up the intent of the market research as well as the output that you expect from it. More so, you have to discuss the feasibility, attainability, and sustainability of your general plans . Being able to showcase these strengths can help your market research proposal become more appealing and relevant.

3. Just like when making a development project proposal , use a timeline that can give an idea of the entire duration of the market research proposal’s actual usage. You have to set time frames where specific deliverables should be seen or observed already. With this, you can ensure your target audience that the proposal is time-bound and realistic.

4. Especially if you will use technical terms, a proper definition of terms is highly suggested to be included in your market research proposal. This part of the document can make the general proposal more understandable for any reasonable person.

5. Know your targets so you can easily come up with the methodology that is relevant to your needs. All the practices and activities that you would like to immerse in should be thoroughly defined in the document so that the general analysis of the measures of your proposal can be objectively done.

6. Discuss the current market conditions in the marketplace where your business belongs. Aside from the trends that you need to consider, you also have to list down the opportunities that the business can take to help it achieve its goals and return of investments.

Marketing Information Management System Research Proposal Example

Marketing Information Management System Research Proposal Example

Size: 399 KB

Research Proposal Usable for Market Study Example

Research Proposal Usable for Market Study Example

Size: 99 KB

Market Research and Developing a Marketing Plan Proposal Example

Market Research and Developing a Marketing Plan Proposal Example

Market Research Conduct and Proposal Drafting Example

Market Research Conduct and Proposal Drafting Example

Size: 131 KB

Discussion Flow for a Simple Market Research Proposal

The format and discussion flow of the market research proposal can contribute to the document’s successes, or the lack thereof. This is the reason why you have to be careful with how you will present the market research proposal to your audience. You have to ensure that the document is visually pleasing and well-organized so that people will not have a hard time reviewing its content. You may also see  freelance proposal examples .

A basic discussion flow that you can use when presenting the details of your market research proposal are listed below:

  • The title of your market research  proposal sample
  • The date when the market research proposal has been made and the dates of its updates
  • The name of the company who can benefit from the document
  • The name of the person who prepared the proposal and the department or division where he or she is assigned at
  • The executive summary of the market research proposal
  • The objectives of the market research proposal
  • The current condition of the business and the market as well as other important existing knowledge
  • The expected output of the document’s usage, when approved
  • The demographics targeted by the business with the help of the market research proposal
  • The processes of data gathering, collection, assessment, and presentation
  • The methodology that will be applied for the  research project plan intended for a particular market
  • The dates and periods where particular tasks should already be done
  • The budget proposed by the team or the individual who made the proposal
  • Any ethical considerations that must be looked into before the implementation of the market research

Proposal to Conduct Consumer Experience of Care Surveys or Market Research

Proposal to Conduct Consumer Experience of Care Surveys or Market Research

Size: 103 KB

Request for Proposal for Solicitation for Contract for Market Research Example

Request for Proposal for Solicitation for Contract for Market Research Example

Size: 344 KB

Marketing Research Group Project Proposal Example

Marketing Research Group Project Proposal Example

Marketing Research Firm Proposal Example

Marketing Research Firm Proposal Example

Size: 477 KB

Tips to Develop an Impressive Market Research Proposal

Aside from having an advertising and marketing business plan , you should also have a market research plan. It is not enough for you to rely on your knowledge about the things that you can control. You also have to think of the elements that are not within your hands like the trends in the marketplace and the reaction of your audience and competition with regards to these trends and/or any other market changes.

Listed below are a few of the tips that you can use if you want to develop an impressive market research proposal for your business:

1. Since a market research proposal is one of the first documents that you will be needing for your market research, you have to ensure that the content of the document is flexible enough to adapt to possible changes within the development of the market research planning and implementation phases. You have to ensure that there are windows where appropriate changes can be inserted as well as channels, mediums, or platforms where you can incorporate backup plans when necessary or called for.

2. Keep in mind that the language and tone that you will use when creating the content of the market research proposal must be highly considered.

You have to ensure that the document is formal, business-appropriate, and compelling. Aside from the fact that the market research proposal is expected to be complete with all the details about your proposed market research plan, it is also imperative for you to make sure that the document is understandable, well-defined, and clear. You may also see security proposal examples .

3. Know the basics of market research proposal organization. There are different kinds of structures that you can look into so that your market research proposal can look cohesive and well put together.

The structure of the document should depend on the length of your discussion, the details that you will incorporate in your market research undertakings, and the key factors that you need to give focus and highlight on when presenting the complexity of the market research. You may also like budget proposal examples .

It will rather be more efficient for you if you plan to use references like templates and examples while preparing your market research proposal.

Maximize the help that you can get from the downloadable examples in this post as well as the related discussion that we have presented. Always ensure that there is an organization in the procedures of market research proposal development so you can be well-guided in terms of getting the output that you would like to have for your market research undertaking.

Twitter

Text prompt

  • Instructive
  • Professional

Generate a proposal for a new school recycling program

Compose a proposal for a school field trip to a science museum.

Loading metrics

Open Access

Peer-reviewed

Meta-Research Article

Meta-Research Articles feature data-driven examinations of the methods, reporting, verification, and evaluation of scientific research.

See Journal Information »

Assessing the evolution of research topics in a biological field using plant science as an example

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America, Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan, United States of America, DOE-Great Lake Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America

ORCID logo

Roles Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

Affiliation Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America

  • Shin-Han Shiu, 
  • Melissa D. Lehti-Shiu

PLOS

  • Published: May 23, 2024
  • https://doi.org/10.1371/journal.pbio.3002612
  • Peer Review
  • Reader Comments

Fig 1

Scientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the need for expert knowledge in a wide range of areas in a field. Using plant biology as an example, we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect shifts in major research trends and recent radiation of new topics, as well as turnover of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.

Citation: Shiu S-H, Lehti-Shiu MD (2024) Assessing the evolution of research topics in a biological field using plant science as an example. PLoS Biol 22(5): e3002612. https://doi.org/10.1371/journal.pbio.3002612

Academic Editor: Ulrich Dirnagl, Charite Universitatsmedizin Berlin, GERMANY

Received: October 16, 2023; Accepted: April 4, 2024; Published: May 23, 2024

Copyright: © 2024 Shiu, Lehti-Shiu. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The plant science corpus data are available through Zenodo ( https://zenodo.org/records/10022686 ). The codes for the entire project are available through GitHub ( https://github.com/ShiuLab/plant_sci_hist ) and Zenodo ( https://doi.org/10.5281/zenodo.10894387 ).

Funding: This work was supported by the National Science Foundation (IOS-2107215 and MCB-2210431 to MDL and SHS; DGE-1828149 and IOS-2218206 to SHS), Department of Energy grant Great Lakes Bioenergy Research Center (DE-SC0018409 to SHS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BERT, Bidirectional Encoder Representations from Transformers; br, brassinosteroid; ccTLD, country code Top Level Domain; c-Tf-Idf, class-based Tf-Idf; ChatGPT, Chat Generative Pretrained Transformer; ga, gibberellic acid; LOWESS, locally weighted scatterplot smoothing; MeSH, Medical Subject Heading; SHAP, SHapley Additive exPlanations; SJR, SCImago Journal Rank; Tf-Idf, Term frequency-Inverse document frequency; UMAP, Uniform Manifold Approximation and Projection

Introduction

The explosive growth of scientific data in recent years has been accompanied by a rapidly increasing volume of literature. These records represent a major component of our scientific knowledge and embody the history of conceptual and technological advances in various fields over time. Our ability to wade through these records is important for identifying relevant literature for specific topics, a crucial practice of any scientific pursuit [ 1 ]. Classifying the large body of literature into topics can provide a useful means to identify relevant literature. In addition, these topics offer an opportunity to assess how scientific fields have evolved and when major shifts in took place. However, such classification is challenging because the relevant articles in any topic or domain can number in the tens or hundreds of thousands, and the literature is in the form of natural language, which takes substantial effort and expertise to process [ 2 , 3 ]. In addition, even if one could digest all literature in a field, it would still be difficult to quantify such knowledge.

In the last several years, there has been a quantum leap in natural language processing approaches due to the feasibility of building complex deep learning models with highly flexible architectures [ 4 , 5 ]. The development of large language models such as Bidirectional Encoder Representations from Transformers (BERT; [ 6 ]) and Chat Generative Pretrained Transformer (ChatGPT; [ 7 ]) has enabled the analysis, generation, and modeling of natural language texts in a wide range of applications. The success of these applications is, in large part, due to the feasibility of considering how the same words are used in different contexts when modeling natural language [ 6 ]. One such application is topic modeling, the practice of establishing statistical models of semantic structures underlying a document collection. Topic modeling has been proposed for identifying scientific hot topics over time [ 1 ], for example, in synthetic biology [ 8 ], and it has also been applied to, for example, automatically identify topical scenes in images [ 9 ] and social network topics [ 10 ], discover gene programs highly correlated with cancer prognosis [ 11 ], capture “chromatin topics” that define cell-type differences [ 12 ], and investigate relationships between genetic variants and disease risk [ 13 ]. Here, we use topic modeling to ask how research topics in a scientific field have evolved and what major changes in the research trends have taken place, using plant science as an example.

Plant science corpora allow classification of major research topics

Plant science, broadly defined, is the study of photosynthetic species, their interactions with biotic/abiotic environments, and their applications. For modeling plant science topical evolution, we first identified a collection of plant science documents (i.e., corpus) using a text classification approach. To this end, we first collected over 30 million PubMed records and narrowed down candidate plant science records by searching for those with plant-related terms and taxon names (see Materials and methods ). Because there remained a substantial number of false positives (i.e., biomedical records mentioning plants in passing), a set of positive plant science examples from the 17 plant science journals with the highest numbers of plant science publications covering a wide range of subfields and a set of negative examples from journals with few candidate plant science records were used to train 4 types of text classification models (see Materials and methods ). The best text classification model performed well (F1 = 0.96, F1 of a naïve model = 0.5, perfect model = 1) where the positive and negative examples were clearly separated from each other based on prediction probability of the hold-out testing dataset (false negative rate = 2.6%, false positive rate = 5.2%, S1A and S1B Fig ). The false prediction rate for documents from the 17 plant science journals annotated with the Medical Subject Heading (MeSH) term “Plants” in NCBI was 11.7% (see Materials and methods ). The prediction probability distribution of positive instances with the MeSH term has an expected left-skew to lower values ( S1C Fig ) compared with the distributions of all positive instances ( S1A Fig ). Thus, this subset with the MeSH term is a skewed representation of articles from these 17 major plant science journals. To further benchmark the validity of the plant science records, we also conducted manual annotation of 100 records where the false positive and false negative rates were 14.6% and 10.6%, respectively (see Materials and methods ). Using 12 other plant science journals not included as positive examples as benchmarks, the false negative rate was 9.9% (see Materials and methods ). Considering the range of false prediction rate estimates with different benchmarks, we should emphasize that the model built with the top 17 plant science journals represents a substantial fraction of plant science publications but with biases. Applying the model to the candidate plant science record led to 421,658 positive predictions, hereafter referred to as “plant science records” ( S1D Fig and S1 Data ).

To better understand how the models classified plant science articles, we identified important terms from a more easily interpretable model (Term frequency-Inverse document frequency (Tf-Idf) model; F1 = 0.934) using Shapley Additive Explanations [ 14 ]; 136 terms contributed to predicting plant science records (e.g., Arabidopsis, xylem, seedling) and 138 terms contributed to non-plant science record predictions (e.g., patients, clinical, mice; Tf-Idf feature sheet, S1 Data ). Plant science records as well as PubMed articles grew exponentially from 1950 to 2020 ( Fig 1A ), highlighting the challenges of digesting the rapidly expanding literature. We used the plant science records to perform topic modeling, which consisted of 4 steps: representing each record as a BERT embedding, reducing dimensionality, clustering, and identifying the top terms by calculating class (i.e., topic)-based Tf-Idf (c-Tf-Idf; [ 15 ]). The c-Tf-Idf represents the frequency of a term in the context of how rare the term is to reduce the influence of common words. SciBERT [ 16 ] was the best model among those tested ( S2 Data ) and was used for building the final topic model, which classified 372,430 (88.3%) records into 90 topics defined by distinct combinations of terms ( S3 Data ). The topics contained 620 to 16,183 records and were named after the top 4 to 5 terms defining the topical areas ( Fig 1B and S3 Data ). For example, the top 5 terms representing the largest topic, topic 61 (16,183 records), are “qtl,” “resistance,” “wheat,” “markers,” and “traits,” which represent crop improvement studies using quantitative genetics.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Numbers of PubMed (magenta) and plant science (green) records between 1950 and 2020. (a, b, c) Coefficients of the exponential function, y = ae b . Data for the plot are in S1 Data . (B) Numbers of documents for the top 30 plant science topics. Each topic is designated by an index number (left) and the top 4–6 terms with the highest cTf-Idf values (right). Data for the plot are in S3 Data . (C) Two-dimensional representation of the relationships between plant science records generated by Uniform Manifold Approximation and Projection (UMAP, [ 17 ]) using SciBERT embeddings of plant science records. All topics panel: Different topics are assigned different colors. Outlier panel: UMAP representation of all records (gray) with outlier records in red. Blue dotted circles: areas with relatively high densities indicating topics that are below the threshold for inclusion in a topic. In the 8 UMAP representations on the right, records for example topics are in red and the remaining records in gray. Blue dotted circles indicate the relative position of topic 48.

https://doi.org/10.1371/journal.pbio.3002612.g001

Records with assigned topics clustered into distinct areas in a two-dimensional (2D) space ( Fig 1C , for all topics, see S4 Data ). The remaining 49,228 outlier records not assigned to any topic (11.7%, middle panel, Fig 1C ) have 3 potential sources. First, some outliers likely belong to unique topics but have fewer records than the threshold (>500, blue dotted circles, Fig 1C ). Second, some of the many outliers dispersed within the 2D space ( Fig 1C ) were not assigned to any single topic because they had relatively high prediction scores for multiple topics ( S2 Fig ). These likely represent studies across subdisciplines in plant science. Third, some outliers are likely interdisciplinary studies between plant science and other domains, such as chemistry, mathematics, and physics. Such connections can only be revealed if records from other domains are included in the analyses.

Topical clusters reveal closely related topics but with distinct key term usage

Related topics tend to be located close together in the 2D representation (e.g., topics 48 and 49, Fig 1C ). We further assessed intertopical relationships by determining the cosine similarities between topics using cTf-Idfs ( Figs 2A and S3 ). In this topic network, some topics are closely related and form topic clusters. For example, topics 25, 26, and 27 collectively represent a more general topic related to the field of plant development (cluster a , lower left in Fig 2A ). Other topic clusters represent studies of stress, ion transport, and heavy metals ( b ); photosynthesis, water, and UV-B ( c ); population and community biology (d); genomics, genetic mapping, and phylogenetics ( e , upper right); and enzyme biochemistry ( f , upper left in Fig 2A ).

thumbnail

(A) Graph depicting the degrees of similarity (edges) between topics (nodes). Between each topic pair, a cosine similarity value was calculated using the cTf-Idf values of all terms. A threshold similarity of 0.6 was applied to illustrate the most related topics. For the full matrix presented as a heatmap, see S4 Fig . The nodes are labeled with topic index numbers and the top 4–6 terms. The colors and width of the edges are defined based on cosine similarity. Example topic clusters are highlighted in yellow and labeled a through f (blue boxes). (B, C) Relationships between the cTf-Idf values (see S3 Data ) of the top terms for topics 26 and 27 (B) and for topics 25 and 27 (C) . Only terms with cTf-Idf ≥ 0.6 are labeled. Terms with cTf-Idf values beyond the x and y axis limit are indicated by pink arrows and cTf-Idf values. (D) The 2D representation in Fig 1C is partitioned into graphs for different years, and example plots for every 5-year period since 1975 are shown. Example topics discussed in the text are indicated. Blue arrows connect the areas occupied by records of example topics across time periods to indicate changes in document frequencies.

https://doi.org/10.1371/journal.pbio.3002612.g002

Topics differed in how well they were connected to each other, reflecting how general the research interests or needs are (see Materials and methods ). For example, topic 24 (stress mechanisms) is the most well connected with median cosine similarity = 0.36, potentially because researchers in many subfields consider aspects of plant stress even though it is not the focus. The least connected topics include topic 21 (clock biology, 0.12), which is surprising because of the importance of clocks in essentially all aspects of plant biology [ 18 ]. This may be attributed, in part, to the relatively recent attention in this area.

Examining topical relationships and the cTf-Idf values of terms also revealed how related topics differ. For example, topic 26 is closely related to topics 27 and 25 (cluster a on the lower left of Fig 2A ). Topics 26 and 27 both contain records of developmental process studies mainly in Arabidopsis ( Fig 2B ); however, topic 26 is focused on the impact of light, photoreceptors, and hormones such as gibberellic acids (ga) and brassinosteroids (br), whereas topic 27 is focused on flowering and floral development. Topic 25 is also focused on plant development but differs from topic 27 because it contains records of studies mainly focusing on signaling and auxin with less emphasis on Arabidopsis ( Fig 2C ). These examples also highlight the importance of using multiple top terms to represent the topics. The similarities in cTf-Idfs between topics were also useful for measuring the editorial scope (i.e., diverse, or narrow) of journals publishing plant science papers using a relative topic diversity measure (see Materials and methods ). For example, Proceedings of the National Academy of Sciences , USA has the highest diversity, while Theoretical and Applied Genetics has the lowest ( S4 Fig ). One surprise is the relatively low diversity of American Journal of Botany , which focuses on plant ecology, systematics, development, and genetics. The low diversity is likely due to the relatively larger number of cellular and molecular science records in PubMed, consistent with the identification of relatively few topical areas relevant to studies at the organismal, population, community, and ecosystem levels.

Investigation of the relative prevalence of topics over time reveals topical succession

We next asked whether relationships between topics reflect chronological progression of certain subfields. To address this, we assessed how prevalent topics were over time using dynamic topic modeling [ 19 ]. As shown in Fig 2D , there is substantial fluctuation in where the records are in the 2D space over time. For example, topic 44 (light, leaves, co, synthesis, photosynthesis) is among the topics that existed in 1975 but has diminished gradually since. In 1985, topic 39 (Agrobacterium-based transformation) became dense enough to be visualized. Additional examples include topics 79 (soil heavy metals), 42 (differential expression), and 82 (bacterial community metagenomics), which became prominent in approximately 2005, 2010, and 2020, respectively ( Fig 2D ). In addition, animating the document occupancy in the 2D space over time revealed a broad change in patterns over time: Some initially dense areas became sparse over time and a large number of topics in areas previously only loosely occupied at the turn of the century increased over time ( S5 Data ).

While the 2D representations reveal substantial details on the evolution of topics, comparison over time is challenging because the number of plant science records has grown exponentially ( Fig 1A ). To address this, the records were divided into 50 chronological bins each with approximately 8,400 records to make cross-bin comparisons feasible ( S6 Data ). We should emphasize that, because of the way the chronological bins were split, the number of records for each topic in each bin should be treated as a normalized value relative to all other topics during the same period. Examining this relative prevalence of topics across bins revealed a clear pattern of topic succession over time (one topic evolved into another) and the presence of 5 topical categories ( Fig 3 ). The topics were categorized based on their locally weighted scatterplot smoothing (LOWESS) fits and ordered according to timing of peak frequency ( S7 and S8 Data , see Materials and methods ). In Fig 3 , the relative decrease in document frequency does not mean that research output in a topic is dwindling. Because each row in the heatmap is normalized based on the minimum and maximum values within each topic, there still can be substantial research output in terms of numbers of publications even when the relative frequency is near zero. Thus, a reduced relative frequency of a topic reflects only a below-average growth rate compared with other topical areas.

thumbnail

(A-E) A heat map of relative topic frequency over time reveals 5 topical categories: (A) stable, (B) early, (C) transitional, (D) sigmoidal, and (E) rising. The x axis denotes different time bins with each bin containing a similar number of documents to account for the exponential growth of plant science records over time. The sizes of all bins except the first are drawn to scale based on the beginning and end dates. The y axis lists different topics denoted by the label and top 4 to 5 terms. In each cell, the prevalence of a topic in a time bin is colored according to the min-max normalized cTf-Idf values for that topic. Light blue dotted lines delineate different decades. The arrows left of a subset of topic labels indicate example relationships between topics in topic clusters. Blue boxes with labels a–f indicate topic clusters, which are the same as those in Fig 2 . Connecting lines indicate successional trends. Yellow circles/lines 1 – 3: 3 major transition patterns. The original data are in S5 Data .

https://doi.org/10.1371/journal.pbio.3002612.g003

The first topical category is a stable category with 7 topics mostly established before the 1980s that have since remained stable in terms of prevalence in the plant science records (top of Fig 3A ). These topics represent long-standing plant science research foci, including studies of plant physiology (topics 4, 58, and 81), genetics (topic 61), and medicinal plants (topic 53). The second category contains 8 topics established before the 1980s that have mostly decreased in prevalence since (the early category, Fig 3B ). Two examples are physiological and morphological studies of hormone action (topic 45, the second in the early category) and the characterization of protein, DNA, and RNA (topic 18, the second to last). Unlike other early topics, topic 78 (paleobotany and plant evolution studies, the last topic in Fig 3B ) experienced a resurgence in the early 2000s due to the development of new approaches and databases and changes in research foci [ 20 ].

The 33 topics in the third, transitional category became prominent in the 1980s, 1990s, or even 2000s but have clearly decreased in prevalence ( Fig 3C ). In some cases, the early and the transitional topics became less prevalent because of topical succession—refocusing of earlier topics led to newer ones that either show no clear sign of decrease (the sigmoidal category, Fig 3D ) or continue to increase in prevalence (the rising category, Fig 3E ). Consistent with the notion of topical succession, topics within each topic cluster ( Fig 2 ) were found across topic categories and/or were prominent at different time periods (indicated by colored lines linking topics, Fig 3 ). One example is topics in topic cluster b (connected with light green lines and arrows, compare Figs 2 and 3 ); the study of cation transport (topic 47, the third in the transitional category), prominent in the 1980s and early 1990s, is connected to 5 other topics, namely, another transitional topic 29 (cation channels and their expression) peaking in the 2000s and early 2010s, sigmoidal topics 24 and 28 (stress response, tolerance mechanisms) and 30 (heavy metal transport), which rose to prominence in mid-2000s, and the rising topic 42 (stress transcriptomic studies), which increased in prevalence in the mid-2010s.

The rise and fall of topics can be due to a combination of technological or conceptual breakthroughs, maturity of the field, funding constraints, or publicity. The study of transposable elements (topic 62) illustrates the effect of publicity; the rise in this field coincided with Barbara McClintock’s 1983 Nobel Prize but not with the publication of her studies in the 1950s [ 21 ]. The reduced prevalence in early 2000 likely occurred in part because analysis of transposons became a central component of genome sequencing and annotation studies, rather than dedicated studies. In addition, this example indicates that our approaches, while capable of capturing topical trends, cannot be used to directly infer major papers leading to the growth of a topic.

Three major topical transition patterns signify shifts in research trends

Beyond the succession of specific topics, 3 major transitions in the dynamic topic graph should be emphasized: (1) the relative decreasing trend of early topics in the late 1970s and early 1980s; (2) the rise of transitional topics in late 1980s; and (3) the relative decreasing trend of transitional topics in the late 1990s and early 2000s, which coincided with a radiation of sigmoidal and rising topics (yellow circles, Fig 3 ). The large numbers of topics involved in these transitions suggest major shifts in plant science research. In transition 1, early topics decreased in relative prevalence in the late 1970s to early 1980s, which coincided with the rise of transitional topics over the following decades (circle 1, Fig 3 ). For example, there was a shift from the study of purified proteins such as enzymes (early topic 48, S5A Fig ) to molecular genetic dissection of genes, proteins, and RNA (transitional topic 35, S5B Fig ) enabled by the wider adoption of recombinant DNA and molecular cloning technologies in late 1970s [ 22 ]. Transition 2 (circle 2, Fig 3 ) can be explained by the following breakthroughs in the late 1980s: better approaches to create transgenic plants and insertional mutants [ 23 ], more efficient creation of mutant plant libraries through chemical mutagenesis (e.g., [ 24 ]), and availability of gene reporter systems such as β-glucuronidase [ 25 ]. Because of these breakthroughs, molecular genetics studies shifted away from understanding the basic machinery to understanding the molecular underpinnings of specific processes, such as molecular mechanisms of flower and meristem development and the action of hormones such as auxin (topic 27, S5C Fig ); this type of research was discussed as a future trend in 1988 [ 26 ] and remains prevalent to this date. Another example is gene silencing (topic 12), which became a focal area of study along with the widespread use of transgenic plants [ 27 ].

Transition 3 is the most drastic: A large number of transitional, sigmoidal, and rising topics became prevalent nearly simultaneously at the turn of the century (circle 3, Fig 3 ). This period also coincides with a rapid increase in plant science citations ( Fig 1A ). The most notable breakthroughs included the availability of the first plant genome in 2000 [ 28 ], increasing ease and reduced cost of high-throughput sequencing [ 29 ], development of new mass spectrometry–based platforms for analyzing proteins [ 30 ], and advancements in microscopic and optical imaging approaches [ 31 ]. Advances in genomics and omics technology also led to an increase in stress transcriptomics studies (42, S5D Fig ) as well as studies in many other topics such as epigenetics (topic 11), noncoding RNA analysis (13), genomics and phylogenetics (80), breeding (41), genome sequencing and assembly (60), gene family analysis (23), and metagenomics (82 and 55).

In addition to the 3 major transitions across all topics, there were also transitions within topics revealed by examining the top terms for different time bins (heatmaps, S5 Fig ). Taken together, these observations demonstrate that knowledge about topical evolution can be readily revealed through topic modeling. Such knowledge is typically only available to experts in specific areas and is difficult to summarize manually, as no researcher has a command of the entire plant science literature.

Analysis of taxa studied reveals changes in research trends

Changes in research trends can also be illustrated by examining changes in the taxa being studied over time ( S9 Data ). There is a strong bias in the taxa studied, with the record dominated by research models and economically important taxa ( S6 Fig ). Flowering plants (Magnoliopsida) are found in 93% of records ( S6A Fig ), and the mustard family Brassicaceae dominates at the family level ( S6B Fig ) because the genus Arabidopsis contributes to 13% of plant science records ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnover emerged similar to topical succession ( Figs 4B , S6C, and S6D ; Materials and methods ). Given that Arabidopsis is mentioned in more publications than other species we analyzed, we further examined the trends for Arabidopsis publications. The increase in the normalized number (i.e., relative to the entire plant science corpus) of Arabidopsis records coincided with advocacy of its use as a model system in the late 1980s [ 32 ]. While it remains a major plant model, there has been a decrease in overall Arabidopsis publications relative to all other plant science publications since 2011 (blue line, normalized total, Fig 4C ). Because the same chronological bins, each with same numbers of records, from the topic-over-time analysis ( Fig 3 ) were used, the decrease here does not mean that there were fewer Arabidopsis publications—in fact, the number of Arabidopsis papers has remained steady since 2011. This decrease means that Arabidopsis-related publications represent a relatively smaller proportion of plant science records. Interestingly, this decrease took place much earlier (approximately 2005) and was steeper in the United States (red line, Fig 4C ) than in all countries combined (blue line, Fig 4C ).

thumbnail

(A) Percentage of records mentioning specific genera. (B) Change in the prevalence of genera in plant science records over time. (C) Changes in the normalized numbers of all records (blue) and records from the US (red) mentioning Arabidopsis over time. The lines are LOWESS fits with fraction parameter = 0.2. (D) Topical over (red) and under (blue) representation among 5 genera with the most plant science records. LLR: log 2 likelihood ratios of each topic in each genus. Gray: topic-species combination not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method [ 33 ]. The data used for plotting are in S9 Data . The statistics for all topics are in S10 Data .

https://doi.org/10.1371/journal.pbio.3002612.g004

Assuming that the normalized number of publications reflects the relative intensity of research activities, one hypothesis for the relative decrease in focus on Arabidopsis is that advances in, for example, plant transformation, genetic manipulation, and genome research have allowed the adoption of more previously nonmodel taxa. Consistent with this, there was a precipitous increase in the number of genera being published in the mid-90s to early 2000s during which approaches for plant transgenics became established [ 34 ], but the number has remained steady since then ( S7A Fig ). The decrease in the proportion of Arabidopsis papers is also negatively correlated with the timing of an increase in the number of draft genomes ( S7B Fig and S9 Data ). It is plausible that genome availability for other species may have contributed to a shift away from Arabidopsis. Strikingly, when we analyzed US National Science Foundation records, we found that the numbers of funded grants mentioning Arabidopsis ( S7C Fig ) have risen and fallen in near perfect synchrony with the normalized number of Arabidopsis publication records (red line, Fig 4C ). This finding likely illustrates the impact of funding on Arabidopsis research.

By considering both taxa information and research topics, we can identify clear differences in the topical areas preferred by researchers using different plant taxa ( Fig 4D and S10 Data ). For example, studies of auxin/light signaling, the circadian clock, and flowering tend to be carried out in Arabidopsis, while quantitative genetic studies of disease resistance tend to be done in wheat and rice, glyphosate research in soybean, and RNA virus research in tobacco. Taken together, joint analyses of topics and species revealed additional details about changes in preferred models over time, and the preferred topical areas for different taxa.

Countries differ in their contributions to plant science and topical preference

We next investigated whether there were geographical differences in topical preference among countries by inferring country information from 330,187 records (see Materials and methods ). The 10 countries with the most records account for 73% of the total, with China and the US contributing to approximately 18% each ( Fig 5A ). The exponential growth in plant science records (green line, Fig 1A ) was in large part due to the rapid rise in annual record numbers in China and India ( Fig 5B ). When we examined the publication growth rates using the top 17 plant science journals, the general patterns remained the same ( S7D Fig ). On the other hand, the US, Japan, Germany, France, and Great Britain had slower rates of growth compared with all non-top 10 countries. The rapid increase in records from China and India was accompanied by a rapid increase in metrics measuring journal impact ( Figs 5C and S8 and S9 Data ). For example, using citation score ( Fig 5C , see Materials and methods ), we found that during a 22-year period China (dark green) and India (light green) rapidly approached the global average (y = 0, yellow), whereas some of the other top 10 countries, particularly the US (red) and Japan (yellow green), showed signs of decrease ( Fig 5C ). It remains to be determined whether these geographical trends reflect changes in priority, investment, and/or interest in plant science research.

thumbnail

(A) Numbers of plant science records for countries with the 10 highest numbers. (B) Percentage of all records from each of the top 10 countries from 1980 to 2020. (C) Difference in citation scores from 1999 to 2020 for the top 10 countries. (D) Shown for each country is the relationship between the citation scores averaged from 1999 to 2020 and the slope of linear fit with year as the predictive variable and citation score as the response variable. The countries with >400 records and with <10% missing impact values are included. Data used for plots (A–D) are in S11 Data . (E) Correlation in topic enrichment scores between the top 10 countries. PCC, Pearson’s correlation coefficient, positive in red, negative in blue. Yellow rectangle: countries with more similar topical preferences. (F) Enrichment scores (LLR, log likelihood ratio) of selected topics among the top 10 countries. Red: overrepresentation, blue: underrepresentation. Gray: topic-country combination that is not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method (for all topics and plotting data, see S12 Data ).

https://doi.org/10.1371/journal.pbio.3002612.g005

Interestingly, the relative growth/decline in citation scores over time (measured as the slope of linear fit of year versus citation score) was significantly and negatively correlated with average citation score ( Fig 5D ); i.e., countries with lower overall metrics tended to experience the strongest increase in citation scores over time. Thus, countries that did not originally have a strong influence on plant sciences now have increased impact. These patterns were also observed when using H-index or journal rank as metrics ( S8 Fig and S11 Data ) and were not due to increased publication volume, as the metrics were normalized against numbers of records from each country (see Materials and methods ). In addition, the fact that different metrics with different caveats and assumptions yielded consistent conclusions indicates the robustness of our observations. We hypothesize that this may be a consequence of the ease in scientific communication among geographically isolated research groups. It could also be because of the prevalence of online journals that are open access, which makes scientific information more readily accessible. Or it can be due to the increasing international collaboration. In any case, the causes for such regression toward the mean are not immediately clear and should be addressed in future studies.

We also assessed how the plant research foci of countries differ by comparing topical preference (i.e., the degree of enrichment of plant science records in different topics) between countries. For example, Italy and Spain cluster together (yellow rectangle, Fig 5E ) partly because of similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Fig 5F , for the fold enrichment and corrected p -values of all topics, see S12 Data ). There are substantial differences in topical focus between countries ( S9 Fig ). For example, research on new plant compounds associated with herbal medicine (topic 69) is a focus in China but not in the US, but the opposite is true for population genetics and evolution (topic 86) ( Fig 5F ). In addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries, which are informative for science policy considerations.

In this study, topic modeling revealed clear transitions among research topics, which represent shifts in research trends in plant sciences. One limitation of our study is the bias in the PubMed-based corpus. The cellular, molecular, and physiological aspects of plant sciences are well represented, but there are many fewer records related to evolution, ecology, and systematics. Our use of titles/abstracts from the top 17 plant science journals as positive examples allowed us to identify papers we typically see in these journals, but this may have led to us missing “outlier” articles, which may be the most exciting. Another limitation is the need to assign only one topic to a record when a study is interdisciplinary and straddles multiple topics. Furthermore, a limited number of large, inherently heterogeneous topics were summarized to provide a more concise interpretation, which undoubtedly underrepresents the diversity of plant science research. Despite these limitations, dynamic topic modeling revealed changes in plant science research trends that coincide with major shifts in biological science. While we were interested in identifying conceptual advances, our approach can identify the trend but the underlying causes for such trends, particularly key records leading to the growth in certain topics, still need to be identified. It also remains to be determined which changes in research trends lead to paradigm shifts as defined by Kuhn [ 35 ].

The key terms defining the topics frequently describe various technologies (e.g., topic 38/39: transformation, 40: genome editing, 59: genetic markers, 65: mass spectrometry, 69: nuclear magnetic resonance) or are indicative of studies enabled through molecular genetics and omics technologies (e.g., topic 8/60: genome, 11: epigenetic modifications, 18: molecular biological studies of macromolecules, 13: small RNAs, 61: quantitative genetics, 82/84: metagenomics). Thus, this analysis highlights how technological innovation, particularly in the realm of omics, has contributed to a substantial number of research topics in the plant sciences, a finding that likely holds for other scientific disciplines. We also found that the pattern of topic evolution is similar to that of succession, where older topics have mostly decreased in relative prevalence but appear to have been superseded by newer ones. One example is the rise of transcriptome-related topics and the correlated, reduced focus on regulation at levels other than transcription. This raises the question of whether research driven by technology negatively impacts other areas of research where high-throughput studies remain challenging.

One observation on the overall trends in plant science research is the approximately 10-year cycle in major shifts. One hypothesis is related to not only scientific advances but also to the fashion-driven aspect of science. Nonetheless, given that there were only 3 major shifts and the sample size is small, it is difficult to speculate as to why they happened. By analyzing the country of origin, we found that China and India have been the 2 major contributors to the growth in the plant science records in the last 20 years. Our findings also show an equalizing trend in global plant science where countries without a strong plant science publication presence have had an increased impact over the last 20 years. In addition, we identified significant differences in research topics between countries reflecting potential differences in investment and priorities. Such information is important for discerning differences in research trends across countries and can be considered when making policy decisions about research directions.

Materials and methods

Collection and preprocessing of a candidate plant science corpus.

For reproducibility purposes, a random state value of 20220609 was used throughout the study. The PubMed baseline files containing citation information ( ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ ) were downloaded on November 11, 2021. To narrow down the records to plant science-related citations, a candidate citation was identified as having, within the titles and/or abstracts, at least one of the following words: “plant,” “plants,” “botany,” “botanical,” “planta,” and “plantarum” (and their corresponding upper case and plural forms), or plant taxon identifiers from NCBI Taxonomy ( https://www.ncbi.nlm.nih.gov/taxonomy ) or USDA PLANTS Database ( https://plants.sc.egov.usda.gov/home ). Note the search terms used here have nothing to do with the values of the keyword field in PubMed records. The taxon identifiers include all taxon names including and at taxonomic levels below “Viridiplantae” till the genus level (species names not used). This led to 51,395 search terms. After looking for the search terms, qualified entries were removed if they were duplicated, lacked titles and/or abstracts, or were corrections, errata, or withdrawn articles. This left 1,385,417 citations, which were considered the candidate plant science corpus (i.e., a collection of texts). For further analysis, the title and abstract for each citation were combined into a single entry. Text was preprocessed by lowercasing, removing stop-words (i.e., common words), removing non-alphanumeric and non-white space characters (except Greek letters, dashes, and commas), and applying lemmatization (i.e., grouping inflected forms of a word as a single word) for comparison. Because lemmatization led to truncated scientific terms, it was not included in the final preprocessing pipeline.

Definition of positive/negative examples

Upon closer examination, a large number of false positives were identified in the candidate plant science records. To further narrow down citations with a plant science focus, text classification was used to distinguish plant science and non-plant science articles (see next section). For the classification task, a negative set (i.e., non-plant science citations) was defined as entries from 7,360 journals that appeared <20 times in the filtered data (total = 43,329, journal candidate count, S1 Data ). For the positive examples (i.e., true plant science citations), 43,329 plant science citations (positive examples) were sampled from 17 established plant science journals each with >2,000 entries in the filtered dataset: “Plant physiology,” “Frontiers in plant science,” “Planta,” “The Plant journal: for cell and molecular biology,” “Journal of experimental botany,” “Plant molecular biology,” “The New phytologist,” “The Plant cell,” “Phytochemistry,” “Plant & cell physiology,” “American journal of botany,” “Annals of botany,” “BMC plant biology,” “Tree physiology,” “Molecular plant-microbe interactions: MPMI,” “Plant biology,” and “Plant biotechnology journal” (journal candidate count, S1 Data ). Plant biotechnology journal was included, but only 1,894 records remained after removal of duplicates, articles with missing info, and/or withdrawn articles. The positive and negative sets were randomly split into training and testing subsets (4:1) while maintaining a 1:1 positive-to-negative ratio.

Text classification based on Tf and Tf-Idf

Instead of using the preprocessed text as features for building classification models directly, text embeddings (i.e., representations of texts in vectors) were used as features. These embeddings were generated using 4 approaches (model summary, S1 Data ): Term-frequency (Tf), Tf-Idf [ 36 ], Word2Vec [ 37 ], and BERT [ 6 ]. The Tf- and Tf-Idf-based features were generated with CountVectorizer and TfidfVectorizer, respectively, from Scikit-Learn [ 38 ]. Different maximum features (1e4 to 1e5) and n-gram ranges (uni-, bi-, and tri-grams) were tested. The features were selected based on the p- value of chi-squared tests testing whether a feature had a higher-than-expected value among the positive or negative classes. Four different p- value thresholds were tested for feature selection. The selected features were then used to retrain vectorizers with the preprocessed training texts to generate feature values for classification. The classification model used was XGBoost [ 39 ] with 5 combinations of the following hyperparameters tested during 5-fold stratified cross-validation: min_child_weight = (1, 5, 10), gamma = (0.5, 1, 1.5, 2.5), subsample = (0.6, 0.8, 1.0), colsample_bytree = (0.6, 0.8, 1.0), and max_depth = (3, 4, 5). The rest of the hyperparameters were held constant: learning_rate = 0.2, n_estimators = 600, objective = binary:logistic. RandomizedSearchCV from Scikit-Learn was used for hyperparameter tuning and cross-validation with scoring = F1-score.

Because the Tf-Idf model had a relatively high model performance and was relatively easy to interpret (terms are frequency-based, instead of embedding-based like those generated by Word2Vec and BERT), the Tf-Idf model was selected as input to SHapley Additive exPlanations (SHAP; [ 14 ]) to assess the importance of terms. Because the Tf-Idf model was based on XGBoost, a tree-based algorithm, the TreeExplainer module in SHAP was used to determine a SHAP value for each entry in the training dataset for each Tf-Idf feature. The SHAP value indicates the degree to which a feature positively or negatively affects the underlying prediction. The importance of a Tf-Idf feature was calculated as the average SHAP value of that feature among all instances. Because a Tf-Idf feature is generated based on a specific term, the importance of the Tf-Idf feature indicates the importance of the associated term.

Text classification based on Word2Vec

The preprocessed texts were first split into train, validation, and test subsets (8:1:1). The texts in each subset were converted to 3 n-gram lists: a unigram list obtained by splitting tokens based on the space character, or bi- and tri-gram lists built with Gensim [ 40 ]. Each n-gram list of the training subset was next used to fit a Skip-gram Word2Vec model with vector_size = 300, window = 8, min_count = (5, 10, or 20), sg = 1, and epochs = 30. The Word2Vec model was used to generate word embeddings for train, validate, and test subsets. In the meantime, a tokenizer was trained with train subset unigrams using Tensorflow [ 41 ] and used to tokenize texts in each subset and turn each token into indices to use as features for training text classification models. To ensure all citations had the same number of features (500), longer texts were truncated, and shorter ones were zero-padded. A deep learning model was used to train a text classifier with an input layer the same size as the feature number, an attention layer incorporating embedding information for each feature, 2 bidirectional Long-Short-Term-Memory layers (15 units each), a dense layer (64 units), and a final, output layer with 2 units. During training, adam, accuracy, and sparse_categorical_crossentropy were used as the optimizer, evaluation metric, and loss function, respectively. The training process lasted 30 epochs with early stopping if validation loss did not improve in 5 epochs. An F1 score was calculated for each n-gram list and min_count parameter combination to select the best model (model summary, S1 Data ).

Text classification based on BERT models

Two pretrained models were used for BERT-based classification: DistilBERT (Hugging face repository [ 42 ] model name and version: distilbert-base-uncased [ 43 ]) and SciBERT (allenai/scibert-scivocab-uncased [ 16 ]). In both cases, tokenizers were retrained with the training data. BERT-based models had the following architecture: the token indices (512 values for each token) and associated masked values as input layers, pretrained BERT layer (512 × 768) excluding outputs, a 1D pooling layer (768 units), a dense layer (64 units), and an output layer (2 units). The rest of the training parameters were the same as those for Word2Vec-based models, except training lasted for 20 epochs. Cross-validation F1-scores for all models were compared and used to select the best model for each feature extraction method, hyperparameter combination, and modeling algorithm or architecture (model summary, S1 Data ). The best model was the Word2Vec-based model (min_count = 20, window = 8, ngram = 3), which was applied to the candidate plant science corpus to identify a set of plant science citations for further analysis. The candidate plant science records predicted as being in the positive class (421,658) by the model were collectively referred to as the “plant science corpus.”

Plant science record classification

In PubMed, 1,384,718 citations containing “plant” or any plant taxon names (from the phylum to genus level) were considered candidate plant science citations. To further distinguish plant science citations from those in other fields, text classification models were trained using titles and abstracts of positive examples consisting of citations from 17 plant science journals, each with >2,000 entries in PubMed, and negative examples consisting of records from journals with fewer than 20 entries in the candidate set. Among 4 models tested the best model (built with Word2Vec embeddings) had a cross validation F1 of 0.964 (random guess F1 = 0.5, perfect model F1 = 1, S1 Data ). When testing the model using 17,330 testing set citations independent from the training set, the F1 remained high at 0.961.

We also conducted another analysis attempting to use the MeSH term “Plants” as a benchmark. Records with the MeSH term “Plants” also include pharmaceutical studies of plants and plant metabolites or immunological studies of plants as allergens in journals that are not generally considered plant science journals (e.g., Acta astronautica , International journal for parasitology , Journal of chromatography ) or journals from local scientific societies (e.g., Acta pharmaceutica Hungarica , Huan jing ke xue , Izvestiia Akademii nauk . Seriia biologicheskaia ). Because we explicitly labeled papers from such journals as negative examples, we focused on 4,004 records with the “Plants” MeSH term published in the 17 plant science journals that were used as positive instances and found that 88.3% were predicted as the positive class. Thus, based on the MeSH term, there is an 11.7% false prediction rate.

We also enlisted 5 plant science colleagues (3 advanced graduate students in plant biology and genetic/genome science graduate programs, 1 postdoctoral breeder/quantitative biologist, and 1 postdoctoral biochemist/geneticist) to annotate 100 randomly selected abstracts as a reviewer suggested. Each record was annotated by 2 colleagues. Among 85 entries where the annotations are consistent between annotators, 48 were annotated as negative but with 7 predicted as positive (false positive rate = 14.6%) and 37 were annotated as positive but with 4 predicted as negative (false negative rate = 10.8%). To further benchmark the performance of the text classification model, we identified another 12 journals that focus on plant science studies to use as benchmarks: Current opinion in plant biology (number of articles: 1,806), Trends in plant science (1,723), Functional plant biology (1,717), Molecular plant pathology (1,573), Molecular plant (1,141), Journal of integrative plant biology (1,092), Journal of plant research (1,032), Physiology and molecular biology of plants (830), Nature plants (538), The plant pathology journal (443). Annual review of plant biology (417), and The plant genome (321). Among the 12,611 candidate plant science records, 11,386 were predicted as positive. Thus, there is a 9.9% false negative rate.

Global topic modeling

BERTopic [ 15 ] was used for preliminary topic modeling with n-grams = (1,2) and with an embedding initially generated by DistilBERT, SciBERT, or BioBERT (dmis-lab/biobert-base-cased-v1.2; [ 44 ]). The embedding models converted preprocessed texts to embeddings. The topics generated based on the 3 embeddings were similar ( S2 Data ). However, SciBERT-, BioBERT-, and distilBERT-based embedding models had different numbers of outlier records (268,848, 293,790, and 323,876, respectively) with topic index = −1. In addition to generating the fewest outliers, the SciBERT-based model led to the highest number of topics. Therefore, SciBERT was chosen as the embedding model for the final round of topic modeling. Modeling consisted of 3 steps. First, document embeddings were generated with SentenceTransformer [ 45 ]. Second, a clustering model to aggregate documents into clusters using hdbscan [ 46 ] was initialized with min_cluster_size = 500, metric = euclidean, cluster_selection_method = eom, min_samples = 5. Third, the embedding and the initialized hdbscan model were used in BERTopic to model topics with neighbors = 10, nr_topics = 500, ngram_range = (1,2). Using these parameters, 90 topics were identified. The initial topic assignments were conservative, and 241,567 records were considered outliers (i.e., documents not assigned to any of the 90 topics). After assessing the prediction scores of all records generated from the fitted topic models, the 95-percentile score was 0.0155. This score was used as the threshold for assigning outliers to topics: If the maximum prediction score was above the threshold and this maximum score was for topic t , then the outlier was assigned to t . After the reassignment, 49,228 records remained outliers. To assess if some of the outliers were not assigned because they could be assigned to multiple topics, the prediction scores of the records were used to put records into 100 clusters using k- means. Each cluster was then assessed to determine if the outlier records in a cluster tended to have higher prediction scores across multiple topics ( S2 Fig ).

Topics that are most and least well connected to other topics

The most well-connected topics in the network include topic 24 (stress mechanisms, median cosine similarity = 0.36), topic 42 (genes, stress, and transcriptomes, 0.34), and topic 35 (molecular genetics, 0.32, all t test p -values < 1 × 10 −22 ). The least connected topics include topic 0 (allergen research, median cosine similarity = 0.12), topic 21 (clock biology, 0.12), topic 1 (tissue culture, 0.15), and topic 69 (identification of compounds with spectroscopic methods, 0.15; all t test p- values < 1 × 10 −24 ). Topics 0, 1, and 69 are specialized topics; it is surprising that topic 21 is not as well connected as explained in the main text.

Analysis of documents based on the topic model

example of a hypothesis for a research proposal

Topical diversity among top journals with the most plant science records

Using a relative topic diversity measure (ranging from 0 to 10), we found that there was a wide range of topical diversity among 20 journals with the largest numbers of plant science records ( S3 Fig ). The 4 journals with the highest relative topical diversities are Proceedings of the National Academy of Sciences , USA (9.6), Scientific Reports (7.1), Plant Physiology (6.7), and PLOS ONE (6.4). The high diversities are consistent with the broad, editorial scopes of these journals. The 4 journals with the lowest diversities are American Journal of Botany (1.6), Oecologia (0.7), Plant Disease (0.7), and Theoretical and Applied Genetics (0.3), which reflects their discipline-specific focus and audience of classical botanists, ecologists, plant pathologists, and specific groups of geneticists.

Dynamic topic modeling

The codes for dynamic modeling were based on _topic_over_time.py in BERTopics and modified to allow additional outputs for debugging and graphing purposes. The plant science citations were binned into 50 subsets chronologically (for timestamps of bins, see S5 Data ). Because the numbers of documents increased exponentially over time, instead of dividing them based on equal-sized time intervals, which would result in fewer records at earlier time points and introduce bias, we divided them into time bins of similar size (approximately 8,400 documents). Thus, the earlier time subsets had larger time spans compared with later time subsets. If equal-size time intervals were used, the numbers of documents between the intervals would differ greatly; the earlier time points would have many fewer records, which may introduce bias. Prior to binning the subsets, the publication dates were converted to UNIX time (timestamp) in seconds; the plant science records start in 1917-11-1 (timestamp = −1646247600.0) and end in 2021-1-1 (timestamp = 1609477201). The starting dates and corresponding timestamps for the 50 subsets including the end date are in S6 Data . The input data included the preprocessed texts, topic assignments of records from global topic modeling, and the binned timestamps of records. Three additional parameters were set for topics_over_time, namely, nr_bin = 50 (number of bins), evolution_tuning = True, and global_tuning = False. The evolution_tuning parameter specified that averaged c-Tf-Idf values for a topic be calculated in neighboring time bins to reduce fluctuation in c-Tf-Idf values. The global_tuning parameter was set to False because of the possibility that some nonexisting terms could have a high c-Tf-Idf for a time bin simply because there was a high global c-Tf-Idf value for that term.

The binning strategy based on similar document numbers per bin allowed us to increase signal particularly for publications prior to the 90s. This strategy, however, may introduce more noise for bins with smaller time durations (i.e., more recent bins) because of publication frequencies (there can be seasonal differences in the number of papers published, biased toward, e.g., the beginning of the year or the beginning of a quarter). To address this, we examined the relative frequencies of each topic over time ( S7 Data ), but we found that recent time bins had similar variances in relative frequencies as other time bins. We also moderated the impact of variation using LOWESS (10% to 30% of the data points were used for fitting the trend lines) to determine topical trends for Fig 3 . Thus, the influence of the noise introduced via our binning strategy is expected to be minimal.

Topic categories and ordering

The topics were classified into 5 categories with contrasting trends: stable, early, transitional, sigmoidal, and rising. To define which category a topic belongs to, the frequency of documents over time bins for each topic was analyzed using 3 regression methods. We first tried 2 forecasting methods: recursive autoregressor (the ForecasterAutoreg class in the skforecast package) and autoregressive integrated moving average (ARIMA implemented in the pmdarima package). In both cases, the forecasting results did not clearly follow the expected trend lines, likely due to the low numbers of data points (relative frequency values), which resulted in the need to extensively impute missing data. Thus, as a third approach, we sought to fit the trendlines with the data points using LOWESS (implemented in the statsmodels package) and applied additional criteria for assigning topics to categories. When fitting with LOWESS, 3 fraction parameters (frac, the fraction of the data used when estimating each y-value) were evaluated (0.1, 0.2, 0.3). While frac = 0.3 had the smallest errors for most topics, in situations where there were outliers, frac = 0.2 or 0.1 was chosen to minimize mean squared errors ( S7 Data ).

The topics were classified into 5 categories based on the slopes of the fitted line over time: (1) stable: topics with near 0 slopes over time; (2) early: topics with negative (<−0.5) slopes throughout (with the exception of topic 78, which declined early on but bounced back by the late 1990s); (3) transitional: early positive (>0.5) slopes followed by negative slopes at later time points; (4) sigmoidal: early positive slopes followed by zero slopes at later time points; and (5) rising: continuously positive slopes. For each topic, the LOWESS fits were also used to determine when the relative document frequency reached its peak, first reaching a threshold of 0.6 (chosen after trial and error for a range of 0.3 to 0.9), and the overall trend. The topics were then ordered based on (1) whether they belonged to the stable category or not; (2) whether the trends were decreasing, stable, or increasing; (3) the time the relative document frequency first reached 0.6; and (4) the time that the overall peak was reached ( S8 Data ).

Taxa information

To identify a taxon or taxa in all plant science records, NCBI Taxonomy taxdump datasets were downloaded from the NCBI FTP site ( https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ ) on September 20, 2022. The highest-level taxon was Viridiplantae, and all its child taxa were parsed and used as queries in searches against the plant science corpus. In addition, a species-over-time analysis was conducted using the same time bins as used for dynamic topic models. The number of records in different time bins for top taxa are in the genus, family, order, and additional species level sheet in S9 Data . The degree of over-/underrepresentation of a taxon X in a research topic T was assessed using the p -value of a Fisher’s exact test for a 2 × 2 table consisting of the numbers of records in both X and T, in X but not T, in T but not X, and in neither ( S10 Data ).

For analysis of plant taxa with genome information, genome data of taxa in Viridiplantae were obtained from the NCBI Genome data-hub ( https://www.ncbi.nlm.nih.gov/data-hub/genome ) on October 28, 2022. There were 2,384 plant genome assemblies belonging to 1,231 species in 559 genera (genome assembly sheet, S9 Data ). The date of the assembly was used as a proxy for the time when a genome was sequenced. However, some species have updated assemblies and have more recent data than when the genome first became available.

Taxa being studied in the plant science records

Flowering plants (Magnoliopsida) are found in 93% of records, while most other lineages are discussed in <1% of records, with conifers and related species being exceptions (Acrogynomsopermae, 3.5%, S6A Fig ). At the family level, the mustard (Brassicaceae), grass (Poaceae), pea (Fabaceae), and nightshade (Solanaceae) families are in 51% of records ( S6B Fig ). The prominence of the mustard family in plant science research is due to the Brassica and Arabidopsis genera ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnovers emerged ( Figs 4B , S6C, and S6D ). While the study of monocot species (Liliopsida) has remained steady, there was a significant uptick in the prevalence of eudicot (eudicotyledon) records in the late 90s ( S6C Fig ), which can be attributed to the increased number of studies in the mustard, myrtle (Myrtaceae), and mint (Lamiaceae) families among others ( S6D Fig ). At the genus level, records mentioning Gossypium (cotton), Phaseolus (bean), Hordeum (wheat), and Zea (corn), similar to the topics in the early category, were prevalent till the 1980s or 1990s but have mostly decreased in number since ( Fig 4B ). In contrast, Capsicum , Arabidopsis , Oryza , Vitus , and Solanum research has become more prevalent over the last 20 years.

Geographical information for the plant science corpus

The geographical information (country) of authors in the plant science corpus was obtained from the address (AD) fields of first authors in Medline XML records accessible through the NCBI EUtility API ( https://www.ncbi.nlm.nih.gov/books/NBK25501/ ). Because only first author affiliations are available for records published before December 2014, only the first author’s location was considered to ensure consistency between records before and after that date. Among the 421,658 records in the plant science corpus, 421,585 had Medline records and 421,276 had unique PMIDs. Among the records with unique PMIDs, 401,807 contained address fields. For each of the remaining records, the AD field content was split into tokens with a “,” delimiter, and the token likely containing geographical info (referred to as location tokens) was selected as either the last token or the second to last token if the last token contained “@” indicating the presence of an email address. Because of the inconsistency in how geographical information was described in the location tokens (e.g., country, state, city, zip code, name of institution, and different combinations of the above), the following 4 approaches were used to convert location tokens into countries.

The first approach was a brute force search where full names and alpha-3 codes of current countries (ISO 3166–1), current country subregions (ISO 3166–2), and historical country (i.e., country that no longer exists, ISO 3166–3) were used to search the address fields. To reduce false positives using alpha-3 codes, a space prior to each code was required for the match. The first approach allowed the identification of 361,242, 16,573, and 279,839 records with current country, historical country, and subregion information, respectively. The second method was the use of a heuristic based on common address field structures to identify “location strings” toward the end of address fields that likely represent countries, then the use of the Python pycountry module to confirm the presence of country information. This approach led to 329,025 records with country information. The third approach was to parse first author email addresses (90,799 records), recover top-level domain information, and use country code Top Level Domain (ccTLD) data from the ISO 3166 Wikipedia page to define countries (72,640 records). Only a subset of email addresses contains country information because some are from companies (.com), nonprofit organizations (.org), and others. Because a large number of records with address fields still did not have country information after taking the above 3 approaches, another approach was implemented to query address fields against a locally installed Nominatim server (v.4.2.3, https://github.com/mediagis/nominatim-docker ) using OpenStreetMap data from GEOFABRIK ( https://www.geofabrik.de/ ) to find locations. Initial testing indicated that the use of full address strings led to false positives, and the computing resource requirement for running the server was high. Thus, only location strings from the second approach that did not lead to country information were used as queries. Because multiple potential matches were returned for each query, the results were sorted based on their location importance values. The above steps led to an additional 72,401 records with country information.

Examining the overlap in country information between approaches revealed that brute force current country and pycountry searches were consistent 97.1% of the time. In addition, both approaches had high consistency with the email-based approach (92.4% and 93.9%). However, brute force subregion and Nominatim-based predictions had the lowest consistencies with the above 3 approaches (39.8% to 47.9%) and each other. Thus, a record’s country information was finalized if the information was consistent between any 2 approaches, except between the brute force subregion and Nominatim searches. This led to 330,328 records with country information.

Topical and country impact metrics

example of a hypothesis for a research proposal

To determine annual country impact, impact scores were determined in the same way as that for annual topical impact, except that values for different countries were calculated instead of topics ( S8 Data ).

Topical preferences by country

To determine topical preference for a country C , a 2 × 2 table was established with the number of records in topic T from C , the number of records in T but not from C , the number of non- T records from C , and the number of non- T records not from C . A Fisher’s exact test was performed for each T and C combination, and the resulting p -values were corrected for multiple testing with the Bejamini–Hochberg method (see S12 Data ). The preference of T in C was defined as the degree of enrichment calculated as log likelihood ratio of values in the 2 × 2 table. Topic 5 was excluded because >50% of the countries did not have records for this topic.

The top 10 countries could be classified into a China–India cluster, an Italy–Spain cluster, and remaining countries (yellow rectangles, Fig 5E ). The clustering of Italy and Spain is partly due to similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Figs 5F and S9 ). There are also substantial differences in topical focus between countries. For example, plant science records from China tend to be enriched in hyperspectral imaging and modeling (topic 9), gene family studies (topic 23), stress biology (topic 28), and research on new plant compounds associated with herbal medicine (topic 69), but less emphasis on population genetics and evolution (topic 86, Fig 5F ). In the US, there is a strong focus on insect pest resistance (topic 75), climate, community, and diversity (topic 83), and population genetics and evolution but less focus on new plant compounds. In summary, in addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries.

Supporting information

S1 fig. plant science record classification model performance..

(A–C) Distributions of prediction probabilities (y_prob) of (A) positive instances (plant science records), (B) negative instances (non-plant science records), and (C) positive instances with the Medical Subject Heading “Plants” (ID = D010944). The data are color coded in blue and orange if they are correctly and incorrectly predicted, respectively. The lower subfigures contain log10-transformed x axes for the same distributions as the top subfigure for better visualization of incorrect predictions. (D) Prediction probability distribution for candidate plant science records. Prediction probabilities plotted here are available in S13 Data .

https://doi.org/10.1371/journal.pbio.3002612.s001

S2 Fig. Relationships between outlier clusters and the 90 topics.

(A) Heatmap demonstrating that some outlier clusters tend to have high prediction scores for multiple topics. Each cell shows the average prediction score of a topic for records in an outlier cluster. (B) Size of outlier clusters.

https://doi.org/10.1371/journal.pbio.3002612.s002

S3 Fig. Cosine similarities between topics.

(A) Heatmap showing cosine similarities between topic pairs. Top-left: hierarchical clustering of the cosine similarity matrix using the Ward algorithm. The branches are colored to indicate groups of related topics. (B) Topic labels and names. The topic ordering was based on hierarchical clustering of topics. Colored rectangles: neighboring topics with >0.5 cosine similarities.

https://doi.org/10.1371/journal.pbio.3002612.s003

S4 Fig. Relative topical diversity for 20 journals.

The 20 journals with the most plant science records are shown. The journal names were taken from the journal list in PubMed ( https://www.nlm.nih.gov/bsd/serfile_addedinfo.html ).

https://doi.org/10.1371/journal.pbio.3002612.s004

S5 Fig. Topical frequency and top terms during different time periods.

(A-D) Different patterns of topical frequency distributions for example topics (A) 48, (B) 35, (C) 27, and (D) 42. For each topic, the top graph shows the frequency of topical records in each time bin, which are the same as those in Fig 3 (green line), and the end date for each bin is indicated. The heatmap below each line plot depicts whether a term is among the top terms in a time bin (yellow) or not (blue). Blue dotted lines delineate different decades (see S5 Data for the original frequencies, S6 Data for the LOWESS fitted frequencies and the top terms for different topics/time bins).

https://doi.org/10.1371/journal.pbio.3002612.s005

S6 Fig. Prevalence of records mentioning different taxonomic groups in Viridiplantae.

(A, B) Percentage of records mentioning specific taxa at the ( A) major lineage and (B) family levels. (C, D) The prevalence of taxon mentions over time at the (C) major lineage and (E) family levels. The data used for plotting are available in S9 Data .

https://doi.org/10.1371/journal.pbio.3002612.s006

S7 Fig. Changes over time.

(A) Number of genera being mentioned in plant science records during different time bins (the date indicates the end date of that bin, exclusive). (B) Numbers of genera (blue) and organisms (salmon) with draft genomes available from National Center of Biotechnology Information in different years. (C) Percentage of US National Science Foundation (NSF) grants mentioning the genus Arabidopsis over time with peak percentage and year indicated. The data for (A–C) are in S9 Data . (D) Number of plant science records in the top 17 plant science journals from the USA (red), Great Britain (GBR) (orange), India (IND) (light green), and China (CHN) (dark green) normalized against the total numbers of publications of each country over time in these 17 journals. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s007

S8 Fig. Change in country impact on plant science over time.

(A, B) Difference in 2 impact metrics from 1999 to 2020 for the 10 countries with the highest number of plant science records. (A) H-index. (B) SCImago Journal Rank (SJR). (C, D) Plots show the relationships between the impact metrics (H-index in (C) , SJR in (D) ) averaged from 1999 to 2020 and the slopes of linear fits with years as the predictive variable and impact metric as the response variable for different countries (A3 country codes shown). The countries with >400 records and with <10% missing impact values are included. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s008

S9 Fig. Country topical preference.

Enrichment scores (LLR, log likelihood ratio) of topics for each of the top 10 countries. Red: overrepresentation, blue: underrepresentation. The data for plotting can be found in S12 Data .

https://doi.org/10.1371/journal.pbio.3002612.s009

S1 Data. Summary of source journals for plant science records, prediction models, and top Tf-Idf features.

Sheet–Candidate plant sci record j counts: Number of records from each journal in the candidate plant science corpus (before classification). Sheet—Plant sci record j count: Number of records from each journal in the plant science corpus (after classification). Sheet–Model summary: Model type, text used (txt_flag), and model parameters used. Sheet—Model performance: Performance of different model and parameter combinations on the validation data set. Sheet–Tf-Idf features: The average SHAP values of Tf-Idf (Term frequency-Inverse document frequency) features associated with different terms. Sheet–PubMed number per year: The data for PubMed records in Fig 1A . Sheet–Plant sci record num per yr: The data for the plant science records in Fig 1A .

https://doi.org/10.1371/journal.pbio.3002612.s010

S2 Data. Numbers of records in topics identified from preliminary topic models.

Sheet–Topics generated with a model based on BioBERT embeddings. Sheet–Topics generated with a model based on distilBERT embeddings. Sheet–Topics generated with a model based on SciBERT embeddings.

https://doi.org/10.1371/journal.pbio.3002612.s011

S3 Data. Final topic model labels and top terms for topics.

Sheet–Topic label: The topic index and top 10 terms with the highest cTf-Idf values. Sheets– 0 to 89: The top 50 terms and their c-Tf-Idf values for topics 0 to 89.

https://doi.org/10.1371/journal.pbio.3002612.s012

S4 Data. UMAP representations of different topics.

For a topic T , records in the UMAP graph are colored red and records not in T are colored gray.

https://doi.org/10.1371/journal.pbio.3002612.s013

S5 Data. Temporal relationships between published documents projected onto 2D space.

The 2D embedding generated with UMAP was used to plot document relationships for each year. The plots from 1975 to 2020 were compiled into an animation.

https://doi.org/10.1371/journal.pbio.3002612.s014

S6 Data. Timestamps and dates for dynamic topic modeling.

Sheet–bin_timestamp: Columns are: (1) order index; (2) bin_idx–relative positions of bin labels; (3) bin_timestamp–UNIX time in seconds; and (4) bin_date–month/day/year. Sheet–Topic frequency per timestamp: The number of documents in each time bin for each topic. Sheets–LOWESS fit 0.1/0.2/0.3: Topic frequency per timestamp fitted with the fraction parameter of 0.1, 0.2, or 0.3. Sheet—Topic top terms: The top 5 terms for each topic in each time bin.

https://doi.org/10.1371/journal.pbio.3002612.s015

S7 Data. Locally weighted scatterplot smoothing (LOWESS) of topical document frequencies over time.

There are 90 scatter plots, one for each topic, where the x axis is time, and the y axis is the document frequency (blue dots). The LOWESS fit is shown as orange points connected with a green line. The category a topic belongs to and its order in Fig 3 are labeled on the top left corner. The data used for plotting are in S6 Data .

https://doi.org/10.1371/journal.pbio.3002612.s016

S8 Data. The 4 criteria used for sorting topics.

Peak: the time when the LOWESS fit of the frequencies of a topic reaches maximum. 1st_reach_thr: the time when the LOWESS fit first reaches a threshold of 60% maximal frequency (peak value). Trend: upward (1), no change (0), or downward (−1). Stable: whether a topic belongs to the stable category (1) or not (0).

https://doi.org/10.1371/journal.pbio.3002612.s017

S9 Data. Change in taxon record numbers and genome assemblies available over time.

Sheet–Genus: Number of records mentioning a genus during different time periods (in Unix timestamp) for the top 100 genera. Sheet–Genus: Number of records mentioning a family during different time periods (in Unix timestamp) for the top 100 families. Sheet–Genus: Number of records mentioning an order during different time periods (in Unix timestamp) for the top 20 orders. Sheet–Species levels: Number of records mentioning 12 selected taxonomic levels higher than the order level during different time periods (in Unix timestamp). Sheet–Genome assembly: Plant genome assemblies available from NCBI as of October 28, 2022. Sheet–Arabidopsis NSF: Absolute and normalized numbers of US National Science Foundation funded proposals mentioning Arabidopsis in proposal titles and/or abstracts.

https://doi.org/10.1371/journal.pbio.3002612.s018

S10 Data. Taxon topical preference.

Sheet– 5 genera LLR: The log likelihood ratio of each topic in each of the top 5 genera with the highest numbers of plant science records. Sheets– 5 genera: For each genus, the columns are: (1) topic; (2) the Fisher’s exact test p -value (Pvalue); (3–6) numbers of records in topic T and in genus X (n_inT_inX), in T but not in X (n_inT_niX), not in T but in X (n_niT_inX), and not in T and X (n_niT_niX) that were used to construct 2 × 2 tables for the tests; and (7) the log likelihood ratio generated with the 2 × 2 tables. Sheet–corrected p -value: The 4 values for generating LLRs were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s019

S11 Data. Impact metrics of countries in different years.

Sheet–country_top25_year_count: number of total publications and publications per year from the top 25 countries with the most plant science records. Sheet—country_top25_year_top17j: number of total publications and publications per year from the top 25 countries with the highest numbers of plant science records in the 17 plant science journals used as positive examples. Sheet–prank: Journal percentile rank scores for countries (3-letter country codes following https://www.iban.com/country-codes ) in different years from 1999 to 2020. Sheet–sjr: Scimago Journal rank scores. Sheet–hidx: H-Index scores. Sheet–cite: Citation scores.

https://doi.org/10.1371/journal.pbio.3002612.s020

S12 Data. Topical enrichment for the top 10 countries with the highest numbers of plant science publications.

Sheet—Log likelihood ratio: For each country C and topic T, it is defined as log((a/b)/(c/d)) where a is the number of papers from C in T, b is the number from C but not in T, c is the number not from C but in T, d is the number not from C and not in T. Sheet: corrected p -value: The 4 values, a, b, c, and d, were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s021

S13 Data. Text classification prediction probabilities.

This compressed file contains the PubMed ID (PMID) and the prediction probabilities (y_pred) of testing data with both positive and negative examples (pred_prob_testing), plant science candidate records with the MeSH term “Plants” (pred_prob_candidates_with_mesh), and all plant science candidate records (pred_prob_candidates_all). The prediction probability was generated using the Word2Vec text classification models for distinguishing positive (plant science) and negative (non-plant science) records.

https://doi.org/10.1371/journal.pbio.3002612.s022

Acknowledgments

We thank Maarten Grootendorst for discussions on topic modeling. We also thank Stacey Harmer, Eva Farre, Ning Jiang, and Robert Last for discussion on their respective research fields and input on how to improve this study and Rudiger Simon for the suggestion to examine differences between countries. We also thank Mae Milton, Christina King, Edmond Anderson, Jingyao Tang, Brianna Brown, Kenia Segura Abá, Eleanor Siler, Thilanka Ranaweera, Huan Chen, Rajneesh Singhal, Paulo Izquierdo, Jyothi Kumar, Daniel Shiu, Elliott Shiu, and Wiggler Catt for their good ideas, personal and professional support, collegiality, fun at parties, as well as the trouble they have caused, which helped us improve as researchers, teachers, mentors, and parents.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 2. Blei DM, Lafferty JD. Topic Models. In: Srivastava A, Sahami M, editors. Text Mining. Cambridge: Chapman and Hall/CRC; 2009. pp. 71–93.
  • 7. ChatGPT. [cited 2023 Aug 25]. Available from: https://chat.openai.com
  • 9. Fei-Fei L, Perona P. A Bayesian hierarchical model for learning natural scene categories. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005. pp. 524–531 vol. 2. https://doi.org/10.1109/CVPR.2005.16
  • 19. Blei DM, Lafferty JD. Dynamic topic models. Proceedings of the 23rd International Conference on Machine learning. New York, NY, USA: Association for Computing Machinery; 2006. pp. 113–120. https://doi.org/10.1145/1143844.1143859
  • 35. Kuhn T. The Structure of Scientific Revolution. Chicago: University of Chicago Press; 1962.
  • 36. CiteSeer | Proceedings of the second international conference on Autonomous agents. [cited 2023 Aug 23]. Available from: https://dl.acm.org/doi/10.1145/280765.280786
  • 39. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785
  • 40. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA; 2010. pp. 45–50.
  • 42. Hugging Face–The AI community building the future. 2023 Aug 19 [cited 2023 Aug 25]. Available from: https://huggingface.co/

example of a hypothesis for a research proposal

Join the Discussion Cancel reply

Add a Comment

Save my information

Post Comment

Undark Magazine

Share this Story

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on FlipBoard (Opens in new window)
  • Click to share on Pocket (Opens in new window)

example of a hypothesis for a research proposal

Bad Blood? The Uncertainty Around Microclots and Long Covid

Scientists are debating whether microscopic blood clots are responsible for the wide range of symptoms in long covid..

Top: A scanning electron microscope image taken by physiologist Resia Pretorius showing red blood cells covered with what she says are microclots, from a sample prepared from whole blood smears of people with long Covid. Visual: Courtesy of Resia Pretorius

A sad Khan first developed Covid-19 in late 2020. A pulmonologist in Manchester, England, Khan had spent most of the year working on packed hospital wards full of acutely ill Covid patients. After falling ill with the disease himself, Khan spent what he described as a dreadful month at home waiting for the symptoms to subside. But they never did. Instead, Khan fell into the grips of long Covid, a post-viral syndrome that can last months to years after a SARS-CoV-2 infection. By the following September, he was sequestered in a darkened room, wearing earplugs and a blindfold. Khan suffered from relentless nausea and couldn’t stand the presence of other people, even his own children. The symptoms were so intolerable, he said, that he would have taken drastic measures to end them.

In December 2021, Tiffany Patino, who at that time had struggled with long Covid symptoms for a year, rests in bed one afternoon in Rockville, Maryland. There are still no widely accepted treatments approved by the FDA to ease long Covid symptoms.

Then Khan heard about an experimental treatment offered by Beate Jaeger, an internist in Germany. Jaeger’s approach drew from preliminary evidence connecting long Covid with microscopic blood clots that can potentially deprive tissues and organs of sufficient oxygen. To remove these so-called microclots, Jaeger was using a procedure called apheresis, whereby a patient’s blood is removed, filtered, and then returned to the body. Apheresis is typically used to treat certain blood disorders or cancers such as leukemia or lymphoma. To Khan, the blood-washing strategy made “physiological sense.” So, after taking a one-hour flight to Germany, Khan met with Jaeger at her clinic. Apheresis treatments can run in the thousands of dollars, and Khan’s first session was scheduled for the following day. But Jaeger “took one look at me,” Khan recalled, and said, ‘You’re having it today.’”

A study supported by the National Institutes of Health in the United States found that nearly one in 10 people will experience long-lasting symptoms after a SARS-CoV-2 infection, which include loss of sense of smell, muscle pain, dizziness, brain fog, shortness of breath, and fatigue. For some, the symptoms are so severe that they are unable to work, attend school, or go about their daily lives. Long Covid is more likely among people who were hospitalized during the acute phase of their illness. But it also occurs among those who were never severely sick with Covid. Women have a higher risk than men, as do people with asthma and other underlying illnesses.

Research agencies have spent vast sums and effort investigating long Covid, but there’s still no consensus regarding its definition , its underlying mechanisms, or how best to manage it. Long Covid has hundreds of potential symptoms. The syndrome “likely has no single cause,” said Roy Silverstein, a hematologist who studies the effects of inflammation on blood clotting at the Medical College of Wisconsin. Some scientists attribute the condition to a mysterious immune system activation, possibly triggered by viral fragments lingering in the body. In this theory of long Covid’s origin, those viral bits and pieces inflame blood vessels, potentially leading to microclots that clog veins and capillaries and deprive tissues of oxygen. For patients frustrated by the lack of therapeutic alternatives, clot-busting approaches have become “a beacon of hope,” Silverstein said. That’s because unlike other purported mechanisms in long Covid, clotting is readily treatable with existing therapies.

But the link between microclots and long Covid has also proven controversial. While clotting is clearly documented in patients who become severely ill with a Covid infection, the notion that similar mechanisms drive long Covid symptoms hasn’t been fully accepted by the scientific community. Some experts say the hypothesis rests on questionable data generated using lab methods that are difficult to interpret and reproduce, and they note that researchers have yet to show evidence that microclots circulate in long Covid patients and trigger persistent effects. Microclots could plausibly explain long-lasting symptoms in some patients, Silverstein said, “but the whole story needs to go through rigorous peer review before we jump to major conclusions about it.”

Beverley Hunt, a professor of thrombosis and hemostasis at King’s College, London, co-authored a 2023 Cochrane Review of five available studies on a type of apheresis called plasmapheresis in long Covid, which concluded that evidence in support of the hypothesis is insufficient. (Cochrane Reviews analyze and synthesize available evidence to inform health decision making on particular topics.) Hunt agreed that there is “a lot of conjecture” surrounding microclots and strongly advises long Covid patients against spending thousands of dollars out of pocket on what she described as unproven microclot-directed therapies with potential harms. Complications from apheresis are rare and can include bleeding, infections, and allergic reactions.

example of a hypothesis for a research proposal

Some doctors also treat long Covid with “triple therapy,” which is a potent cocktail of three blood-thinners that’s typically reserved for heart attack patients, people in need of coronary stents, and people at high risk of stroke. Triple therapy poses a significant risk of dangerous bleeding , so people who get it need careful monitoring, cautioned Amitava Banerjee, a cardiologist and professor of clinical data science at University College London. Banerjee leads the only active clinical trial investigating blood thinning in long Covid: It’s part of a broader U.K.-based effort called Stimulate-ICP that involves more than 30 organizations evaluating current care and working to study drugs that show some evidence of benefitting patients with the condition. Researchers should publish evidence that triple therapy can actually work in long Covid patients, otherwise “we’re just giving them bleeding risks,” Banerjee said. “And with triple anti-coagulation therapy,” he noted, that’s significant.

But some clinicians who give these therapies disagree. They say that the blood of their long Covid patients is so sticky and hypercoagulated that triple therapy merely returns it to normal without the bleeds one might expect in other patients. Jaeger has not seen any major bleeding complications among the thousand or more patients she has treated so far, she told Undark in an email.

Meanwhile, severe long Covid patients like Khan say they have little choice but to seek treatment from doctors willing to push clinical boundaries. “In response to specialists suggesting that patients should wait for trials, I say this — you have not walked in our shoes,” wrote Khan in a 2022 editorial . “There are so many people suffering,” he said during an interview with Undark. “We don’t have the luxury of waiting for long, longitudinal randomized control trials. So we need to use the evidence that we have available.”

T he microclot hypothesis starts with the endothelium, which is a single-cell layer lining the interior surfaces of blood vessels. The endothelium normally releases anti-coagulating substances that keep blood liquid and flowing. But when inflamed by wounds or invading pathogens, it tilts towards a defensive posture: Endothelial cells activate small cell fragments called platelets, which combine with a protein called fibrinogen to initiate clotting. During that process, fibrinogen is converted into a tough fibrous mesh called fibrin that holds a clot in place. The body relies on blood clots to staunch bleeding and defend itself against infections. But during severe Covid, endothelial inflammation and clotting can go into overdrive. Autopsies of people who died from Covid reveal blood clots plugging blood vessels, along with evidence of multi-organ failure. That research gave rise to a view that Covid is, in many ways, a endothelial disease .

Evidence that microclots might similarly drive persistent symptoms in long Covid emerged from a collaboration between two scientists: Resia Pretorius, a physiologist at Stellenbosch University, in South Africa, and Douglas Kell, a biochemist at the University of Liverpool, in the U.K. The pair have long been studying how microbes can cause inflammation , which in turn, they say, pushes fibrinogen and other plasma proteins to adopt abnormal shapes they call amyloids. Scientists ordinarily use the term amyloid in reference to misshapen proteins that accumulate in organs and tissues of patients with certain diseases, including Alzheimer’s. Pretorius and Kell, however, claimed to have found them in blood samples from Covid patients obtained with the help of a local doctor in Stellenbosch, Jaco Laubscher.

“The whole story needs to go through rigorous peer review before we jump to major conclusions about it.”

Pretorius was studying blood samples from people who were severely ill with Covid around May 2020 when, she said, she first discovered the odd structures. She had stained the samples with a dye called thioflavin T that glows brightly when it binds to amyloid structures. Pretorius then looked at the samples through a fluorescence microscope. “The whole screen just lit up like Christmas trees,” she said. By the end of 2020, Pretorius was detecting amyloid fibrin deposits — or microclots, as she and Kell began calling them — in blood samples from patients with long Covid symptoms as well.

In the body, blood clots ordinarily dissolve over time. But in Pretorius’ test tubes, the amyloid clots resisted this normal degradative process. Pretorius and her research team made this discovery while comparing plasma samples from pre-treatment Covid patients and acute or long Covid patients with those obtained from healthy controls. First, they spun the samples down in a centrifuge. Then, they exposed all the samples to trypsin, a digestive enzyme that breaks plasma proteins into chains of amino acids. In the test tubes containing normal plasma, the proteins were fully digested. By contrast, proteins in the Covid samples had clumped into amyloid pellets made visible when stained with dye.

example of a hypothesis for a research proposal

In another study , Pretorius, Kell, and other researchers reported that adding part of the spike protein from SARS-CoV-2 triggers amyloid clotting in plasma samples from healthy people. Pretorius and her collaborators later performed studies showing that clots in blood plasma samples from people with long Covid contain a variety of pro-inflammatory molecules. Based on these test tube studies, Pretorius believes that microclots function as “little garbage bags” that roll through blood vessels, entrapping various proteins and bits of molecular debris.

In patients who recover normally from Covid, “microclots will be dissolved by the normal physiological processes, and then the endothelial layers will be fine,” she said. But in a subset of the population, she speculated that spike proteins studding the SARS-CoV-2 surface will linger in circulation such that patients experience an “onslaught of permanent, ongoing microclot formation that doesn’t break down; platelet hyper-activation; and widespread vascular damage.”

Kell agreed and took that speculation further: “When you’ve got these essentially insoluble bits of gunk sloshing around, they can do bad stuff like block blood flow,” he said. “That can cause almost all the consequences that you see, including fatigue, post-exertional malaise, brain fog, auto-antibodies.” In Kell’s view, microclots are at the core of what’s driving long Covid symptoms in millions of people around the world.

“We don’t have the luxury of waiting for long, longitudinal randomized control trials. So we need to use the evidence that we have available.”

But other scientists who spoke with Undark said that conclusion is premature given that Kell and Pretorius’ results haven’t been replicated or supported with follow-up studies. Hunt also questioned whether long Covid patients have endothelial cell activation — the inflammatory reaction that triggers clotting. “I’m very happy to believe there’s endothelial cell activation going on,” she said. “I’d just like more substantial evidence to support it.” Also, Hunt said that researchers have yet to actually find blood clots in the small vessels of patients who have long Covid, explaining that they could be easily found through imaging or biopsies. “I’m not saying it doesn’t happen,” she said. But “where is the evidence?”

Indeed, a recent  study  examining post-exertional malaise, the symptom of extreme fatigue experienced after a bout of exercise, used thioflavin T to look for amyloids in muscle biopsies of long Covid patients. The researchers found amyloid-containing deposits in muscle tissue but not in capillaries, leading them to conclude that microclots weren’t blocking blood vessels as hypothesized. What the clots are doing in the muscle and whether they lead to any symptoms, though, was unclear, said study co-author Rob C. I.  Wüst of Vrije Universiteit Amsterdam, in an email.

A t the same time that Pretorius and Kell were delving into amyloid clots in the lab, Laubscher, was independently treating his severe Covid patients with triple therapy. He was using a tool called thromboelastography to measure their blood’s ability to clot, and then prescribing blood thinners based on the diagnostic results. Laubscher never published any data with his acute Covid patients in scientific journals. In videos posted to YouTube, he has described his use of triple therapy and claimed that most of his patients got better and never needed to be placed on ventilators. Clinical trials, though, have been conducted. One showed that anticoagulants improve outcomes for those with severe Covid, specifically in hospitalized patients who don’t yet require intensive care.

Toward the end of 2020, Pretorius started referring people with long Covid to Laubscher. Following treatment with anticoagulants, the long Covid patients “started to recover,” Pretorius said. Laubscher has since posted one of the few available articles describing triple therapy outcomes in long Covid patients. It appears on Research Square, an online server for papers that haven’t been peer-reviewed, but has not yet been accepted to a journal, according to Khan, who is one of the co-authors.

example of a hypothesis for a research proposal

During the study, also co-authored by Pretorius and Kell, Laubscher treated 91 people with a triple therapy cocktail of two antiplatelet drugs — clopidogrel and aspirin — combined with apixaban, an anticoagulant. The report claimed that symptoms resolved in most patients, along with the severity of amyloid microclotting in their blood. In a video posted in December 2022, Laubscher claimed that out of the 373 long Covid patients he had treated with triple therapy, two had experienced major bleeds, one requiring hospitalization. Bruising was common among the rest, he said. The treatment typically worked faster and was more effective in patients who had suffered long Covid symptoms for less than six months. But without a control group, it’s impossible to know if the patients improved because of the treatment or because of the passage of time.

Pretorius emphatically said she doesn’t endorse any particular microclot treatment. “I’m not a clinician,” she said. “But I do support the efforts of the clinicians that want to try different things to try to help patients.” Pretorius’ role in that regard has mainly been to share the analytical methods for detecting microclots with scientists who plan to use it for research. Commercial users — doctors who want to use the method for diagnostic purposes and therapy — must first sign a licensing agreement with Pretorius’ startup company, Biocode , which assumes responsibility for quality control.

David Putrino, a professor at the Icahn School of Medicine at Mount Sinai who has a doctorate in neuroscience, is now trying to improve the specificity of the diagnostic approach. Putrino directs a center at Mount Sinai that offers specialized rehabilitative treatment for people with complex chronic illnesses such as Lyme, long Covid, and myalgic encephalomyelitis/chronic fatigue syndrome . To Putrino, the microclots hypothesis sounded plausible given that SARS-CoV-2 enters cells after binding with their ACE-2 receptors, which he described as being over-represented on blood vessels. If the virus circulates in the bloodstream, he reasoned, then that might go a long way towards explaining long Covid’s systemic symptoms. So Putrino reached out to Pretorius and asked if she would teach him how to look for amyloid clots in patient blood samples.

Doctors who want to use the method for diagnostic purposes and therapy must first sign a licensing agreement with Pretorius’ startup company.

Pretorius wound up coming to Mount Sinai and teaching him the method personally. “Sure enough, when we started testing folks, we would see evidence of these microclots,” and hyper-activated platelets, Putrino said. But the method is also limited in that it can only reveal if microclots and activated platelets are present in a given blood sample, and not how their numbers vary based on symptom severity, he said. So, Putrino’s team is now using machine learning and computer vision tools to derive quantitative microclot scores that, he said, might reveal how well particular treatments are working.

Putrino corresponds routinely with Pretorius and Kell, and he said their findings helped to show that long Covid patients actually have biological evidence of disease. Complex chronic conditions that now include long Covid, he said, have experienced years of neglect in part because doctors can’t find anything physically wrong with people who suffer from them. “We have the gall to call these illnesses ‘invisible illnesses,’” Putrino said. But “they’re highly visible when you know where to look.”

“Resia taught us how to look for microclots and platelet hyperactivation,” he added. “In many cases, these things would correlate with symptom burden.”

example of a hypothesis for a research proposal

But the nature of the biological evidence remains contentious. Jeffrey Winters, a pathologist and chair of the division of transfusion medicine at the Mayo Clinic, said the way Pretorius and Kell describe amyloid clots is confusing to scientists who view amyloids in a different context. “I don’t understand how they are using this term,” he said. “I assume that what they are referring to is not what I refer to as amyloids,” which he describes as various proteins that deposit into organs and don’t ordinarily float around in the blood. “I don’t understand what they are testing, and I don’t understand what they are seeing.” The dye used in the test, thioflavin T, can and does stain other things that are not amyloids, Winters said in an email.

For researchers accustomed to working with fluorescence microscopy, the method for detecting microclots is simple, Pretorius said. When asked for a published description of the method, she directed Undark to a 2021 paper with Laubscher as lead author. Hunt’s response was that working off the written account, “I would be unable to reproduce the assay in my lab.” The method needs to be more clearly presented, Hunt said, so that another investigator can perform it “without having to go for a teaching lesson.” Furthermore, the method has yet to be publicly validated in studies with control patients who do not have long Covid, Winters pointed out. If confirmed by other researchers in peer-reviewed publications, then “I might be inclined to seek to offer the test and even license it for clinical use,” Winters wrote in an email. “But without peer-reviewed publications demonstrating utility, I am [loathe] to do so.”

T he clinical evidence with microclot treatments is yet another point of contention, given the few available studies with treated patients. In May 2023, Jaeger published a case report describing the results of apheresis in long Covid. Jaeger uses a specialized form of the technology called heparin-mediated extracorporeal LDL precipitation, or HELP, apheresis that selectively removes fibrinogen and lipoproteins from blood plasma. (The equipment is not currently being sold in the U.S.) HELP differs from plasmapheresis, the more widespread alternative that replaces all of a patient’s plasma. Jaeger claims HELP apheresis filters microclots and SARS-CoV-2 spike proteins along with viral debris. Her report claims that 16 of 17 treated patients felt immediate improvement and 12 reached nearly full recovery. However, Winters, who is also editor in chief of the Journal of Clinical Apheresis, said he has yet to see published, peer-reviewed evidence showing that HELP apheresis removes spike protein or amyloid clots.

Hunt’s Cochrane Review, which was published in July 2023, couldn’t identify reliable research showing that fibrinogen particles contribute to long Covid or studies that investigated removing those particles with plasmapheresis. In the absence of a better therapeutic rationale, Hunt and her colleagues urged that plasmapheresis for long Covid should not be used outside of clinical trials. But in an email to Undark, Kell wrote that HELP and plasmapheresis are “completely different,” adding that “no one informed and of whom we are aware has ever suggested that plasmapheresis might be used to remove microclots, and we have never made any comments to that effect.”

example of a hypothesis for a research proposal

For all of Undark’s coverage of the global Covid-19 pandemic, please visit our extensive coronavirus archive .

According to Winters, European clinics offering HELP apheresis are generally doing “cash on the barrelhead,” meaning that patients pay upfront for the service along with “a lot of other alternatives that you can add on, such as hyperbaric oxygen and vitamin infusions.” The associated costs range from roughly $10,000 to $20,000 for three to five weeks of therapy, including transportation, accommodation, and laboratory expenses. Jaeger is medical adviser to the Apheresis Center in Larnaca, Cyprus, which posts glowing testimonials from its treated patients, many of whom claim improvements to their symptoms, and also markets its services directly to consumers. But Winters emphasized that, short of a clinical trial, “even a large case-series in a peer-reviewed publication would be useful for me to get a sense of what they are seeing,” and comments posted to long Covid forums reveal that some patients do worse after treatment.

Triple therapy for long Covid also attracts skepticism from scientists who say it isn’t adequately supported by published evidence. Michael Putman, an assistant professor and rheumatologist at the Medical College of Wisconsin, said that given the bleeding risk, treating long Covid with triple therapy outside the context of a clinical trial is “dangerous and irresponsible.” Putrino said he believes there’s a rationale for triple therapy in long Covid, but that clinicians at Mount Sinai are holding off on the treatment until supporting data become available. “I would challenge people who are strong proponents of this therapy to run the necessary trial and do the necessary work to validate it,” he said.

Cost of treatment with HELP apheresis ranges from roughly $10,000 to $20,000 for three to five weeks of therapy, including transportation, accommodation, and laboratory expenses.

Jordan Vaughn, an internist and chief executive officer of MedHelp Clinics, a private practice that employs about 20 physicians at five locations in the Birmingham, Alabama area, claims to have treated roughly 1,600 long Covid patients with blood thinners in various combinations — along with over-the-counter anti-clotting supplements (nattokinase and serrapeptase) and other approaches — but has never published a case series describing their clinical outcomes. “I am kind of the independent guy out there trying to help people,” Vaughn said, adding that he lacks the resources to conduct clinical research and publish his results.

During the pandemic, he noticed that acute Covid patients often had coagulation problems, which he diagnosed in part by measuring blood levels of D-dimer, a clot-associated protein. If the levels were high, then Vaughn treated with antiplatelets and anticoagulants, claiming they were “incredibly effective.” Later, when patients started showing up with long Covid, Vaughn dug into the medical literature and concluded that the “theories that Resia and Doug had, it made a lot of sense.” Vaughn bought a used fluorescence microscope for around $50,000, and then Pretorius taught him remotely how to look for amyloid clots in blood samples. He said his long Covid patients trusted him when he suggested blood thinners. “These patients were miserable,” he said. They were willing to undertake experimental treatment with drugs that were already approved by the U.S. Food and Drug Administration — meaning they weren’t Emergency Use Authorization “like the other crap,” he added, alluding to Covid vaccines and other treatments that received accelerated approval in the early days of the pandemic.

Vaughn said that he might launch a clinical trial with other collaborators, mentioning Rae Duncan, a consultant cardiologist with the U.K.’s National Health Service.

Get Our Newsletter

  • Email This field is for validation purposes and should be left unchanged.

Academic and government-funded institutions, meanwhile, have yet to conduct any trials with either triple therapy or HELP apheresis, according to Banerjee and Winters. Winters said he has “great deal of respect” for some of Jaeger’s findings, even if her 2023 study “smacks a bit of too much enthusiasm for the treatment and not as independently as I would have liked.”

The blood thinner arm of the U.K.’s Stimulate-ICP study is testing a single drug — rivaroxaban — rather than the triple therapy administered by Vaughn and other doctors. “You can’t just jump into any study and go straight to triple therapy,” said Banerjee, the study’s principal investigator. Safety needs to be shown first “with single-agent, randomized evidence.” Banerjee acknowledged that the pace of research isn’t ideal for patients. “But we’d rather have science done properly than shoddily and running out things that might not work,” he said.

example of a hypothesis for a research proposal

Hannah Davis, a data scientist and co-founder of the Patient-Led Research Collaborative, which advocates for patient involvement in long Covid research, expects that clinical trials will yield some results on treatments this year. She added that blood-based treatments might result in short-term improvements for some, or potential cures for others, if given early enough. “But the more likely scenario is that something is happening upstream of microclots that is causing them,” she said. “And that is what would need to be treated to solve the full illness.”

For Khan and other long Covid patients, effective treatments can’t come soon enough. “In the U.K., we’ve got maybe 2 million people affected, according to the most reliable current estimates, and at least a third of them are severely disabled, as in they’re unable to function in society,” he said. “This is a huge issue and it’s not going to go away”.

Khan wound up having 15 sessions of HELP apheresis and six sessions of plasmapheresis, and nearly three years later, he remains on anticoagulants, which he has tried to discontinue multiple times. He credits the treatments with improving his heart palpations, cognition, sleep, and other problems, adding that no one symptom is cured. The microclots hypothesis “is an important area for research,” he said. A few committed clinicians and researchers “are trying their very best to get this research out there, to get the studies done properly,” he added. But “there’s no will on the part of responsible government bodies to go down that route.”

UPDATE: A previous version of this piece described David Putrino as a physical therapist and professor at the Icahn School of Medicine at Mount Sinai. Contrary to Putrino’s public profile on the Mount Sinai website, he is not licensed as a physical therapist in New York. As state law requires licensure to use the title, it has been removed.

Share This Story

Charles Schmidt is a senior contributor to Undark and has also written for Science, Nature Biotechnology, Scientific American, Discover Magazine, and The Washington Post, among other publications.

The Basics: How Quantum Computers Work and Where the Technology is Heading

example of a hypothesis for a research proposal

The theoretical foundations of quantum computing emerged throughout the twentieth century, including Planck’s Quantum Hypothesis (1900), the Uncertainty Principle (1927), and Bell’s Inequality (1964). Practical applications initially emerged in the 1980s when Richard Feynman proposed using quantum systems to simulate other quantum systems, a task impractical for classical computers. This idea spurred the development of quantum algorithms, like Shor’s Algorithm (1994), which showed that quantum computers could efficiently factorize large numbers, and Grover’s Algorithm (1996), which is also known as the quantum search algorithm. Alongside this, the development of quantum error-correcting codes by Peter Shor and his colleagues marked significant progress in making quantum computing viable. Since 2000, an intense race to build practical quantum computers has ensued with technology behemoths and startups announcing advancements toward quantum supremacy. Similar to integrated circuit capacity, we may witness exponential growth in quantum computing capacity (e.g., the number of qubits on chips doubling about every 18 months per Rose’s Law).

Quantum Computers vs. Classical Computers

Quantum computers and classical computers operate on fundamentally different principles. Classical computers process information using transistors (or any digital circuitry) that store data in binary bits. These bits can only be in one of two states, either 0 or 1, which correspond to the absence or presence of voltage on the transistor gate. This binary state system is simple and robust, ensuring that when the state of a transistor is measured, it will distinctly show either a 0 or a 1.

In contrast, quantum computers utilize quantum bits – known as qubits – which have some probability of being in each of two states (designated |0⟩ and |1⟩) at the same time. Qubits can operate in binary in that they can be set to 0 or 1. However, due to their quantum mechanical nature, qubits can do much more. They can exist in a superposition state, where they embody aspects of both 0 and 1 simultaneously. This phenomenon is depicted in the Bloch Sphere, where unlike a classical bit that can only be at the North or South Pole (representing 0 or 1), a qubit can be anywhere on the sphere’s surface, including the poles:

example of a hypothesis for a research proposal

In another analogy, classical bits can be likened to a thumbs-up or thumbs-down system, where a thumb pointing up represents a 1 and a thumb pointing down represents a 0. On the other hand, a qubit allows for the thumb to represent a value even if it’s not completely up or down. Thus, a thumb positioned at an angle |ψ> which represents the qubit state, e.g., a 90-degree, or a 35-degree angle (in all directions), can also encode information. A thumb positioned horizontally represents a |0⟩ and |1⟩ at the same time.

This paradigm allows the qubit to represent multiple states at once, leading to probabilistic measurement outcomes where the likelihood of measuring a 0 or 1 can vary based on the qubit’s state.

What’s The Benefit?

The ability to exist in multiple states simultaneously enables quantum computers to encode and process information in ways that classical computers cannot. For instance, while a classical computer with three bits can represent one of eight possible states at a time, a quantum computer can represent all eight possible states simultaneously in a superposition state. This concept ( i.e. , quantum parallelism) along with quantum interference ( i.e. , the interaction between the states within a superposition), allows quantum computers to perform certain computations much faster and with less hardware than classical computers. This stark difference in data processing is what sets quantum computers apart, and it has significant implications for the kinds of tasks and calculations they can perform efficiently.

Additionally, quantum computers benefit from another important concept, “quantum entanglement.” Entanglement in quantum computing allows qubits to be interlinked, enabling them to process and store information in ways that surpass classical computers’ capabilities. Quantum entanglement occurs when a group of qubits (referred to as “entangled qubits”) share a quantum state so that their properties become correlated. Suppose there are two entangled qubits. When a quantum computer measures or changes a property of one qubit (e.g., spin, position, or polarization), it will then instantaneously change a property of the other qubit because their properties and states are correlated or entangled. Quantum computers can utilize this instantaneous correlation to improve their processing power. For instance, this interconnectedness facilitates parallelism, enabling quantum computers to solve complex problems more efficiently by performing multiple calculations simultaneously. Additionally, entanglement enhances the precision of quantum algorithms, contributing to faster and more accurate problem-solving in fields like cryptography, optimization, and material science.

To illustrate how entanglement can improve the computing power, consider the following example:

In a classical computer, doubling the number of bits can merely double the processing power. That is, the computing power grows linearly in relation to the number of bits. In quantum computing, however, this relationship is exponential. Therefore, adding an additional qubit to a 60-qubit computer will result in the quantum computer to be able to evaluate 260 qubit states concurrently.

Just as classical gates manipulate bits in well-defined ways according to Boolean logic, quantum gates operate on qubits using quantum gates, enabling the performance of quantum algorithms. Therefore, quantum gates are analogous to fundamental building blocks of quantum computing and can be thought of as a quantum version of the “logic gates” of classical computing. In contrast to logical gates, quantum gates can allow for more complex and nuanced operations. For instance, while classical gates apply deterministic transformations to their inputs, quantum gates introduce operations like entanglement and superposition, enhancing computational potential through non-classical behaviors.

Quantum gates can be utilized in quantum algorithms to orchestrate and perform complex computations using qubits. Understanding how a quantum algorithm operates offers an intriguing glimpse into the power of quantum computing. Initially, the input to a quantum computer typically consists of a massive superposition state, which means the system simultaneously represents multiple potential outcomes. Various quantum gates can then interact with all these potential states at once due to the property of quantum parallelism. This simultaneous operation is complemented by quantum interference, which adjusts the coefficients of these states, further shaping the computational process.

What’s Next?

Quantum computing technology is nearing a pivotal moment where it could transition from research laboratories to public use. Progress so far has been rapid but relatively small with regard to public consumption. A few companies and research institutions have developed incremental quantum processors and integrated them into cloud-based platforms that are accessible to developers worldwide. This accessibility allows for experimentation with quantum algorithms, laying the groundwork for future applications. Moreover, as these processors grow in qubit count and stability, and as error correction improves, we are approaching a threshold where quantum computing could start impacting areas such as cryptography, complex molecular modeling, and optimization problems.

While quantum computing shows tremendous potential, predicting its availability for mass use remains challenging. The field is still in its early stages, grappling with significant technical hurdles such as qubit coherence, error rates and correction, and scalable system design. The timeline for widespread commercial availability is uncertain because these foundational challenges must be overcome before it can be reliably and cost-effectively integrated into everyday technology. This uncertainty underscores the experimental and evolutionary nature of quantum computing technology as it seeks to transition from experimental setups to practical, mass-market applications.

In upcoming articles we will explore how quantum computing technologies will iteratively evolve and gradually pervade everyday life.

Beyond The Binary Series

Click here to view Foley’s multi-part Beyond The Binary series of articles describing various aspects of quantum computing technology, its principles, and the legal landscape surrounding its development and implementations.

To subscribe to the series, click here .

Whitney Johnson headshot.

R. Whitney Johnson

example of a hypothesis for a research proposal

Kamyar Maserrat

Senior Counsel

example of a hypothesis for a research proposal

Jihwang Yeo

Related insights, bipartisan proposal would not tax staking rewards until time of sale, cancer drugs: strategies for patenting antibody-drug conjugate inventions, ai 2030 show with nikhil pradhan.

IMAGES

  1. #1 Ultimate Guide On Writing A Hypothesis For Research Proposals

    example of a hypothesis for a research proposal

  2. 💌 Hypothesis in a research paper. How to write a hypothesis in a

    example of a hypothesis for a research proposal

  3. How to Write a Hypothesis: The Ultimate Guide with Examples

    example of a hypothesis for a research proposal

  4. 😎 Research proposal hypothesis example. Hypothesis example. 2019-01-20

    example of a hypothesis for a research proposal

  5. hypothesis in research different types

    example of a hypothesis for a research proposal

  6. How to Write a Strong Hypothesis in 6 Simple Steps

    example of a hypothesis for a research proposal

VIDEO

  1. Two-Sample Hypothesis Testing: Dependent Sample

  2. details discussion on Hypothesis/types,& characteristics /#mostimportanttopic/#hypothesis/#research

  3. Proportion Hypothesis Testing, example 2

  4. research problem hypothesis (research methodology part 4) #researchmethodology #biotechnology

  5. How to Write Research Proposal?

  6. Differences Between Hypothesis Formulation and Hypothesis Development

COMMENTS

  1. How to Write a Strong Hypothesis

    Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.

  2. What is a Research Hypothesis: How to Write it, Types, and Examples

    A research hypothesis is a statement that proposes a possible explanation for an observable phenomenon or pattern. It guides the direction of a study and predicts the outcome of the investigation. A research hypothesis is testable, i.e., it can be supported or disproven through experimentation or observation. Characteristics of a good hypothesis

  3. Research Hypothesis: Definition, Types, Examples and Quick Tips

    3. Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, "Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking. 4.

  4. How to Write a Strong Hypothesis

    Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  5. PDF RESEARCH HYPOTHESIS

    your hypothesis, before proceeding with any work on the topic. Your will be expressing your hypothesis in 3 ways: • As a one-sentence hypothesis • As a research question • As a title for your paper Your hypothesis will become part of your research proposal. Sample Student Hypotheses 2008-2009 Senior Seminar

  6. How to Write a Research Hypothesis: Good & Bad Examples

    Another example for a directional one-tailed alternative hypothesis would be that. H1: Attending private classes before important exams has a positive effect on performance. Your null hypothesis would then be that. H0: Attending private classes before important exams has no/a negative effect on performance.

  7. How to Write a Research Hypothesis

    Research hypothesis checklist. Once you've written a possible hypothesis, make sure it checks the following boxes: It must be testable: You need a means to prove your hypothesis. If you can't test it, it's not a hypothesis. It must include a dependent and independent variable: At least one independent variable ( cause) and one dependent ...

  8. What Is A Research Hypothesis? A Simple Definition

    A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.

  9. Hypothesis: Definition, Examples, and Types

    A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process. Consider a study designed to examine the relationship between sleep deprivation and test ...

  10. How to Write a Research Proposal

    Research proposal examples. Writing a research proposal can be quite challenging, but a good starting point could be to look at some examples. We've included a few for you below. Example research proposal #1: 'A Conceptual Framework for Scheduling Constraint Management'.

  11. What is and How to Write a Good Hypothesis in Research?

    An effective hypothesis in research is clearly and concisely written, and any terms or definitions clarified and defined. Specific language must also be used to avoid any generalities or assumptions. Use the following points as a checklist to evaluate the effectiveness of your research hypothesis: Predicts the relationship and outcome.

  12. What is a Research Hypothesis and How to Write a Hypothesis

    The steps to write a research hypothesis are: 1. Stating the problem: Ensure that the hypothesis defines the research problem. 2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a 'if-then' structure. 3.

  13. How To Write A Research Proposal (With Examples)

    Make sure you can ask the critical what, who, and how questions of your research before you put pen to paper. Your research proposal should include (at least) 5 essential components : Title - provides the first taste of your research, in broad terms. Introduction - explains what you'll be researching in more detail.

  14. What Is A Research Proposal? Examples + Template

    The purpose of the research proposal (its job, so to speak) is to convince your research supervisor, committee or university that your research is suitable (for the requirements of the degree program) and manageable (given the time and resource constraints you will face). The most important word here is "convince" - in other words, your ...

  15. How To Write a Proposal

    This is where you will want to work with your mentor to craft the experimental portion of your proposal. Propose two original specific aims to test your hypothesis. Don't propose more than two aims-you will NOT have enough time to do more. In the example presented, Specific Aim 1 might be "To determine the oncogenic potential of Brca1 null ...

  16. How to prepare a Research Proposal

    Sample size: The proposal should provide information and justification (basis on which the sample size is calculated) about sample size in the methodology section. 3 A larger sample size than needed to test the research hypothesis increases the cost and duration of the study and will be unethical if it exposes human subjects to any potential unnecessary risk without additional benefit.

  17. PDF How to Write a Good Postgraduate RESEARCH PROPOSAL

    Suggested structure for a research proposal: • Title and abstract • Background information/brief summary of existing literature • The hypothesis and the objectives • Methodology • How the research will be communicated to the wider community • The supervisory provision as well as specialist and transferable skills training

  18. PDF Research Proposal Format Example

    1. Research Proposal Format Example. Following is a general outline of the material that should be included in your project proposal. I. Title Page II. Introduction and Literature Review (Chapters 2 and 3) A. Identification of specific problem area (e.g., what is it, why it is important). B. Prevalence, scope of problem.

  19. How to write a research proposal?

    A proposal needs to show how your work fits into what is already known about the topic and what new paradigm will it add to the literature, while specifying the question that the research will answer, establishing its significance, and the implications of the answer. [ 2] The proposal must be capable of convincing the evaluation committee about ...

  20. How To Write A Research Proposal

    Here is an explanation of each step: 1. Title and Abstract. Choose a concise and descriptive title that reflects the essence of your research. Write an abstract summarizing your research question, objectives, methodology, and expected outcomes. It should provide a brief overview of your proposal. 2.

  21. 17 Research Proposal Examples (2024)

    17 Research Proposal Examples. By Chris Drew (PhD) / January 12, 2024. A research proposal systematically and transparently outlines a proposed research project. The purpose of a research proposal is to demonstrate a project's viability and the researcher's preparedness to conduct an academic study. It serves as a roadmap for the researcher.

  22. How to Write a Project Proposal (Examples & Templates)

    Step 4: Define the Project Deliverables. Defining your project deliverables is a crucial step during the project proposal process. Stakeholders want to know just what it is you're going to be delivering to them at the end of the project. This could be a product, a program, an upgrade in technology or something similar.

  23. Market Research Proposal

    Download. State the objectives, scope of work, research methodology, target market, and other such important information of your market research by downloading and using this above-shown research proposal example template. This ready-made template's content can be edited and customized in various file formats such as MS Word, Pages, Gooogle ...

  24. Assessing the evolution of research topics in a biological field using

    Our ability to understand the progress of science through the evolution of research topics is limited by the need for specialist knowledge and the exponential growth of the literature. This study uses artificial intelligence and machine learning approaches to demonstrate how a biological field (plant science) has evolved, how the model systems have changed, and how countries differ in terms of ...

  25. Bad Blood? The Uncertainty Around Microclots and Long Covid

    The microclots hypothesis "is an important area for research," he said. A few committed clinicians and researchers "are trying their very best to get this research out there, to get the studies done properly," he added. But "there's no will on the part of responsible government bodies to go down that route."

  26. Question: How do you write a hypothesis For a research proposal

    How do you write a hypothesis For a research proposal; Your solution's ready to go! Enhanced with AI, our expert help has broken down your problem into an easy-to-learn solution you can count on. Get my answer Get my answer Get my answer done loading.

  27. The Basics: How Quantum Computers Work and Where the Technology is

    Example. To illustrate how entanglement can improve the computing power, consider the following example: In a classical computer, doubling the number of bits can merely double the processing power. That is, the computing power grows linearly in relation to the number of bits. In quantum computing, however, this relationship is exponential.