U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Turk Arch Otorhinolaryngol
  • v.58(1); 2020 Mar

Logo of turcharchoto

Power Analysis and Sample Size, When and Why?

Clinical or experimental study results need to be processed precisely to lead to development and advances in medicine. At this stage, biostatistics plays an important role in collecting healthy data, making unbiased comparisons and interpreting the findings correctly. In order to interpret the findings correctly and to adapt this to the diagnosis or treatment of patients, it is very important to conduct power analysis in scientific research. By determining the number of samples to be included in the study by power analysis, it can be demonstrated that the results obtained are really significant or not ( 1 , 2 ).

Rosenfeld and Rockette ( 1 ) showed in their study that only 1% of 541 original research articles published in four prestigious otolaryngology journals of 1989 studied sample size or power analysis.

Today, the first step of a clinical or experimental study is design. Before beginning the study, one should determine the study population, than find a sample which is considered to represent the population, and it is clear that, the most important part of a study design is the sample size ( 2 ). A small sample size might lead to failure of the study and statistical analysis will be ineffective; on the other hand, a big sample size might lead statistically significant results with unnecessary numbers of subjects and cost ( 2 , 3 ). Also, including more participants than needed is an ethical problem ( 2 ). For both statistical adequacy and unnecessary cost avoidance we have to find the exact number of patients, subjects or laboratory animals ( 3 , 4 ).

The power analysis is performed by some specific tests and they aim to find the exact number of population for a clinical or experimental study ( 5 ).

In fact, there are two situations while testing the hypothesis in a clinical trial. These are null hypothesis (H 0 ) and alternative hypothesis (H 1 ). Null hypothesis always argues that there is no difference between groups. The opposite of this is called as alternative hypothesis. Other than the hypothesis types, there are two types of error in biostatistics. Type I error means that incorrectly rejection of the hypothesis where the hypotesis is true. Type II error is an error that we accept the hypothesis, when the hypothesis is false, but we incorrectly do not have the ability to reject it ( Table 1 ) ( 6 ).

Type I, Type II errors and their relationship

Type I Error value is predetermined by the researchers and usually set at 0.05 or 0.01. If authors define type I error as 0.05 and if the result is found as no difference, that is 95% true ( 1 ). Type II error is defined as the power of the study. It is usually set at 0.20, sometimes 0.10. If it is set to 0.20, the power of the study is 80%. In other words, the probability of not detecting the difference between two groups is considered as 20% ( 5 – 7 ).

The other two parameters which affect the sample size are minimal clinically relevant difference and variance. The minimal clinically relevant difference is the smallest difference outcome between the study groups. It can be called as minimal scientific outcome that is significant for the investigator. So, this difference should be determined by the author. For example, if you are working on a treatment for sudden hearing loss. The level for the outcome should be determined as 20 or 30 dB by the authors.

The last important parameter is the variance of the outcome. This outcome is usually obtained from the clinical knowledge or previous data ( 6 ).

After determining these parameters, calculations can be done easily by using different software by a biostatistician.

As a conclusion, at the beginning of a clinical or experimental study, the researcher should determine the type I and type II error values, minimal clinically relevant difference and the variance of their own study. Than by using these parameters, biostatisticians can be able to help us find the most appropriate number of samples that will obtain effective and qualified scientific values.

importance of power analysis in research

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.5 power analysis, why is power analysis important section  .

Consider a research experiment where the p -value computed from the data was 0.12. As a result, one would fail to reject the null hypothesis because this p -value is larger than \(\alpha\) = 0.05. However, there still exist two possible cases for which we failed to reject the null hypothesis:

  • the null hypothesis is a reasonable conclusion,
  • the sample size is not large enough to either accept or reject the null hypothesis, i.e., additional samples might provide additional evidence.

Power analysis is the procedure that researchers can use to determine if the test contains enough power to make a reasonable conclusion. From another perspective power analysis can also be used to calculate the number of samples required to achieve a specified level of power.

Example S.5.1

Let's take a look at an example that illustrates how to compute the power of the test.

Let X denote the height of randomly selected Penn State students. Assume that X is normally distributed with unknown mean \(\mu\) and a standard deviation of 9. Take a random sample of n = 25 students, so that, after setting the probability of committing a Type I error at \(\alpha = 0.05\), we can test the null hypothesis \(H_0: \mu = 170\) against the alternative hypothesis that \(H_A: \mu > 170\).

What is the power of the hypothesis test if the true population mean were \(\mu = 175\)?

\[\begin{align}z&=\frac{\bar{x}-\mu}{\sigma / \sqrt{n}} \\ \bar{x}&= \mu + z \left(\frac{\sigma}{\sqrt{n}}\right) \\ \bar{x}&=170+1.645\left(\frac{9}{\sqrt{25}}\right) \\ &=172.961\\ \end{align}\]

So we should reject the null hypothesis when the observed sample mean is 172.961 or greater:

\[\begin{align}\text{Power}&=P(\bar{x} \ge 172.961 \text{ when } \mu =175)\\ &=P\left(z \ge \frac{172.961-175}{9/\sqrt{25}} \right)\\ &=P(z \ge -1.133)\\ &= 0.8713\\ \end{align}\]

and illustrated below:

Two overlapping normal distributions with means of 170 and 175. The power of 0.871 is show on the right curve.

In summary, we have determined that we have an 87.13% chance of rejecting the null hypothesis \(H_0: \mu = 170\) in favor of the alternative hypothesis \(H_A: \mu > 170\) if the true unknown population mean is, in reality, \(\mu = 175\).

Calculating Sample Size Section  

If the sample size is fixed, then decreasing Type I error \(\alpha\) will increase Type II error \(\beta\). If one wants both to decrease, then one has to increase the sample size.

To calculate the smallest sample size needed for specified \(\alpha\), \(\beta\), \(\mu_a\), then (\(\mu_a\) is the likely value of \(\mu\) at which you want to evaluate the power.

Let's investigate by returning to our previous example.

Example S.5.2

Let X denote the height of randomly selected Penn State students. Assume that X is normally distributed with unknown mean \(\mu\) and standard deviation 9. We are interested in testing at \(\alpha = 0.05\) level , the null hypothesis \(H_0: \mu = 170\) against the alternative hypothesis that \(H_A: \mu > 170\).

Find the sample size n that is necessary to achieve 0.90 power at the alternative μ = 175.

\[\begin{align}n&= \dfrac{\sigma^2(Z_{\alpha}+Z_{\beta})^2}{(\mu_0−\mu_a)^2}\\ &=\dfrac{9^2 (1.645 + 1.28)^2}{(170-175)^2}\\ &=27.72\\ n&=28\\ \end{align}\]

In summary, you should see how power analysis is very important so that we are able to make the correct decision when the data indicate that one cannot reject the null hypothesis. You should also see how power analysis can also be used to calculate the minimum sample size required to detect a difference that meets the needs of your research.

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

Introduction to Power Analysis

This seminar treats power and the various factors that affect power on both a conceptual and a mechanical level. While we will not cover the formulas needed to actually run a power analysis, later on we will discuss some of the software packages that can be used to conduct power analyses.

OK, let’s start off with a basic definition of what a power is.  Power is the probability of detecting an effect, given that the effect is really there.  In other words, it is the probability of rejecting the null hypothesis when it is in fact false.  For example, let’s say that we have a simple study with drug A and a placebo group, and that the drug truly is effective; the power is the probability of finding a difference between the two groups.  So, imagine that we had a power of .8 and that this simple study was conducted many times.  Having power of .8 means that 80% of the time, we would get a statistically significant difference between the drug A and placebo groups.  This also means that 20% of the times that we run this experiment, we will not obtain a statistically significant effect between the two groups, even though there really is an effect in reality.

There are several of reasons why one might do a power analysis.  Perhaps the most common use is to determine the necessary number of subjects needed to detect an effect of a given size.  Note that trying to find the absolute, bare minimum number of subjects needed in the study is often not a good idea.  Additionally, power analysis can be used to determine power, given an effect size and the number of subjects available.  You might do this when you know, for example, that only 75 subjects are available (or that you only have the budget for 75 subjects), and you want to know if you will have enough power to justify actually doing the study.  In most cases, there is really no point to conducting a study that is seriously underpowered.  Besides the issue of the number of necessary subjects, there are other good reasons for doing a power analysis.  For example, a power analysis is often required as part of a grant proposal.  And finally, doing a power analysis is often just part of doing good research.  A power analysis is a good way of making sure that you have thought through every aspect of the study and the statistical analysis before you start collecting data.

Despite these advantages of power analyses, there are some limitations.  One limitation is that power analyses do not typically generalize very well.  If you change the methodology used to collect the data or change the statistical procedure used to analyze the data, you will most likely have to redo the power analysis.  In some cases, a power analysis might suggest a number of subjects that is inadequate for the statistical procedure.  For example, a power analysis might suggest that you need 30 subjects for your logistic regression, but logistic regression, like all maximum likelihood procedures, require much larger sample sizes.  Perhaps the most important limitation is that a standard power analysis gives you a “best case scenario” estimate of the necessary number of subjects needed to detect the effect.  In most cases, this “best case scenario” is based on assumptions and educated guesses.  If any of these assumptions or guesses are incorrect, you may have less power than you need to detect the effect.  Finally, because power analyses are based on assumptions and educated guesses, you often get a range of the number of subjects needed, not a precise number.  For example, if you do not know what the standard deviation of your outcome measure will be, you guess at this value, run the power analysis and get X number of subjects.  Then you guess a slightly larger value, rerun the power analysis and get a slightly larger number of necessary subjects.  You repeat this process over the plausible range of values of the standard deviation, which gives you a range of the number of subjects that you will need.

After all of this discussion of power analyses and the necessary number of subjects, we need to stress that power is not the only consideration when determining the necessary sample size.  For example, different researchers might have different reasons for conducting a regression analysis.  One might want to see if the regression coefficient is different from zero, while the other wants to get a very precise estimate of the regression coefficient with a very small confidence interval around it.  This second purpose requires a larger sample size than does merely seeing if the regression coefficient is different from zero.  Another consideration when determining the necessary sample size is the assumptions of the statistical procedure that is going to be used.  The number of statistical tests that you intend to conduct will also influence your necessary sample size:  the more tests that you want to run, the more subjects that you will need.  You will also want to consider the representativeness of the sample, which, of course, influences the generalizability of the results.  Unless you have a really sophisticated sampling plan, the greater the desired generalizability, the larger the necessary sample size.  Finally, please note that most of what is in this presentation does not readily apply to people who are developing a sampling plan for a survey or psychometric analyses.

Definitions

Before we move on, let’s make sure we are all using the same definitions.  We have already defined power as the probability of detecting a “true” effect, when the effect exists.  Most recommendations for power fall between .8 and .9.  We have also been using the term “effect size”, and while intuitively it is an easy concept, there are lots of definitions and lots of formulas for calculating effect sizes.  For example, the current APA manual has a list of more than 15 effect sizes, and there are more than a few books mostly dedicated to the calculation of effect sizes in various situations.  For now, let’s stick with one of the simplest definitions, which is that an effect size is the difference of two group means divided by the pooled standard deviation.  Going back to our previous example, suppose the mean of the outcome variable for the drug A group was 10 and it was 5 for the placebo group.  If the pooled standard deviation was 2.5, we would have and effect size which is equal to (10-5)/2.5 = 2 (which is a large effect size).

We also need to think about “statistically significance” versus “clinically relevant”.  This issue comes up often when considering effect sizes. For example, for a given number of subjects, you might only need a small effect size to have a power of .9.  But that effect size might correspond to a difference between the drug and placebo groups that isn’t clinically meaningful, say reducing blood pressure by two points.  So even though you would have enough power, it still might not be worth doing the study, because the results would not be useful for clinicians.

There are a few other definitions that we will need later in this seminar.  A Type I error occurs when the null hypothesis is true (in other words, there really is no effect), but you reject the null hypothesis.  A Type II error occurs when the alternative hypothesis is correct, but you fail to reject the null hypothesis (in other words, there really is an effect, but you failed to detect it).  Alpha inflation refers to the increase in the nominal alpha level when the number of statistical tests conducted on a given data set is increased.

When discussing statistical power, we have four inter-related concepts: power, effect size, sample size and alpha.  These four things are related such that each is a function of the other three.  In other words, if three of these values are fixed, the fourth is completely determined (Cohen, 1988, page 14).  We mention this because, by increasing one, you can decrease (or increase) another.  For example, if you can increase your effect size, you will need fewer subjects, given the same power and alpha level.  Specifically, increasing the effect size, the sample size and/or alpha will increase your power.

While we are thinking about these related concepts and the effect of increasing things, let’s take a quick look at a standard power graph.  (This graph was made in SPSS Sample Power, and for this example, we’ve used .61 and 4 for our two proportion positive values.)

We like these kinds of graphs because they make clear the diminishing returns you get for adding more and more subjects.  For example, let’s say that we have only 10 subjects per group.  We can see that we have a power of about .15, which is really, really low.  We add 50 subjects per group, now we have a power of about .6, an increase of .45.  However, if we started with 100 subjects per group (power of about .8) and added 50 per group, we would have a power of .95, an increase of only .15.  So each additional subject gives you less additional power.  This curve also illustrates the “cost” of increasing your desired power from .8 to .9.

Knowing your research project

As we mentioned before, one of the big benefits of doing a power analysis is making sure that you have thought through every detail of your research project.

Now most researchers have thought through most, if not all, of the substantive issues involved in their research.  While this is absolutely necessary, it often is not sufficient.  Researchers also need to carefully consider all aspects of the experimental design, the variables involved, and the statistical analysis technique that will be used.  As you will see in the next sections of this presentation, a power analysis is the union of substantive knowledge (i.e., knowledge about the subject matter), experimental or quasi-experimental design issues, and statistical analysis.  Almost every aspect of the experimental design can affect power.  For example, the type of control group that is used or the number of time points that are collected will affect how much power you have.  So knowing about these issues and carefully considering your options is important.  There are plenty of excellent books that cover these issues in detail, including Shadish, Cook and Campbell (2002); Cook and Campbell (1979); Campbell and Stanley (1963); Brickman (2000a, 2000b); Campbell and Russo (2001); Webb, Campbell, Schwartz and Sechrest (2000); and Anderson (2001).

Also, you want to know as much as possible about the statistical technique that you are going to use.  If you learn that you need to use a binary logistic regression because your outcome variable is 0/1, don’t stop there; rather, get a sample data set (there are plenty of sample data sets on our web site) and try it out.  You may discover that the statistical package that you use doesn’t do the type of analysis that need to do.  For example, if you are an SPSS user and you need to do a weighted multilevel logistic regression, you will quickly discover that SPSS doesn’t do that (as of version 25), and you will have to find (and probably learn) another statistical package that will do that analysis.  Maybe you want to learn another statistical package, or maybe that is beyond what you want to do for this project.  If you are writing a grant proposal, maybe you will want to include funds for purchasing the new software.  You will also want to learn what the assumptions are and what the “quirks” are with this particular type of analysis.  Remember that the number of necessary subjects given to you by a power analysis assumes that all of the assumptions of the analysis have been met, so knowing what those assumptions are is important deciding if they are likely to be met or not.

The point of this section is to make clear that knowing your research project involves many things, and you may find that you need to do some research about experimental design or statistical techniques before you do your power analysis.

We want to emphasize that this is time and effort well spent.  We also want to remind you that for almost all researchers, this is a normal part of doing good research.  UCLA researchers are welcome and encouraged to come by walk-in consulting at this stage of the research process to discuss issues and ideas, check out books and try out software.

What you need to know to do a power analysis

In the previous section, we discussed in general terms what you need to know to do a power analysis.  In this section we will discuss some of the actual quantities that you need to know to do a power analysis for some simple statistics.  Although we understand very few researchers test their main hypothesis with a t-test or a chi-square test, our point here is only to give you a flavor of the types of things that you will need to know (or guess at) in order to be ready for a power analysis.

– For an independent samples t-test, you will need to know the population means of the two groups (or the difference between the means), and the population standard deviations of the two groups.  So, using our example of drug A and placebo, we would need to know the difference in the means of the two groups, as well as the standard deviation for each group (because the group means and standard deviations are the best estimate that we have of those population values).  Clearly, if we knew all of this, we wouldn’t need to conduct the study.  In reality, researchers make educated guesses at these values.  We always recommend that you use several different values, such as decreasing the difference in the means and increasing the standard deviations, so that you get a range of values for the number of necessary subjects.

In SPSS Sample Power, we would have a screen that looks like the one below, and we would fill in the necessary values.  As we can see, we would need a total of 70 subjects (35 per group) to have a power of .91 if we had a mean of 5 and a standard deviation of 2.5 in the drug A group, and a mean of 3 and a standard deviation of 2.5 in the placebo group.  If we decreased the difference in the means and increased the standard deviations such that for the drug A group, we had a mean of 4.5 and a standard deviation of 3, and for the placebo group a mean of 3.5 and a standard deviation of 3, we would need 190 subjects per group, or a total of 380 subjects, to have a power of .90.  In other words, seemingly small differences in means and standard deviations can have a huge effect on the number of subjects required.

Image t-test

– For a correlation, you need to know/guess at the correlation in the population.  This is a good time to remember back to an early stats class where they emphasized that correlation is a large N procedure (Chen and Popovich, 2002).  If you guess that the population correlation is .6, a power analysis would suggest (with an alpha of .05 and for a power of .8) that you would need only 16 subjects.  There are several points to be made here.  First, common sense suggests that N = 16 is pretty low.  Second, a population correlation of .6 is pretty high, especially in the social sciences.  Third, the power analysis assumes that all of the assumptions of the correlation have been met.  For example, we are assuming that there is no restriction of range issue, which is common with Likert scales; the sample data for both variables are normally distributed; the relationship between the two variables is linear; and there are no serious outliers.  Also, whereas you might be able to say that the sample correlation does not equal zero, you likely will not have a very precise estimate of the population correlation coefficient.

Image corr

– For a chi-square test, you will need to know the proportion positive for both populations (i.e., rows and columns).  Let’s assume that we will have a 2 x 2 chi-square, and let’s think of both variables as 0/1.  Let’s say that we wanted to know if there was a relationship between drug group (drug A/placebo) and improved health.  In SPSS Sample Power, you would see a screen like this.

Image chi-square

In order to get the .60 and the .30, we would need to know (or guess at) the number of people whose health improved in both the drug A and placebo groups.

We would also need to know (or guess at) either the number of people whose health did not improve in those two groups, or the total number of people in each group.

– For an ordinary least squares regression, you would need to know things like the R 2 for the full and reduced model.  For a simple logistic regression analysis with only one continuous predictor variable, you would need to know the probability of a positive outcome (i.e., the probability that the outcome equals 1) at the mean of the predictor variable and the probability of a positive outcome at one standard deviation above the mean of the predictor variable.  Especially for the various types of logistic models (e.g., binary, ordinal and multinomial), you will need to think very carefully about your sample size, and information from a power analysis will only be part of your considerations.  For example, according to Long (1997, pages 53-54), 100 is a minimum sample size for logistic regression, and you want *at least* 10 observations per predictor.  This does not mean that if you have only one predictor you need only 10 observations.

Also, if you have categorical predictors, you may need to have more observations to avoid computational difficulties caused by empty cells or cells with few observations.  More observations are needed when the outcome variable is very lopsided; in other words, when there are very few 1s and lots of 0s, or vice versa.  These cautions emphasize the need to know your data set well, so that you know if your outcome variable is lopsided or if you are likely to have a problem with empty cells.

The point of this section is to give you a sense of the level of detail about your variables that you need to be able to estimate in order to do a power analysis. Also, when doing power analyses for regression models, power programs will start to ask for values that most researchers are not accustomed to providing.  Guessing at the mean and standard deviation of your response variable is one thing, but increments to R 2 is a metric in which few researchers are used to thinking.  In our next section we will discuss how you can guestimate these numbers.

Obtaining the necessary numbers to do a power analysis

There are at least three ways to guestimate the values that are needed to do a power analysis: a literature review, a pilot study and using Cohen’s recommendations.  We will review the pros and cons of each of these methods.  For this discussion, we will focus on finding the effect size, as that is often the most difficult number to obtain and often has the strongest impact on power.

Literature review: Sometimes you can find one or more published studies that are similar enough to yours that you can get a idea of the effect size.  If you can find several such studies, you might be able to use meta-analysis techniques to get a robust estimate of the effect size.  However, oftentimes there are no studies similar enough to your study to get a good estimate of the effect size.  Even if you can find such an study, the necessary effect sizes or other values are often not clearly stated in the article and need to be calculated (if they can) based on the information provided.

Pilot studies:  There are lots of good reasons to do a pilot study prior to conducting the actual study.  From a power analysis prospective, a pilot study can give you a rough estimate of the effect size, as well as a rough estimate of the variability in your measures.  You can also get some idea about where missing data might occur, and as we will discuss later, how you handle missing data can greatly affect your power.  Other benefits of a pilot study include allowing you to identify coding problems, setting up the data base, and inputting the data for a practice analysis.  This will allow you to determine if the data are input in the correct shape, etc.

Of course, there are some limitations to the information that you can get from a pilot study.  (Many of these limitations apply to small samples in general.)  First of all, when estimating effect sizes based on nonsignificant results, the effect size estimate will necessarily have an increased error; in other words, the standard error of the effect size estimate will be larger than when the result is significant. The effect size estimate that you obtain may be unduly influenced by some peculiarity of the small sample.  Also, you often cannot get a good idea of the degree of missingness and attrition that will be seen in the real study.  Despite these limitations, we strongly encourage researchers to conduct a pilot study.  The opportunity to identify and correct “bugs” before collecting the real data is often invaluable.  Also, because of the number of values that need to be guestimated in a power analysis, the precision of any one of these values is not that important.  If you can estimate the effect size to within 10% or 20% of the true value, that is probably sufficient for you to conduct a meaningful power analysis, and such fluctuations can be taken into account during the power analysis.

Cohen’s recommendations:  Jacob Cohen has many well-known publications regarding issues of power and power analyses, including some recommendations about effect sizes that you can use when doing your power analysis.  Many researchers (including Cohen) consider the use of such recommendations as a last resort, when a thorough literature review has failed to reveal any useful numbers and a pilot study is either not possible or not feasible.  From Cohen (1988, pages 24-27):

– Small effect:  1% of the variance; d = 0.25 (too small to detect other than statistically; lower limit of what is clinically relevant)

– Medium effect:  6% of the variance; d = 0.5 (apparent with careful observation)

– Large effect: at least 15% of the variance; d = 0.8 (apparent with a superficial glance; unlikely to be the focus of research because it is too obvious)

Lipsey and Wilson (1993) did a meta analysis of 302 meta analyses of over 10,000 studies and found that the average effect size was .5, adding support to Cohen’s recommendation that, as a last resort, guess that the effect size is .5 (cited in Bausell and Li, 2002).  Sedlmeier and Gigerenzer (1989) found that the average effect size for articles in The Journal of Abnormal Psychology was a medium effect.  According to Keppel and Wickens (2004), when you really have no idea what the effect size is, go with the smallest effect size of practical value.  In other words, you need to know how small of a difference is meaningful to you.  Keep in mind that research suggests that most researchers are overly optimistic about the effect sizes in their research, and that most research studies are under powered (Keppel and Wickens, 2004; Tversky and Kahneman, 1971).  This is part of the reason why we stress that a power analysis gives you a lower limit to the number of necessary subjects.

Factors that affect power

From the preceding discussion, you might be starting to think that the number of subjects and the effect size are the most important factors, or even the only factors, that affect power.  Although effect size is often the largest contributor to power, saying it is the only important issue is far from the truth.  There are at least a dozen other factors that can influence the power of a study, and many of these factors should be considered not only from the perspective of doing a power analysis, but also as part of doing good research.  The first couple of factors that we will discuss are more “mechanical” ways of increasing power (e.g., alpha level, sample size and effect size). After that, the discussion will turn to more methodological issues that affect power.

1.  Alpha level:  One obvious way to increase your power is to increase your alpha (from .05 to say, .1).  Whereas this might be an advisable strategy when doing a pilot study, increasing your alpha usually is not a viable option.  We should point out here that many researchers are starting to prefer to use .01 as an alpha level instead of .05 as a crude attempt to assure results are clinically relevant; this alpha reduction reduces power.

1a.  One- versus two-tailed tests:  In some cases, you can test your hypothesis with a one-tailed test.  For example, if your hypothesis was that drug A is better than the placebo, then you could use a one-tailed test.  However, you would fail to detect a difference, even if it was a large difference, if the placebo was better than drug A.  The advantage of one-tailed tests is that they put all of your power “on one side” to test your hypothesis.  The disadvantage is that you cannot detect differences that are in the opposite direction of your hypothesis.  Moreover, many grant and journal reviewers frown on the use of one-tailed tests, believing it is a way to feign significance (Stratton and Neil, 2004).

2.  Sample size:  A second obvious way to increase power is simply collect data on more subjects.  In some situations, though, the subjects are difficult to get or extremely costly to run.  For example, you may have access to only 20 autistic children or only have enough funding to interview 30 cancer survivors.  If possible, you might try increasing the number of subjects in groups that do not have these restrictions, for example, if you are comparing to a group of normal controls.  While it is true that, in general, it is often desirable to have roughly the same number of subjects in each group, this is not absolutely necessary.  However, you get diminishing returns for additional subjects in the control group:  adding an extra 100 subjects to the control group might not be much more helpful than adding 10 extra subjects to the control group.

3.  Effect size:  Another obvious way to increase your power is to increase the effect size.  Of course, this is often easier said than done. A common way of increasing the effect size is to increase the experimental manipulation.  Going back to our example of drug A and placebo, increasing the experimental manipulation might mean increasing the dose of the drug. While this might be a realistic option more often than increasing your alpha level, there are still plenty of times when you cannot do this.  Perhaps the human subjects committee will not allow it, it does not make sense clinically, or it doesn’t allow you to generalize your results the way you want to.  Many of the other issues discussed below indirectly increase effect size by providing a stronger research design or a more powerful statistical analysis.

4.  Experimental task:  Well, maybe you can not increase the experimental manipulation, but perhaps you can change the experimental task, if there is one.  If a variety of tasks have been used in your research area, consider which of these tasks provides the most power (compared to other important issues, such as relevancy, participant discomfort, and the like).  However, if various tasks have not been reviewed in your field, designing a more sensitive task might be beyond the scope of your research project.

5.  Response variable:  How you measure your response variable(s) is just as important as what task you have the subject perform.  When thinking about power, you want to use a measure that is as high in sensitivity and low in measurement error as is possible.  Researchers in the social sciences often have a variety of measures from which they can choose, while researchers in other fields may not.  For example, there are numerous established measures of anxiety, IQ, attitudes, etc.  Even if there are not established measures, you still have some choice.  Do you want to use a Likert scale, and if so, how many points should it have?  Modifications to procedures can also help reduce measurement error.  For example, you want to make sure that each subject knows exactly what he or she is supposed to be rating.  Oral instructions need to be clear, and items on questionnaires need to be unambiguous to all respondents.  When possible, use direct instead of indirect measures.  For example, asking people what tax bracket they are in is a more direct way of determining their annual income than asking them about the square footage of their house.  Again, this point may be more applicable to those in the social sciences than those in other areas of research.  We should also note that minimizing the measurement error in your predictor variables will also help increase your power.

Just as an aside, most texts on experimental design strongly suggest collecting more than one measure of the response in which you are interested. While this is very good methodologically and provides marked benefits for certain analyses and missing data, it does complicate the power analysis.

6.  Experimental design:  Another thing to consider is that some types of experimental designs are more powerful than others.  For example, repeated measures designs are virtually always more powerful than designs in which you only get measurements at one time.  If you are already using a repeated measures design, increasing the number of time points a response variable is collected to at least four or five will also provide increased power over fewer data collections.  There is a point of diminishing return when a researcher collects too many time points, though this depends on many factors such as the response variable, statistical design, age of participants, etc.

7.  Groups:  Another point to consider is the number and types of groups that you are using.  Reducing the number of experimental conditions will reduce the number of subjects that is needed, or you can keep the same number of subjects and just have more per group.  When thinking about which groups to exclude from the design, you might want to leave out those in the middle and keep the groups with the more extreme manipulations.  Going back to our drug A example, let’s say that we were originally thinking about having a total of four groups: the first group will be our placebo group, the second group would get a small dose of drug A, the third group a medium dose, and the fourth group a large dose.  Clearly, much more power is needed to detect an effect between the medium and large dose groups than to detect an effect between the large dose group and the placebo group.  If we found that we were unable to increase the power enough such that we were likely to find an effect between small and medium dose groups or between the medium and the large dose groups, then it would probably make more sense to run the study without these groups.  In some cases, you may even be able to change your comparison group to something more extreme.  For example, we once had a client who was designing a study to compare people with clinical levels of anxiety to a group that had subclinical levels of anxiety.  However, while doing the power analysis and realizing how many subjects she would need to detect the effect, she found that she needed far fewer subjects if she compared the group with the clinical levels of anxiety to a group of “normal” people (a number of subjects she could reasonably obtain).

8.  Statistical procedure:  Changing the type of statistical analysis may also help increase power, especially when some of the assumptions of the test are violated.  For example, as Maxwell and Delaney (2004) noted, “Even when ANOVA is robust, it may not provide the most powerful test available when its assumptions have been violated.”  In particular, violations of assumptions regarding independence, normality and heterogeneity can reduce power. In such cases, nonparametric alternatives may be more powerful.

9.  Statistical model:  You can also modify the statistical model.  For example, interactions often require more power than main effects.  Hence, you might find that you have reasonable power for a main effects model, but not enough power when the model includes interactions.  Many (perhaps most?) power analysis programs do not have an option to include interaction terms when describing the proposed analysis, so you need to keep this in mind when using these programs to help you determine how many subjects will be needed.  When thinking about the statistical model, you might want to consider using covariates or blocking variables.  Ideally, both covariates and blocking variables reduce the variability in the response variable.  However, it can be challenging to find such variables.  Moreover, your statistical model should use as many of the response variable time points as possible when examining longitudinal data.  Using a change-score analysis when one has collected five time points makes little sense and ignores the added power from these additional time points.  The more the statistical model “knows” about how a person changes over time, the more variance that can be pulled out of the error term and ascribed to an effect.

9a. Correlation between time points:  Understanding the expected correlation between a response variable measured at one time in your study with the same response variable measured at another time can provide important and power-saving information.  As noted previously, when the statistical model has a certain amount of information regarding the manner by which people change over time, it can enhance the effect size estimate.  This is largely dependent on the correlation of the response measure over time.  For example, in a before-after data collection scenario, response variables with a .00 correlation from before the treatment to after the treatment would provide no extra benefit to the statistical model, as we can’t better understand a subject’s score by knowing how he or she changes over time.  Rarely, however, do variables have a .00 correlation on the same outcomes measured at different times.  It is important to know that outcome variables with larger correlations over time provide enhanced power when used in a complimentary statistical model.

10.  Modify response variable:  Besides modifying your statistical model, you might also try modifying your response variable.  Possible benefits of this strategy include reducing extreme scores and/or meeting the assumptions of the statistical procedure.  For example, some response variables might need to be log transformed.  However, you need to be careful here.  Transforming variables often makes the results more difficult to interpret, because now you are working in, say, a logarithm metric instead of the metric in which the variable was originally measured. Moreover, if you use a transformation that adjusts the model too much, you can loose more power than is necessary.  Categorizing continuous response variables (sometimes used as a way of handling extreme scores) can also be problematic, because logistic or ordinal logistic regression often requires many more subjects than does OLS regression.  It makes sense that categorizing a response variable will lead to a loss of power, as information is being “thrown away.”

11.  Purpose of the study:  Different researchers have different reasons for conducting research.  Some are trying to determine if a coefficient (such as a regression coefficient) is different from zero.  Others are trying to get a precise estimate of a coefficient.  Still others are replicating research that has already been done.  The purpose of the research can affect the necessary sample size.  Going back to our drug A and placebo study, let’s suppose our purpose is to test the difference in means to see if it equals zero.   In this case, we need a relatively small sample size.  If our purpose is to get a precise estimate of the means (i.e., minimizing the standard errors), then we will need a larger sample size.  If our purpose is to replicate previous research, then again we will need a relatively large sample size.  Tversky and Kahneman (1971) pointed out that we often need more subjects in a replication study than were in the original study.  They also noted that researchers are often too optimistic about how much power they really have.  They claim that researchers too readily assign “causal” reasons to explain differences between studies, instead of sampling error. They also mentioned that researchers tend to underestimate the impact of sampling and think that results will replicate more often than is the case.

12.  Missing data:  A final point that we would like to make here regards missing data.  Almost all researchers have issues with missing data.  When designing your study and selecting your measures, you want to do everything possible to minimize missing data.  Handling missing data via imputation methods can be very tricky and very time-consuming.  If the data set is small, the situation can be even more difficult.  In general, missing data reduces power; poor imputation methods can greatly reduce power.  If you have to impute, you want to have as few missing data points on as few variables as possible.  When designing the study, you might want to collect data specifically for use in an imputation model (which usually involves a different set of variables than the model used to test your hypothesis).  It is also important to note that the default technique for handling missing data by virtually every statistical program is to remove the entire case from an analysis (i.e., listwise deletion).  This process is undertaken even if the analysis involves 20 variables and a subject is missing only one datum of the 20.  Listwise deletion is one of the biggest contributors to loss of power, both because of the omnipresence of missing data and because of the omnipresence of this default setting in statistical programs (Graham et al., 2003).

This ends the section on the various factors that can influence power.  We know that was a lot, and we understand that much of this can be frustrating because there is very little that is “black and white”.  We hope that this section made clear the close relationship between the experimental design, the statistical analysis and power.

Cautions about small sample sizes and sampling variation

We want to take a moment here to mention some issues that frequently arise when using small samples.  (We aren’t going to put a lower limit on what we mean be “small sample size.”)  While there are situations in which a researcher can either only get or afford a small number of subjects, in most cases, the researcher has some choice in how many subjects to include.  Considerations of time and effort argue for running as few subjects as possible, but there are some difficulties associated with small sample sizes, and these may outweigh any gains from the saving of time, effort or both.  One obvious problem with small sample sizes is that they have low power.  This means that you need to have a large effect size to detect anything.  You will also have fewer options with respect to appropriate statistical procedures, as many common procedures, such as correlations, logistic regression and multilevel modeling, are not appropriate with small sample sizes.  It may also be more difficult to evaluate the assumptions of the statistical procedure that is used (especially assumptions like normality).  In most cases, the statistical model must be smaller when the data set is small. Interaction terms, which often test interesting hypotheses, are frequently the first casualties.  Generalizability of the results may also be comprised, and it can be difficult to argue that a small sample is representative of a large and varied population. Missing data are also more problematic; there are a reduced number of imputations methods available to you, and these are not considered to be desirable imputation methods (such as mean imputation).  Finally, with a small sample size, alpha inflation issues can be more difficult to address, and you are more likely to run as many tests as you have subjects.

While the issue of sampling variability is relevant to all research, it is especially relevant to studies with small sample sizes.  To quote Murphy and Myors (2004, page 59), “The lack of attention to power analysis (and the deplorable habit of placing too much weight on the results of small sample studies) are well documented in the literature, and there is no good excuse to ignore power in designing studies.”  In an early article entitled The Law of Small Numbers , Tversky and Kahneman (1971) stated that many researchers act like the Law of Large Numbers applies to small numbers.  People often believe that small samples are more representative of the population than they really are.

The last two points to be made here is that there is usually no point to conducting an underpowered study, and that underpowered studies can cause chaos in the literature because studies that are similar methodologically may report conflicting results.

We will briefly discuss some of the programs that you can use to assist you with your power analysis.  Most programs are fairly easy to use, but you still need to know effect sizes, means, standard deviations, etc.

Among the programs specifically designed for power analysis, we use SPSS Sample Power, PASS and GPower.  These programs have a friendly point-and-click interface and will do power analyses for things like correlations, OLS regression and logistic regression.  We have also started using Optimal Design for repeated measures, longitudinal and multilevel designs. We should note that Sample Power is a stand-alone program that is sold by SPSS; it is not part of SPSS Base or an add-on module.  PASS can be purchased directly from NCSS at http://www.ncss.com/index.htm . GPower (please see GPower for details) and Optimal Design (please see http://sitemaker.umich.edu/group-based/home for details) are free.

Several general use stat packages also have procedures for calculating power.  SAS has proc power , which has a lot of features and is pretty nice.  Stata has the sampsi command, as well as many user-written commands, including fpower , powerreg and aipe (written by our IDRE statistical consultants).  Statistica has an add-on module for power analysis.  There are also many programs online that are free.

For more advanced/complicated analyses, Mplus is a good choice.  It will allow you to do Monte Carlo simulations, and there are some examples at http://www.statmodel.com/power.shtml and http://www.statmodel.com/ugexcerpts.shtml .

Most of the programs that we have mentioned do roughly the same things, so when selecting a power analysis program, the real issue is your comfort; all of the programs require you to provide the same kind of information.

Multiplicity

This issue of multiplicity arises when a researcher has more than one outcome of interest in a given study.  While it is often good methodological practice to have more than one measure of the response variable of interest, additional response variables mean more statistical tests need to be conducted on the data set, and this leads to question of experimentwise alpha control. Returning to our example of drug A and placebo, if we have only one response variable, then only one t test is needed to test our hypothesis.  However, if we have three measures of our response variable, we would want to do three t tests, hoping that each would show results in the same direction.  The question is how to control the Type I error (AKA false alarm) rate.  Most researchers are familiar with Bonferroni correction, which calls for dividing the prespecified alpha level (usually .05) by the number of tests to be conducted.  In our example, we would have .05/3 = .0167.  Hence, .0167 would be our new critical alpha level, and statistics with a p-value greater than .0167 would be classified as not statistically significant.  It is well-known that the Bonferroni correction is very conservative; there are other ways of adjusting the alpha level.

Afterthoughts:  A post-hoc power analysis

In general, just say “No!” to post-hoc analyses.  There are many reasons, both mechanical and theoretical, why most researchers should not do post-hoc power analyses.  Excellent summaries can be found in Hoenig and Heisey (2001) The Abuse of Power:  The Pervasive Fallacy of Power Calculations for Data Analysis and Levine and Ensom (2001) Post Hoc Power Analysis:  An Idea Whose Time Has Passed? .  As Hoenig and Heisey show, power is mathematically directly related to the p-value; hence, calculating power once you know the p-value associated with a statistic adds no new information.  Furthermore, as Levine and Ensom clearly explain, the logic underlying post-hoc power analysis is fundamentally flawed.

However, there are some things that you should look at after your study is completed.  Have a look at the means and standard deviations of your variables and see how close they are (or are not) from the values that you used in the power analysis.  Many researchers do a series of related studies, and this information can aid in making decisions in future research.  For example, if you find that your outcome variable had a standard deviation of 7, and in your power analysis you were guessing it would have a standard deviation of 2, you may want to consider using a different measure that has less variance in your next study.

The point here is that in addition to answering your research question(s), your current research project can also assist with your next power analysis.

Conclusions

Conducting research is kind of like buying a car.  While buying a car isn’t the biggest purchase that you will make in your life, few of us enter into the process lightly.  Rather, we consider a variety of things, such as need and cost, before making a purchase.  You would do your research before you went and bought a car, because once you drove the car off the dealer’s lot, there is nothing you can do about it if you realize this isn’t the car that you need.  Choosing the type of analysis is like choosing which kind of car to buy.  The number of subjects is like your budget, and the model is like your expenses.  You would never go buy a car without first having some idea about what the payments will be.  This is like doing a power analysis to determine approximately how many subjects will be needed.  Imagine signing the papers for your new Maserati only to find that the payments will be twice your monthly take-home pay.  This is like wanting to do a multilevel model with a binary outcome, 10 predictors and lots of cross-level interactions and realizing that you can’t do this with only 50 subjects.  You don’t have enough “currency” to run that kind of model.  You need to find a model that is “more in your price range.”  If you had $530 a month budgeted for your new car, you probably wouldn’t want exactly $530 in monthly payments. Rather you would want some “wiggle-room” in case something cost a little more than anticipated or you were running a little short on money that month. Likewise, if your power analysis says you need about 300 subjects, you wouldn’t want to collect data on exactly 300 subjects.  You would want to collect data on 300 subjects plus a few, just to give yourself some “wiggle-room” just in case.

Don’t be afraid of what you don’t know.  Get in there and try it BEFORE you collect your data.  Correcting things is easy at this stage; after you collect your data, all you can do is damage control.  If you are in a hurry to get a project done, perhaps the worst thing that you can do is start collecting data now and worry about the rest later.  The project will take much longer if you do this than if you do what we are suggesting and do the power analysis and other planning steps.  If you have everything all planned out, things will go much smoother and you will have fewer and/or less intense panic attacks.  Of course, some thing unexpected will always happen, but it is unlikely to be as big of a problem.  UCLA researchers are always welcome and strongly encouraged to come into our walk-in consulting and discuss their research before they begin the project.

Power analysis = planning.  You will want to plan not only for the test of your main hypothesis, but also for follow-up tests and tests of secondary hypotheses.  You will want to make sure that “confirmation” checks will run as planned (for example, checking to see that interrater reliability was acceptable).  If you intend to use imputation methods to address missing data issues, you will need to become familiar with the issues surrounding the particular procedure as well as including any additional variables in your data collection procedures.  Part of your planning should also include a list of the statistical tests that you intend to run and consideration of any procedure to address alpha inflation issues that might be necessary.

The number output by any power analysis program is often just a starting point of thought more than a final answer to the question of how many subjects will be needed.  As we have seen, you also need to consider the purpose of the study (coefficient different from 0, precise point estimate, replication), the type of statistical test that will be used (t-test versus maximum likelihood technique), the total number of statistical tests that will be performed on the data set, genearlizability from the sample to the population, and probably several other things as well.

The take-home message from this seminar is “do your research before you do your research.”

Anderson, N. H.  (2001).  Empirical Direction in Design and Analysis.  Mahwah, New Jersey:  Lawrence Erlbaum Associates.

Bausell, R. B. and Li, Y.  (2002).  Power Analysis for Experimental Research:  A Practical Guide for the Biological, Medical and Social Sciences.  Cambridge University Press, New York, New York.

Bickman, L., Editor.  (2000).  Research Design:  Donald Campbell’s Legacy, Volume 2.  Thousand Oaks, CA:  Sage Publications.

Bickman, L., Editor.  (2000).  Validity and Social Experimentation. Thousand Oaks, CA:  Sage Publications.

Campbell, D. T. and Russo, M. J.  (2001).  Social Measurement. Thousand Oaks, CA:  Sage Publications.

Campbell, D. T. and Stanley, J. C.  (1963).  Experimental and Quasi-experimental Designs for Research.  Reprinted from Handbook of Research on Teaching .  Palo Alto, CA:  Houghton Mifflin Co.

Chen, P. and Popovich, P. M.  (2002).  Correlation: Parametric and Nonparametric Measures.  Thousand Oaks, CA:  Sage Publications.

Cohen, J. (1988).  Statistical Power Analysis for the Behavioral Sciences, Second Edition.  Hillsdale, New Jersey:  Lawrence Erlbaum Associates.

Cook, T. D. and Campbell, D. T.  Quasi-experimentation:  Design and Analysis Issues for Field Settings.  (1979).  Palo Alto, CA: Houghton Mifflin Co.

Graham, J. W., Cumsille, P. E., and Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka and W. F. Velicer (Eds.), Handbook of psychology (Vol. 2, pp. 87-114). New York: Wiley.

Green, S. B.  (1991).  How many subjects does it take to do a regression analysis?  Multivariate Behavioral Research, 26(3) , 499-510.

Hoenig, J. M. and Heisey, D. M.  (2001).  The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis.  The American Statistician, 55(1) , 19-24.

Kelley, K and Maxwell, S. E.  (2003).  Sample size for multiple regression:  Obtaining regression coefficients that are accurate, not simply significant.  Psychological Methods, 8(3) , 305-321.

Keppel, G. and Wickens, T. D. (2004).  Design and Analysis:  A Researcher’s Handbook, Fourth Edition.  Pearson Prentice Hall:  Upper Saddle River, New Jersey.

Kline, R. B. Beyond Significance  (2004).  Beyond Significance Testing:  Reforming Data Analysis Methods in Behavioral Research. American Psychological Association:  Washington, D.C.

Levine, M., and Ensom M. H. H.  (2001).  Post Hoc Power Analysis: An Idea Whose Time Has Passed?  Pharmacotherapy, 21(4) , 405-409.

Lipsey, M. W. and Wilson, D. B.  (1993).  The Efficacy of Psychological, Educational, and Behavioral Treatment:  Confirmation from Meta-analysis.  American Psychologist, 48(12) , 1181-1209.

Long, J. S. (1997).  Regression Models for Categorical and Limited Dependent Variables.  Thousand Oaks, CA:  Sage Publications.

Maxwell, S. E.  (2000).  Sample size and multiple regression analysis.  Psychological Methods, 5(4) , 434-458.

Maxwell, S. E. and Delany, H. D.  (2004).  Designing Experiments and Analyzing Data:  A Model Comparison Perspective, Second Edition. Lawrence Erlbaum Associates, Mahwah, New Jersey.

Murphy, K. R. and Myors, B.  (2004).  Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. Mahwah, New Jersey:  Lawrence Erlbaum Associates.

Publication Manual of the American Psychological Association, Fifth Edition. (2001).  Washington, D.C.:  American Psychological Association.

Sedlmeier, P. and Gigerenzer, G.  (1989).  Do Studies of Statistical Power Have an Effect on the Power of Studies?  Psychological Bulletin, 105(2) , 309-316.

Shadish, W. R., Cook, T. D. and Campbell, D. T.  (2002). Experimental and Quasi-experimental Designs for Generalized Causal Inference. Boston:  Houghton Mifflin Co.

Stratton, I. M. and Neil, A.  (2004).  How to ensure your paper is rejected by the statistical reviewer.  Diabetic Medicine , 22, 371-373.

Tversky, A. and Kahneman, D.  (1971).  Belief in the Law of Small Numbers.  Psychological Bulletin, 76(23) , 105-110.

Webb, E., Campbell, D. T., Schwartz, R. D., and Sechrest, L.  (2000). Unobtrusive Measures, Revised Edition.  Thousand Oaks, CA:  Sage Publications.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 20, Issue 5
  • An introduction to power and sample size estimation
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

This article has corrections. Please see:

  • Correction - January 01, 2004
  • Correction: An introduction to power and sample size estimation - October 01, 2023

Download PDF

  • S R Jones 1 ,
  • S Carley 2 ,
  • M Harrison 3
  • 1 North Manchester Hospital, Manchester, UK
  • 2 Royal Bolton Hospital, Bolton, UK
  • 3 North Staffordshire Hospital, UK
  • Correspondence to: Dr S R Jones, Emergency Department, Manchester Royal Infirmary, Oxford Road, Manchester M13 9WL, UK; steve.r.jones{at}bigfoot.com

The importance of power and sample size estimation for study design and analysis.

  • research design
  • sample size

https://doi.org/10.1136/emj.20.5.453

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Understand power and sample size estimation.

Understand why power is an important part of both study design and analysis.

Understand the differences between sample size calculations in comparative and diagnostic studies.

Learn how to perform a sample size calculation.

– (a) For continuous data

– (b) For non-continuous data

– (c) For diagnostic tests

POWER AND SAMPLE SIZE ESTIMATION

Power and sample size estimations are measures of how many patients are needed in a study. Nearly all clinical studies entail studying a sample of patients with a particular characteristic rather than the whole population. We then use this sample to draw inferences about the whole population.

In previous articles in the series on statistics published in this journal, statistical inference has been used to determine if the results found are true or possibly due to chance alone. Clearly we can reduce the possibility of our results coming from chance by eliminating bias in the study design using techniques such as randomisation, blinding, etc. However, another factor influences the possibility that our results may be incorrect, the number of patients studied. Intuitively we assume that the greater the proportion of the whole population studied, the closer we will get to true answer for that population. But how many do we need to study in order to get as close as we need to the right answer?

WHAT IS POWER AND WHY DOES IT MATTER

Power and sample size estimations are used by researchers to determine how many subjects are needed to answer the research question (or null hypothesis).

An example is the case of thrombolysis in acute myocardial infarction (AMI). For many years clinicians felt that this treatment would be of benefit given the proposed aetiology of AMI, however successive studies failed to prove the case. It was not until the completion of adequately powered “mega-trials” that the small but important benefit of thrombolysis was proved.

Generally these trials compared thrombolysis with placebo and often had a primary outcome measure of mortality at a certain number of days. The basic hypothesis for the studies may have compared, for example, the day 21 mortality of thrombolysis compared with placebo. There are two hypotheses then that we need to consider:

The null hypothesis is that there is no difference between the treatments in terms of mortality.

The alternative hypothesis is that there is a difference between the treatments in terms of mortality.

In trying to determine whether the two groups are the same (accepting the null hypothesis) or they are different (accepting the alternative hypothesis) we can potentially make two kinds of error. These are called a type I error and a type II error.

A type I error is said to have occurred when we reject the null hypothesis incorrectly (that is, it is true and there is no difference between the two groups) and report a difference between the two groups being studied.

A type II error is said to occur when we accept the null hypothesis incorrectly (that is, it is false and there is a difference between the two groups which is the alternative hypothesis) and report that there is no difference between the two groups.

They can be expressed as a two by two table (table 1 ⇓ ).

  • View inline

Two by two table

Power calculations tell us how many patients are required in order to avoid a type I or a type II error.

The term power is commonly used with reference to all sample size estimations in research. Strictly speaking “power” refers to the probability of avoiding a type II error in a comparative study. Sample size estimation is a more encompassing term that looks at more than just the type II error and is applicable to all types of studies. In common parlance the terms are used interchangeably.

WHAT AFFECTS THE POWER OF A STUDY?

There are several factors that can affect the power of a study. These should be considered early on in the development of a study. Some of the factors we have control over, others we do not.

The precision and variance of measurements within any sample

Why might a study not find a difference if there truly is one? For any given result from a sample of patients we can only determine a probability distribution around that value that will suggest where the true population value lies. The best known example of this would be 95% confidence intervals. The size of the confidence interval is inversely proportional to the number of subjects studied. So the more people we study the more precise we can be about where the true population value lies.

Figure 1 ⇓ shows that for a single measurement, the more subjects studied the narrower the probability distribution becomes. In group 1 the mean is 5 with wide confidence intervals (3–7). By doubling the number of patients studied (but in our example keeping the values the same) the confidence intervals have narrowed (3.5–6.5) giving a more precise estimate of the true population mean.

  • Download figure
  • Open in new tab
  • Download powerpoint

Change in confidence interval width with increasing numbers of subjects.

The probability distribution of where the true value lies is an integral part of most statistical tests for comparisons between groups (for example, t tests). A study with a small sample size will have large confidence intervals and will only show up as statistically abnormal if there is a large difference between the two groups. Figure 2 ⇓ demonstrates how increasing the number of subjects can give a more precise estimate of differences.

Effect of confidence interval reduction to demonstrate a true difference in means. This example shows that the initial comparison between groups 1 and 3 showed no statistical difference as the confidence intervals overlapped. In groups 3 and 4 the number of patients is doubled (although the mean remains the same). We see that the confidence intervals no longer overlap indicating that the difference in means is unlikely to have occurred by chance.

The magnitude of a clinically significant difference

If we are trying to detect very small differences between treatments, very precise estimates of the true population value are required. This is because we need to find the true population value very precisely for each treatment group. Conversely, if we find, or are looking for, a large difference a fairly wide probability distribution may be acceptable.

In other words if we are looking for a big difference between treatments we might be able to accept a wide probability distribution, if we want to detect a small difference we will need great precision and small probability distributions. As the width of probability distributions is largely determined by how many subjects we study it is clear that the difference sought affects sample size calculations.

Factors affecting a power calculation

Magnitude of a clinically significant difference

How certain we want to be to avoid type 1 error

The type of statistical test we are performing

When comparing two or more samples we usually have little control over the size of the effect. However, we need to make sure that the difference is worth detecting. For example, it may be possible to design a study that would demonstrate a reduction in the onset time of local anaesthesia from 60 seconds to 59 seconds, but such a small difference would be of no clinical importance. Conversely a study demonstrating a difference of 60 seconds to 10 minutes clearly would. Stating what the “clinically important difference” is a key component of a sample size calculation.

How important is a type I or type II error for the study in question?

We can specify how concerned we would be to avoid a type I or type II error. A type I error is said to have occurred when we reject the null hypothesis incorrectly. Conventionally we choose a probability of <0.05 for a type I error. This means that if we find a positive result the chances of finding this (or a greater difference) would occur on less than 5% of occasions. This figure, or significance level, is designated as pα and is usually pre-set by us early in the planning of a study, when performing a sample size calculation. By convention, rather than design, we more often than not choose 0.05. The lower the significance level the lower the power, so using 0.01 will reduce our power accordingly.

(To avoid a type I error—that is, if we find a positive result the chances of finding this, or a greater difference, would occur on less than α% of occasions)

A type II error is said to occur when we accept the null hypothesis incorrectly and report that there is no difference between the two groups. If there truly is a difference between the interventions we express the probability of getting a type II error and how likely are we to find it. This figure is referred to as pβ. There is less convention as to the accepted level of pβ, but figures of 0.8–0.9 are common (that is, if a difference truly exists between interventions then we will find it on 80%–90% of occasions.)

The avoidance of a type II error is the essence of power calculations. The power of a study, pβ, is the probability that the study will detect a predetermined difference in measurement between the two groups, if it truly exists, given a pre-set value of pα and a sample size, N.

Sample size calculations indicate how the statistical tests used in the study are likely to perform. Therefore, it is no surprise that the type of test used affects how the sample size is calculated. For example, parametric tests are better at finding differences between groups than non-parametric tests (which is why we often try to convert basic data to normal distributions). Consequently, an analysis reliant upon a non-parametric test (for example, Mann-Whitney U) will need more patients than one based on a parametric test (for example, Student’s t test).

SHOULD SAMPLE SIZE CALCULATIONS BE PERFORMED BEFORE OR AFTER THE STUDY?

The answer is definitely before, occasionally during, and sometimes after.

In designing a study we want to make sure that the work that we do is worthwhile so that we get the correct answer and we get it in the most efficient way. This is so that we can recruit enough patients to give our results adequate power but not too many that we waste time getting more data than we need. Unfortunately, when designing the study we may have to make assumptions about desired effect size and variance within the data.

Interim power calculations are occasionally used when the data used in the original calculation are known to be suspect. They must be used with caution as repeated analysis may lead to a researcher stopping a study as soon as statistical significance is obtained (which may occur by chance at several times during subject recruitment). Once the study is underway analysis of the interim results may be used to perform further power calculations and adjustments made to the sample size accordingly. This may be done to avoid the premature ending of a study, or in the case of life saving, or hazardous therapies, to avoid the prolongation of a study. Interim sample size calculations should only be used when stated in the a priori research method.

When we are assessing results from trials with negative results it is particularly important to question the sample size of the study. It may well be that the study was underpowered and that we have incorrectly accepted the null hypothesis, a type II error. If the study had had more subjects, then a difference may well have been detected. In an ideal world this should never happen because a sample size calculation should appear in the methods section of all papers, reality shows us that this is not the case. As a consumer of research we should be able to estimate the power of a study from the given results.

Retrospective sample size calculation are not covered in this article. Several calculators for retrospective sample size are available on the internet (UCLA power calculators ( http://calculators.stat.ucla.edu/powercalc/ ), Interactive statistical pages ( http://www.statistics.com/content/javastat.html ).

WHAT TYPE OF STUDY SHOULD HAVE A POWER CALCULATION PERFORMED?

Nearly all quantitative studies can be subjected to a sample size calculation. However, they may be of little value in early exploratory studies where scarce data are available on which to base the calculations (though this may be addressed by performing a pilot study first and using the data from that).

Clearly sample size calculations are a key component of clinical trials as the emphasis in most of these studies is in finding the magnitude of difference between therapies. All clinical trials should have an assessment of sample size.

In other study types sample size estimation should be performed to improve the precision of our final results. For example, the principal outcome measures for many diagnostic studies will be the sensitivity and specificity for a particular test, typically reported with confidence intervals for these values. As with comparative studies, the greater number of patients studied the more likely the sample finding is to reflect the true population value. By performing a sample size calculation for a diagnostic study we can specify the precision with which we would like to report the confidence intervals for the sensitivity and specificity.

As clinical trials and diagnostic studies are likely to form the core of research work in emergency medicine we have concentrated on these in this article.

POWER IN COMPARATIVE TRIALS

Studies reporting continuous normally distributed data.

Suppose that Egbert Everard had become involved in a clinical trial involving hypertensive patients. A new antihypertensive drug, Jabba Juice, was being compared with bendrofluazide as a new first line treatment for hypertension (table 2 ⇓ ).

Egbert writes down some things that he thinks are important for the calculation

As you can see the figures for pα and pβ are somewhat typical. These are usually set by convention, rather than changing between one study and another, although as we see below they can change.

A key requirement is the “clinically important difference” we want to detect between the treatment groups. As discussed above this needs to be a difference that is clinically important as, if it is very small, it may not be worth knowing about.

Another figure that we require to know is the standard deviation of the variable within the study population. Blood pressure measurements are a form of normally distributed continuous data and as such will have standard deviation, which Egbert has found from other studies looking at similar groups of people.

Once we know these last two figures we can work out the standardised difference and then use a table to give us an idea of the number of patients required.

The difference between the means is the clinically important difference—that is, it represents the difference between the mean blood pressure of the bendrofluazide group and the mean blood pressure of the new treatment group.

From Egbert’s scribblings:

Using table 3 ⇓ we can see that with a standardised difference of 0.5 and a power level (pβ) of 0.8 the number of patients required is 64. This table is for a one tailed hypothesis, (?) the null hypothesis requires the study to be powerful enough to detect either treatment being better or worse than the other, so we will need a minimum of 64×2=128 patients. This is so that we make sure we get patients that fall both sides of the mean difference we have set.

How power changes with standardised difference

Another method of setting the sample size is to use the nomogram developed by Gore and Altman 2 as shown in figure 3 ⇓ .

Nomogram for the calculation of sample size.

From this we can use a straight edge to join the standardised difference to the power required for the study. Where the edge crosses the middle variable gives an indication as to the number, N, required.

The nomogram can also be used to calculate power for a two tailed hypothesis comparison of a continuous measurement with the same number of patients in each group.

If the data are not normally distributed the nomogram is unreliable and formal statistical help should be sought.

Studies reporting categorical data

Suppose that Egbert Everard, in his constant quest to improve care for his patients suffering from myocardial infarction, had been persuaded by a pharmaceutical representative to help conduct a study into the new post-thrombolysis drug, Jedi Flow. He knew from previous studies that large numbers would be needed so performed a sample size calculation to determine just how daunting the task would be (table 4 ⇓ ).

Sample size calculation

Once again the figures for pα and pβ are standard, and we have set the level for a clinically important difference.

Unlike continuous data, the sample size calculation for categorical data is based on proportions. However, similar to continuous data we still need to calculate a standardised difference. This enables us to use the nomogram to work out how many patients are needed.

p 1 =proportional mortality in thrombolysis group =12% or 0.12

p 2 =proportional mortality in Jedi Flow group =9% or 0.09 (This is the 3% clinically important difference in mortality we want to show).

P=(p 1+ p 2 )/2=

The standardised difference is 0.1. If we use the nomogram, and draw a line from 0.1 to the power axis at 0.8, we can see from the intersect with the central axis, at 0.05 pα level, we need 3000 patients in the study. This means we need 1500 patients in the Jedi Flow group and 1500 in the thrombolysis group.

POWER IN DIAGNOSTIC TESTS

Power calculations are rarely reported in diagnostic studies and in our experience few people are aware of them. They are of particular relevance to emergency medicine practice because of the nature of our work. The methods described here are taken from the work by Buderer. 3

Dr Egbert Everard decides that the diagnosis of ankle fractures may be improved by the use of a new hand held ultrasound device in the emergency department at Death Star General. The DefRay device is used to examine the ankle and gives a read out of whether the ankle is fractured or not. Dr Everard thinks this new device may reduce the need for patients having to wait hours in the radiology department thereby avoiding all the ear ache from patients when they come back. He thinks that the DefRay may be used as a screening tool, only those patients with a positive DefRay test would be sent to the radiology department to demonstrate the exact nature of the injury.

He designs a diagnostic study where all patients with suspected ankle fracture are examined in the emergency department using the DefRay. This result is recorded and then the patients are sent around for a radiograph regardless of the result of the DefRay test. Dr Everard and a colleague will then compare the results of the DefRay against the standard radiograph.

Missed ankle fractures cost Dr Everard’s department a lot of money in the past year and so it is very important that the DefRay performs well if it be accepted as a screening test. Egbert wonders how many patients he will need. He writes down some notes (table 5 ⇓ ).

Everard’s calculations

For a diagnostic study we calculate the power required to achieve either an adequate sensitivity or an adequate specificity. The calculations work around the standard two by two way of reporting diagnostic data as shown in table 6 ⇓ .

Two by two reporting table for diagnostic tests

To calculate the need for adequate sensitivity

To calculate the need for adequate specificity.

If Egbert were equally interested in having a test with a specificity and sensitivity we would take the greater of the two, but he is not. He is most interested in making sure the test has a high sensitivity to rule out ankle fractures. He therefore takes the figure for sensitivity, 243 patients.

Sample size estimation is key in performing effective comparative studies. An understanding of the concepts of power, sample size, and type I and II errors will help the researcher and the critical reader of the medical literature.

What factors affect a power calculation for a trial of therapy?

Dr Egbert Everard wants to test a new blood test (Sithtastic) for the diagnosis of the dark side gene. He wants the test to have a sensitivity of at least 70% and a specificity of 90% with 5% confidence levels. Disease prevalence in this population is 10%.

– (i) How many patients does Egbert need to be 95% sure his test is more than 70% sensitive?

– (ii) How many patients does Egbert need to be 95% sure that his test is more than 90% specific?

If Dr Everard was to trial a new treatment for light sabre burns that was hoped would reduce mortality from 55% to 45%. He sets the pα to 0.05 and pβ to 0.99 but finds that he needs lots of patients, so to make his life easier he changes the power to 0.80.

How many patients in each group did he need with the pα to 0.05 and pβ to 0.80?

How many patients did he need with the higher (original) power?

Quiz answers

(i) 2881 patients; (ii) 81 patients

(i) about 400 patients in each group; (ii) about 900 patients in each group

Acknowledgments

We would like to thank Fiona Lecky, honorary senior lecturer in emergency medicine, Hope Hospital, Salford for her help in the preparation of this paper.

  • Driscoll P , Wardrope J. An introduction to statistics. J Accid Emerg Med 2000 ; 17 : 205 . OpenUrl FREE Full Text
  • ↵ Gore SM , Altman DG. How large a sample. In: Statistics in practice . London: BMJ Publishing, 2001 :6–8.
  • ↵ Buderer NM . Statistical methodology: I. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. Acad Emerg Med 1996 ; 3 : 895 –900. OpenUrl CrossRef PubMed Web of Science

Correction notice Following recent feedback from a reader, the authors have corrected this article. The original version of this paper stated that: “Strictly speaking, “power” refers to the number of patients required to avoid a type II error in a comparative study.” However, the formal definition of “power” is that it is the probability of avoiding a type II error (rejecting the alternative hypothesis when it is true), rather than a reference to the number of patients. Power is, however, related to sample size as power increases as the number of patients in the study increases. This statement has therefore been corrected to: “Strictly speaking, “power” refers to the probability of avoiding a type II error in a comparative study.

Linked Articles

  • Correction Correction BMJ Publishing Group Ltd and the British Association for Accident & Emergency Medicine Emergency Medicine Journal 2004; 21 126-126 Published Online First: 20 Jan 2004.
  • Correction Correction: An introduction to power and sample size estimation BMJ Publishing Group Ltd and the British Association for Accident & Emergency Medicine Emergency Medicine Journal 2023; 40 e4-e4 Published Online First: 27 Sep 2023. doi: 10.1136/emj.20.5.453corr2

Read the full text or download the PDF:

  • Help and information
  • Dietetics and Nutrition
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Molecular and Cell Biology
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Study and Communication Skills in Life Sciences
  • Zoology and Animal Sciences
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Organic Chemistry
  • Organometallic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Clinical Genetics
  • Endocrinology and Diabetes
  • Infectious Diseases
  • Medical Oncology
  • Sports and Exercise Medicine
  • Agriculture and Farming
  • Applied Ecology
  • Environmental Sustainability
  • Environmentalist and Conservationist Organizations
  • Management of Land and Natural Resources
  • Haematology
  • Biomathematics and Statistics
  • Medical Skills
  • Cognition and Behavioural Neuroscience
  • Histopathology
  • Medical Microbiology and Virology
  • Biological and Medical Physics
  • Cell Biology
  • Public Health and Epidemiology
  • Share This Facebook LinkedIn Twitter

Power Analysis: An Introduction for the Life Sciences | Science Trove

Power Analysis: An Introduction for the Life Sciences  

Power Analysis starts by asking: what is statistical power and why is low power undesirable? It then moves on to considering ways in which we can improve the power of an experiment. It asks how we can quantify power by simulation. It also examines simple factorial designs and extensions to other designs. Next, it asks how we can deal with multiple hypotheses. Finally, it looks at how to apply the simulation approach presented in this book beyond null hypothesis testing.

You do not currently have access to this article

Please login to access the full content.

Access to the full content requires a subscription

  • About the Authors  
  • Acknowledgments  
  • Introduction: why should you read this book?  
  • 1. What is statistical power?  
  • 2. Why low power is undesirable  
  • 3. Improving the power of an experiment  
  • 4. How to quantify power by simulation  
  • 5. Simple factorial designs  
  • 6. Extensions to other designs  
  • 7. Dealing with multiple hypotheses  
  • 8. Applying our simulation approach beyond null hypothesis testing: parameter estimation, bayesian, and model-selection contexts  
  • Appendix: some handy hints on simulating data in R  

Printed from Oxford Science Trove. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 01 June 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|109.248.223.228]
  • 109.248.223.228

Character limit 500 /500

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Supplements
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 6, Issue 11
  • Power analysis in health policy and systems research: a guide to research conceptualisation
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-3448-7983 Stephanie M Topp 1 , 2 ,
  • http://orcid.org/0000-0002-7616-5966 Marta Schaaf 3 ,
  • Veena Sriram 4 ,
  • http://orcid.org/0000-0003-3597-9637 Kerry Scott 5 , 6 ,
  • http://orcid.org/0000-0002-7218-5193 Sarah L Dalglish 5 , 7 ,
  • http://orcid.org/0000-0001-9161-4814 Erica Marie Nelson 8 ,
  • http://orcid.org/0000-0003-3597-4991 Rajasulochana SR 9 ,
  • http://orcid.org/0000-0001-7218-9322 Arima Mishra 10 ,
  • http://orcid.org/0000-0003-4186-2136 Sumegha Asthana 11 ,
  • Rakesh Parashar 12 ,
  • http://orcid.org/0000-0002-2416-2309 Robert Marten 13 ,
  • João Gutemberg Quintas Costa 14 ,
  • http://orcid.org/0000-0003-0743-7208 Emma Sacks 5 ,
  • Rajeev BR 15 ,
  • http://orcid.org/0000-0002-2903-6571 Katherine Ann V Reyes 16 ,
  • Shweta Singh 17
  • 1 College of Public Health Medical and Veterinary Sciences , James Cook University , Townsville , Queensland , Australia
  • 2 Nossal Institute for Global Health , University of Melbourne , Melbourne , Victoria , Australia
  • 3 Independent Consultant , Brooklyn , New York , USA
  • 4 School of Public Policy and Global Affairs and School of Population and Public Health , University of British Columbia , Vancouver , British Columbia , Canada
  • 5 Department of International Health , Johns Hopkins University Bloomberg School of Public Health , Baltimore , Maryland , USA
  • 6 Independent Consultant , Toronto , Ontario , Canada
  • 7 Institute for Global Health , University College London , London , UK
  • 8 Health and Nutrition Cluster , Institute of Development Studies , Brighton , UK
  • 9 Jawaharlal Institute of Postgraduate Medical Education and Research , Puducherry , Tamil Nadu , India
  • 10 Azim Premji University , Bangalore , Karnataka , India
  • 11 Independent Consultant , New Delhi , India
  • 12 Oxford Policy Management , New Dehli , India
  • 13 Alliance for Health Policy and Systems Research , WHO , Geneva , Switzerland
  • 14 Independent Consultant , Geneva , Switzerland
  • 15 Society for Community Health Awareness Research and Action , Bangalore , Karnataka , India
  • 16 Alliance for Improving Health Outcomes Inc , Quezon City , Philippines
  • 17 Independent Consultant , Raipur , India
  • Correspondence to Dr Stephanie M Topp; globalstopp{at}gmail.com

Power is a growing area of study for researchers and practitioners working in the field of health policy and systems research (HPSR). Theoretical development and empirical research on power are crucial for providing deeper, more nuanced understandings of the mechanisms and structures leading to social inequities and health disparities; placing contemporary policy concerns in a wider historical, political and social context; and for contributing to the (re)design or reform of health systems to drive progress towards improved health outcomes. Nonetheless, explicit analyses of power in HPSR remain relatively infrequent, and there are no comprehensive resources that serve as theoretical and methodological starting points. This paper aims to fill this gap by providing a consolidated guide to researchers wishing to consider, design and conduct power analyses of health policies or systems. This practice article presents a synthesis of theoretical and conceptual understandings of power; describes methodologies and approaches for conducting power analyses; discusses how they might be appropriately combined; and throughout reflects on the importance of engaging with positionality through reflexive praxis. Expanding research on power in health policy and systems will generate key insights needed to address underlying drivers of health disparities and strengthen health systems for all.

  • health systems
  • health policies and all other topics
  • health services research

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjgh-2021-007268

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

Analysing how power shapes health policy and systems is critical to identifying underlying factors driving health disparities, health systems challenges and societal inequities.

Power is complex to explore conceptually, theoretically and methodologically, and explicit analyses of power in health policy and systems remain relatively infrequent.

There is no consolidated resource that provides health policy and systems researchers with an empirical, theoretical and methodological starting point on power.

We introduce a new framework for identifying and refining discrete areas of inquiry for power-focused health policy and systems research.

Theoretical and conceptual understandings of power are summarised and linked to a selection of methodologies and methods for conducting analyses.

Illustrative examples of combining theory and methodology to analyse different levels of power in health policy and systems research are provided.

Expanding research on power in health policy and systems in all contexts will generate insights needed to address underlying drivers of health disparities and strengthen health systems for all.

Introduction

Power is defined as the ability or capacity to ‘do something or act in a particular way’ and to ‘direct or influence the behaviour of others or the course of events’. 1 Relationships of power shape societies, and in turn, health policies, services and outcomes. 2 Power dynamics—or the relational power that manifests in the interaction among individuals and organisations—also influence health systems, or ‘the organizations, people and actions whose primary intent is to promote, restore or maintain health’. 3 The universe of power dynamics that are pertinent to the study of health policies and systems includes diverse types and locations of policy, social, implementation and political processes. Power dynamics have also influenced health systems planning and research, by defining what is seen as a health system, and the translation or adaptation of health systems models across distinct geographic contexts over time. 4 5

Studying power is thus a core concern of researchers and practitioners working in the field of health policy and systems research (HPSR), an interdisciplinary, problem-driven field focused on understanding and strengthening of multilevel systems and policies. 6 Accelerating theoretical development and empirical research on power in this domain is crucial for several reasons. First, it provides a deeper, more nuanced understanding of the mechanisms and structures that lead to social inequities and health disparities. 7 Second, it reveals historical patterns entrenched in health and social systems, allowing contemporary policy concerns to be seen in a wider context and lessons to be drawn from these trends. 8 Third, analysing power can contribute to the (re)design or reform of health systems to redress imbalances and progress towards improved health outcomes. 9

Studies incorporating examinations of power in public health and HPSR have gradually increased in number, including, for example, analyses of accountability, political prioritisation, commercial determinants of health, determinants of universal health coverage and state sovereignty in health agenda setting. 10–15 Nonetheless, explicit analyses of power in HPSR remain relatively infrequent. 7 16 Lack of a power-specific lens may reflect the continued dominance of biomedical and behaviouralist approaches in health research and funding, limitations stemming from the political economy of research funding and agendas, and reluctance among institutions and individuals to examine their own role in perpetuating existing power dynamics. 17 18 Power is also complex to examine conceptually, theoretically and methodologically. Seminal publications providing guidance on different aspects of power research include Erasmus and Gilson’s 19 paper on investigating organisational power; the health policy analysis reader edited by Gilson et al , 20 and Loewenson et al ’s 21 methods reader on participatory action research (PAR). Recent resources also provide conceptual overviews of power. 7 9 22 However, there remains no comprehensive resource that can serve as a theoretical and methodological starting point for aspiring power researchers, irrespective of disciplinary orientation or area of HPSR interest. 16

This paper aims to fill this gap, building on the above-mentioned resources but providing a more consolidated guide to researchers wishing to consider, design and conduct power analyses of health policies or systems. Recognising the expansive and interlinked nature of power relations, we focus this article on the different ways to research power as it manifests in health policies and systems. We also engage with literature on the social determinants of health insofar as these determinants impact health policies and systems.

This project emerged from the Social Science Approaches for Research and Engagement in Health Policy and Systems (SHAPES) thematic working group of Health Systems Global. SHAPES members (SMT, VS, MS and KS) with interest and expertise in power analyses reached out to the wider network and requested other interested researchers and practitioners to join the project. Recognising that expertise can take many forms, no criteria were placed on participation other than an interest in the topic and willingness to contribute to the paper’s development. The group was ultimately comprised of researchers from academic institutions, research organisations and multilateral agencies, in both the Global North (eight) and Global South (six) all of whom have experiential knowledge of assessing and negotiating power in health systems at various levels, and a number of whom have published in this area.

The process to develop this resource began in 2019. Members of the original group (SMT, VS, MS and KS) first prepared an outline of the paper via virtual and email discussions among group members. That outline was then divided into sections on theory, methodology and reflexivity, and section leads were appointed by a process of consensus. Group members volunteered to work on a section or sections based on experience and ability to input. Literature was sourced from database searches combined with expert guidance from group members. Working group leads organised the work of these sections and led drafting. Section drafts were reviewed by each group and then the full group, and two external researchers were invited to provide feedback on specific aspects of the paper. Online supplemental appendix 1 illustrates the iterative process by which the ideas were conceptualised, synthesised and agreed on at different stages of the paper drafting. All authors also read and commented on at least one version of the final paper. As a whole, the project was collaborative and worked from the logic of crowd-sourcing among a diverse set of authors engaged in HPSR.

Supplemental material

Doing power analyses in health policies and systems research.

This paper outlines key considerations and principles for power analyses in health policies and systems research throughout the research cycle. The paper is divided into three sections. The first section starts by discussing the identification of a research topic and presents three overarching empirical ‘sites’–or discrete areas of inquiry–for power-focused HPSR. The empirical sites offer a starting point for study design by providing researchers with ways to reflect on and refine their research question. This section also highlights researchers’ positionality and its influence on the whole research process. The second section provides an introduction to (and tabular summary of) theories useful for analysing power, demonstrating each theory’s relationship to one or more of the empirical sites. Finally, the third section of the paper introduces a selection of methodologies, considers their usefulness in the context of different types of power analyses and discusses how they, too, must be selected with consideration for the research question, the researcher’s positionality and alignment with theory. The ideas presented in this paper apply to all geographic contexts; however, we draw largely on HPSR literature from low-income and middle-income countries. This paper does not engage extensively with the use of specific data collection tools or methods (eg, interviews, observations and document review) associated with a given methodology, as other resources address these topics in detail. 19 21 23 24

Identifying a topic

Power is imposed, negotiated and contested in diverse ways in the context of health policy formulation and implementation and health systems functioning. Research into power in the field of HPSR generally focuses on how the ‘expression’ of power enables or blocks health system change or policy implementation and what types of power are implicated in the process. 16 20 From these two broad areas of focus, we discern three main sites of empirical work on power in the health policy and systems field, recognising that these three sites overlap significantly. These are: (1) actor relationships and networks; (2) sources of power and (3) societal flows and expressions of power.

In figure 1 , we locate each of these empirical sites of power research around an adapted version of Walt and Gilson’s 25 seminal Policy Triangle. This figure highlights that applied research on power cannot be conducted in isolation from the actors, context, content, structures and processes of the policy or system in focus. By demonstrating the link between actors, context, and structures and broad areas of power research, the three empirical sites are intended to provide a point of departure for the researcher to consider what is the issue or topic of interest. We expand on each of these empirical sites further below.

  • Download figure
  • Open in new tab
  • Download powerpoint

Three empirical sites of power research in health policy and systems.

Empirical site 1: actor relationships and networks

The role and manifestations of power in actor relationships and networks comprise an important site of empirical research on power in HPSR. We list this site first because we understand health systems as social systems , 6 fundamentally shaped by the values, intentions and relationships of the human and organisational actors within them. As illustrated in the central green triangle in figure 1 , questions about power relating to actor relationships and networks include foundational enquiries about which individuals and organisations make and influence (health) policy and system decisions, how they relate to one another and why.

Empirical site 2: sources of power

As outlined in Sriram et al 16 and Moon 22 , a substantial body of theory is directed towards understanding how actors draw on power from particular sources. 16 22 Sources of power thus represent a second important grouping of research on power in HPSR. Some methodologies, particularly those based in political science and economic theory, can describe and problematise key sources of power, such as material capital; technical expertise; political and bureaucratic position and influence; and forms of cultural capital and power gained from title, education and knowledge. Resultant research can provide analyses regarding which actors are impacting processes, from where they derive their power and how their actions impact policy and systems. This empirical site focuses our attention on ‘drivers of the drivers’, surfacing the institutions, organisations and attributes that provide a fountainhead of power in HPSR.

Empirical site 3: societal flows and expressions of power

A third empirical site of power research in health policy and systems relates to the societal flows and expressions of power. Research on the exercise of power shows how power is expressed, leveraged and experienced to impact health policy and systems, and ultimately, health inequities. Reflecting the intersection among context, actors and structures, research related to flows and expressions of power can generate insights regarding how formal or informal institutions shape health policy-making and service delivery, or on the impact of prevailing ideologies regarding health policy on service delivery. 26 27 Researchers may focus on the ways that health policies and systems shape inequities 28 or the ways that different groups have accepted, adapted and subverted health systems, such as the dictates of colonial medicine 29 30 or neocolonial or internalised colonial forms of public health practice. 31 32

Addressing power within the research process: positionality and reflexivity

In the process of issue identification and throughout the research process, it is critical to recognise the contested relationships of power that shape research itself. The nature of evidence in the fields of global health and health policy and systems research is contested, 33–35 and the funding of evidence generation is politicised. 18 36 Researchers—whether investigating power or other aspects of health and society—must be willing to consider their own role as actors in a contested process. Health research broadly tends to reward—in professional status, resourcing and publishing–positivist and utilitarian approaches over humanistic and relativistic and/or interpretive ones, 36 Northern voices over Southern ones 37 and biomedical knowledge over other forms of knowledge. 38 Indeed the positionality of researchers is present in the many forms of power and privilege that can distance them from the issues they are analysing. Researchers’ professional positionality in the political economy of global health, as well as their individual lived experiences and attributes relating to race, caste, gender, class, ability and more, can significantly influence the choice of questions and (as discussed further) theories and methodologies used to enact analysis of those issues.

How should researchers engage with these challenges? There is no straightforward mechanism by which to operationalise critical reflexivity. Instead, building on the work of Sultana, 39 Citrin, 40 Mafuta et al , 41 Abimbola 37 Keikelame and Swartz 42 and Pratt, 43 we offer a set of questions in table 1 to guide reflection on power as it impacts a given research project. Researchers should consider: for whom they are designing and conducting data collection and analysis and writing up findings? And, how does this influence ‘bad habits’ that pervade global health research? 44 However, discussions of power dynamics as they manifest in politics, social norms and otherwise is not a straightforward endeavour. Those who are brought in to collaborate in research processes, whether they be community members, health services representatives or funders, might be uncomfortable with an explicit focus on power relations. Shining a light on power asymmetries could create risks for collaborators or participants.

  • View inline

Questions to guide reflections on power in health policy and systems research

A conscious nurturing of critical reflexivity within all stages of a research process is a necessary component of ethical and rigorous praxis. However, analysing power while simultaneously maintaining awareness of the power relationships that structure the research endeavour itself is no easy feat. These questions and processes demand a more deliberative, bottom-up, time consuming approach to defining and answering research questions than is often enacted in HPSR. Prospective researchers of power should factor this time into their work. Since the political economy of global health and health policy and systems research can create incentives that undermine reflective, inclusive and transparent approaches to defining and answering research questions, 18 these considerations should be taken into account from this initial step through the dissemination of findings and beyond.

Refining the research question with theory and methodology

The three empirical sites provide a launching pad for considering avenues for power inquiry for health policy and systems. In moving from a topic of interest to a more specific research question on power, and in conjunction with considerations of their own position and power, the researcher must consider their epistemological foundation (ie, what do we consider knowledge and how do we know it), the theories that provide a relevant analytical scaffolding, and concurrently, the methodologies that will enable appropriate collection, collation and analysis of data to that end. 45

Thinking about theory

Theory helps to shape what we ask about power in HPSR. As a field, HPSR aims to generate research to inform policy and action 24 ; this has implications for theory application, with the end goals of equity and justice often informing epistemological and theoretical positions. 16

Some theories are foundational and address the nature of the state, society and human interaction; others are more operational in that they focus on discrete elements of the state, society and human interaction. As part of a process of reflexive research praxis, the entire research team should consider the guiding principles they wish to follow in their research and the implications that these choices have for theory choice and application. For example, researchers with applied interests may consider frameworks designed for this purpose, such as the PowerCube 46 ; conversely, researchers seeking a deeper theoretical understanding of mechanisms driving power imbalances may consider foundational theories, such as Max Weber’s sources of authority, 47

HPSR as a field has developed in dialogue with theories of power from diverse disciplines from the social sciences and humanities, including philosophy, sociology, political science, anthropology, feminist theory, postcolonial and gender studies, history, and international relations, among others. Most of the foundational theories cited in peer-reviewed social science literature (eg, Marx, Gramsci, Bourdieu, Foucault and Haugaard; see ref 9 ) originated in high-income countries, reflecting and perpetuating the discursive and material power held by scholars and academic institutions in these contexts. Many of these theories were developed in the 19th and 20th centuries, and while they describe macro-level processes that are still salient, they were not developed with contemporary phenomena—such as the proliferation of mobile technology and social media—in mind. Some scholars developed critical theories to analyse and critique power structures from the point of view of the oppressed. Theories of domination originating from feminist, postcolonial, Marxist, queer or critical race theory, among others, have been used to describe structural determinants of health, health policy and healthcare, and healthcare-seeking behaviours. 48–50

Many contemporary critical theories focus on the intersectionality of systems of subordination 51–53 ; researchers have begun to suggest ways of applying these theories in health policy analyses. 54 55 Postcolonial literature and subaltern studies have not (yet) been applied extensively in HPSR 29 but have increasingly been cited in discussions about how to decolonise global health 37 42 56 and in recent scholarship on social inequities during the COVID-19 pandemic. 57

Other frameworks used in HPSR, particularly those from public policy studies, draw insights from social science theories to explore power without necessarily invoking power explicitly, such as street-level bureaucracy theory 58 and diffusion theory. 22 In table 2 , we provide an illustrative list and brief explanation of influential theories of power that have informed or been applied to studies assessing health determinants, health policy and health systems. We recognise that the approaches described in this paper do not capture the full breadth and complexity of this topic, and a more detailed version of this table can be found in online supplemental appendix 2 .

Select theorists and theories useful for research on power in health policy and systems

Pairing theory with methodology

Different theories are better suited to analysing power asymmetries characterising each of the three empirical sites. With regards to empirical site 1, theories with potential for exploring actor relationships and networks may include Weber’s three sources of authority 47 : street-level bureaucracy 58 ; feminist standpoint theory, 50 critical race theory 48 and Bourdieu’s fields. 59 Theories particularly relevant to examining the sources of power (empirical site 2) include Barnett and Duvall’s taxonomy of power, 60 Bourdieu’s ‘fields’, 59 Gramsci’s concept of cultural hegemony 61 and feminist approaches. 50 62 Theories relevant to expansive questions regarding how power is expressed and manifest in society at large (empirical site 3) may include Foucault's concept of knowledge/power, 63 Veneklasen and Miller’s ‘expressions of power’ 64 and Lukes’ three faces of power. 65

While theory helps to shape what we ask about power in HPSR, methodology shapes how we ask it and how we interpret the findings ( figure 2 ). Below we provide an overview of 10 methodologies (broadly defined) that are of use in the context of the three empirical sites. The organisation of the methodologies under the empirical sites is merely illustrative. While some methodologies may be closely associated with a given empirical site (eg, social network analysis is associated with actor relationships and networks), many others are not. In conjunction with ongoing reflexive considerations of positionality, researchers choosing a methodology should consider their theoretical and epistemological position and the context of the research question, since the assumptions underlying the application of methodologies can be different (eg, the difference between an objectivist case study and an ethnography). Selection of methodologies should also consider for whom the research is being conducted, and whether the aim is to generate or further refine a theory or produce more immediately actionable findings. A summary table of these methodologies may be found in online supplemental appendix 3 .

Linking empirical sites, theory and methodologies for research on power in health policy and systems research.

To further make this point, table 3 provides illustrative examples of possible combinations of research question, theory and methodology. The inclusion in the table of two research questions at each of the different levels of health policy and systems function (micro, meso and macro) is intended to demonstrate (although incompletely) the breadth of potential inquiry as well as to showcase the specificity sometimes required to enable effective theoretical and methodological linkage. A key point made clear by the repeat listings of theories and methodologies across the various questions in table 3 is that there are many valid combinations of theories and methodologies.

Illustrative combinations of theory and methodology paired with research questions on power in HPSR

Useful methodologies for empirical site 1: actor relationships and networks

Stakeholder analysis is an actor-oriented methodology useful for examining the power differentials of key policy and health system actors, ranging from frontline healthcare workers to national level policy makers. 20 Stakeholder analysis is most commonly used prospectively, as a tool for researchers and practitioners to understand the feasibility of a given policy and to develop responses to likely challenges in implementing that policy. 66 Stakeholder analysis can also be used retrospectively, as a stand-alone study or in combination with political economy and case study approaches. Stakeholder analysis is also commonly used to consider sources of power, described in further detail below.

Actor interface analysis focuses on understanding individual actors (rather than organisations), examines policy through the lens of power struggles between individuals and explores how this behaviour is embedded in actors’ lived experiences and values, called actor lifeworlds . 67 68 When used to study health policy, actor interface analysis examines how interactions among different actors shape the implementation and outcomes of the policy. Where actors interact, collaboration, contestation or resistance can be identified and analysed. This methodology brings an actor-centric lens to the study of power in policy implementation as compared with other (more institutionally focused) methodologies and helps to examine how policy-related decisions and action are shaped by the actors themselves. 67 69 70

Social network analysis is the quantitative study of relationship patterns among actors, with actors being broadly defined to potentially include people, groups or organisations. 71 72 This methodology draws from sociology and mathematical foundations of graph theory to illuminate how the nature of actors and ties (eg, number, strength and type of tie, such as friendship, supervisory relationship and whether information, resources or beliefs were shared) enable expressions and tools of power (eg, money, pressure, influence and knowledge) to be concentrated, spread or blocked. 73 In the field of HPSR, social network analysis can be used to analyse the health system structure as it functions, including through informal personal relationships, rather than as it is formally defined. 74 This can inform policy makers about how ties among actors can influence the diffusion and implementation of health reforms and programmes; how social networks influence governance and financing structures; as well as informing the public about how policy makers may be using power to include or exclude certain actors. 71 75 76

Useful methodologies for empirical site 2: sources of power

Case study design is a form of empirical inquiry characterised by an ‘intense focus on a single phenomenon within its real-life context’ 77 and is particularly useful in situations where boundaries between the phenomenon of interest and the context are blurred. In relation to power in HPSR, case study research has most commonly been used to produce exploratory and explanatory accounts focusing on different actors’ expressions of power (formal and informal, overt and covert) to answer ‘how?’ and ‘why?’ certain health policy or system features exist and to assess efforts to change power dynamics. 20 78 By combining an interpretivist (seeking to understand individual and shared social meanings) and critical (questioning one’s own and others’ assumptions) analytical approach, researchers may use this methodology to consciously account for the ways in which broader social and political environments influence both macropower and micropower dynamics. 79 80 Comparative case studies can be used for theory building or theory testing.

Political economy analysis is a methodology used to identify and describe structures such as government and the law; resources (labour, capital, trade and production) and how they are distributed and contested in different country and sector contexts, and the resulting implications for policy and indicators of well-being. 81 Of relevance to HPSR, political economy can draw on both quantitative and qualitative methods to explore the nature of the political landscape through mapping the power and position of key actors. Political economy can also explore how the distribution of resources influence relationships and through this the feasibility and trajectory of policy reform over time. 81 82 Reflecting their roots in the comparatively more positivist paradigms of political science and economics, these methodologies have been used for purposes of explanation and hypothesis testing in HSPR, including in the context of evaluations and policy design. Consistent with HPSR’s multidisciplinary orientation, political economy methodologies can nonetheless be developed and deployed in a way that accommodates—or even centres—interpretive goals.

Big data analytics examines high volume, biological, clinical, environmental and behavioural information collected from single individuals to large cohorts at one or several time points. 83 Big data analytics can uncover patterns in health outcomes and health behaviours 84 ; health policy (eg, resourcing and implementation fidelity) 85 ; and health system function (eg, provider behaviours). 86 87 When applied in conjunction with a power lens, big data analytics can reveal important and often masked trends or patterned experiences, prompting further explanatory work or evaluative action. 88 For example, Yu et al 89 use big data analytics to explore the influence of private medical providers in promoting unnecessary medical interventions. 89 Big data analytics may also help identify systemic issues such as discrimination, information asymmetry and patient-provider dynamics and their influence on care quality. Nonetheless, given its volume as well as its potential interest to profit seeking entities, big data presents unique challenges for ethics, boundaries and reflexivity. Researchers should carefully consider the potential misuses of the data, the extent to which the data accurately represents the factors of interest (construct validity) and which individuals and groups are overlooked in analyses that focus on the mean (or median). 90

Useful methodologies for empirical site 3: societal flows and expressions of power

Discourse analysis entails close examination of the use of language in texts (such as laws, policies, strategy documents or news media articles) and oral communication (such as transcribed interviews, debates or speeches) to describe the ways in which communicative acts construct shared understandings of what is normal 91 92 and what is possible, legitimate, or true. 63 Discourse analysis should include the study of what is present in the text, as well as what is assumed or ignored, shedding light on often unacknowledged material asymmetries and social hierarchies that pervade health policy-making at all levels. 93 94 In this way, discourse analysis can expose and problematise dominant paradigms in global and domestic health policy-making, such as the ways that standard epidemiological risk factors obscure structural inequities, 95 the assumption that the private sector will act in the public interest 96 or that a primary function of government reproductive health programmes is to decrease the fertility rate, rather than enable reproductive autonomy. 97

Ethnographers seek to understand how humans in groups interact, behave and perceive, and how meaning and value are established. Ethnography can build rich and holistic understanding of people’s perspectives, practices and cultural context 98 and focuses on depth over breadth, immersive observation in natural settings (eg, non-experimental conditions), exploratory (rather than hypothesis testing) research and describing the meaning and function of human action in context. 99 100 While ethnography has its origins in colonial conceptions of ‘culture’ and colonial motivations to study them and has thus been frequently used to ‘study down’, 101 ethnography has also been employed to research ‘up, down and sideways’. 102 This includes work focusing on institutions and politics, political legitimacy, moral universes, tacit knowledge and discourses to provide insight into how power is constructed, solidified and wielded within and beyond health systems, 103 104 the development and normalisation of certain forms of knowledge 105 and the implicit or explicit privilege or denigration of individuals or marginalised groups accessing healthcare. 106

Participatory action research (PAR) seeks to build new understandings of power while also changing power relations. PAR seeks to shift control over the construction of knowledge and truth from the historically privileged to the historically marginalised 107 108 and increase participant understandings of injustice (conscientisation) 109 in order to build solidarity 110 and transform systems and institutions. PAR explores and recognises different sources of power (eg, social position, nationality and cultural knowledge) and applications of power (eg, via citizen-led collective action 111 ). This research methodology typically entails the use of tools, such as community meetings, resource mapping, problem identification, visioning and diaries that draw out the priorities and perspectives of the communities participating, rather than reflecting a priori theory. It is apt for exploratory questions, as well as for bringing stakeholders together to cocreate solutions to health systems challenges. 112

Historical research aims to generate or regenerate explanatory narratives relating to past events, places or people. Historical evidence includes visual, audio and text-based materials (archival material, communications, policy documents and project reports) and first-person accounts (oral histories). The study of history can illuminate broad power-related themes that continue to be relevant, such as the interface between individual liberty and domestic governmental health objectives 113 ; medical experimentation, social control and scientific racism 114 115 ; corporate profit making, governmental interference and population health 116 ; and global health as a vehicle for state-craft, diplomacy, population control and Western-centric conceptions of charity. 8 97 117–119 Historical studies also offer broader explanatory value as ‘cases’ for the development of theory related to power 27 28 and as case studies for contemporary policy debates. Insofar as traditional historical approaches can privilege written work, it may omit the perspectives of historically oppressed groups. To combat this tendency, alternative methods such as participatory oral historical or community-based sourcing of visual, audio and text-based records not located in ‘official’ repositories open up alternative analytical possibilities.

More research on power in health policy and systems is needed. Linking empirical inquiry with theory and methodologies, with attention to positionality strengthens the rigour of such research and can help improve the depth and breadth of knowledge regarding root causes of inequities in health. This paper guides readers through the multiple stages involved, and a range of theories and methodologies that may be used, in developing a study focused on power in health policy and systems. It also seeks to push the HPSR field to challenge the political economy of research and destabilise hierarchies of knowledge through greater honesty about how power dynamics influence the research endeavour itself. Through the analysis of power in health policies and systems, we encourage researchers to expand the boundaries of how we may address inequities of health, to surface new insights, theories and approaches pertaining to power and, ultimately, to contribute to a more just world.

Ethics statements

Patient consent for publication.

Not applicable.

Acknowledgments

Walter Flores was part of discussions during which this paper was conceived. We would like to thank members of the Social Science Approaches for Research and Engagement in Health Policy and Systems (SHAPES) Thematic Working Group of Health Systems Global for their feedback on the initial concept note. We would like to thank Prachi Sanghavi (University of Chicago) and Michelle Friedner (University of Chicago) for their review of sections of the paper during development.

  • Shawar YR ,
  • World Health Organization
  • Friel S , et al
  • Dusenberry J
  • de Lacy-Vawdon C ,
  • Livingstone C
  • Pfeiffer J ,
  • Nichter M , Critical Anthropology of Global Health Special Interest Group
  • Douglas R ,
  • Williams OD , et al
  • Chattopadhyay S , et al
  • Schaaf M , et al
  • Ferguson A ,
  • Bhatti A , et al
  • Erasmus E ,
  • Loewenson R ,
  • Laurell A ,
  • Dalglish SL ,
  • Agyepong IA
  • Woolcock M ,
  • Szreter S ,
  • Sivaramakrishnan K
  • Parkhurst J
  • Storeng KT ,
  • Mehanni S ,
  • Acharya B , et al
  • Mafuta EM ,
  • Dieleman MA ,
  • Essink L , et al
  • Keikelame MJ ,
  • Gaventa J ,
  • Airhihenbuwa CO
  • Morgan R , et al
  • Hankivsky O ,
  • Hunting G , et al
  • Kapilashrami A ,
  • ↵ Decolonising COVID-19 . Lancet Glob Health 2020 ; 8 : e612 . doi:10.1016/S2214-109X(20)30134-0 pmid: 32353297 OpenUrl PubMed
  • Barnett M ,
  • Veneklasen L ,
  • Abiiro GA ,
  • Lehmann U ,
  • Parashar R ,
  • Blanchet K ,
  • Webster C ,
  • Borgatti SP ,
  • Brass DJ , et al
  • Etemadi M ,
  • Kangarani HM , et al
  • Shiffman J ,
  • Rotarou ES ,
  • Sakellariou D
  • Collinson S
  • Auffray C ,
  • Balling R ,
  • Barroso I , et al
  • Schintler LA ,
  • Pastorino R ,
  • De Vito C ,
  • Migliara G , et al
  • Shafqat S ,
  • Kishwer S ,
  • Rasool RU , et al
  • Atteberry P ,
  • Dzenowagis J ,
  • Brownstein JS , et al
  • Sieleunou I ,
  • Turcotte-Tremblay A-M ,
  • Fotso J-CT , et al
  • Yazdannik A ,
  • Yousefy A ,
  • Mohammadi S
  • Unterhalter E
  • Connelly M ,
  • Connelly MJ
  • Hammersley M ,
  • Creswell J ,
  • Stryker R ,
  • Spangler SA
  • Aryeetey GC ,
  • Jehu-Appiah C ,
  • Kotoh AM , et al
  • Mathias K ,
  • Gaitonde R , et al
  • Hernández A ,
  • Hurtig A-K ,
  • Goicolea I , et al
  • Adekeye O , et al
  • Gutiérrez ER ,
  • Quevedo Velez E
  • Hyder AA , et al
  • Béhague DP ,
  • Kanhonou LG ,
  • Filippi V , et al
  • Hanefeld J ,
  • Rodríguez DC ,
  • Harouna A , et al
  • George AS ,
  • Harvey SA , et al
  • Nisbett N ,
  • Gillespie S ,
  • Haddad L , et al
  • McCollum R ,
  • Taegtmeyer M ,
  • Otiso L , et al
  • Ssali S , et al
  • Theobald S ,
  • Hawkins K , et al
  • Hardeman RR ,
  • Karbeah J'Mag ,
  • Kozhimannil KB
  • Spivak RGGC ,
  • Kingori P ,
  • McPhail-Bell K ,
  • Fredericks B ,
  • Mignolo WD ,
  • Scott K , et al
  • Cairney P ,
  • Studlar D ,
  • Rushton S ,
  • Williams OD
  • Battams S ,
  • Kentikelenis A ,
  • Freedman LP

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2
  • Data supplement 3

SMT, MS and VS are joint first authors.

Handling editor Seye Abimbola

Twitter @globalstopp, @martaschaaf, @veena_sriram, @kerfully, @Sarah_Dlish, @sumeghaasthana, @Ra_Parashar, @martenrobert, @ersacks, @DrKathyReyes

Contributors SMT, MS, VS and KS conceived of the paper. SMT, MS and VS led design of figures 1 and 2 and table 2 and coordinated drafting of different components of the paper. All authors contributed to methodological synthesis and drafting of text and all provided critical input to multiple drafts. SMT, MS and VS act as guarantor to this article.

Funding This collaborative project received no special funding. SMT holds a NHMRC Investigator Award (2020-24) GNT1173004.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Explaining research performance: investigating the importance of motivation

  • Original Paper
  • Open access
  • Published: 23 May 2024
  • Volume 4 , article number  105 , ( 2024 )

Cite this article

You have full access to this open access article

importance of power analysis in research

  • Silje Marie Svartefoss   ORCID: orcid.org/0000-0001-5072-1293 1   nAff4 ,
  • Jens Jungblut 2 ,
  • Dag W. Aksnes 1 ,
  • Kristoffer Kolltveit 2 &
  • Thed van Leeuwen 3  

542 Accesses

6 Altmetric

Explore all metrics

In this article, we study the motivation and performance of researchers. More specifically, we investigate what motivates researchers across different research fields and countries and how this motivation influences their research performance. The basis for our study is a large-N survey of economists, cardiologists, and physicists in Denmark, Norway, Sweden, the Netherlands, and the UK. The analysis shows that researchers are primarily motivated by scientific curiosity and practical application and less so by career considerations. There are limited differences across fields and countries, suggesting that the mix of motivational aspects has a common academic core less influenced by disciplinary standards or different national environments. Linking motivational factors to research performance, through bibliometric data on publication productivity and citation impact, our data show that those driven by practical application aspects of motivation have a higher probability for high productivity. Being driven by career considerations also increases productivity but only to a certain extent before it starts having a detrimental effect.

Similar content being viewed by others

importance of power analysis in research

Theories of Motivation in Education: an Integrative Framework

importance of power analysis in research

How to design bibliometric research: an overview and a framework proposal

importance of power analysis in research

Scientific Truth in a Post-Truth Era: A Review*

Avoid common mistakes on your manuscript.

Introduction

Motivation and abilities are known to be as important factors in explaining employees’ job performance of employees (Van Iddekinge et al. 2018 ), and in the vast scientific literature on motivation, it is common to differentiate between intrinsic and extrinsic motivation factors (Ryan and Deci 2000 ). In this context, path-breaking individuals are said to often be intrinsically motivated (Jindal-Snape and Snape 2006 ; Thomas and Nedeva 2012 ; Vallerand et al. 1992 ), and it has been found that the importance of these of types of motivations differs across occupations and career stages (Duarte and Lopes 2018 ).

In this article, we address the issue of motivation for one specific occupation, namely: researchers working at universities. Specifically, we investigate what motivates researchers across fields and countries (RQ1) and how this motivation is linked to their research performance (RQ2). The question of why people are motivated to do their jobs is interesting to address in an academic context, where work is usually harder to control, and individuals tend to have a lot of much freedom in structuring their work. Moreover, there have been indications that academics possess an especially high level of motivation for their tasks that is not driven by a search for external rewards but by an intrinsic satisfaction from academic work (Evans and Meyer 2003 ; Leslie 2002 ). At the same time, elements of researchers’ performance are measurable through indicators of their publication activity: their productivity through the number of outputs they produce and the impact of their research through the number of citations their publications receive (Aksnes and Sivertsen 2019 ; Wilsdon et al. 2015 ).

Elevating research performance is high on the agenda of many research organisations (Hazelkorn 2015 ). How such performance may be linked to individuals’ motivational aspects has received little attention. Thus, a better understanding of this interrelation may be relevant for developing institutional strategies to foster environments that promote high-quality research and research productivity.

Previous qualitative research has shown that scientists are mainly intrinsically motivated (Jindal-Snape and Snape 2006 ). Other survey-based contributions suggest that there can be differences in motivations across disciplines (Atta-Owusu and Fitjar 2021 ; Lam 2011 ). Furthermore, the performance of individual scientists has been shown to be highly skewed in terms of publication productivity and citation rates (Larivière et al. 2010 ; Ruiz-Castillo and Costas 2014 ). There is a large body of literature explaining these differences. Some focus on national and institutional funding schemes (Hammarfelt and de Rijcke 2015 ; Melguizo and Strober 2007 ) and others on the research environment, such as the presence of research groups and international collaboration (Jeong et al. 2014 ), while many studies address the role of academic rank, age, and gender (see e.g. Baccini et al. 2014 ; Rørstad and Aksnes 2015 ). Until recently, less emphasis has been placed on the impact of researchers’ motivation. Some studies have found that different types of motivations drive high levels of research performance (see e.g. Horodnic and Zaiţ 2015 ; Ryan and Berbegal-Mirabent 2016 ). However, researchers are only starting to understand how this internal drive relates to research performance.

While some of the prior research on the impact of motivation depends on self-reported research performance evaluations (Ryan 2014 ), the present article combines survey responses with actual bibliometric data. To investigate variation in research motivation across scientific fields and countries, we draw on a large-N survey of economists, cardiologists, and physicists in Denmark, Norway, Sweden, the Netherlands, and the UK. To investigate how this motivation is linked to their research performance, we map the survey respondents’ publication and citation data from the Web of Science (WoS).

This article is organised as follows. First, we present relevant literature on research performance and motivation. Next, the scientific fields and countries are then presented before elaborating on our methodology. In the empirical analysis, we investigate variations in motivation across fields, gender, age, and academic position and then relate motivation to publications and citations as our two measures of research performance. In the concluding section, we discuss our findings and implications for national decision-makers and individual researchers.

Motivation and research performance

As noted above, the concepts of intrinsic and extrinsic motivation play an important role in the literature on motivation and performance. Here, intrinsic motivation refers to doing something for its inherent satisfaction rather than for some separable consequence. Extrinsic motivation refers to doing something because it leads to a separable outcome (Ryan and Deci 2000 ).

Some studies have found that scientists are mainly intrinsically motivated (Jindal-Snape and Snape 2006 ; Lounsbury et al. 2012 ). Research interests, curiosity, and a desire to contribute to new knowledge are examples of such motivational factors. Intrinsic motives have also been shown to be crucial when people select research as a career choice (Roach and Sauermann 2010 ). Nevertheless, scientists are also motivated by extrinsic factors. Several European countries have adopted performance-based research funding systems (Zacharewicz et al. 2019 ). In these systems, researchers do not receive direct financial bonuses when they publish, although such practices may occur at local levels (Stephan et al. 2017 ). Therefore, extrinsic motivation for such researchers may include salary increases, peer recognitions, promotion, or expanded access to research resources (Lam 2011 ). According to Tien and Blackburn ( 1996 ), both types of motivations operate simultaneously, and their importance vary and may depend on the individual’s circumstances, personal situation, and values.

The extent to which different kinds of motivations play a role in scientists’ performance has been investigated in several studies. In these studies, bibliometric indicators based on the number of publications are typically used as outcome measures. Such indicators play a critical role in various contexts in the research system (Wilsdon et al. 2015 ), although it has also been pointed out that individuals can have different motivations to publish (Hangel and Schmidt-Pfister 2017 ).

Based on a survey of Romanian economics and business administration academics combined with bibliometric data, Horodnic and Zait ( 2015 ) found that intrinsic motivation was positively correlated with research productivity, while extrinsic motivation was negatively correlated. Their interpretations of the results are that researchers motivated by scientific interest are more productive, while researchers motivated by extrinsic forces will shift their focus to more financially profitable activities. Similarly, based on the observation that professors continue to publish even after they have been promoted to full professor, Finkelstein ( 1984 ) concluded that intrinsic rather than extrinsic motivational factors have a decisive role regarding the productivity of academics.

Drawing on a survey of 405 research scientists working in biological, chemical, and biomedical research departments in UK universities, Ryan ( 2014 ) found that (self-reported) variations in research performance can be explained by instrumental motivation based on financial incentives and internal motivation based on the individual’s view of themselves (traits, competencies, and values). In the study, instrumental motivation was found to have a negative impact on research performance: As the desire for financial rewards increase, the level of research performance decreases. In other words, researchers mainly motivated by money will be less productive and effective in their research. Contrarily, internal motivation was found to have a positive impact on research performance. This was explained by highlighting that researchers motivated by their self-concept set internal standards that become a reference point that reinforces perceptions of competency in their environments.

Nevertheless, it has also been argued that intrinsic and extrinsic motivations for publishing are intertwined (Ma 2019 ). According to Tien and Blackburn ( 1996 ), research productivity is neither purely intrinsically nor purely extrinsically motivated. Publication activity is often a result of research, which may be intrinsically motivated or motivated by extrinsic factors such as a wish for promotion, where the number of publications is often a part of the assessment (Cruz-Castro and Sanz-Menendez 2021 ; Tien 2000 , 2008 ).

The negative relationship between external/instrumental motivation and performance and the positive relationship between internal/self-concept motivation and performance are underlined by Ryan and Berbegal-Mirabent ( 2016 ). Drawing on a fuzzy set qualitative comparative analysis of a random sampling of 300 of the original respondents from Ryan ( 2014 ), they find that scientists working towards the standards and values they identify with, combined with a lack of concern for instrumental rewards, contribute to higher levels of research performance.

Based on the above, this article will address two research questions concerning different forms of motivation and the relationship between motivation and research performance.

How does the motivation of researchers vary across fields and countries?

How do different types of motivations affect research performance?

In this study, the roles of three different motivational factors are analysed. These are scientific curiosity, practical and societal applications, and career progress. The study aims to assess the role of these specific motivational factors and not the intrinsic-extrinsic distinction more generally. Of the three factors, scientific curiosity most strongly relates to intrinsic motivation; practical and societal applications also entail strong intrinsic aspects. On the other hand, career progress is linked to extrinsic motivation.

In addition to variation in researchers’ motivations by field and country, we consider differences in relation to age, position and gender. Additionally, when investigating how motivation relates to scientific performance we control for the influence of age, gender, country and funding. These are dimensions where differences might be found in motivational factors given that scientific performance, particularly publication productivity, has been shown to differ along these dimensions (Rørstad and Aksnes 2015 ).

Research context: three fields, five countries

To address the research question about potential differences across fields and countries, the study is based on a sample consisting of researchers in three different fields (cardiology, economics, and physics) and five countries (Denmark, Norway, Sweden, the Netherlands, and the UK). Below, we describe this research context in greater detail.

The fields represent three different domains of science: medicine, social sciences, and the natural sciences, where different motivational factors may be at play. This means that the fields cover three main areas of scientific investigations: the understanding of the world, the functioning of the human body, and societies and their functions. The societal role and mission of the fields also differ. While a primary aim of cardiology research and practice is to reduce the burden of cardiovascular disease, physics research may drive technology advancements, which impacts society. Economics research may contribute to more effective use of limited resources and the management of people, businesses, markets, and governments. In addition, the fields also differ in publication patterns (Piro et al. 2013 ). The average number of publications per researcher is generally higher in cardiology and physics than in economics (Piro et al. 2013 ). Moreover, cardiologists and physicists mainly publish in international scientific journals (Moed 2005 ; Van Leeuwen 2013 ). In economics, researchers also tend to publish books, chapters, and articles in national languages, in addition to international journal articles (Aksnes and Sivertsen 2019 ; van Leeuwen et al. 2016 ).

We sampled the countries with a twofold aim. On the one hand, we wanted to have countries that are comparable so that differences in the development of the science systems, working conditions, or funding availability would not be too large. On the other hand, we also wanted to assure variation among the countries regarding these relevant framework conditions to ensure that our findings are not driven by a specific contextual condition.

The five countries in the study are all located in the northwestern part of Europe, with science systems that are foremost funded by block grant funding from the national governments (unlike, for example, the US, where research grants by national funding agencies are the most important funding mechanism) (Lepori et al. 2023 ).

In all five countries, the missions of the universities are composed of a blend of education, research, and outreach. Furthermore, the science systems in Norway, Denmark, Sweden, and the Netherlands have a relatively strong orientation towards the Anglo-Saxon world in the sense that publishing in the national language still exists, but publishing in English in internationally oriented journals in which English is the language of publications is the norm (Kulczycki et al. 2018 ). These framework conditions ensure that those working in the five countries have somewhat similar missions to fulfil in their professions while also belonging to a common mainly Anglophone science system.

However, in Norway, Denmark, Sweden, and the Netherlands, research findings in some social sciences, law, and the humanities are still oriented on publishing in various languages. Hence, we avoided selecting the humanities field for this study due to a potential issue with cross-country comparability (Sivertsen 2019 ; Sivertsen and Van Leeuwen 2014 ; Van Leeuwen 2013 ).

Finally, the chosen countries vary regarding their level of university autonomy. When combining the scores for organisational, financial, staffing, and academic autonomy presented in the latest University Autonomy in Europe Scorecard presented by the European University Association (EUA), the UK, the Netherlands, and Denmark have higher levels of autonomy compared to Norway and Sweden, with Swedish universities having less autonomy than their Norwegian counterparts (Pruvot et al. 2023 ). This variation is relevant for our study, as it ensures that our findings are not driven by response from a higher education system with especially high or low autonomy, which can influence the motivation and satisfaction of academics working in it (Daumiller et al. 2020 ).

Data and methods

The data used in this article are a combination of survey data and bibliometric data retrieved from the WoS. The WoS database was chosen for this study due to its comprehensive coverage of research literature across all disciplines, encompassing the three specific research areas under analysis. Additionally, the WoS database is well-suited for bibliometric analyses, offering citation counts essential for this study.

Two approaches were used to identify the sample for the survey. Initially, a bibliometric analysis of the WoS using journal categories (‘Cardiac & cardiovascular systems’, ‘Economics’, and ‘Physics’) enabled the identification of key institutions with a minimum number of publications within these journal categories. Following this, relevant organisational units and researchers within these units were identified through available information on the units’ webpages. Included were employees in relevant academic positions (tenured academic personnel, post-docs, and researchers, but not PhD students, adjunct positions, guest researchers, or administrative and technical personnel).

Second, based on the WoS data, people were added to this initial sample if they had a minimum number of publications within the field and belonged to any of the selected institutions, regardless of unit affiliation. For economics, the minimum was five publications within the selected period (2011–2016). For cardiology and physics, where the individual publication productivity is higher, the minimum was 10 publications within the same period. The selection of the minimum publication criteria was based on an analysis of publication outputs in these fields between 2011 and 2016. The thresholds were applied to include individuals who are more actively engaged in research while excluding those with more peripheral involvement. The higher thresholds for cardiology and physics reflect the greater frequency of publications (and co-authorship) observed in these fields.

The benefit of this dual-approach strategy to sampling is that we obtain a more comprehensive sample: the full scope of researchers within a unit and the full scope of researchers that publish within the relevant fields. Overall, 59% of the sample were identified through staff lists and 41% through the second step involving WoS data.

The survey data were collected through an online questionnaire first sent out in October 2017 and closed in December 2018. In this period, several reminders were sent to increase the response rate. Overall, the survey had a response rate of 26.1% ( N  = 2,587 replies). There were only minor variations in response rates between scientific fields; the variations were larger between countries. Tables  1 and 2 provide an overview of the response rate by country and field.

Operationalisation of motivation

Motivation was measured by a question in the survey asking respondents what motivates or inspires them to conduct research, of which three dimensions are analysed in the present paper. The two first answer categories were related to intrinsic motivation (‘Curiosity/scientific discovery/understanding the world’ and ‘Application/practical aims/creating a better society’). The third answer category was more related to extrinsic motivation (‘Progress in my career [e.g. tenure/permanent position, higher salary, more interesting/independent work]’). Appendix Table A1 displays the distribution of respondents and the mean value and standard deviation for each item.

These three different aspects of motivation do not measure the same phenomenon but seem to capture different aspects of motivation (see Pearson’s correlation coefficients in Appendix Table A2 ). There is no correlation between curiosity/scientific discovery, career progress, and practical application. However, there is a weak but significant positive correlation between career progress and practical application. These findings indicate that those motivated by career considerations to some degrees also are motivated by practical application.

In addition to investigating how researchers’ motivation varies by field and country, we consider the differences in relation to age, position and gender as well. Field of science differentiates between economics, cardiology, physics, and other fields. The country variables differentiate between the five countries. Age is a nine-category variable. The position variable differentiates between full professors, associate professors, and assistant professors. The gender variable has two categories (male or female). For descriptive statistics on these additional variables, see Appendix Table A3 .

Publication productivity and citation impact

To analyse the respondents’ bibliometric performance, the Centre for Science and Technology Studies (CWTS) in-house WoS database was used. We identified the publication output of each respondent during 2011–2017 (limited to regular articles, reviews, and letters). For 16% of the respondents, no publications were identified in the database. These individuals had apparently not published in international journals covered by the database. However, in some cases, the lack of publications may be due to identification problems (e.g. change of names). Therefore, we decided not to include the latter respondents in the analysis.

Two main performance measures were calculated: publication productivity and citation impact. As an indicator of productivity, we counted the number of publications for each individual (as author or co-author) during the period. To analyse the citation impact, a composite measure using three different indicators was used: total number of citations (total citations counts for all articles they have contributed to during the period, counting citations up to and including 2017), normalised citation score (MNCS), and proportion of publications among the 10% most cited articles in their fields (Waltman and Schreiber 2013 ). Here, the MNCS is an indicator for which the citation count of each article is normalised by subject, article type, and year, where 1.00 corresponds to the world average (Waltman et al. 2011 ). Based on these data, averages for the total publication output of each respondent were calculated. By using three different indicators, we can avoid biases or limitations attached to each of them. For example, using the MNCS, a respondent with only one publication would appear as a high impact researcher if this article was highly cited. However, when considering the additional indicator, total citation counts, this individual would usually perform less well.

The bibliometric scores were skewedly distributed among the respondents. Rather than using the absolute numbers, in this paper, we have classified the respondents into three groups according to their scores on the indicators. Here, we have used percentile rank classes (tertiles). Percentile statistics are increasingly applied in bibliometrics (Bornmann et al. 2013 ; Waltman and Schreiber 2013 ) due to the presence of outliers and long tails, which characterise both productivity and citation distributions.

As the fields analysed have different publication patterns, the respondents within each field were ranked according to their scores on the indicators, and their percentile rank was determined. For the productivity measure, this means that there are three groups that are equal in terms of number of individuals included: 1: Low productivity (the group with the lowest publication numbers, 0–33 percentile), 2: Medium productivity (33–67 percentile), and 3: High productivity (67–100 percentile). For the citation impact measure, we conducted a similar percentile analysis for each of the three composite indicators. Then everyone was assigned to one of the three percentile groups based on their average score: 1: Low citation impact (the group with lowest citation impact, 0–33 percentile), 2: Medium citation impact (33–67 percentile), and 3: High citation impact (67–100 percentile), cf. Table  3 . Although it might be argued that the application of tertile groups rather than absolute numbers leads to a loss of information, the advantage is that the results are not influenced by extreme values and may be easier to interpret.

Via this approach, we can analyse the two important dimensions of the respondents’ performance. However, it should be noted that the WoS database does not cover the publication output of the fields equally. Generally, physics and cardiology are very well covered, while the coverage of economics is somewhat lower due to different publication practices (Aksnes and Sivertsen 2019 ). This problem is accounted for in our study by ranking the respondents in each field separately, as described above. In addition, not all respondents may have been active researchers during the entire 2011–2017 period, which we have not adjusted for. Despite these limitations, the analysis provides interesting information on the bibliometric performance of the respondents at an aggregated level.

Regression analysis

To analyse the relationship between motivation and performance, we apply multinomial logistic regression rather then ordered logistic regression because we assume that the odds for respondents belonging in each category of the dependent variables are not equal (Hilbe 2017 ). The implication of this choice of model is that the model tests the probability of respondents being in one category compared to another (Hilbe 2017 ). This means that a reference or baseline category must be selected for each of the dependent variables (productivity and citation impact). Furthermore, the coefficient estimates show how the probability of being in one of the other categories decreases or increases compared to being in the reference category.

For this analysis, we selected the medium performers as the reference or baseline category for both our dependent variables. This enables us to evaluate how the independent variables affect the probability of being in the low performers group compared to the medium performers and the high performers compared to the medium performers.

To evaluate model fit, we started with a baseline model where only types of motivations were included as independent variables. Subsequently, the additional variables were introduced into the model, and based on measures for model fit (Pseudo R 2 , -2LL, and Akaike Information Criterion (AIC)), we concluded that the model with all additional variables included provides the best fit to the data for both the dependent variables (see Appendix Tables A5 and A6 ). Additional control variables include age, gender, country, and funding. We include these variables as controls to obtain robust effects of motivation and not effects driven by other underlying factors. The type of funding was measured by variables where the respondent answered the following question: ‘How has your research been funded the last five years?’ The funding variable initially consisted of four categories: ‘No source’, ‘Minor source’, ‘Moderate source’, and ‘Major source’. In this analysis, we have combined ‘No source’ and ‘Minor source’ into one category (0) and ‘Moderate source’ and ‘Major source’ into another category (1). Descriptive statistics for the funding variables are available in Appendix Table A4 . We do not control for the influence of field due to how the scientific performance variables are operationalised, the field normalisation implies that there are no variations across fields. We also do not control for position, as this variable is highly correlated with age, and we are therefore unable to include these two variables in the same model.

The motivation of researchers

In the empirical analysis, we first investigate variation in motivation and then relate it to publications and citations as our two measures of research performance.

As Fig.  1 shows, the respondents are mainly driven by curiosity and the wish to make scientific discoveries. This is by far the most important motivation. Practical application is also an important source of motivation, while making career progress is not identified as being very important.

figure 1

Motivation of researchers– percentage

As Table  4 shows, at the level of fields, there are no large differences, and the motivational profiles are relatively similar. However, physicists tend to view practical application as somewhat less important than cardiologists and economists. Moreover, career progress is emphasised most by economists. Furthermore, as table 5 shows, there are some differences in motivation between countries. For curiosity/scientific discovery and practical application, the variations across countries are minor, but researchers in Denmark tend to view career progress as somewhat more important than researchers in the other countries.

Furthermore, as table 6 shows, women seem to view practical application and career progress as a more important motivation than men; these differences are also significant. Similar gender disparities have also been reported in a previous study (Zhang et al. 2021 ).

There are also some differences in motivation across the additional variables worth mentioning, as Table  7 shows. Unsurprisingly, perhaps, there is a significant moderate negative correlation between age, position, and career progress. This means that the importance of career progress as a motivation seems to decrease with increased age or a move up the position hierarchy.

In the second part of the analysis, we relate motivation to research performance. We first investigate publications and productivity using the percentile groups. Here, we present the results we use using predicted probabilities because they are more easily interpretable than coefficient estimates. For the model with productivity percentile groups as the dependent variable, the estimates for career progress were negative when comparing the medium productivity group to the high productivity group and the medium productivity group to the low productivity group. This result indicates that the probability of being in the high and low productivity groups decreases compared to the medium productivity group as the value of career progress increases, which may point towards a curvilinear relationship between the variables. A similar pattern was also found in the model with the citation impact group as the dependent variable, although it was not as apparent.

As a result of this apparent curvilinear relationship, we included quadric terms for career progress in both models, and these were significant. Likelihood ratio tests also show that the models with quadric terms included have a significant better fit to the data. Furthermore, the AIC was also lower for these models compared to the initial models where quadric terms were not included (see Appendix Tables A5 – A7 ). Consequently, we base our results on these models, which can be found in Appendix Table A7 . Due to a low number of respondents in the low categories of the scientific curiosity/discovery variable, we also combined the first three values into one to include it as a variable in the regression analysis, which results in a reduced three-value variable for scientific curiosity/discovery.

Results– productivity percentile group

Using the productivity percentile group as the dependent variable, we find that the motivational aspects of practical application and career progress have a significant effect on the probability of being in the low, medium, or high productivity group but not curiosity/scientific discovery. In Figs.  2 and 3 , each line represents the probability of being in each group across the scale of each motivational aspect.

figure 2

Predicted probability for being in each of the productivity groups according to the value on the ‘practical application’ variable

figure 3

Predicted probability of being in the low and high productivity groups according to the value on the ‘progress in my career’ variable

Figure  2 shows that at low values of application, there are no significant differences between the probability of being in either of the groups. However, from around value 3 of application, the differences between the probability of being in each group increases, and these are also significant. As a result, we concluded that high scores on practical application is related to increased probability of being in the high productivity group.

In Fig.  3 , we excluded the medium productivity group from the figure because there are no significant differences between this group and the high and low productivity group. Nevertheless, we found significant differences between the low productivity and the high productivity group. Since we added a quadric term for career progress, the two lines in Fig.  3 have a curvilinear shape. Figure  3 shows that there are only significant differences between the probability of being in the low or high productivity group at mid and high values of career progress. In addition, the probability of being in the high productivity group is at its highest value at mid values of career progress. This indicates that being motivated by career progress increases the probability of being in the high productivity group but only up to a certain point before it begins to have a negative effect on the probability of being in this group.

We also included age and gender as variables in the model, and Figs.  4 and 5 show the results. Figure  4 shows that age especially impacts the probability of being in the high productivity and low productivity groups. The lowest age category (< 30–34 years) has the highest probability for being in the low productivity group, while from the mid age category (50 years and above), the probability is highest for being in the high productivity group. This means that increased age is related to an increased probability of high productivity. The variable controlling for the effect of funding also showed some significant results (see Appendix Table A7 ). The most relevant finding is that receiving competitive grants from external public sources had a very strong and significant positive effect on being in the high productivity group and a medium-sized significant negative effect on being in the low productivity group. This shows that receiving external funding in the form of competitive grants has a strong effect on productivity.

figure 4

Predicted probability of being in each of the productivity groups according to age

Figure  5 shows that there is a difference between male and female respondents. For females, there are no differences in the probability of being in either of the groups, while males have a higher probability of being in the high productivity group compared to the medium and low productivity groups.

figure 5

Results– citation impact group

For the citation impact group as the dependent variable, we found that career progress has a significant effect on the probability of being in the low citation impact group or the high citation group but not curiosity/scientific discovery or practical application. Figure  6 shows how the probability of being in the high citation impact group increases as the value on career progress increases and is higher than that of being in the low citation impact group, but only up to a certain point. This indicates that career progress increases the probability of being in the high citation impact group to some degree but that too high values are not beneficial for high citation impact. However, it should also be noted that the effect of career progress is weak and that it is difficult to conclude on how very low or very high values of career progress affect the probability of being in the two groups.

figure 6

Predicted probability for being in each of the citation impact groups according to the value on the ‘progress in my career’ variable

We also included age and gender as variables in the model, and we found a similar pattern as in the model with productivity percentile group as the dependent variable. However, the relationship between the variables is weaker in this model with the citation impact group as the dependent variable. Figure  7 shows that the probability of being in the high citation impact group increases with age, but there is no significant difference between the probability of being in the high citation impact group and the medium citation impact group. We only see significant differences when each of these groups is compared to the low citation impact group. In addition, the increase in probability is more moderate in this model.

figure 7

Predicted probability of being in each of the citation impact groups according to age

Figure  8 shows that there are differences between male and female respondents. Male respondents have a significant higher probability of being in the medium or high citation impact group compared to the low citation impact group, but there is no significant difference in the probability between the high and medium citation impact groups. For female respondents, there are no significant differences. Similarly, for age, the effect also seems to be more moderate in this model compared to the model with productivity percentile groups as the dependent variable. In addition, the effect of funding sources is more moderate on citation impact compared to productivity (see Appendix Table A7 ). Competitive grants from external public sources still have the most relevant effect, but the effect size and level of significance is lower than for the model where productivity groups are the dependent variable. Respondents who received a large amount of external funding through competitive grants are more likely to be highly cited, but the effect size is much smaller, and the result is only significant at p  < 0.1. Those who do not receive much funding from this source are more likely to be in the low impact group. Here, the effect size is large, and the coefficient is highly significant.

figure 8

Predicted probability for being in each of the citation impact groups according to gender

Concluding discussion

This article aimed to explore researchers’ motivations and investigate the impact of motivation on research performance. By addressing these issues across several fields and countries, we provided new evidence on the motivation and performance of researchers.

Most researchers in our large-N survey found curiosity/scientific discovery to be a crucial motivational factor, with practical application being the second most supported aspect. Only a smaller number of respondents saw career progress as an important inspiration to conduct their research. This supports the notion that researchers are mainly motivated by core aspects of academic work such as curiosity, discoveries, and practical application of their knowledge and less so by personal gains (see Evans and Meyer 2003 ). Therefore, our results align with earlier research on motivation. In their interview study of scientists working at a government research institute in the UK, Jindal-Snape and Snape ( 2006 ) found that the scientists were typically motivated by the ability to conduct high quality, curiosity-driven research and de-motivated by the lack of feedback from management, difficulty in collaborating with colleagues, and constant review and change. Salaries, incentive schemes, and prospects for promotion were not considered a motivator for most scientists. Kivistö and colleagues ( 2017 ) also observed similar patterns in more recent survey data from Finnish academics.

As noted in the introduction, the issue of motivation has often been analysed in the literature using the intrinsic-extrinsic distinction. In our study, we have not applied these concepts directly. However, it is clear that the curiosity/scientific discovery item should be considered a type of intrinsic motivation, as it involves performing the activity for its inherent satisfaction. Moreover, the practical application item should probably be considered mainly intrinsic, as it involves creating a better society (for others) without primarily focusing on gains for oneself. The career progress item explicitly mentions personal gains such as position and higher salary and is, therefore, a type of extrinsic motivation. This means that our results support the notion that there are very strong elements of intrinsic motivation among researchers (Jindal-Snape and Snape 2006 ).

When analysing the three aspects of motivation, we found some differences. Physicists tend to view practical application as less important than researchers in the two other fields, while career progress was most emphasised by economists. Regarding country differences, our data suggest that career progress is most important for researchers in Denmark. Nevertheless, given the limited effect sizes, the overall picture is that motivational factors seem to be relatively similar regarding disciplinary and country dimensions.

Regarding gender aspects of motivation, our data show that women seem to view practical application and career progress as more important than men. One explanation for this could be the continued gender differences in academic careers, which tend to disadvantage women, thus creating a greater incentive for female scholars to focus on and be motivated by career progress aspects (Huang et al. 2020 ; Lerchenmueller and Sorenson 2018 ). Unsurprisingly, respondents’ age and academic position influenced the importance of different aspects of motivation, especially regarding career progress. Here, increased age and moving up the positional hierarchy are linked to a decrease in importance. This highlights that older academics and those in more senior positions drew more motivation from other sources that are not directly linked to their personal career gains. This can probably be explained by the academic career ladder plateauing at a certain point in time, as there are often no additional titles and very limited recognition beyond becoming a full professor. Finally, the type of funding that scholars received also had an influence on their productivity and, to a certain extent, citation impact.

Overall, there is little support that researchers across various fields and countries are very different when it comes to their motivation for conducting research. Rather, there seems to be a strong common core of academic motivation that varies mainly by gender and age/position. Rather than talking about researchers’ motivation per se, our study, therefore, suggests that one should talk about motivation across gender, at different stages of the career, and, to a certain degree, in different fields. Thus, motivation seems to be a multi-faceted construct, and the importance of different aspects of motivation vary between different groups.

In the second step of our analysis, we linked motivation to performance. Here, we focused on both scientific productivity and citation impact. Regarding the former, our data show that both practical application and career progress have a significant effect on productivity. The relationship between practical application aspects and productivity is linear, meaning that those who indicate that this aspect of motivation is very important to them have a higher probability of being in the high productivity group. The relationship between career aspects of motivation and productivity is curve linear, and we found only significant differences between the high and low productivity groups at mid and high values of the motivation scale. This indicates that being more motivated by career progress increases productivity but only to a certain extent before it starts having a detrimental effect. A common assumption has been that intrinsic motivation has a positive and instrumental effect and extrinsic motivation has a negative effect on the performance of scientists (Peng and Gao 2019 ; Ryan and Berbegal-Mirabent 2016 ). Our results do not generally support this, as motives related to career progress are positively linked with productivity only to a certain point. Possibly, this can be explained by the fact that the number of publications is often especially important in the context of recruitment and promotion (Langfeldt et al. 2021 ; Reymert et al. 2021 ). Thus, it will be beneficial from a scientific career perspective to have many publications when trying to get hired or promoted.

Regarding citation impact, our analysis highlights that only the career aspects of motivation have a significant effect. Similar to the results regarding productivity, being more motivated by career progress increases the probability of being in the high citation impact group, but only to a certain value when the difference stops being significant. It needs to be pointed out that the effect strength is weaker than in the analysis that focused on productivity. Thus, these results should be treated with greater caution.

Overall, our results shed light on some important aspects regarding the motivation of academics and how this translates into research performance. Regarding our first research question, it seems to be the case that there is not one type of motivation but rather different contextual mixes of motivational aspects that are strongly driven by gender and the academic position/age. We found only limited effects of research fields and even less pronounced country effects, suggesting that while situational, the mix of motivational aspects also has a common academic core that is less influenced by different national environments or disciplinary standards. Regarding our second research question, our results challenge the common assumption that intrinsic motivation has a positive effect and extrinsic motivation has a negative effect on the performance of scientists. Instead, we show that motives related to career are positively linked to productivity at least to a certain point. Our analysis regarding citation patterns achieved similar results. Combined with the finding regarding the importance of current academic position and age for specific patterns of motivation, it could be argued that the fact that the number of publications is often used as a measurement in recruitment and promotion makes academics that are more driven by career aspects publish more, as this is perceived as a necessary condition for success.

Our study has a clear focus on the research side of academic work. However, most academics do both teaching and research, which raises the question of how far our results can also inform our knowledge regarding the motivation for teaching. On the one hand, previous studies have highlighted that intrinsic motivation is also of high importance for the quality of teaching (see e.g. Wilkesmann and Lauer 2020 ), which fits well with our findings. At the same time, the literature also highlights persistent goal conflicts of academics (see e.g. Daumiller et al. 2020 ), given that extra time devoted to teaching often comes at the costs of publications and research. Given that other findings in the literature show that research performance continues to be of higher importance than teaching in academic hiring processes (Reymert et al. 2021 ), the interplay between research performance, teaching performance, and different types of motivation is most likely more complicated and demands further investigation.

While offering several relevant insights, our study still comes with certain limitations that must be considered. First, motivation is a complex construct. Thus, there are many ways one could operationalise it, and not one specific understanding so far seems to have emerged as best practice. Therefore, our approach to operationalisation and measurement should be seen as an addition to this broader field of measurement approaches, and we do not claim that this is the only sensible way of doing it. Second, we rely on self-reported survey data to measure the different aspects of motivation in our study. This means that aspects such as social desirability could influence how far academics claim to be motivated by certain aspects. For example, claiming to be mainly motivated by personal career gains may be considered a dubious motive among academics.

With respect to the bibliometric analyses, it is important to realise that we have lumped researchers into categories, thereby ‘smoothening’ the individual performances into group performances under the various variables. This has an effect that some extraordinary scores might have become invisible in our study, which might have been interesting to analyse separately, throwing light on the relationships we studied. However, breaking the material down to the lower level of analysis of individual researchers also comes with a limitation, namely that at the level of the individual academic, bibliometrics tend to become quite sensitive for the underlying numbers, which in itself is then hampered by the coverage of the database used, the publishing cultures in various countries and fields, and the age and position of the individuals. Therefore, the level of the individual academic has not been analysed in our study, how interesting and promising outcomes might have been. even though we acknowledge that such a study could yield interesting results.

Finally, our sample is drawn from northwestern European countries and a limited set of disciplines. We would argue that we have sufficient variation in countries and disciplines to make the results relevant for a broader audience context. While our results show rather small country or discipline differences, we are aware that there might be country- or discipline-specific effects that we cannot capture due to the sampling approach we used. Moreover, as we had to balance sufficient variation in framework conditions with the comparability of cases, the geographical generalisation of our results has limitations.

This article investigated what motivates researchers across different research fields and countries and how this motivation influences their research performance. The analysis showed that the researchers are mainly motivated by scientific curiosity and practical application and less so by career considerations. Furthermore, the analysis shows that researchers driven by practical application aspects of motivation have a higher probability of high productivity. Being driven by career considerations also increases productivity but only to a certain extent before it starts having a detrimental effect.

The article is based on a large-N survey of economists, cardiologists, and physicists in Denmark, Norway, Sweden, the Netherlands, and the UK. Building on this study, future research should expand the scope and study the relationship between motivation and productivity as well as citation impact in a broader disciplinary and geographical context. In addition, we encourage studies that develop and validate our measurement and operationalisation of aspects of researchers’ motivation.

Finally, a long-term panel study design that follows respondents throughout their academic careers and investigates how far their motivational patterns shift over time would allow for more fine-grained analysis and thereby a richer understanding of the important relationship between motivation and performance in academia.

Data availability

The data set for this study is available from the corresponding author upon reasonable request.

Aksnes DW, Sivertsen G (2019) A criteria-based assessment of the coverage of Scopus and web of Science. J Data Inform Sci 4(1):1–21. https://doi.org/10.2478/jdis-2019-0001

Article   Google Scholar  

Atta-Owusu K, Fitjar RD (2021) What motivates academics for external engagement? Exploring the effects of motivational drivers and organizational fairness. Sci Public Policy. https://doi.org/10.1093/scipol/scab075 . November, scab075

Baccini A, Barabesi L, Cioni M, Pisani C (2014) Crossing the hurdle: the determinants of individual. Sci Perform Scientometrics 101(3):2035–2062. https://doi.org/10.1007/s11192-014-1395-3

Bornmann L, Leydesdorff L, Mutz R (2013) The use of percentiles and percentile rank classes in the analysis of bibliometric data: opportunities and limits. J Informetrics 7(1):158–165. https://doi.org/10.1016/j.joi.2012.10.001

Cruz-Castro L, Sanz-Menendez L (2021) What should be rewarded? Gender and evaluation criteria for tenure and promotion. J Informetrics 15(3):1–22. https://doi.org/10.1016/j.joi.2021.101196

Daumiller M, Stupnisky R, Janke S (2020) Motivation of higher education faculty: theoretical approaches, empirical evidence, and future directions. Int J Educational Res 99:101502. https://doi.org/10.1016/j.ijer.2019.101502

Duarte H, Lopes D (2018) Career stages and occupations impacts on workers motivations. Int J Manpow 39(5):746–763. https://doi.org/10.1108/IJM-02-2017-0026

Evans IM, Meyer LH (2003) Motivating the professoriate: why sticks and carrots are only for donkeys. High Educ Manage Policy 15(3):151–167. https://doi.org/10.1787/hemp-v15-art29-en

Finkelstein MJ (1984) The American academic profession: a synthesis of social scientific inquiry since World War II. Ohio State University, Columbus

Google Scholar  

Hammarfelt B, de Rijcke S (2015) Accountability in context: effects of research evaluation systems on publication practices, disciplinary norms, and individual working routines in the Faculty of arts at Uppsala University. Res Evaluation 24(1):63–77. https://doi.org/10.1093/reseval/rvu029

Hangel N, Schmidt-Pfister D (2017) Why do you publish? On the tensions between generating scientific knowledge and publication pressure. Aslib J Inform Manage 69(5):529–544. https://doi.org/10.1108/AJIM-01-2017-0019

Hazelkorn E (2015) Rankings and the reshaping of higher education: the battle for world-class excellence. Palgrave McMillan, Basingstoke

Book   Google Scholar  

Hilbe JM (2017) Logistic regression models. Taylor & Francis Ltd, London

Horodnic IA, Zaiţ A (2015) Motivation and research productivity in a university system undergoing transition. Res Evaluation 24(3):282–292

Huang J, Gates AJ, Sinatra R, Barabási A-L (2020) Historical comparison of gender inequality in scientific careers across countries and disciplines. Proceedings of the National Academy of Sciences 117(9):4609–4616. https://doi.org/10.1073/pnas.1914221117

Jeong S, Choi JY, Kim J-Y (2014) On the drivers of international collaboration: the impact of informal communication, motivation, and research resources. Sci Public Policy 41(4):520–531. https://doi.org/10.1093/scipol/sct079

Jindal-Snape D, Snape JB (2006) Motivation of scientists in a government research institute: scientists’ perceptions and the role of management. Manag Decis 44(10):1325–1343. https://doi.org/10.1108/00251740610715678

Kivistö J, Pekkola E, Lyytinen A (2017) The influence of performance-based management on teaching and research performance of Finnish senior academics. Tert Educ Manag 23(3):260–275. https://doi.org/10.1080/13583883.2017.1328529

Kulczycki E, Engels TCE, Pölönen J, Bruun K, Dušková M, Guns R et al (2018) Publication patterns in the social sciences and humanities: evidence from eight European countries. Scientometrics 116(1):463–486. https://doi.org/10.1007/s11192-018-2711-0

Lam A (2011) What motivates academic scientists to engage in research commercialization: gold, ribbon or puzzle? Res Policy 40(10):1354–1368. https://doi.org/10.1016/j.respol.2011.09.002

Langfeldt L, Reymert I, Aksnes DW (2021) The role of metrics in peer assessments. Res Evaluation 30(1):112–126. https://doi.org/10.1093/reseval/rvaa032

Larivière V, Macaluso B, Archambault É, Gingras Y (2010) Which scientific elites? On the concentration of research funds, publications and citations. Res Evaluation 19(1):45–53. https://doi.org/10.3152/095820210X492495

Lepori B, Jongbloed B, Hicks D (2023) Introduction to the handbook of public funding of research: understanding vertical and horizontal complexities. In: Lepori B, Hicks BJ D (eds) Handbook of public funding of research. Edward Elgar Publishing, Cheltenham, pp 1–19

Chapter   Google Scholar  

Lerchenmueller MJ, Sorenson O (2018) The gender gap in early career transitions in the life sciences. Res Policy 47(6):1007–1017. https://doi.org/10.1016/j.respol.2018.02.009

Leslie DW (2002) Resolving the dispute: teaching is academe’s core value. J High Educ 73(1):49–73

Lounsbury JW, Foster N, Patel H, Carmody P, Gibson LW, Stairs DR (2012) An investigation of the personality traits of scientists versus nonscientists and their relationship with career satisfaction: relationship of personality traits and career satisfaction of scientists and nonscientists. R&D Manage 42(1):47–59. https://doi.org/10.1111/j.1467-9310.2011.00665.x

Ma L (2019) Money, morale, and motivation: a study of the output-based research support scheme. Univ Coll Dublin Res Evaluation 28(4):304–312. https://doi.org/10.1093/reseval/rvz017

Melguizo T, Strober MH (2007) Faculty salaries and the maximization of prestige. Res High Educt 48(6):633–668

Moed HF (2005) Citation analysis in research evaluation. Springer, Dordrecht

Netherlands Observatory of Science (NOWT) (2012) Report to the Dutch Ministry of Science, Education and Culture (OC&W). Den Haag 1998

Peng J-E, Gao XA (2019) Understanding TEFL academics’ research motivation and its relations with research productivity. SAGE Open 9(3):215824401986629. https://doi.org/10.1177/2158244019866295

Piro FN, Aksnes DW, Rørstad K (2013) A macro analysis of productivity differences across fields: challenges in the measurement of scientific publishing. J Am Soc Inform Sci Technol 64(2):307–320. https://doi.org/10.1002/asi.22746

Pruvot EB, Estermann T, Popkhadze N (2023) University autonomy in Europe IV. The scorecard 2023. Retrieved from Brussels. https://eua.eu/downloads/publications/eua autonomy scorecard.pdf

Reymert I, Jungblut J, Borlaug SB (2021) Are evaluative cultures national or global? A cross-national study on evaluative cultures in academic recruitment processes in Europe. High Educ 82(5):823–843. https://doi.org/10.1007/s10734-020-00659-3

Roach M, Sauermann H (2010) A taste for science? PhD scientists’ academic orientation and self-selection into research careers in industry. Res Policy 39(3):422–434. https://doi.org/10.1016/j.respol.2010.01.004

Rørstad K, Aksnes DW (2015) Publication rate expressed by age, gender and academic position– A large-scale analysis of Norwegian academic staff. J Informetrics 9(2):317–333. https://doi.org/10.1016/j.joi.2015.02.003

Ruiz-Castillo J, Costas R (2014) The skewness of scientific productivity. J Informetrics 8(4):917–934. https://doi.org/10.1016/j.joi.2014.09.006

Ryan JC (2014) The work motivation of research scientists and its effect on research performance: work motivation of research scientists. R&D Manage 44(4):355–369. https://doi.org/10.1111/radm.12063

Ryan JC, Berbegal-Mirabent J (2016) Motivational recipes and research performance: a fuzzy set analysis of the motivational profile of high-performing research scientists. J Bus Res 69(11):5299–5304. https://doi.org/10.1016/j.jbusres.2016.04.128

Ryan RM, Deci EL (2000) Intrinsic and extrinsic motivations: classic definitions and new directions. Contemp Educ Psychol 25(1):54–67. https://doi.org/10.1006/ceps.1999.1020

Sivertsen G (2019) Understanding and evaluating research and scholarly publishing in the social sciences and humanities (SSH). Data Inform Manage 3(2):61–71. https://doi.org/10.2478/dim-2019-0008

Sivertsen G, Van Leeuwen T (2014) Scholarly publication patterns in the social sciences and humanities and their relationship with research assessment

Stephan P, Veugelers R, Wang J (2017) Reviewers are blinkered by bibliometrics. Nature 544(7651):411–412. https://doi.org/10.1038/544411a

Thomas D, Nedeva M (2012) Characterizing researchers to study research funding agency impacts: the case of the European Research Council’s starting grants. Res Evaluation 21(4):257–269. https://doi.org/10.1093/reseval/rvs020

Tien FF (2000) To what degree does the desire for promotion motivate faculty to perform research? Testing the expectancy theory. Res High Educt 41(6):723–752. https://doi.org/10.1023/A:1007020721531

Tien FF (2008) What kind of faculty are motivated to perform research by the desire for promotion? High Educ 55(1):17–32. https://doi.org/10.1007/s10734-006-9033-5

Tien FF, Blackburn RT (1996) Faculty rank system, research motivation, and faculty research productivity: measure refinement and theory testing. J High Educ 67(1):2. https://doi.org/10.2307/2943901

Vallerand RJ, Pelletier LG, Blais MR, Briere NM, Senecal C, Vallieres EF (1992) The academic motivation scale: a measure of intrinsic, extrinsic, and amotivation in education. Educ Psychol Meas 52(4):1003–1017. https://doi.org/10.1177/0013164492052004025

Van Iddekinge CH, Aguinis H, Mackey JD, DeOrtentiis PS (2018) A meta-analysis of the interactive, additive, and relative effects of cognitive ability and motivation on performance. J Manag 44(1):249–279. https://doi.org/10.1177/0149206317702220

Van Leeuwen T (2013) Bibliometric research evaluations, Web of Science and the social sciences and humanities: A problematic relationship? Bibliometrie - Praxis Und Forschung, September, Bd. 2(2013). https://doi.org/10.5283/BPF.173

Van Leeuwen T, van Wijk E, Wouters PF (2016) Bibliometric analysis of output and impact based on CRIS data: a case study on the registered output of a Dutch university. Scientometrics 106(1):1–16. https://doi.org/10.1007/s11192-015-1788-y

Waltman L, Schreiber M (2013) On the calculation of percentile-based bibliometric indicators. J Am Soc Inform Sci Technol 64(2):372–379. https://doi.org/10.1002/asi.22775

Waltman L, van Eck NJ, van Leeuwen TN, Visser MS, van Raan AFJ (2011) Towards a new crown indicator: an empirical analysis. Scientometrics 87(3):467–481. https://doi.org/10.1007/s11192-011-0354-5

Wilkesmann U, Lauer S (2020) The influence of teaching motivation and new public management on academic teaching. Stud High Educ 45(2):434–451. https://doi.org/10.1080/03075079.2018.1539960

Wilsdon J, Allen L, Belfiore E, Campbell P, Curry S, Hill S, Jones R et al (2015) The metric tide: report of the independent review of the role of metrics in research assessment and management. https://doi.org/10.13140/RG.2.1.4929.1363

Zacharewicz T, Lepori B, Reale E, Jonkers K (2019) Performance-based research funding in EU member states—A comparative assessment. Sci Public Policy 46(1):105–115. https://doi.org/10.1093/scipol/scy041

Zhang L, Sivertsen G, Du H, Huang Y, Glänzel W (2021) Gender differences in the aims and impacts of research. Scientometrics 126(11):8861–8886. https://doi.org/10.1007/s11192-021-04171-y

Download references

Acknowledgements

We are thankful to the R-QUEST team for input and comments to the paper.

The authors disclosed the receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Research Council Norway (RCN) [grant number 256223] (R-QUEST).

Open access funding provided by University of Oslo (incl Oslo University Hospital)

Author information

Silje Marie Svartefoss

Present address: TIK Centre for Technology, Innovation and Culture, University of Oslo, 0317, Oslo, Norway

Authors and Affiliations

Nordic Institute for Studies in Innovation, Research and Education (NIFU), Økernveien 9, 0608, Oslo, Norway

Silje Marie Svartefoss & Dag W. Aksnes

Department of Political Science, University of Oslo, 0315, Oslo, Norway

Jens Jungblut & Kristoffer Kolltveit

Centre for Science and Technology Studies (CWTS), Leiden University, 2311, Leiden, The Netherlands

Thed van Leeuwen

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Silje Marie Svartefoss, Jens Jungblut, Dag W. Aksnes, Kristoffer Kolltveit, and Thed van Leeuwen. The first draft of the manuscript was written by all authors in collaboration, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Silje Marie Svartefoss .

Ethics declarations

Competing interests.

The authors have no competing interests to declare that are relevant to the content of this article.

Informed consent

was retrieved from the participants in this study.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Svartefoss, S.M., Jungblut, J., Aksnes, D.W. et al. Explaining research performance: investigating the importance of motivation. SN Soc Sci 4 , 105 (2024). https://doi.org/10.1007/s43545-024-00895-9

Download citation

Received : 14 December 2023

Accepted : 15 April 2024

Published : 23 May 2024

DOI : https://doi.org/10.1007/s43545-024-00895-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Performance
  • Productivity
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. What is a Power Analysis ?

    importance of power analysis in research

  2. There are three major considerations when doing a power analysis for

    importance of power analysis in research

  3. Introduction to Power Analysis

    importance of power analysis in research

  4. Power analysis flow chart.

    importance of power analysis in research

  5. Introduction to Power Analysis

    importance of power analysis in research

  6. (PDF) Editorial 384- Importance of Sample Size Calculation and Power

    importance of power analysis in research

VIDEO

  1. Take decisions to Move , Life Decisions and it's importance, power of decision. #viral

  2. Need for Power system planning and operational Studies

  3. Power Analysis Explained: How to Calculate Sample Size for Your Research

  4. BIBLE STUDY ON THE TOPIC OF FAITH.[BISHWAS] BY ELDER LILA SUNUWAR, "IMMANUEL MINISTRY BARDIBAS"

  5. Measurement of Power & PF of R Load by using wattmeter and Power Quality Analyser (part 1)

  6. 18. AC Power Analysis (11): importance of the power factor

COMMENTS

  1. Power Analysis and Sample Size, When and Why?

    In order to interpret the findings correctly and to adapt this to the diagnosis or treatment of patients, it is very important to conduct power analysis in scientific research. By determining the number of samples to be included in the study by power analysis, it can be demonstrated that the results obtained are really significant or not (1, 2).

  2. Power to Detect What? Considerations for Planning and Evaluating Sample

    Sample size analyses can play an important part in planning a study. To control the risks of unjustified negative or inconclusive outcomes from research, a priori power analysis has become required in recent decades by many funders, and by ethical bodies charged with determining whether research is worthwhile (Vollmer & Howard, 2010).

  3. S.5 Power Analysis

    Consider a research experiment where the p-value computed from the data was 0.12. ... In summary, you should see how power analysis is very important so that we are able to make the correct decision when the data indicate that one cannot reject the null hypothesis. You should also see how power analysis can also be used to calculate the minimum ...

  4. Introduction to Power Analysis

    A power analysis is a good way of making sure that you have thought through every aspect of the study and the statistical analysis before you start collecting data. Despite these advantages of power analyses, there are some limitations. One limitation is that power analyses do not typically generalize very well.

  5. An introduction to power and sample size estimation

    Correction. Correction: An introduction to power and sample size estimation. BMJ Publishing Group Ltd and the British Association for Accident & Emergency Medicine. Emergency Medicine Journal 2023; 40 e4-e4 Published Online First: 27 Sep 2023. doi: 10.1136/emj.20.5.453corr2.

  6. Statistical Power and Why It Matters

    To calculate sample size or perform a power analysis, use online tools or statistical software like G*Power. Sample size. Sample size is positively related to power. A small sample (less than 30 units) may only have low power while a large sample has high power. Increasing the sample size enhances power, but only up to a point.

  7. Power Analysis

    The Role of Power Analysis in Applied Research. The concept of power analysis is not new. Quantitative researchers have highlighted the importance of (a priori) power analyses for more than half a century (e.g., Cohen 1962). However, applied research was - and in some areas still is ...

  8. Power Analysis: An Introduction for the Life Sciences

    Abstract. Power Analysis starts by asking: what is statistical power and why is low power undesirable? It then moves on to considering ways in which we can improve the power of an experiment. It asks how we can quantify power by simulation. It also examines simple factorial designs and extensions to other designs.

  9. PDF Power Analysis for Experimental Research A Practical Guide for the

    This will be an indispensable text for researchers and graduates in the medical and biological sciences needing to apply power analysis in the design of their experiments. of Medicine and Director of Research of the Complementary Medicine Program. He has served as the editor-in-chief of the peer-reviewed methodology journal Evaluation and the ...

  10. Power Analysis for Experimental Research

    Power Analysis for Experimental Research. A Practical Guide for the Biological, Medical and Social Sciences. Search within full text. Get access. Cited by 111. R. Barker Bausell, University of Maryland, Baltimore, Yu-Fang Li, Puget Sound Healthcare System, Seattle. Publisher:

  11. Using and Understanding Power in Psychological Research: A Survey Study

    Mone et al. (1996) also briefly examined barriers to power analysis use, with researchers reporting difficulties with software and an overall lack of knowledge about power. The research of Bakker et al. (2016) also suggests that an insufficient understanding of power is a barrier to power analysis use. In a brief knowledge test, three quarters of their sample could identify the correct ...

  12. Statistical Power, Sampling, and Effect Sizes: Three Keys to Research

    Contemporary researchers are using meta-analysis as a quantitative method of research synthesis (Cooper, 2010; Konstantopoulos, 2008) and, at times, a statistical approach to increase statistical power in a fixed-effect model 3 where samples are from the same population (Cohn & Becker, 2003). As more studies are included in the meta-analysis ...

  13. What is a power analysis?

    A power analysis is a calculation that helps you determine a minimum sample size for your study. It's made up of four main components. If you know or have estimates for any three of these, you can calculate the fourth component. Statistical power: the likelihood that a test will detect an effect of a certain size if there is one, usually set ...

  14. PDF Power Analysis in Survey Research: Importance and Use for Health Educators

    the first of which is an overview of power analysis: what it is, why it is important, and how to calculate it. The second purpose is the relative importance of power analysis to adequate survey return rates. While these two issues could be learned elsewhere (e.g., various research methods texts and journal articles), this article provides those ...

  15. Power analysis in health policy and systems research: a guide to

    Power is a growing area of study for researchers and practitioners working in the field of health policy and systems research (HPSR). Theoretical development and empirical research on power are crucial for providing deeper, more nuanced understandings of the mechanisms and structures leading to social inequities and health disparities; placing contemporary policy concerns in a wider historical ...

  16. How to Calculate Sample Size Needed for Power

    Statistical power and sample size analysis provides both numeric and graphical results, as shown below. The text output indicates that we need 15 samples per group (total of 30) to have a 90% chance of detecting a difference of 5 units. The dot on the Power Curve corresponds to the information in the text output.

  17. Full article: Power Analysis, Sample Size, and Assessment of

    In this analysis, the sample sizes and statistical tests reported in a sample of lighting research papers are used as example data in determining the power achieved for different effect sizes. This aims to reveal the power capable of being achieved by existing research practices within the lighting field.

  18. Power Analysis in Survey Research: Importance and Use for Health

    For four of the seven journals, less than 5% of their research articles reported a power analysis. Only two journals (American Journal of Health Behavior and Health Education Research) had a modest number of research articles (14-35%) that reported power analysis. This is the first reported examination of power analysis in health education ...

  19. Explaining research performance: investigating the importance of

    In the empirical analysis, we first investigate variation in motivation and then relate it to publications and citations as our two measures of research performance. As Fig. 1 shows, the respondents are mainly driven by curiosity and the wish to make scientific discoveries. This is by far the most important motivation.

  20. Temporal Evolution, Oscillation and Coherence Characteristics Analysis

    The majority of the energy required for human survival is derived either directly or indirectly from solar radiation, thus it is important to investigate the periodic fluctuations in global solar radiation over time. In this study, six cities—Harbin, Shenyang, Beijing, Shanghai, Wuhan, and Guangzhou—located in the utilizable areas of solar energy in China, were selected, and the ...