Jump to navigation


Cochrane Training

Chapter 10: analysing data and undertaking meta-analyses.

Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Key Points:

  • Meta-analysis is the statistical combination of results from two or more separate studies.
  • Potential advantages of meta-analyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, within-study biases, variation across studies, and reporting biases are not carefully considered.
  • It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
  • Most meta-analysis methods are variations on a weighted average of the effect estimates from the different studies.
  • Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
  • Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Random-effects meta-analyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation.
  • Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August  2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

10.1 Do not start here!

It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of meta-analyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a meta-analysis is undertaken.

10.2 Introduction to meta-analysis

An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of meta-analyses include the following:

  • T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
  • To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
  • To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. The use of network meta-analysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).

10.2.1 Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

  • Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).

case study and meta analysis

  • The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a random-effects meta-analysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
  • The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
  • As well as yielding a summary quantification of the intervention effect, all methods of meta-analysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
  • The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a meta-analysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to follow-up or exclusions from analysis) (see Section 10.12 ).

Meta-analyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and meta-analyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the meta-analysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.

Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons

case study and meta analysis

10.3 A generic inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.

10.3.1 Fixed-effect method for meta-analysis

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

case study and meta analysis

where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixed-effect meta-analysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a ‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-analysis can be interpreted without making such an assumption (Rice et al 2018).

10.3.2 Random-effects methods for meta-analysis

A variation on the inverse-variance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a random-effects meta-analysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Random-effects meta-analysis is discussed in detail in Section 10.10.4 .

10.3.3 Performing inverse-variance meta-analyses

Most meta-analysis programs perform inverse-variance meta-analyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).

When the data are conveniently available as summary statistics from each intervention group, the inverse-variance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in meta-analysis. It facilitates the analysis of properly analysed crossover trials, cluster-randomized trials and non-randomized trials (see Chapter 23 ), as well as outcome data that are ordinal, time-to-event or rates (see Chapter 6 ).

10.4 Meta-analysis of dichotomous outcomes

There are four widely used methods of meta-analysis for dichotomous outcomes, three fixed-effect methods (Mantel-Haenszel, Peto and inverse variance) and one random-effects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the meta-analysis methods are available elsewhere (Deeks et al 2001).

Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .

10.4.1 Mantel-Haenszel methods

When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inverse-variance methods may be poor. Mantel-Haenszel methods are fixed-effect meta-analysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the Mantel-Haenszel method is generally preferable to the inverse variance method in fixed-effect meta-analyses. In other situations the two methods give similar estimates.

10.4.2 Peto odds ratio method

Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inverse-variance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.

The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis.

Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using time-to-event analyses where log-rank tests have been used (see Section 10.9 ).

10.4.3 Which effect measure for dichotomous outcomes?

Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a meta-analysis, sometimes dramatically so.

The selection of a summary statistic for use in meta-analysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the meta-analysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid meta-analysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.

Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing meta-analyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the non-event (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a meta-analysis estimate.

Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in meta-analysis, although it can be computed from the meta-analysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other often-cited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).

Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from meta-analyses as risk ratios. Although odds ratios can be re-expressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent over-estimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow trade-offs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.

It is generally recommended that meta-analyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which meta-analysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the meta-analysis (see Section 10.14 ).

It is often sensible to use one statistic for meta-analysis and to re-express the results using a second, more easily interpretable statistic. For example, often meta-analysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results re-expressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for meta-analysis they can also be re-expressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for meta-analysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for meta-analysis.

10.4.4 Meta-analysis of rare events

For rare outcomes, meta-analysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually under-powered to detect differences in rare outcomes, but a meta-analysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of meta-analysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of meta-analysis (Efthimiou 2018).

There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the meta-analysis observe no events in one or more study arms. Studies with no events in one or more arms

Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance meta-analytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most meta-analytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The Mantel-Haenszel methods require zero-cell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for Mantel-Haenszel methods as for the inverse-variance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.

Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and over-estimating variances of study estimates (consequently down-weighting inappropriately their contribution to the meta-analysis). Where the sizes of the study arms are unequal (which occurs more commonly in non-randomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative non-fixed zero-cell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004). Studies with no events in either arm

The standard practice in meta-analysis of odds ratios and risk ratios is to exclude studies from the meta-analysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.

Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for meta-analysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.

It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a meta-analysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference meta-analyses will be affected by non-reporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected. Validity of methods of meta-analysis for rare events

Simulation studies have revealed that many meta-analytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.

In the following we consider the choice of statistical method for meta-analyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.

Bradburn and colleagues found that many of the most commonly used meta-analytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the Mantel-Haenszel odds ratio method using a 0.5 zero-cell correction. As already noted, risk difference meta-analytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.

At event rates below 1% the Peto one-step odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different meta-analytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).

This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were under-estimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.

In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and meta-analyses where many studies were substantially imbalanced) the best performing methods were the Mantel-Haenszel odds ratio without zero-cell corrections, logistic regression and an exact method. None of these methods is available in RevMan.

Methods that should be avoided with rare events are the inverse-variance methods (including the DerSimonian and Laird random-effects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the meta-analysis, but these are usually based on a large-sample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.

10.5 Meta-analysis of continuous outcomes

An important assumption underlying standard methods for meta-analysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).

10.5.1 Which effect measure for continuous outcomes?

The two summary statistics commonly used for meta-analysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .

The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.

For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.

For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that between-study variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section .

These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.

10.5.2 Meta-analysis of change scores

In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of post-intervention values, as it removes a component of between-person variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true between-person baseline variability. Change-from-baseline outcomes may also be preferred if they have a less skewed distribution than post-intervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.

The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a meta-analysis using the generic inverse-variance method, since means and SDs are not available for each intervention group separately.

In practice an author is likely to discover that the studies included in a review include a mixture of change-from-baseline and post-intervention value scores. However, mixing of outcomes is not a problem when it comes to meta-analysis of MDs. There is no statistical reason why studies with change-from-baseline outcomes should not be combined in a meta-analysis with studies with post-intervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on post-intervention measurements. That is to say, the difference in mean post-intervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if post-intervention values had been used, as they will have smaller SDs.

When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of post-intervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.

In contrast, post-intervention value and change scores should not in principle be combined using standard meta-analysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing post-intervention values reflects between-person variability at a single point in time. The SD when standardizing change scores reflects variation in between-person changes over time, so will depend on both within-person and between-person variability; within-person variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 meta-analyses in osteoarthritis did not find a difference between combined SMDs based on post-intervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using post-intervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.

A common practical problem associated with including change-from-baseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section .

10.5.3 Meta-analysis of skewed data

Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for change-from-baseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.

Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.

Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A meta-analysis may be then performed on the scale of the log-transformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Log-transformed and untransformed data should not be mixed in a meta-analysis.

MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews

10.6 Combining dichotomous and continuous outcomes

Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cut-point). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cut-point may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.

There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.

There are statistical approaches available that will re-express odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (Anzures-Cabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be re-expressed as a SMD according to the following simple formula (Chinn 2000):

case study and meta analysis

The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be re-expressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the meta-analysis, they can be combined using the generic inverse-variance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).

10.7 Meta-analysis of ordinal outcomes and measurement scale s

Ordinal and measurement scale outcomes are most commonly meta-analysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.

Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.

The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a meta-analysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be meta-analysed using the generic inverse-variance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the meta-analysis. In the context of the three-category model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).

10.8 Meta-analysis of counts and rates

Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and time-to-event data (see Section 10.9 ), as well as being analysed as rate data.

Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a follow-up period of two years. Her rate of strokes is one per year of follow-up (or, equivalently 0.083 per month of follow-up). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 person-years of follow-up. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 women-years of follow-up. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or time-to-event methods that investigate the time elapsing before the first pregnancy.

Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:

  • the assumption of a constant underlying risk may not be suitable; and
  • the statistical methods are not as well developed as they are for other types of data.

The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inverse-variance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).

In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of follow-up should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.

It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inverse-variance method (Hasselblad and McCrory 1995, Guevara et al 2004).

10.9 Meta-analysis of time-to-event outcomes

Two approaches to meta-analysis of time-to-event outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from re-analysis of individual participant data.

If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through re-analysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a log-rank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing time-to-event data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixed-effect meta-analysis methods are available in RevMan for ‘O – E and Variance’ outcomes.

Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inverse-variance methods (see Section 10.3.3 ).

If a mixture of log-rank and Cox model estimates are obtained from the studies, all results can be combined using the generic inverse-variance method, as the log-rank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .

10.10 Heterogeneity

10.10.1 what is heterogeneity.

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .

Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.

Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .

The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Meta-analysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a meta-analysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.

MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews

There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Meta-regression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).

10.10.2 Identifying and measuring heterogeneity

It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chi-squared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).

MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews

Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a meta-analysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a non-significant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a meta-analysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.

Some argue that, since clinical and methodological diversity always occur in a meta-analysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the meta-analysis. A useful statistic for quantifying inconsistency is:

case study and meta analysis

In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).

Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of meta-analyses of randomized trials is as follows:

  • 0% to 40%: might not be important;
  • 30% to 60%: may represent moderate heterogeneity*;
  • 50% to 90%: may represent substantial heterogeneity*;
  • 75% to 100%: considerable heterogeneity*.

*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).

10.10.3 Strategies for addressing heterogeneity

Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a meta-analysis.

MECIR Box 10.10.c  Relevant expectations for conduct of intervention reviews

  • Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into meta-analysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unit-of-analysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).  
  • Do not do a meta -analysis. A systematic review need not contain any meta-analyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.  
  • Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or meta-regression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly pre-specified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.  
  • Ignore heterogeneity. Fixed-effect meta-analyses ignore heterogeneity. The summary effect estimate from a fixed-effect meta-analysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixed-effect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).  
  • Perform a random-effects meta-analysis. A random-effects meta-analysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .  
  • Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular meta-analysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).  
  • Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a meta-analysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any meta-analysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.

10.10.4 Incorporating heterogeneity into random-effects models

The random-effects meta-analysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A random-effects meta-analysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of random-effects meta-analyses. The importance of the assumed shape for this distribution has not been widely studied.

To undertake a random-effects meta-analysis, the standard errors of the study-specific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tau-squared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the meta-analysis.

In a heterogeneous set of studies, a random-effects meta-analysis will award relatively more weight to smaller studies than such studies would receive in a fixed-effect meta-analysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.

Note that a random-effects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ). Fixed or random effects?

A fixed-effect meta-analysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixed-effect meta-analysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).

A random-effects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.

The random-effects method and the fixed-effect method will give identical results when there is no heterogeneity among the studies.

When heterogeneity is present, a confidence interval around the random-effects summary estimate is wider than a confidence interval around a fixed-effect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).

Sometimes the central estimate of the intervention effect is different between fixed-effect and random-effects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or within-study bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a random-effects meta-analysis will exacerbate the effects of the bias (see also Chapter 13, Section ). A fixed-effect analysis will be affected less, although strictly it will also be inappropriate.

The decision between fixed- and random-effects meta-analyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:

  • Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixed-effect model if this is likely and a random-effects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the random-effects model.
  • Others have argued that a fixed-effect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a random-effects meta-analysis. They then refer to it as a ‘fixed-effects’ meta-analysis (Peto et al 1995, Rice et al 2018).
  • Under any interpretation, a fixed-effect meta-analysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
  • In the presence of heterogeneity, a random-effects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the random-effects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
  • A pragmatic approach is to plan to undertake both a fixed-effect and a random-effects meta-analysis, with an intention to present the random-effects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using meta-regression (see Chapter 13, Section ).
  • The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test for heterogeneity. Interpretation of random-effects meta-analyses

The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section ).

Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The random-effects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of over-estimates and under-estimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.

When there is little information, either because there are few studies or if the studies are small with few events, a random-effects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixed-effect methods such as the Mantel-Haenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity. Prediction intervals from a random-effects meta-analysis

An estimate of the between-study variance in a random-effects meta-analysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.

To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the random-effects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:

case study and meta analysis

where M is the summary mean from the random-effects meta-analysis, t k −2 is the 95% percentile of a t -distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.

The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the meta-analysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the random-effects meta-analysis.

Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a meta-analysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry. Implementing random-effects meta-analyses

As introduced in Section 10.3.2 , the random-effects model can be implemented using an inverse-variance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of random-effects meta-analysis that is described by DerSimonian and Laird, making use of a ‘moment-based’ estimate of the between-study variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.

For many years, RevMan has implemented two random-effects methods for dichotomous data: a Mantel-Haenszel method and an inverse-variance method. Both use the moment-based approach to estimating the amount of between-studies variation. The difference between the two is subtle: the former estimates the between-study variation by comparing each study’s result with a Mantel-Haenszel fixed-effect meta-analysis result, whereas the latter estimates it by comparing each study’s result with an inverse-variance fixed-effect meta-analysis result. In practice, the difference is likely to be trivial.

There are alternative methods for performing random-effects meta-analyses that have better technical properties than the DerSimonian and Laird approach with a moment-based estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of between-study heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).

An empirical comparison of different ways to estimate between-study variation in Cochrane meta-analyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.

10.11 Investigating heterogeneity

10.11.1 interaction and effect modification.

Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and meta-regression. All methods have considerable pitfalls.

10.11.2 What are subgroup analyses?

Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.

Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.

Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.

10.11.3 Undertaking subgroup analyses

Meta-analyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the meta-analysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results. Is the effect different in different subgroups?

Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare within-subgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, non-overlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.

A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the meta-analysis uses a fixed-effect inverse-variance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).

If fixed-effect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If random-effects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.

An alternative method for testing for differences between subgroups is to use meta-regression techniques, in which case a random-effects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on random-effects models may be regarded as preferable to those based on fixed-effect models, due to the high risk of false-positive results when a fixed-effect model is used to compare subgroups (Higgins and Thompson 2004).

MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews

10.11.4 Meta-regression

If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the meta-analysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Meta-regression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Meta-regression should generally not be considered when there are fewer than ten studies in a meta-analysis.

Meta-regressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In meta-regression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Meta-regressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘random-effects meta-regression’, since the extra variability is incorporated in the same way as in a random-effects meta-analysis (Thompson and Sharp 1999).

The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the log-transformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.

Meta-regression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.

Meta-regression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.

10.11.5 Selection of study characteristics for subgroup analyses and meta-regression

Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to meta-regressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994). Ensure that there are adequate studies to justify subgroup analyses and meta-regressions

It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies. Specify characteristics in advance

Authors should, whenever possible, pre-specify characteristics in the protocol that later will be subject to subgroup analyses or meta-regression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Pre-specifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True pre-specification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such post-hoc analyses should be identified as such.

MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews Select a small number of characteristics

The likelihood of a false-positive result among subgroup analyses and meta-regression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons. Ensure there is scientific rationale for investigating each characteristic

Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a post-hoc data dredge of factors that may have changed over time.

Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of follow-up) or methodology (design and quality). Be aware that the effect of a characteristic may not always be identified

Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or meta-regression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and trial-level effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study. Think about whether the characteristic is closely related to another characteristic (confounded)

The problem of ‘confounding’ complicates interpretation of subgroup analyses and meta-regressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In meta-regression, co-linearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.

10.11.6 Interpretation of subgroup analyses and meta-regressions

Appropriate interpretation of subgroup analyses and meta-regressions requires caution (Oxman and Guyatt 1992).

  • Subgroup comparisons are observational. It must be remembered that subgroup analyses and meta-regressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other study-level characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.  
  • Was the analysis pre-specified or post hoc? Authors should state whether subgroup analyses were pre-specified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of pre-specified analyses. Performing numerous post-hoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.  
  • Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.  
  • Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.  
  • Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section ).  
  • Are analyses looking at within-study or between-study relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings.

10.11.7 Investigating the effect of underlying risk

One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of case-mix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of follow-up, which often varies across studies. However, underlying risk has received particular attention in meta-analysis because the information is readily available once dichotomous data have been prepared for use in meta-analyses. Sharp provides a full discussion of the topic (Sharp 2001).

Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).

Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in meta-analysis (see Section 10.4.3 ).

Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.

10.11.8 Dose-response analyses

The principles of meta-regression can be applied to the relationships between intervention effect and dose (commonly termed dose-response), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on between-study differences. Authors should be particularly cautious about claiming that a dose-response relationship does not exist, given the low power of many meta-regression analyses to detect genuine relationships.

10.12 Missing data

10.12.1 types of missing data.

There are many potential sources of missing data in a systematic review or meta-analysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .

Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .

Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section ). Similarly, summary data for an outcome, in a form that can be included in a meta-analysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when change-from-baseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, follow-up times for calculating rates, and sufficient details of time-to-event outcomes. Inappropriate analyses of studies, for example of cluster-randomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for cluster-randomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a meta-analysis.

It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .

Missing data can also affect subgroup analyses. If subgroup analyses or meta-regressions are planned (see Section 10.11 ), they require details of the study-level characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.

Table 10.12.a Types of missing data in a meta-analysis

MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews

10.12.2 General principles for dealing with missing data

There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.

Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some quality-of-life questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.

Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final follow-up interview, and more likely to have missing outcome data. Such data are ‘non-ignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.

The principal options for dealing with missing data are:

  • analysing only the available data (i.e. ignoring the missing data);
  • imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
  • imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
  • using statistical models to allow for missing data, making assumptions about their relationships with the available data.

Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.

Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:

  • Whenever possible, contact the original investigators to request missing data.
  • Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
  • Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
  • Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
  • Address the potential impact of missing data on the findings of the review in the Discussion section.

10.12.3 Dealing with missing outcome data from individual participants

Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).

Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).

10.13 Bayesian approaches to meta-analysis

Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the meta-analysis (for example, 2×2 tables from randomized trials) and the meta-analysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard meta-regression software with a simple trick (Rhodes et al 2016).

A difference between Bayesian analysis and classical meta-analysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the long-term frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different meta-analysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.

In the context of a meta-analysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a random-effects meta-analysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the meta-analysis, such as information from non-randomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘non-informative’ prior can be used, in which all values across the possible range are equally likely.

Most Bayesian meta-analyses use non-informative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of among-study variation in a random-effects analysis. This is particularly advantageous when the number of studies in the meta-analysis is small, say fewer than five or ten. Libraries of data-based prior distributions are available that have been derived from re-analyses of many thousands of meta-analyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).

Box 10.13.a Some potential advantages of Bayesian meta-analysis

Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).

10.14 Sensitivity analyses

The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and non-contentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to follow-up. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.

It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or meta-analysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the meta-analysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the meta-analysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’

MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews

There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:

Searching for studies:

  • Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?

Eligibility criteria:

  • Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
  • Characteristics of the intervention: what range of doses should be included in the meta-analysis?
  • Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
  • Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
  • Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?

What data should be analysed?

  • Time-to-event data: what assumptions of the distribution of censored data should be made?
  • Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on post-intervention values?
  • Ordinal scales: what cut-point should be used to dichotomize short ordinal scales into two groups?
  • Cluster-randomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
  • Crossover trials: what values of the within-subject correlation coefficient should be used when this is not available in primary reports?
  • All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?

Analysis methods:

  • Should fixed-effect or random-effects methods be used for the analysis?
  • For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
  • For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?

Some sensitivity analyses can be pre-specified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.

Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.

Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.

10.15 Chapter information

Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group

Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead

Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil Als-Nielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.

Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

10.16 References

Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.

Akl EA, Kahale LA, Agoritsas T, Brignardello-Petersen R, Busse JW, Carrasco-Labra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, Alonso-Coello P, Guyatt G. Handling trial participants with missing outcome data when conducting a meta-analysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.

Akl EA, Kahale LA, Ebrahim S, Alonso-Coello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147-154.

Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 2967-2985.

Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993; 4 : 218-228.

Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group A-LAITS. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371-387.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1 : 97-111.

Borenstein M, Higgins JPT. Meta-analysis and subgroups. Prev Sci 2013; 14 : 134-143.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2007; 26 : 53-77.

Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine 2000; 19 : 3127-3131.

da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847-855.

Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703-709.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285-312.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 1575-1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177-188.

DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.

Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 1014-1021 e1011.

Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, Alonso-Coello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560-570.

Efthimiou O. Practical guide to the meta-analysis of rare events. Evidence-Based Mental Health 2018; 21 : 72-76.

Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315 : 629-634.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: an empirical study of 125 meta-analyses. Statistics in Medicine 2000; 19 : 1707-1728.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse follow-up data. Biometrics 1985; 41 : 55-68.

Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 1-30.

Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology 1992; 135 : 1301-1309.

Guevara JP, Berlin JA, Wolf FM. Meta-analytic methods for pooling rates when follow-up duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.

Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 3875-3889.

Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 81-96.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002; 21 : 1539-1558.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003; 327 : 557-560.

Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 2004; 23 : 1663-1682.

Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008a; 5 : 225-239.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis of skewed data: combining results reported on log-transformed or raw scales. Statistics in Medicine 2008b; 27 : 6072-6092.

Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137-159.

Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Annals of Internal Medicine 2001; 135 : 982-989.

Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 meta-analyses. Research Synthesis Methods 2015; 6 : 195-205.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181-198.

Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated random-effects meta-analyses. Research Synthesis Methods 2019; 10 : 83-98.

Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 1479-1480.

Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325-337.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719-748.

McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 1713-1728.

Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 1336-1344.

Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 78-84.

Peto R, Collins R, Gray R. Large-scale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 23-40.

Poole C, Greenland S. Random-effects meta-analyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469-475.

Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data. Statistics in Medicine 2016; 35 : 5495-5511.

Rice K, Higgins JPT, Lumley T. A re-evaluation of fixed effect(s) meta-analysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205-227.

Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Röver C. Bayesian random-effects meta-analysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .

Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in meta-analysis with zero cells. Statistics in Medicine 2009; 28 : 721-738.

Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Meta-analysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176-188.

Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statistics in Medicine 2002; 21 : 3153-3159.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Methods 2011; 2 : 139-149.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881-889.

Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: a comparative study. Statistics in Medicine 1995; 14 : 2685-2699.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . Chichester (UK): John Wiley & Sons; 2004.

Spittal MJ, Pirkis J, Gurrin LC. Meta-analysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.

Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-analysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.

Sutton AJ, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277-303.

Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23 : 1351-1375.

Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in meta-analysis. Statistics in Medicine 1997; 16 : 2741-2758.

Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statistics in Medicine 1999; 18 : 2693-2708.

Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 1559-1574.

Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818-827.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Research Synthesis Methods 2016; 7 : 55-79.

Whitehead A, Jones NMB. A meta-analysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 2503-2515.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335-371.

Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 93-98.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Study Design 101: Meta-Analysis

  • Case Report
  • Case Control Study
  • Cohort Study
  • Randomized Controlled Trial
  • Practice Guideline
  • Systematic Review


  • Helpful Formulas
  • Finding Specific Study Types

A subset of systematic reviews; a method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power. This conclusion is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.

Meta-analysis would be used for the following purposes:

  • To establish statistical significance with studies that have conflicting results
  • To develop a more correct estimate of effect magnitude
  • To provide a more complex analysis of harms, safety data, and benefits
  • To examine subgroups with individual numbers that are not statistically significant

If the individual studies utilized randomized controlled trials (RCT), combining several selected RCT results would be the highest-level of evidence on the evidence hierarchy, followed by systematic reviews, which analyze all available studies on a topic.

  • Greater statistical power
  • Confirmatory data analysis
  • Greater ability to extrapolate to general population affected
  • Considered an evidence-based resource


  • Difficult and time consuming to identify appropriate studies
  • Not all studies provide adequate data for inclusion and analysis
  • Requires advanced statistical techniques
  • Heterogeneity of study populations

Design pitfalls to look out for

The studies pooled for review should be similar in type (i.e. all randomized controlled trials).

Are the studies being reviewed all the same type of study or are they a mixture of different types?

The analysis should include published and unpublished results to avoid publication bias.

Does the meta-analysis include any appropriate relevant studies that may have had negative outcomes?

Fictitious Example

Do individuals who wear sunscreen have fewer cases of melanoma than those who do not wear sunscreen? A MEDLINE search was conducted using the terms melanoma, sunscreening agents, and zinc oxide, resulting in 8 randomized controlled studies, each with between 100 and 120 subjects. All of the studies showed a positive effect between wearing sunscreen and reducing the likelihood of melanoma. The subjects from all eight studies (total: 860 subjects) were pooled and statistically analyzed to determine the effect of the relationship between wearing sunscreen and melanoma. This meta-analysis showed a 50% reduction in melanoma diagnosis among sunscreen-wearers.

Real-life Examples

Goyal, A., Elminawy, M., Kerezoudis, P., Lu, V., Yolcu, Y., Alvi, M., & Bydon, M. (2019). Impact of obesity on outcomes following lumbar spine surgery: A systematic review and meta-analysis. Clinical Neurology and Neurosurgery, 177 , 27-36. https://doi.org/10.1016/j.clineuro.2018.12.012

This meta-analysis was interested in determining whether obesity affects the outcome of spinal surgery. Some previous studies have shown higher perioperative morbidity in patients with obesity while other studies have not shown this effect. This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive surgery, but patients with obesity who had open surgery had experienced higher blood loss and longer operative times (not clinically meaningful) as well as higher complication and reoperation rates. Further research is needed to explore this issue in patients with morbid obesity.

Nakamura, A., van Der Waerden, J., Melchior, M., Bolze, C., El-Khoury, F., & Pryor, L. (2019). Physical activity during pregnancy and postpartum depression: Systematic review and meta-analysis. Journal of Affective Disorders, 246 , 29-41. https://doi.org/10.1016/j.jad.2018.12.009

This meta-analysis explored whether physical activity during pregnancy prevents postpartum depression. Seventeen studies were included (93,676 women) and analysis showed a "significant reduction in postpartum depression scores in women who were physically active during their pregnancies when compared with inactive women." Possible limitations or moderators of this effect include intensity and frequency of physical activity, type of physical activity, and timepoint in pregnancy (e.g. trimester).

Related Terms

A document often written by a panel that provides a comprehensive review of all relevant studies on a particular clinical or health-related topic/question.

Publication Bias

A phenomenon in which studies with positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Therefore, conclusions based exclusively on published studies can be misleading.

Now test yourself!

1. A Meta-Analysis pools together the sample populations from different studies, such as Randomized Controlled Trials, into one statistical analysis and treats them as one large sample population with one conclusion.

a) True b) False

2. One potential design pitfall of Meta-Analyses that is important to pay attention to is:

a) Whether it is evidence-based. b) If the authors combined studies with conflicting results. c) If the authors appropriately combined studies so they did not compare apples and oranges. d) If the authors used only quantitative data.

Evidence Pyramid - Navigation

  • Meta- Analysis
  • Case Reports
  • << Previous: Systematic Review
  • Next: Helpful Formulas >>

Creative Commons License

  • Last Updated: Sep 25, 2023 10:59 AM
  • URL: https://guides.himmelfarb.gwu.edu/studydesign101

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

The Role of Meta-Analysis in Scientific Studies

Sean is a fact-checker and researcher with experience in sociology, field research, and data analytics.

case study and meta analysis

 Maskot / Getty Images

  • Why It Matters
  • Reasons for Use


At a glance.

Psychological researchers can use meta-analysis to review and analyze many studies on the same subject. While it can be a very helpful way to get a “big picture” view of a topic, meta-analysis also has limitations.

A meta-analysis is a type of statistical analysis in which the results of multiple studies are combined and then analyzed. Researchers can perform this type of study when there have been previous studies looking at the same question.

A meta-analysis is a type of statistical analysis where researchers review, combine, and analyze the results of multiple studies (integrated results). Meta-analysis is useful when there have been many previous studies on the same topic or asking the same question.

This article discusses when meta-analysis might be used and why it’s important. It also covers some advantages and disadvantages of using meta-analysis in psychology research.

What Is Meta-Analysis?

A simple definition of meta-analysis in psychology is that it’s a study of past studies on a subject that can give researchers a “big picture” view of the topic. To do a meta-analysis, a researcher reviews the published studies on a topic and then analyzes all the results to look for trends. Meta-analysis is used in  psychology , medicine, and other fields.

New studies from around the world are constantly being published, so the amount of research that’s out there on any given topic can be overwhelming. A meta-analysis is helpful because it's designed to summarize all the research information on a subject. There are a few general principles that a meta-analysis follows:

  • It is done systematically.
  • It uses certain criteria.
  • It contains a pool of results.
  • It is based on quantitative analysis (mathematical and statistical techniques to measure, model, and understand aspects of human behavior).

Why Is Meta-Analysis Important?

The data provided by a meta-analysis is bigger-picture than a single study, so it gives psychology researchers a better sense of the magnitude of the effect of whatever it is that is being studied—for example, a treatment. A meta-analysis also makes important conclusions clear and can identify trends that can inform future studies, policy decisions, and patient care.

Reasons Researchers Do Meta-Analysis

In addition to summarizing and analyzing integrated results, a meta-analysis also has other uses. For example, psychology researchers can use a meta-analysis to:

  • Evaluate effects in different subsets of participants.
  • Create new hypotheses to be studied in future research.
  • Overcome the limitations of small sample sizes.
  • Establish statistical significance.

Increasing Sample Size

One of the reasons why meta-analyses are used is to overcome a very common problem in research: small  sample sizes .

Even though researchers would often prefer to have a large sample size for a study, it requires more resources, such as funds and personnel, than a small sample size does. When individual studies do not use a large number of subjects, it can be harder to draw reliable and valid conclusions from the findings. 

A meta-analysis helps overcome the issue of small sample sizes because it reviews multiple studies from the same subject area, essentially creating a larger sample size.

Establishing Statistical Significance

Meta-analyses can also help establish  statistical significance  across studies that might otherwise seem to have conflicting results. Statistical significance refers to the probability of the study’s results being due to random chance rather than an important difference. 

When you consider multiple studies at the same time, the statistical significance that is established is much greater than it would be in one study on its own. This is important because statistical significance increases the validity of any observed differences in a study, which, in turn, increases the reliability of the information researchers may glean from the findings.

Benefits of a Meta-Analysis

Meta-analyses offer many advantages over individual studies. Here are just a few benefits of meta-analysis:

  • It has greater statistical power and the ability to extrapolate to the broader population.
  • It is evidence-based.
  • It is more likely to show an effect because smaller studies are combined into one larger study.
  • It has better accuracy (because smaller studies are pooled and analyzed).
  • It is more efficient (because researchers can collect a large amount of data without spending a lot of time, money, and resources since the bulk of the data collection work has already been completed).

Meta-analysis provides a view of the research that has been done in a particular field, summarizes and integrates the different findings, and provides possible directions for future research.

A meta-analysis also reduces the amount of work required to research a topic for other researchers and policymakers. For example, instead of having to look at the results of many smaller studies, people can get a more accurate view of what might be happening in a population by looking at the results of one meta-analysis.

Although it can be a powerful research tool, meta-analysis does have disadvantages:

  • It can be difficult and time-consuming to find all of the appropriate studies to look at.
  • It requires complex statistical skills and techniques (which can be intimidating and challenging for researchers who may lack experience with this type of research).
  • It may have the effect of halting research on a particular topic (for example, rather than giving directions for future research, a meta-analysis may imply that a specific question has been answered sufficiently and no more research is needed).

Types of Bias in Meta-Analysis

The way researchers do a meta-analysis (procedure) can affect the results. Following certain principles is crucial to making sure they draw valid and reliable conclusions from their work.

Even straying slightly from the protocol can produce biased and misleading results. The three main types of bias that can be a problem in meta-analysis are:

  • Publication bias :   When "positive" studies are more likely to be accepted and printed.
  • Search bias : When the search for studies produces unintentionally biased results. This includes using an incomplete set of keywords or varying strategies to search databases. Also, the search engine used can be a factor.
  • Selection bias : When researchers do not clearly define criteria for choosing from the long list of potential studies to be included in the meta-analysis to make sure they get unbiased results.

Examples of Meta-Analysis in Psychology

It can be helpful to look at how a meta-analysis might be used in psychology to research specific topics. For example, imagine that a small study showed that consuming sugar before an exam was correlated to decreased test performance. Taken alone, such results would imply that students should avoid sugar consumption before taking an exam. However, a meta-analysis that pools data looking at eating behavior and subsequent test results might demonstrate that this previous study was an outlier.

Here are a few examples of meta-analysis that have been published on topics in psychology:

  • Massoud Sokouti, Ali Reza Shafiee-Kandjani, Mohsen Sokouti, Babak Sokouti. A meta-analysis of systematic reviews and meta-analyses to evaluate the psychological consequences of COVID-19 .  BMC Psychology . 2023;11(1). doi:10.1186/s40359-023-01313-0
  • Pim Cuijpers, Franco P, Markéta Čihařová, et al. Psychological treatment of perinatal depression: a meta-analysis .  Psychological Medicine . 2021;53(6):2596-2608. doi: 10.1017/s0033291721004529
  • Xu C, Lucille Lucy Miao, Turner DA, DeRubeis RJ. Urbanicity and depression: A global meta-analysis.  Journal of Affective Disorders . 2023;340:299-311. doi:10.1016/j.jad.2023.08.030
  • Pauley D, Pim Cuijpers, Papola D, Miguel C, Eirini Karyotaki. Two decades of digital interventions for anxiety disorders: a systematic review and meta-analysis of treatment effectiveness .  Psychological Medicine . Published online May 28, 2021:1-13. doi:10.1017/s0033291721001999
  • Bhattacharya S, Goicoechea C, Heshmati S, Carpenter JK, Hofmann S. E fficacy of cognitive behavioral therapy for anxiety-related disorders: A meta-analysis of recent literature .  Current Psychiatry Reports . 2022;25(1):19-30. doi:10.1007/s11920-022-01402-8

A meta-analysis can be a useful research tool in psychology. In addition to providing an accurate, big-picture view of a specific topic, the studies can also make it easier for policymakers and other decision-makers to see a summary of findings more quickly. Meta-analysis can run into problems with bias and may suggest that more research is needed on a particular topic, but researchers can avoid these pitfalls by following procedures for doing a meta-analysis closely

‌George Washington University. Study design 101: Meta-analysis .

Cochrane Library. Chapter 10: Analysing data and undertaking meta-analyses .

Wilson, LC. American Psychological Association. Introduction to meta-analysis: a guide for the novice .

Paul J, Mojtaba Barari. Meta‐analysis and traditional systematic literature reviews—What, why, when, where, and how?  Psychology & Marketing . 2022;39(6):1099-1115. doi: 10.1002/mar.21657

Maziarz M. Is meta-analysis of RCTs assessing the efficacy of interventions a reliable source of evidence for therapeutic decisions?  Studies in History and Philosophy of Science . 2022;91:159-167. doi: 10.1016/j.shpsa.2021.11.007

Cochrane. When not to use meta-analysis in a review .

Association for Psychological Science. Meta-analysis helps psychologists build knowledge .

Mikolajewicz N, Komarova SV. Meta-analytic methodology for basic research: a practical guide . Front Physiol . 2019;10:203. doi:10.3389/fphys.2019.00203

František Bartoš, Maier M, Shanks DR, Stanley TD, Sladekova M, Eric‐Jan Wagenmakers. Meta-analyses in psychology often overestimate evidence for and size of effects. Royal Society Open Science . 2023;10(7). doi:10.1098/rsos.230224

Walker E, Hernandez AV, Kattan MW. Meta-analysis: Its strengths and limitations . Cleve Clin J Med. 2008;75(6):431-439. doi:10.3949/ccjm.75.6.431

By Kristalyn Salters-Pedneault, PhD  Kristalyn Salters-Pedneault, PhD, is a clinical psychologist and associate professor of psychology at Eastern Connecticut State University.

  • Open access
  • Published: 18 June 2022

Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer

  • Faith Wavinya Mutinda 1 ,
  • Kongmeng Liew 1 ,
  • Shuntaro Yada 1 ,
  • Shoko Wakamiya 1 &
  • Eiji Aramaki 1  

BMC Medical Informatics and Decision Making volume  22 , Article number:  158 ( 2022 ) Cite this article

3663 Accesses

7 Citations

3 Altmetric

Metrics details

Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis.

Materials and methods

Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis.

The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information.

We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.

Peer Review reports


A meta-analysis is a statistical analysis that combines the results of different studies that are all focused on same disease, treatment, or outcome to determine if a treatment is effective or not. Meta-analyses provide the best form of medical evidence and are an essential tool for enabling evidence-based medicine and clinical and health policy decision-making [ 1 ]. Meta-analyses are time-consuming, labor-intensive, and expensive as they require domain experts to manually search, read, and extract data from hundreds of research articles written in unstructured natural language. The number of research articles is increasing exponentially and it is becoming almost impossible to keep up with the high number of biomedical literature [ 2 ]. For instance, a recent study showed that more than 50,000 research articles related to the COVID-19 pandemic have been published and more articles are being published every day [ 3 ]. The large number of research articles increases the time required to conduct a meta-analysis. Previous research showed that on average it takes about 67 weeks, from registration to publication, to finalize a meta-analysis [ 4 ]. This poses a challenge for practitioners in the infectious disease field where informed decisions have to be made promptly. Moreover, most meta-analyses are outdated shortly after publication as they have not incorporated new evidence which might alter the results [ 5 ].

Automatic meta-analysis systems have the benefit of reducing the time-taken in conducting a meta-analysis so as to help in timely dissemination of medical evidence and allow for automatic updates when new evidence becomes available. According to surveys on automation of meta-analysis, different strategies for automating the various meta-analysis stages (searching the databases for relevant literature, screening, data extraction, and statistical analysis) have been proposed [ 6 , 7 ]. Marshall et al. [ 7 ] suggests that systems for searching literature, identifying randomized controlled trials (RCTs), and screening articles have attained a good performance and are ready for use. The systems for the data extraction and statistical analysis, on the other hand, are still not readily available.

Techniques for data extraction from research abstracts and full-text articles have been widely studied [ 6 ]. Although various methods for extracting different Participants, Intervention, Control, and Outcomes (PICO) information from research articles have been proposed, fewer attempts have been made to extract detailed information for the outcomes, especially numeric texts identifying the number of patients having certain outcomes [ 8 , 9 ]. Extraction of numeric texts is important for statistical analysis to determine the effectiveness of the intervention. Summerscales et al. [ 9 ] used conditional random field-based approach to extract various named entities including treatment groups, group sizes, outcomes, and outcome numbers from research abstracts. Pradhan et al. [ 8 ] developed a Web application for extracting data from ClinicalTrials.gov, a clinical trials database. Although ClinicalTrials.gov is an important source of clinical trials data, it has a small number of studies and mainly focuses on clinical trials in the United States [ 8 ].

figure 1

Proposed system architecture

figure 2

A sample abstract with PICO elements highlighted. The top part shows the abstract while the bottom part shows the PICO elements transformed into a structured format

figure 3

Visualization system interface

The goal of this work is to provide a system that automates data extraction in order to support meta-analysis statistical analysis. We utilize the current state-of-the-art natural language processing (NLP) models to extract PICO information from research abstracts. We use abstracts because they are easily accessible and they provide a concise summary of the full-text article especially the main results. The proposed system (shown in Fig.  1 ) performs various steps including extracting data from research abstracts, parsing numeric outcomes to identify the number of patients having specific outcomes, converting extracted data into a structured format for statistical analysis, and visualizing the results. We assess the performance of the proposed system by using it to reproduce the results of an existing meta-analysis. The results show potential in automating the tasks and hope to increase interest in research on automating the entire integrated meta-analysis process.

The corpus consists of 1011 abstracts of breast cancer randomized controlled extracted from the PubMed. Footnote 1 PubMed is a free search engine that gives access to the MEDLINE database Footnote 2 that indexes abstracts of biomedical and life science research articles. An annotator marked text spans that describe the PICO elements, i.e., Participants (P), Interventions (I), Control (C), and Outcomes (O).

Participants: text snippets that describe the characteristics of the participants. These include the total number of participants, number of participants in the intervention group, number of participants in the control group, condition, age, ethnicity, location of the study, and eligibility.

Intervention and Control: text snippets that identify the intervention and control treatments.

Outcomes: text snippets that identify the outcomes in a study. These include outcomes that were measured, outcome measures, the number of events in the intervention group, and the number of events in the control group.

Outcomes can be classified into binary outcomes and continuous outcomes. Binary outcomes take two values such as the treatment was successful or not. Continuous outcomes take multiple values such as pain which is measured on a numerical scale (pain scores on a scale 0–10). Continuous outcomes are mostly reported as mean, standard deviation, median, or quartiles. The corpus is annotated with different entities to capture the different types of outcomes and their values.

The corpus consists of 1011 manually annotated abstracts. Table  1 shows the frequency of each entity in the corpus. The tags iv, cv, bin, and cont represent intervention group, control group, binary outcome, and continuous outcome respectively. Since binary outcomes numeric texts tend to be absolute values or percentage values, abs and percent are used to represent absolute and percentage values, respectively. Furthermore, for the continuous outcomes we use mean, sd, median, q1, and q3 to represent mean, standard deviation, median, first quartile, and third quartile values, respectively. The corpus is publicly available on our github page. Footnote 3

The architecture of the proposed system is shown in Fig.  1 . The proposed system consists of five major components: research abstracts, data extraction, PICO elements normalization, creating structured data, and aggregation and visualization. The system input is free-text research abstracts. The research abstracts are passed to the data extraction module for pre-processing and extraction of PICO elements. The extracted PICO elements are then normalized using Unified Medical Language System (UMLS) and dictionary string matching techniques. After normalization, numeric texts are parsed to identify the number of patients having certain outcomes and convert the data into a structured format for statistical analysis. Finally, similar studies (same intervention and same outcome) are grouped together and the results are visualized using forest plots which provide a summary and the extent to which results from different studies overlap.

Data extraction


The pre-processing step mainly involves acronym expansion. In research articles, acronyms are frequently used to avoid repeating long terms and save space. Even though acronyms simplify writing and reading, they are a major obstacle to natural language text understanding tasks [ 10 ]. Generally, acronyms can have multiple common expansions which depend on a particular context. Acronyms commonly occur in the words preceding their first occurrence in parentheses, for example, “Randomized controlled trials (RCT) of scalp cooling (SC) to prevent chemotherapy induced alopecia (CIA)”. In this study, we employ a rule-based method using regular expressions for acronym expansion. The first step in identifying acronyms is to look for terms in parenthesis that are between two and ten characters long. Regular expressions are then used to find expansion candidates in the surrounding text.

PICO elements extraction

Data extraction aims to extract PICO elements from research abstracts. This task is formulated as a sequence labelling task, i.e., given a token, classify it as one of pre-defined named entity recognition (NER) tags. As deep learning models have gained a lot of attention in NLP tasks, we adopt Bidirectional Encoder Representations from Transformers (BERT)-based models for this task. BERT has achieved state-of-the-art performance in various NLP tasks including NER and has also proven to be effective for small datasets [ 11 ]. BERT is a language model pre-trained on huge amounts of unlabelled data and can be fine-tuned to specific tasks. It uses the encoder structure of the transformer, which is an attention mechanism that learns contextual relations between words (or subwords) in a text.

We chose three pre-trained transformer-based models, i.e., BioBERT [ 12 ], BlueBERT [ 13 ], and Longformer [ 14 ]. BioBERT is pre-trained on different combinations of general and biomedical domain corpora. It is initialized with BERT [ 11 ] and further pre-trained on biomedical domain texts (PubMed abstracts and PubMed Central full-text articles). BlueBERT is also initialized with BERT and further pre-trained on PubMed abstracts and clinical notes from MIMIC-III [ 15 ]. Longformer is initialized with the RoBERTa model [ 16 ] and further pre-trained with books, wikipedia, realnews, and stories.

Traditional transformer-based language models such as BioBERT and BlueBERT cannot attend to long sequences and are limited to a maximum of 512 tokens at a time. This is due to the self-attention operation which grows quadratically with sequence length. Modified transformer models, such as Longformer, have been created to overcome this problem. In Longformer model, the self-attention pattern scales linearly with sequence length enabling it to process longer documents. It can attend to long sequences of up to 4096 tokens, which is 8 times longer than BERT.

PICO elements normalization

Meta-analysis involves combining similar studies to assess the effectiveness of the intervention (treatment). To automatically group similar studies together and compare them within a meta-study, it is necessary to normalize the extracted PICO elements. We focus on the normalization of the intervention, control, and outcome elements. Our corpus consists of RCTs related to breast cancer, hence all participants are breast cancer patients.

We utilize the UMLS Metathesaurus for the normalization of intervention and control elements. UMLS comprehensively covers most of the interventions and control, especially medications, and hence we did not need to create a normalization dictionary manually. We use MetaMap [ 17 ], which is a state-of-the-art NLP tool that maps biomedical text to concepts in the UMLS Metathesaurus. For each text, MetaMap splits the text into phrases and identifies possible mappings for each phrase based on lexical look-up and variants.

A dictionary-based approach was employed for outcome normalization. We extracted all the outcomes from the corpus and manually created a dictionary of the outcomes and their normalizations. For example, pain, breast pain, less pain, and mild pain are all normalized to pain. After creating the dictionary in this manner, we use dictionary string matching techniques to match outcomes and their normalized versions.

The task of matching an outcome with its normalization is defined as; given a predefined set of normalized outcomes N , and an input string o (outcome), find normalized outcome \(n \in N\) that is most similar to o . For this task, we utilize a technique that combines Term-Frequency Inverse Document Frequency (TF-IDF), n-grams, and cosine similarity. TF-IDF creates features from text by multiplying the frequency of a term in a document (term frequency) by the importance (inverse document frequency) of the term in the entire corpus. In TF-IDF, usually the term is a word, but depending on the corpus, n-grams have been shown to achieve high performance. For each outcome, we represent the outcome as a vector using TF-IDF and calculate the cosine similarity between the outcome vector and the normalized outcomes vectors and select the normalized outcome with the highest cosine similarity score.

Even though BERT-based models are currently widely used for NLP tasks we utilized a traditional string matching approach for outcome normalization. The current corpus contains many different outcomes which vary greatly with some occurring frequently and others occurring less frequently. Although the BERT models achieve high performance for the outcomes with high frequency, they fail for the outcomes with less frequency. Therefore, we adopted the approach of TF-IDF with cosine similarity, which achieves relatively good performance for both high-frequency and low-frequency outcomes.

Outcome event matching and creating structured data

Once PICO elements are extracted and normalized, studies with the same intervention and outcome are pooled together so as to compute the overall effect of the intervention. Before calculating the overall effect of the intervention, each study’s treatment effect is determined first. The effect is usually calculated using summary statistics such as risk ratio, odds ratio, or risk difference. In this study, the extracted and normalized PICO elements are converted into a structured format as shown in Fig.  2 . To compute the summary statistics, for each outcome four values are required, i.e., Ee , Ne , Ec , and Nc . Ee is the number of participants in the intervention group that demonstrated effect of the treatment (intervention events), Ne is the total number of participants in the intervention group, Ec is the number of participants in the control group that demonstrated effect of the treatment (control events), and Nc is the total number of participants in the control group. The summary statistics (risk ratio, odds ratio, and risk difference) used in this study are intended for binary outcomes. Ee and Ec are absolute values that correspond to bin-abs-iv and bin-abs-cv respectively (Table  1 ). Ee and Ec can also be calculated from bin-percent-iv and bin-percent-cv as explained in an example further down.

Extraction of the number of participants having certain outcomes is challenging because of lack of uniformity in reporting of results in different articles. We use a rule-based approach for this task and assume that an outcome and its events are reported within the same sentence. If only one outcome is present in a sentence, we assume that the intervention and control events reported in that sentence belong to that outcome. If two or more outcomes are present in a sentence, the first occurrence of intervention events and control events are assigned to the first outcome, the second occurrence of intervention and control events are assigned to the second outcome, and so on. For example, “Overall survival (100% treated, 90.6% controls at 5 years) and disease-free survival (96.2% treated, 86.8% controls at 5 years) were not significantly different in the 2 groups”, we extract (outcome: overall survival, intervention events: 100%, control events: 90.6%) and (outcome: disease-free survival, intervention events: 96.2%, control events: 86.8%). In this example, only percentage values are reported and hence we require knowledge of the number of participants in the intervention and control groups to calculate the absolute values ( Ee and Ec ). In some studies, the number of participants in the intervention and control groups ( Ne and Nc ) are reported in a different sentence within the abstract (as shown in the sample abstract in Fig.  2 ) while in other studies they are not reported at all. In the rule-based approach, if the number of participants are not mentioned in the outcome sentence, we check if they are mentioned in the other sentences. Moreover, in some studies words instead of numbers are used, for instance, “Sixty-three percent achieved a complete response ...”, and hence we need to convert the words to numbers. Once the abstracts have been processed in this manner, we get structured data as shown in the bottom part of Fig.  2 .

Meta-analysis results visualization system

We developed a web-based visualization system Footnote 4 for visualizing meta-analysis results. The system was developed using Python and R. R is a powerful and flexible tool that is commonly used when conducting meta-analyses. The calculations of summary statistics were implemented using meta [ 18 ], which is an R package commonly used when conducting standard meta-analysis. The results are visualized using forest plots which provide a summary and the extent to which results from different studies overlap. In the forest plot, the effect size of each study is shown and the average effect is shown at the bottom of the plot. Also, in the forest plot, each study is represented by a square whose area represents the weight of the study in the meta-analysis and horizontal line (95% confidence interval).

When using the visualization system, shown in Fig.  3 , a user first uploads a csv file. The file must contain columns for study_name, intervention, control, outcome, Ee , Ne , Ec , and Nc as shown in the bottom part of Fig.  2 . After uploading the file, the user then selects a summary measure and a method for pooling the studies. The available summary measures include risk ratio, odds ratio, and risk difference which are commonly used for binary outcomes. The available pooling methods include inverse variance (Inverse), Mantel-Haenszel (MH), Peto, generalised linear mixed model (GLMM), and sample size method (SSW). For risk ratio and risk difference, only the Inverse or MH pooling methods are used. For odds ratio, inverse, MH, Peto, GLMM, or SSW pooling methods are used. In addition, the user selects the interventions and outcomes for which they would like the results to be visualized. The system groups together similar studies depending on the selected intervention(s) and outcome(s), computes the summary statistics, and returns forest plots. Each forest plot is a summary of studies with the same intervention and the same outcome.

Results and discussion

Experimental settings.

Our corpus consists of 1011 PubMed abstracts annotated with PICO elements. The frequency of the elements is shown in Table  1 . The dataset was split into 80% training set and 20% test set. We developed BERT-based models for data extraction (NER) and compared the performance of general-purpose (Longformer) and biomedical domain (BioBERT, BlueBERT) BERT models. The BioBERT and BlueBERT models cannot attend to sequences longer than 512 tokens (as discussed in the “ PICO elements extraction ” section). BERT uses WordPiece tokenization and a word can be broken down into more than one sub-words. In the corpus, some abstracts were found to have more than 512 tokens. The default strategy for the BioBERT and BlueBERT models is to truncate long sequences and ignore the tokens after the maximum number is reached. Since truncation leads to loss of information, we split sequences longer than the maximum length into multiple chunks so as to preserve all the information. The split was done in a sentence-wise manner, i.e., if the number of tokens in an abstract is more than 512, we split the abstract into individual sentences, then split the sentences into two halves to create two almost equal chunks. If the number of tokens is greater than 1024, the abstracts are split into three chunks and so on.

In the experiments, we followed the standard pre-trained BERT models for sequence classification. The pre-trained models were fine-tuned on our corpus. The fine-tuning was done by setting the maximum sequence length to 512 tokens for the BioBERT and BlueBERT models and 4096 tokens for the Longformer model. The number of epochs was set to 10, batch size was set to 2, and the learning rate was set to 2e-5 for the BioBERT model and 5e-5 for BlueBERT and Longformer models.

Data extraction results

The performance of the NER model was evaluated using Precision, Recall, and F1 score in the test set and the results are shown in Table  2 . BioBERT_split and BlueBERT_split are the model results where sequences longer than 512 tokens were split into multiple chunks. The Longformer model did not require splitting of abstracts because the maximum sequence length for Longformer is 4096 tokens and there were no abstracts with tokens exceeding the maximum number.

The performance was relatively high with sub-categories such as total-participants and outcome-measure achieving F1-scores greater than 0.90. Most of the other sub-categories achieved F1-scores greater than 0.80. F1-score was zero for the entities with lowest frequency such as cont-q1-iv, cont-q1-cv, cont-q3-iv, and cont-q3-cv. In overall, BioBERT and Longformer models achieved the highest performance in almost all of the entities.

The Longformer model, which is a general purpose model, performed well compared to the biomedical domain BERT models (BioBERT and BlueBERT). One likely explanation is that the biomedical domain BERT models have a maximum sequence length of 512 tokens and longer sequences are truncated resulting in loss of important contextual information. The Longformer model has a maximum sequence length of 4096 tokens and could therefore build contextual representation of the entire context.

The splitting of long sequences was expected to increase model performance, however, there was no change in the model performance. This could be attributed to loss of useful contexts caused by splitting. However, in this study it is necessary to extract information from the entire abstract. The default strategy for BERT models is to truncate long texts hence leading to loss of important information. The purpose of splitting the abstracts into multiple chunks was to enable extraction of information from the entire abstracts. Even though splitting the abstracts did not improve the performance, we were able to avoid loss of information due to truncation.

Even though automatic extraction of PICO elements from abstracts has been studied widely, only a few studies have attempted extraction of numeric texts that identify the number of patients experiencing specific outcomes. We developed a rule-based approach (discussed in “ Outcome event matching and creating structured data ” section) to parse numeric texts to identify the patients having certain outcomes. The rule-based approach was able to extract outcomes and their events from 77% of the outcome sentences in the gold test set. The rule-based approach however cannot extract outcomes and their events in cases where the outcomes and events are reported in different sentences or in studies other than double-arm studies (one intervention group and one control group).

System evaluation

To evaluate the performance of the proposed system, we selected a published meta-analysis and used our system to reproduce the results. The selected meta-analysis was conducted by Feng et al. [ 19 ] and examines the effect of platinum-based neoadjuvant chemotherapy on resectable triple-negative breast cancer patients. The meta-analysis consists of nine studies, Alba et al. [ 20 ], Ando et al. [ 21 ], Gluz et al. [ 22 ], Loibl et al. [ 23 ], Sikov et al. [ 24 ], Tung et al. [ 25 ], Minckwitz et al. [ 26 ], Wu et al. [ 27 ], and Zhang et al. [ 28 ].

The results are shown in Table  3 . The NER model successfully extracted data from the abstracts of the nine studies. There was a NER model prediction error in one study as shown in bold underlined text in Table  3 . For the study Gluz et al. [ 22 ] and pathological complete response outcome, the model misclassified Ne as Nc and vice-versa. In this study, the Ee and Ec values were reported as percentage values. The absolute values of Ee and Ec were therefore calculated based on the Ne and Nc values (as discussed in “ Outcome event matching and creating structured data ” section). Since the system extracted Ne and Nc values were incorrect, the calculated Ee and Ec values were also incorrect.

Although the NER model had high accuracy, there were other factors that prevented the full reproduction of the meta-analysis. The italic and underlined texts represent studies where extra post-processing steps were required. For instance, for the studies Loibl et al. [ 23 ] and Sikov et al. [ 24 ], and pathological complete response, the studies have multiple intervention and control groups. The Gluz et al. [ 22 ] and Minckwitz et al. [ 26 ] studies, for the pathological complete response outcome, the abstracts report results for different sub-groups. The current system considers only double-arm studies (studies with one intervention group and one control group) and does not perform subgroup analysis, and these will be one of our important future works. Moreover, in some studies, the total number of participants in the intervention and control groups ( Ne and Nc ) were not reported in the abstracts. The studies where the numbers were not reported are indicated as NA in Table  3 . In the Sikov et al. [ 24 ] and Tung et al. [ 25 ] studies, we were not able to calculate the absolute values for Ee and Ec because their calculation depends on the Ne and Nc values which were not reported in the abstracts.

Error analysis

We performed an error analysis and identified miclassified entities and boundary detection as the major types of errors.

Misclassified entities: the model detected the correct boundaries for entities but assigned them the wrong classes. For example, the model sometimes misclassified bin-abs-iv as bin-abs-cv and vice versa (as discussed in the “ System evaluation ” section).

Boundary detection: this is where the model identifies shorter or longer entities than those marked in the gold set. The boundary detection error was common in the outcome and eligibility entities. Human annotation could contribute to this error, because sometimes it is difficult to decide the start and end spans of some entities.

Limitations and future work

Our study has several limitations. This study uses abstracts only and as seen in the “ System evaluation ” section, abstracts sometimes lack information that is present in the full-text document. For instance, a manual check of our corpus found that a significant number of abstracts do not mention the number of participants in the intervention and control groups. This presents a challenge when determining the number of patients having certain outcomes for statistical analysis. We also do not account for participants who drop out of a study and this might affect the final results. For future work, it is important to consider extracting information from full-text articles.

We proposed a rule-based system for matching outcomes and their events (discussed in “ Outcome event matching and creating structured data ” section). The rule-based approach considers only double-arm studies, i.e., studies with one intervention group and one control group. Single-arm studies and studies with more than multiple intervention or control groups are ignored. In future, it is necessary to explore other approaches such as relation extraction.

In the statistical analysis step, we consider only binary outcomes. The summary statistics (odds ratio, risk ratio, and risk difference) used in our results visualization system are only focused on binary outcomes. Incorporating continuous outcomes and their summary statistics is important future work. Moreover, some meta-analyses perform subgroup analysis where they compare the results of different subgroups of participants either by age or cancer type. Annotation and incorporation of such information is also necessary in future. Finally, we assessed the performance of the proposed system by replicating the results of an existing meta-study. To substantiate the usefulness of the system, it is important to test it on larger and more complex meta-studies.

In this paper, we proposed a system for automating data extraction to support meta-analysis statistical analysis. Our objective is to provide a system that automates data extraction and statistical analysis, to shorten the time it takes to carry out a meta-analysis and allow for automatic updates when new results becomes available. The proposed system extracts PICO elements from research abstracts, parses numeric outcomes to extract the number of patients experiencing certain outcomes, transforms the extracted information into a structured format, performs statistical analysis, and visualizes the results in forest plots. We evaluated the performance of the system by attempting to reproduce the results of an existing meta-analysis. The system extracted PICO elements from the studies with high accuracy. The statistical analysis step did not perform well owing to lack of some information in the abstracts and lack of uniformity in the research abstracts were some abstracts required extra pre-processing. These results however show that there is potential to automate these tasks and wish to motivate more research towards fully automating the entire meta-analysis process.

Availability of data and materials

The dataset used in this article can be freely and openly accessed at our github page: https://github.com/sociocom/PICO-Corpus .

https://www.nlm.nih.gov/bsd/pmresources.html .

https://www.nlm.nih.gov/medline/medline_overview.html .

https://github.com/sociocom/PICO-Corpus .

https://aoi.naist.jp/autometavisualization/ .


Participants, intervention, control, and outcomes

Named entity recognition

Natural language processing

Randomized controlled trials

Unified Medical Language System

Term-frequency inverse document frequency

Bidirectional encoder representations from transformers


Generalised linear mixed model

Sample size method

Number of events in the intervention group

Number of participants in the control group

Number of events in the control group

Gopalakrishnan S, Ganeshkumar P. Systematic reviews and meta-analysis: understanding the best evidence in primary healthcare. J Fam Med Primary Care. 2013;2(1):9.

Article   CAS   Google Scholar  

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: How will we ever keep up? PLoS Med. 2010;7(9): e1000326.

Article   Google Scholar  

Wang LL, Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform. 2021;22(2):781–99.

Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2): e012545.

Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4(1):1–16.

Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8(1):1–10.

Pradhan R, Hoaglin DC, Cornell M, Liu W, Wang V, Yu H. Automatic extraction of quantitative data from ClinicalTrials.gov to conduct meta-analyses. J Clin Epidemiol. 2019;105:92–100.

Summerscales RL, Argamon S, Bai S, Hupert J, Schwartz A. Automatic summarization of results from clinical trials. In: 2011 IEEE international conference on bioinformatics and biomedicine. IEEE; 2011. p. 372–7.

Pouran Ben Veyseh A, Dernoncourt F, Nguyen TH, Chang W, Celi LA. Acronym identification and disambiguation shared tasks for scientific document understanding. arXiv e-prints. 2020;p. arXiv-2012.

Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 . 2018.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.

CAS   PubMed   Google Scholar  

Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 . 2019.

Beltagy I, Peters ME, Cohan A. Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 . 2020.

Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1–9.

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 . 2019.

Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA symposium. American Medical Informatics Association; 2001. p. 17.

Schwarzer G, et al. meta: an R package for meta-analysis. R News. 2007;7(3):40–5.

Google Scholar  

Feng W, He Y, Zhang H, Si Y, Xu J, Xu J, et al. A meta-analysis of the effect and safety of platinum-based neoadjuvant chemotherapy in treatment of resectable triple-negative breast cancer. Anti-cancer Drugs. 2022;33(1):e52–60.

Alba E, Chacon J, Lluch A, Anton A, Estevez L, Cirauqui B, et al. A randomized phase II trial of platinum salts in basal-like breast cancer patients in the neoadjuvant setting. Results from the GEICAM/2006-03, multicenter study. Breast Cancer Res Treat. 2012;136(2):487–93.

Ando M, Yamauchi H, Aogi K, Shimizu S, Iwata H, Masuda N, et al. Randomized phase II study of weekly paclitaxel with and without carboplatin followed by cyclophosphamide/epirubicin/5-fluorouracil as neoadjuvant chemotherapy for stage II/IIIA breast cancer without HER2 overexpression. Breast Cancer Res Treat. 2014;145(2):401–9.

Gluz O, Nitz U, Liedtke C, Christgen M, Grischke EM, Forstbauer H, et al. Comparison of neoadjuvant nab-paclitaxel+ carboplatin vs nab-paclitaxel+ gemcitabine in triple-negative breast cancer: randomized WSG-ADAPT-TN trial results. J Natl Cancer Inst. 2018;110(6):628–37.

Loibl S, O’Shaughnessy J, Untch M, Sikov WM, Rugo HS, McKee MD, et al. Addition of the PARP inhibitor veliparib plus carboplatin or carboplatin alone to standard neoadjuvant chemotherapy in triple-negative breast cancer (BrighTNess): a randomised, phase 3 trial. Lancet Oncol. 2018;19(4):497–509.

Sikov WM, Berry DA, Perou CM, Singh B, Cirrincione CT, Tolaney SM, et al. Impact of the addition of carboplatin and/or bevacizumab to neoadjuvant once-per-week paclitaxel followed by dose-dense doxorubicin and cyclophosphamide on pathologic complete response rates in stage II to III triple-negative breast cancer: CALGB 40603 (Alliance). J Clin Oncol. 2015;33(1):13.

Tung N, Arun B, Hacker MR, Hofstatter E, Toppmeyer DL, Isakoff SJ, et al. TBCRC 031: randomized phase II study of neoadjuvant cisplatin versus doxorubicin-cyclophosphamide in germline BRCA carriers with HER2-negative breast cancer (the INFORM trial). J Clin Oncol. 2020;38(14):1539.

Von Minckwitz G, Schneeweiss A, Loibl S, Salat C, Denkert C, Rezai M, et al. Neoadjuvant carboplatin in patients with triple-negative and HER2-positive early breast cancer (GeparSixto; GBG 66): a randomised phase 2 trial. Lancet Oncol. 2014;15(7):747–56.

Wu X, Tang P, Li S, Wang S, Liang Y, Zhong L, et al. A randomized and open-label phase II trial reports the efficacy of neoadjuvant lobaplatin in breast cancer. Nat Commun. 2018;9(1):1–8.

Zhang P, Yin Y, Mo H, Zhang B, Wang X, Li Q, et al. Better pathologic complete response and relapse-free survival after carboplatin plus paclitaxel compared with epirubicin plus paclitaxel as neoadjuvant chemotherapy for locally advanced triple-negative breast cancer: a randomized phase 2 trial. Oncotarget. 2016;7(37):60647.

Download references


This work was supported by JST, AIP Trilateral AI Research, Grant Number JPMJCR20G9, Japan.

Not applicable.

Author information

Authors and affiliations.

Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan

Faith Wavinya Mutinda, Kongmeng Liew, Shuntaro Yada, Shoko Wakamiya & Eiji Aramaki

You can also search for this author in PubMed   Google Scholar


E.A. and F.M. proposed the original idea of the study. F.M., S.Y., and S.W. developed the corpus. F.M. conducted the experiments. All authors discussed and analyzed the results. F.M. took the lead in drafting the manuscript. K.L., S.Y., S.W., and E.A. provided critical feedback that helped shape the manuscript. S.W. and E.A. supervised the project. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Eiji Aramaki .

Ethics declarations

Ethics approval and consent to participate.

All experiments were performed in accordance with relevant guidelines and regulations.

Consent for publication

Competing interests.

The authors have no competing interests to declare.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Mutinda, F.W., Liew, K., Yada, S. et al. Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer. BMC Med Inform Decis Mak 22 , 158 (2022). https://doi.org/10.1186/s12911-022-01897-4

Download citation

Received : 22 March 2022

Accepted : 07 June 2022

Published : 18 June 2022

DOI : https://doi.org/10.1186/s12911-022-01897-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Automatic meta-analysis
  • Natural language processing (NLP)
  • Automatic data extraction
  • Named entity recognition (NER)
  • Evidence-based medicine

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

case study and meta analysis

Loading metrics

Open Access


Research Article

Combined and progestagen-only hormonal contraceptives and breast cancer risk: A UK nested case–control study and meta-analysis

Roles Formal analysis, Investigation, Methodology, Project administration, Writing – original draft

Affiliations Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom, Adelaide Medical School, Faculty of Health and Medical Sciences, University of Adelaide, South Australia, Australia

ORCID logo

Roles Formal analysis, Validation, Visualization, Writing – review & editing

* E-mail: [email protected]

Affiliation Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom

Roles Funding acquisition, Supervision, Writing – review & editing

Roles Conceptualization, Funding acquisition, Methodology, Project administration, Writing – review & editing

† Deceased.

Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

  • Danielle Fitzpatrick, 
  • Kirstin Pirie, 
  • Gillian Reeves, 
  • Jane Green, 
  • Valerie Beral


  • Published: March 21, 2023
  • https://doi.org/10.1371/journal.pmed.1004188
  • Reader Comments

Table 1

Current or recent use of combined oral contraceptives (containing oestrogen+progestagen) has been associated with a small increase in breast cancer risk. Progestagen-only contraceptive use is increasing, but information on associated risks is limited. We aimed to assess breast cancer risk associated with current or recent use of different types of hormonal contraceptives in premenopausal women, with particular emphasis on progestagen-only preparations.

Methods and findings

Hormonal contraceptive prescriptions recorded prospectively in a UK primary care database (Clinical Practice Research Datalink [CPRD]) were compared in a nested case–control study for 9,498 women aged <50 years with incident invasive breast cancer diagnosed in 1996 to 2017, and for 18,171 closely matched controls. On average, 7.3 (standard deviation [SD] 4.6) years of clinical records were available for each case and their matched controls prior to the date of diagnosis. Conditional logistic regression yielded odds ratios (ORs) and 95% confidence intervals (CIs) of breast cancer by the hormonal contraceptive type last prescribed, controlled for age, GP practice, body mass index, number of recorded births, time since last birth, and alcohol intake. MEDLINE and Embase were searched for observational studies published between 01 January 1995 and 01 November 2022 that reported on the association between current or recent progestagen-only contraceptive use and breast cancer risk in premenopausal women. Fixed effects meta-analyses combined the CPRD results with previously published results from 12 observational studies for progestagen-only preparations.

Overall, 44% (4,195/9,498) of women with breast cancer and 39% (7,092/18,171) of matched controls had a hormonal contraceptive prescription an average of 3.1 (SD 3.7) years before breast cancer diagnosis (or equivalent date for controls). About half the prescriptions were for progestagen-only preparations. Breast cancer ORs were similarly and significantly raised if the last hormonal contraceptive prescription was for oral combined, oral progestagen-only, injected progestagen, or progestagen-releasing intrauterine devices (IUDs): ORs = 1.23 (95% CI [1.14 to 1.32]; p < 0.001), 1.26 (95% CI [1.16 to 1.37]; p < 0.001), 1.25 (95% CI [1.07 to 1.45]; p = 0.004), and 1.32 (95% CI [1.17 to 1.49]; p < 0.001), respectively. Our meta-analyses yielded significantly raised relative risks (RRs) for current or recent use of progestagen-only contraceptives: oral = 1.29 (95% CI [1.21 to 1.37]; heterogeneity χ 2 5 = 6.7; p = 0.2), injected = 1.18 (95% CI [1.07 to 1.30]; heterogeneity χ 2 8 = 22.5; p = 0.004), implanted = 1.28 (95% CI [1.08 to 1.51]; heterogeneity χ 2 3 = 7.3; p = 0.06), and IUDs = 1.21 (95% CI [1.14 to 1.28]; heterogeneity χ 2 4 = 7.9; p = 0.1). When the CPRD results were combined with those from previous published findings (which included women from a wider age range), the resulting 15-year absolute excess risk associated with 5 years use of oral combined or progestagen-only contraceptives in high-income countries was estimated at: 8 per 100,000 users from age 16 to 20 years and 265 per 100,000 users from age 35 to 39 years. The main limitation of the study design was that, due to the nature of the CPRD data and most other prescription databases, information on contraceptive use was recorded during a defined period only, with information before entry into the database generally being unavailable. This means that although our findings provide evidence about the short-term associations between hormonal contraceptives and breast cancer risk, they do not provide information regarding longer-term associations, or the impact of total duration of contraceptive use on breast cancer risk.


This study provides important new evidence that current or recent use of progestagen-only contraceptives is associated with a slight increase in breast cancer risk, which does not appear to vary by mode of delivery, and is similar in magnitude to that associated with combined hormonal contraceptives. Given that the underlying risk of breast cancer increases with advancing age, the absolute excess risk associated with use of either type of oral contraceptive is estimated to be smaller in women who use it at younger rather than at older ages. Such risks need be balanced against the benefits of using contraceptives during the childbearing years.

Author summary

Why was this study done.

  • Use of combined oral contraceptives has been associated with a small transient increase in breast cancer risk, but there is limited data about the effect of progestagen-only contraceptives on breast cancer risk.
  • Use of progestagen-only hormonal contraceptives has increased substantially over the last decade, and in 2020, there were almost as many prescriptions in England for oral progestagen-only contraceptives as for combined oral contraceptives.
  • Given the increasing use of progestagen-only contraceptives, it is important to understand how their use is associated with breast cancer risk.

What did the researchers do and find?

  • We carried out a nested case–control study in the Clinical Practice Research Datalink (CPRD), including almost 10,000 women aged <50 years with breast cancer, to assess the relationship between a woman’s recent use of hormonal contraceptives and her subsequent risk of breast cancer.
  • In our study, current or recent use of hormonal contraceptives was associated with a similarly increased risk of breast cancer regardless of whether the preparation last used was oral combined, oral progestagen-only, injectable progestagen, progestagen implant, or progestagen intrauterine device.
  • When our findings for progestagen-only contraceptives were combined with those of previous studies, there was evidence of a broadly similar increased risk of breast cancer in current and recent users of all four types of progestagen-only preparations.

What do these findings mean?

  • Our findings suggest that there is a relative increase of around 20% to 30% in breast cancer risk associated with current or recent use of either combined oral or progestagen-only contraceptives.
  • When our findings for oral contraceptives are combined with results from previous studies (which included women in a wider age range), they suggest that the 15-year absolute excess risk of breast cancer associated with use of oral contraceptives ranges from 8 per 100,000 users (an increase in incidence from 0.084% to 0.093%) for use from age 16 to 20 to about 265 per 100,000 users (from 2.0% to 2.2%) for use from age 35 to 39.
  • These excess risks must be viewed in the context of the well-established benefits of contraceptive use in women’s reproductive years.
  • The lack of complete information on a woman’s prescription history means that this study was unable to assess the long-term associations of contraceptive use on breast cancer risk, but this should not have unduly affected the findings regarding their short-term associations.

Citation: Fitzpatrick D, Pirie K, Reeves G, Green J, Beral V (2023) Combined and progestagen-only hormonal contraceptives and breast cancer risk: A UK nested case–control study and meta-analysis. PLoS Med 20(3): e1004188. https://doi.org/10.1371/journal.pmed.1004188

Received: September 26, 2022; Accepted: February 1, 2023; Published: March 21, 2023

Copyright: © 2023 Fitzpatrick et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data used in this study were obtained from the Clinical Practice Research Datalink (CPRD). The licencing agreement between the University of Oxford and CPRD, and the data governance of CPRD, prevent the authors from distributing these data to other persons. Access to the data are available from CPRD for researchers who meet the criteria for access at https://cprd.com/data-access .

Funding: Central data collection, checking, analysis, and manuscript preparation (DF, KP, GR, JG, VB) was supported by the core funding of the Cancer Epidemiology Unit by Cancer Research UK (C570/A16491 and A29186) and the Medical Research Council (MR/K02700X/1). DF was funded through the Rhodes Trust. The funders had no role in study design, data collection, analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BMI, body mass index; CI, confidence interval; CPRD, Clinical Practice Research Datalink; ER, oestrogen-receptor; GP, general practitioner; IUD, intrauterine device; NHS, National Health Service; OR, odds ratio; RR, relative risk; SD, standard deviation


A meta-analysis of the worldwide evidence on breast cancer risk associated with use of combined (containing oestrogens plus progestagens) oral contraceptives in 1996 found a slightly increased risk in current or recent users that declined after use ceased, with no apparent excess risk 10 or more years after cessation [ 1 ]. At that time, there was limited information on risks associated with hormonal contraceptives containing only progestagens. Published evidence since then on premenopausal breast cancer risk associated with use of progestagen-only contraceptives is limited [ 2 – 12 ].

Use of the different types of hormonal contraceptives has changed over time, with recent increases in use of progestagen-only preparations, both as oral and as long-acting parenteral formulations such as injectables, implants, and progestagen-releasing intrauterine devices (IUDs). In England, for example, prescriptions for oral progestagen-only contraceptives almost doubled in the last decade (from 1.9 to 3.3 million from 2010 to 2020); and in 2020, there were almost as many prescriptions for oral progestagen-only contraceptives as for oral combined contraceptives (3.3 million of each) [ 13 ]. Given the trend towards increasing use of progestagen-only contraceptives, it is important to reliably quantify their effects on breast cancer risk.

We aimed to assess breast cancer risk associated with current or recent use of different types of hormonal contraceptives in premenopausal women, with particular emphasis on progestagen-only preparations. We present new data on breast cancer risk associated with prospectively recorded prescriptions for hormonal contraceptives in women aged <50 years in the United Kingdom (UK) primary care Clinical Practice Research Datalink (CPRD) and conduct meta-analyses of breast cancer risk associated with current or recent progestagen-only hormonal contraceptives, combining the new and previously published findings.

This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline ( S1 STROBE Checklist ) and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline ( S1 PRISMA Checklist ). All analyses were done in STATA version 17.0, and graphs were generated using the R package Jasper [ 14 ].

Clinical Practice Research Datalink (CPRD)

The CPRD is a computerised UK primary care database containing anonymised, linked, and prospective medical records for approximately 11 million individuals registered with a National Health Service (NHS) general practitioner (GP) [ 15 ]. As of 2013, approximately 7% of the UK population were active participants in CPRD [ 15 ]. CPRD’s Independent Scientific Advisory Committee approved the study protocol in 2011 (10_152), with an amendment for an updated dataset approved in 2017 ( S1 Protocol ).

Study design

The association between use of hormonal contraceptives and invasive breast cancer risk in CPRD was studied using a nested case–control design. Although the original protocol allowed for the assessment of hormonal contraceptive use in relation to both in situ and invasive breast cancer, we present findings here for invasive breast cancer only since this was the primary outcome of interest. Cases are all women aged 20 to 49 years with incident invasive breast cancer recorded between 1 January 1996 and 20 September 2017, with no prior record of incident in situ breast cancer. Invasive breast cancer was defined using CPRD Read codes for the disease ( S1 File ) [ 16 ]. Oestrogen-receptor (ER) status of the tumours was not recorded, and so this was assessed by the presence of one of more prescriptions for tamoxifen and/or aromatase inhibitors up to 3 years after the cancer diagnosis date; for those with <12 months follow-up after diagnosis date, ER status was classified as unknown.

For each case, the “observation period” (the period during which reliable prescription data were available before diagnosis) was defined as starting either from 1 January 1995 or from the date of entry into an up-to-standard CPRD practice (whichever was later) and ending at the date of diagnosis. Two controls were selected for each case, matched on index date (date of diagnosis of the case), year of birth (+/−2 years), general practice, and observation period (the duration of observation prior to the index date for the control had to be at least as long as that of the case). Controls were selected from women with no record of invasive or in situ breast cancer before 20 September 2017. The resulting sample size was deemed sufficient to detect relevant effect sizes ( S1 Protocol ). To ensure identical opportunities for ascertainment of prescribing in cases and controls, the observation period for each matched control was truncated to be exactly the same time period as for the matched case. Both cases and controls were required to have a minimum of 12 months of follow-up prior to the index date.

Women were defined as having a prescription for hormonal contraceptives if they had one or more prescriptions for any hormonal contraceptive during the observation period. Nonusers were defined as women having no such prescription. We used the British National Formulary system (BNF sections 7.3.1 and 7.3.2 [ 17 ]) to classify the hormonal contraceptive preparation last prescribed: oral combined contraceptive, oral progestagen-only contraceptive, injectable progestagen, progestagen implant, or progestagen-releasing IUD. Current users of oral contraceptives were defined as women whose last prescription was <12 months prior to the index date; their duration of use during the observation window was calculated as the time between the first and last recorded prescription. Prescriptions for nonhormonal copper IUDs (BNF section 7.3.4 [ 17 ]) were also extracted. The small number of women whose last prescription was the combined contraceptive vaginal ring or the combined contraceptive patch were classified as other users. Prescriptions for emergency contraceptives were not included. Cases and controls with one or more prescriptions for hormone therapy for the menopause (BNF section [ 17 ]; 2,032 women in total) were excluded since such women are likely to be postmenopausal, which would confound comparisons.

Statistical analysis

A matched analysis was done using conditional logistic regression to calculate odds ratios (ORs) and 95% confidence intervals (CIs) for incident invasive breast cancer in women with one or more hormonal contraceptive prescriptions compared to women with no such prescription during the observation period. We also examined ORs separately in women with one or more hormonal contraceptive prescriptions by type of preparation last prescribed. Previous evidence suggests that the effect of hormonal contraceptive use on breast cancer risk lasts for up to 10 years after use ceases [ 1 ]. In order to assess the impact of potential confounding by prior use of other types of hormonal contraceptives, therefore, we further examined risks in the subset of women with an observation period of at least 10 years and no recorded use of other hormonal contraceptives within the observation window. All analyses were adjusted for number of recorded births (0, 1 to 2, 3+ births recorded before the index date, which included births before the observation period), time since last recorded birth (<5, 5 to 10 years, no record of birth within the observation period), body mass index (BMI <20, 20 to 24.9, 25 to 29.9, 30+ kg/m 2 ), and alcohol intake (non/past drinker, drinker). For alcohol intake and BMI, we used the most recent record in the 10 years prior to 6 months before the index date. All of these adjustment variables had been specified in the original study protocol except for time since last birth, which was additionally adjusted for due to its observed association with case control status in these data, and its likely relationship with recent use of hormonal contraceptives. In a slight deviation from the original protocol, no adjustment was made for smoking status as it was not considered to be a substantial risk factor for breast cancer in premenopausal women. Women with missing values were assigned to a separate category, and sensitivity analyses were done restricting analyses to women with known values for these variables. Likelihood ratio tests were used to assess evidence of heterogeneity in risks across subgroups of women.

Other sensitivity analyses assessed robustness of results with respect to use of any hormonal contraceptive: by restricting the definition of hormonal contraceptive exposure to at least 2 prescriptions for hormonal contraceptives; by restricting analyses to women with an observation period of ≥5 years; and by excluding women with a history of hysterectomy, tubal ligation, and/or bilateral oophorectomy. To assess possible bias associated with those prescribed hormonal contraceptives having more frequent contact with GPs than the average, we also examined the relationship between breast cancer risk and other regularly prescribed medications that were not expected to be associated with breast cancer risk: nonsedating antihistamines, antibacterial eye preparations, and corticosteroids for asthma and other respiratory conditions.

The absolute excess incidence of breast cancer in women who used any type of oral contraceptive at different ages was estimated by applying relative risks (RRs) by time since last oral contraceptive use (combining the CPRD results with previously published findings; [ 1 ]) to breast cancer incidence rates in nonusers of hormonal contraceptives, using UK age-specific breast cancer rates, which are typical for rates in high-income countries ( S2 File ) [ 18 ].


In November 2022, two authors (DF and KP) independently searched MEDLINE/PubMed and Embase for studies published between 1 January 1995 and 1 November 2022 that reported RRs and CIs for breast cancer associated with current and/or recent use of progestagen-only contraceptives compared to never-use in premenopausal women. Information was extracted independently by both DF and KP, and reference lists of included studies and relevant systematic reviews were searched for further references. The following search terms were used:


(("contraceptive agents"[MeSH Terms] OR "contraceptive devices"[MeSH Terms] OR "contracept*"[Title/Abstract] OR "intrauterine device*"[Title/Abstract] OR "IUD"[Title/Abstract]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[Title/Abstract] AND ("tumour*"[Title/Abstract] OR "tumor*"[Title/Abstract] OR "cancer*"[Title/Abstract] OR "carcinoma*"[Title/Abstract] OR "malignan*"[Title/Abstract]))) AND 1995/01/01:2022/11/01[Date—Publication]) NOT ("case reports"[Publication Type] OR "editorial"[Publication Type] OR "letter"[Publication Type] OR "comment"[Publication Type])

  • intrauterine contraceptive device/
  • contraception/ or contraceptive agent/ or contracept*.mp.
  • 1 or 2 or 3
  • breast cancer/
  • limit 6 to human
  • limit 7 to yr = "1995 -Current"
  • limit 8 to adult <18 to 64 years>

Eligible studies are those that reported RRs and CIs for breast cancer associated with current or recent use of progestagen-only contraceptives in premenopausal women. Studies that lacked information on recency of use, were restricted to specific patient populations, or included postmenopausal women only were excluded. Information was extracted from each of the included study reports by one reviewer (DF) and checked by another (KP). Information was sought on study characteristics (country, year of publication, study design), study participants (age at diagnosis, menopausal status), analysis characteristics (exposure definition, adjustment factors used), and study results (number of exposed/unexposed cases, RRs and CIs). This information was tabulated in order to confirm that each study met the eligibility criteria, and to enable assessment of each study in terms of factors that may lead to biased estimates, such as study design and adjustment for confounding. Sensitivity analyses explored the likely impact of potential sources of bias by restricting analyses to studies with particular characteristics. Funnel plots were produced to assess small study effects. This literature review was not registered, and a protocol was not prepared. Where necessary, results reported by fine categories of time since last use were combined in order to produce categories that were more comparable with other studies [ 19 ]. Summary RRs, combining study-specific results, were calculated as weighted averages with weights proportional to the inverse of the variance of the study-specific log RR. Chi-squared tests were used to assess heterogeneity across studies.

The main analyses of CPRD data included 9,498 breast cancer cases and 18,171 closely matched controls ( Table 1 ). By design, cases and controls were the same age at index date (mean 43 [SD 5] years) and had identical observation periods (mean 7.3 [SD 4.6] years, range 1 to 22 years). Overall, 2% of the cases and controls were aged below 30 years, 21% were aged 30 to 39 years, and 77% aged 40 to 49 years. The characteristics of the cases and controls were similar, except that cases were somewhat more likely than controls to have had a recent birth recorded within the observation period; this was adjusted for in the main analyses.


  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image


During the observation period, 4,195 cases (44%) and 7,092 controls (39%) had one or more prescriptions for a hormonal contraceptive, and around two-thirds of these women (7,511/11,287; 67%) had been prescribed only one type of hormonal contraceptive during their observation window. Prescriptions varied considerably by age: for example, among 20- to 29-year-olds, 67% had received a prescription for any hormonal contraceptive in the previous 5 years (among whom the type last prescribed was: 77% oral combined, 12% oral progestagen-only, and 2% progestagen-releasing IUD); among 30- to 39-year-olds, 48% had received a prescription (60% oral combined, 23% oral progestagen-only, and 7% progestagen-releasing IUD); and among 40- to 49-year-olds, 25% had received a prescription (34% oral combined, 38% oral progestagen-only, and 17% progestagen-releasing IUD).

Compared to women with no hormonal contraceptive prescriptions during the observation period, women with at least 1 prescription had significantly increased odds of incident breast cancer (unadjusted OR = 1.33; 95% CI [1.26 to 1.41]; adjusted OR = 1.25; 95% CI [1.18 to 1.33]; p < 0.001). The mean time between the last hormonal contraceptive prescription and diagnosis of breast cancer (or equivalent date for controls) was 3.1 (SD 3.7) years. Fig 1 shows the OR for breast cancer associated with one or more prescriptions by the type of hormonal contraceptive last prescribed. All ORs were increased and did not vary by the type of hormonal contraceptive last prescribed (test for heterogeneity p = 0.9). For the oral combined, oral progestagen-only, injectable progestagen, progestagen implant, and progestagen IUD, the ORs were, respectively, 1.23 (95% CI [1.14 to 1.32]; p < 0.001), 1.26 (95% CI [1.16 to 1.37]; p < 0.001), 1.25 (95% CI [1.07 to 1.45]; p = 0.004), 1.22 (95% CI [0.93 to 1.59]; p = 0.2), and 1.32 (95% CI [1.17 to 1.49]; p < 0.001); every OR was significantly elevated, except for implanted progestagens, where the numbers were small and the CI correspondingly wide. To examine the extent to which these associations may have been affected by confounding with prior use of other types of hormonal contraceptives, we repeated this analysis among the 7,473 women with an observation period of at least 10 years, and restricting to women with no recorded use of any other hormonal contraceptive type within their observation period. The results were broadly similar for each type of hormonal contraceptive, respectively, with estimated ORs of 1.32 (95% CI [1.15 to 1.52]; p < 0.001), 1.35 (95% CI [1.09 to 1.65]; p = 0.005), 1.17 (95% CI [0.82 to 1.68]; p = 0.4), 1.39 (95% CI [0.55 to 3.52]; p = 0.7), and 1.40 (95% CI [1.04 to 1.87]; p = 0.03). All of these ORs were significantly elevated with the exception of those for injected progestagens and implanted progestagens, for which the numbers of exposed cases were extremely small (47 and 7, respectively), but there was no evidence that these ORs were materially different from the corresponding estimates from the main analysis.


Data from the CPRD. Adjusted ORs are adjusted for time since last birth, number of recorded births, BMI, and alcohol intake. P values are based on the relevant Wald tests. BMI, body mass index; CI, confidence interval; CPRD, Clinical Practice Research Datalink; IUD, intrauterine device; OR, odds ratio.


Oral contraceptives (either combined or progestagen-only) are effective only while they are being used, whereas injected, implanted, and intrauterine hormone-releasing contraceptives can be effective for months or even years [ 20 ]. To examine for any persistent effects after exposure to the hormones ceased, analyses focussed only on women who were last prescribed oral preparations as it is unclear when hormonal exposure would have ceased for those who last used nonoral preparations ( Fig 2 ). Among current users of these oral preparations (among whom the last prescription was an average of 0.3 years prior to the index date), there was a 33% excess risk of breast cancer compared to women with no hormonal contraceptive prescription (OR = 1.33; 95% CI [1.23 to 1.44]; p < 0.001). The ORs declined by time since last use (test for heterogeneity p = 0.01), although only around a quarter of all cases had their last prescription more than 5 years previously. In every category of time since last use, ORs did not differ between users of oral combined and of oral progestagen-only contraceptives: for example, among the current users, the ORs were 1.38 (95% CI [1.24 to 1.52]; p < 0.001) and 1.28 (95% CI [1.15 to 1.42]; p < 0.001), respectively.


Data from the CPRD. Adjusted ORs are adjusted for time since last birth, number of recorded births, BMI, and alcohol intake. P values are based on the relevant Wald tests. BMI, body mass index; CI, confidence interval; CPRD, Clinical Practice Research Datalink; OR, odds ratio.


Fig 3 shows the ORs for breast cancer in current users of oral preparations in various subgroups. There was little effect of 1 year’s duration of use ( p = 0.02 for 1 year versus longer durations) but no significant variation in the ORs between phasic and nonphasic formulations, by the progestagenic component of the preparations, by whether or not other hormonal contraceptive types had been used beforehand, by estimated ER status of the breast tumours, or across categories of women defined by their age or BMI.



ORs for breast cancer were increased in women last prescribed a progestagen-releasing IUD ( Fig 1 ). To investigate these findings further, we assessed whether this OR differed according to whether or not there was a previous prescription for other hormonal contraceptives and found no evidence of any difference in the magnitude of the effect ( Fig 4 ; test for heterogeneity p = 0.8). We also examined whether use of nonhormonal (i.e., copper) IUDs was associated with breast cancer risk; the reference group for these analyses was women with no prescription for either a hormonal contraceptive or a nonhormonal IUD during the observation period. For women last prescribed nonhormonal IUDs, the OR for breast cancer was not significantly elevated (OR = 1.10; 95% CI [0.89 to 1.35]; p = 0.4, based on just 142 cases and 264 controls), although this OR was not significantly different from that associated with progestagen-releasing IUDs (OR = 1.33; 95% CI [1.17 to 1.50]; p < 0.001) ( Fig 4 ).


Data from the CPRD. All ORs are versus women with no recorded prescriptions for either a hormonal contraceptive or nonhormonal IUD during the observation period. Numbers may vary from previous analyses as women whose last contraceptive prescription was for a nonhormonal IUD are considered under this category, even if they had previously received a hormonal contraceptive prescription. Adjusted ORs are adjusted for time since last birth, number of recorded births, BMI, and alcohol intake. P values are based on the relevant Wald tests. BMI, body mass index; CI, confidence interval; CPRD, Clinical Practice Research Datalink; IUD, intrauterine device; OR, odds ratio.


The findings for those currently or recently prescribed any type of hormonal contraceptive were not materially altered in various sensitivity analyses: defining exposure as 2 or more hormonal contraceptive prescriptions; by restricting analyses to women with an observation period of more than 5 years; by restricting analyses to women with no missing records for any adjustment variables; or by excluding women with tubal ligation, hysterectomy, or bilateral salpingo-oophorectomy ( S1 Table ). Nor was breast cancer risk associated with other commonly repeated noncontraceptive prescriptions: nonsedating antihistamines (OR = 1.01; 95% CI [0.95 to 1.08]; p = 0.7), antibacterial eye preparations (OR = 1.04; 95% CI [0.97 to 1.11]; p = 0.3), or corticosteroids for asthma and other respiratory diseases (OR = 1.02; 95% CI [0.94 to 1.11]; p = 0.6) ( S2 Table ).

Our meta-analysis was restricted to evidence for progestagen-only contraceptives, as ample evidence already exists for oral combined contraceptives [ 1 ]. We identified 1 previous meta-analysis and 11 other eligible studies ( S1 Fig and S3 Table ) [ 1 – 12 ]. Combining published results with the new results from CPRD yielded significant excess risks ( p < 0.001) for all 4 preparation types ( Fig 5 ): oral progestagen-only contraceptives (RR = 1.29; 95% CI [1.21 to 1.37]; heterogeneity χ 2 5 = 6.7; p = 0.2), injected progestagens (RR = 1.18; 95% CI [1.07 to 1.30]; heterogeneity χ 2 8 = 22.5; p = 0.004), implanted progestagens (RR = 1.28; 95% CI [1.08 to 1.51]; heterogeneity χ 2 3 = 7.3; p = 0.06), and progestagen-releasing IUDs (RR = 1.21; 95% CI [1.14 to 1.28]; heterogeneity χ 2 4 = 7.9; p = 0.1). There was no heterogeneity between the RRs for the 4 types of progestagen-only contraceptives (test for heterogeneity p = 0.3). Results were similar when restricted to studies that only included premenopausal women, to studies with prospectively recorded information, and to studies where women in the reference group had used neither progestagen-only nor combined oral contraceptives ( S2 Fig ). There was no evidence of publication bias based on funnel plots ( S3 Fig ).


Results are presented separately for studies that recorded information prospectively, i.e., where information on contraceptive use was recorded prior to breast cancer diagnosis, and for studies that recorded information retrospectively. CI, confidence interval; RR, relative risk.


Combining the CPRD results on time since last use of oral contraceptives, be they combined or progestagen-only, with previously published findings [ 1 ] yielded RRs of 1.27 (95% CI [1.21 to 1.33]) for current use; 1.16 (95% CI [1.11 to 1.22]) for last use 1 to 4 years ago; 1.08 (95% CI [1.04 to 1.13]) for last use 5 to 9 years ago; with no excess risk 10 or more years after stopping use. Based on these RR estimates, the absolute excess incidence of breast cancer among women in Western countries who use oral combined or progestagen-only contraceptives for 5 years can be estimated ( S4 Table ). These estimates are of absolute risk over a 15-year period after starting oral contraceptive use and include both the excess risks during 5 years of current use and the excess risks during the 10 years after use stopped. Breast cancer incidence in nonusers is extremely rare before about age 30 and increases sharply with age thereafter. This is reflected in the estimated absolute excess incidence ( S4 Table ), where the 15-year excess absolute risk of breast cancer for use at ages 16 to 20 years is about 8 per 100,000 users (an increase in incidence from 0.084% to 0.093%); for use at ages 25 to 29 is about 61 per 100,000 users (from 0.50% to 0.57%); and for use at ages 35 to 39 is about 265 per 100,000 users (from 2.0% to 2.2%). Fig 6 shows the estimated age-specific absolute risks associated with hormonal contraceptive use at these ages.


Absolute risks include the excess risks in current users during the 5 years when the OC is used and the excess risks in the 10 years after stopping. There is no excess risk more than 10 years after stopping. OC, oral contraceptive.


In a large nested case–control study, which included almost 10,000 UK women aged <50 years with breast cancer, those prescribed oral combined contraceptives (containing oestrogen+progestagen), oral progestagen-only contraceptives, injectable progestagens, and progestagen-releasing IUD contraceptives were found to be at increased risk of breast cancer. The ORs for each of these hormonal contraceptives were statistically significant but comparatively small, at around 1.2 to 1.3, with no material difference between the different hormonal contraceptive types. The average time between the last prescription and breast cancer diagnosis was about 3 years, so these results generally apply to current or recent use of these hormonal preparations.

The ORs for breast cancer among current or recent users of each of these hormonal contraceptives were of similar magnitude to previously reported risks associated with use of oral combined oestrogen-progestagen contraceptives [ 1 ]. Fewer studies have published on the risks associated with progestagen-only contraceptive use, however, and so our meta-analysis aimed to bring together the totality of the available evidence. In the meta-analysis, breast cancer risks were similarly elevated among current or recent users of oral progestagens, injectable progestagens, progestagen implants, and progestagen-releasing IUDs, with respective RRs of 1.29, 1.18, 1.28, and 1.21. Every RR was significantly elevated ( p < 0.005), although with substantial heterogeneity in risks for injected progestagens.

Doses of oestrogen and progestagen constituents in combined oral contraceptives are generally lower than they were in previous decades [ 21 ]. Although preparations used by women in the CPRD data are likely to have been of lower dose, on average, than those used by women in the previously published meta-analysis [ 1 ], findings from both studies are consistent. One puzzling finding in the current analysis, however, is the excess breast cancer risk associated with use of progestagen-releasing contraceptive IUDs, which is of similar magnitude to the excess risks found for oral and for other parenteral progestagens. This excess did not appear to be due to prior prescriptions for other hormonal contraceptives and is also consistent with the only other published prospective evidence restricted to premenopausal women [ 10 , 12 ]. Phase II and III trials and pharmacokinetic analyses suggest that serum levonorgestrel levels associated with levonorgestrel-releasing IUDs are considerably lower than the levels associated with other levonorgestrel-containing oral or parenteral contraceptives [ 22 – 25 ]. We attempted to investigate whether breast cancer risk associated with use of hormonal IUDs was greater than with use of nonhormonal IUDs, but too few women had been prescribed nonhormonal IUDs for reliable comparison.

We considered the possibility that breast cancers might be selectively diagnosed among women who regularly seek prescriptions from their GPs. Any such selective detection might be expected to be greater for oral preparations that require more frequent prescriptions than for long-acting preparations (such as IUDs), but no such differences were found. Further evidence against possible detection bias is that no excess breast cancer risk was associated with other common prescriptions that are often repeated (for antihistamines, antibacterial eye preparations, or corticosteroids for asthma and other respiratory disease).

A major strength of the CPRD analysis presented here is that information on hormonal contraceptive prescribing was recorded prospectively, with reliable information on specific preparations, thus avoiding bias associated with selective recall of contraceptive use after breast cancer has been diagnosed, and misclassification of exposures due to reporting errors. The analyses of CPRD data presented here were matched on GP practice, providing some degree of adjustment for socioeconomic status, and were also adjusted for established risk factors for breast cancer, which might be expected to confound the association between contraceptive use and breast cancer risk. The only exception to this was family history of breast cancer, as information on this factor was relatively incomplete, particularly for controls, as this information was often only recorded around the time of breast cancer diagnosis. While it is unclear what effect, if any, adjustment for family history of breast cancer would have made to our findings, previously published findings for combined oral contraceptives [ 26 ] were unaltered after adjustment for family history, and 2 studies of progestagen-only contraceptives included in our meta-analysis [ 3 , 11 ] found little change in associated risks after adjustment for a number of additional factors including family history. Data on BMI and alcohol use were missing for some women, but sensitivity analyses restricted to those with complete data yielded similar results. Although information on earlier births may be missing for some women, information on recent births, which are more likely to confound any association of recent contraceptive use with breast cancer risk, should be relatively complete. To avoid potential biases associated with the menopause, analyses were restricted to cases and controls younger than 50 years, excluding women with prescriptions for menopausal hormonal therapy.

A limitation of the CPRD data, shared with some other prescription databases, is that the prescriptions are recorded during a defined period only, with information before entry into such databases generally being unavailable. While a lack of complete prescription data makes it difficult to assess the long-term effects of contraceptive use, it does not unduly affect estimates of the short-term effects of such use, which is the main focus of these investigations. The lack of information on hormonal contraceptive use prior to the start of the observation period also means we were unable to allow for the effect of possible differences in duration of use of the different types of hormonal contraceptives on their relative associations with breast cancer risk. Since our analyses of risks associated with specific types of contraceptives categorised women according to the preparation last prescribed, it is possible that prior patterns of contraceptive use could have confounded associations of breast cancer risk with type of preparation last prescribed. However, when we examined risks in women with an observation period of at least 10 years and no prior use of other types of hormonal contraceptives, the results were similar, suggesting that any such confounding had little effect on our results. It is possible that some hormonal contraceptives were prescribed outside UK general practices, but this is uncommon and would, if anything, attenuate any estimated ORs. Furthermore, although some women may not fill the prescriptions they received, those with repeated prescriptions would be more likely to have done so, and their ORs were similar to those in the main findings. While use of primary care data, as opposed to cancer registration data, for ascertainment of breast cancer cases may have led to a small degree of misclassification with respect to breast cancer status, a recent validation study found that the vast majority of breast cancers identified through primary care data (approximately 96%) can be verified through cancer registry data [ 27 ].

Perhaps the greatest potential source of bias in the findings from studies included in our meta-analysis is that due to recall bias in studies where information on hormonal contraceptives was recorded after a diagnosis of breast cancer. However, restriction of the meta-analysis to studies with prospectively collected information on contraceptive use yielded similar results, and so recall bias is unlikely to have had a major impact on our findings. Inadequate adjustment for important confounders could also have biased the results of certain studies, but in those studies that assessed the impact of adjustment for other factors in addition to age, ethnicity, and sociodemographic status ( S3 Table ), it appeared to have little impact on the results, suggesting that residual confounding by other risk factors has not unduly affected the findings. There is likely to be some degree of misclassification in the recording of hormonal contraceptive use, particularly in those studies that rely on self-reported information, but this should not have materially affected the findings since the 3 largest studies that contributed to the meta-analysis were based on prescription records. We found no good evidence of publication bias based on funnel plots for each association of interest ( S3 Fig ), although there is likely to be limited power to detect any such bias given the small number of studies included.

The excess RRs for breast cancer appear to be of similar magnitude in current or recent users of combined oestrogen-progestagen and of progestagen-only contraceptives, be they orally or parenterally administered. In nonusers of hormonal contraceptives, breast cancer incidence is extremely rare before age 30 but increases sharply with age. The estimated absolute excess incidence of breast cancer in current or recent users of oral contraceptives is, therefore, much smaller for use at younger than at older ages (for example, for women in high-income countries, the excess breast cancer incidence would be about 8 per 100,000 for use at age 16 to 20 years, but 265 per 100,000 for use at age 35 to 39 years). Given that the RRs are similar for oral preparations, injected progestagens, implanted progestagens and progestagen-releasing IUDs, these estimated absolute excess risks would be broadly similar for all types of hormonal contraceptives. These results, which are based on the combination of data from a large number of worldwide studies, are expected to be generalisable to other populations; however, breast cancer incidence is lower in middle-income and low-income than in high-income countries, and, thus, the absolute excess risks would also be expected to be lower. These risks need, of course, to be considered in the context of the benefits of contraceptive use in the childbearing years.

The mechanisms underlying the effects of progestagens on the development of breast cancer are poorly understood. It is clear that among postmenopausal women, breast cancer RRs are considerably greater with use of hormonal therapies containing both progestagens and oestrogens than oestrogens alone [ 28 ]. However, menopausal hormone therapies are usually taken during a period when ovarian function has ceased, and endogenous oestrogen and progesterone levels are relatively low, whereas oral contraceptives are taken during the reproductive years when levels of these hormones are much higher, and so their likely impact on overall exposure to hormones during use is less clear. Risks associated with hormonal contraceptive use did not appear to differ by estimated ER status (based on the presence or absence of a Tamoxifen prescription in the months after diagnosis), whereas a clear excess of ER-positive tumours was found for menopausal hormone therapy [ 28 ]. Further research is therefore needed to elucidate the mechanisms behind the similar associations of recent use of combined and progestagen-only contraceptives with breast cancer risk observed here.

Supporting information

S1 strobe checklist. strobe checklist..


S1 PRISMA Checklist. PRISMA checklist.


S1 Protocol. ISAC protocol.


S1 File. Additional information about CPRD and list of breast cancer Read codes.


S2 File. Breast cancer risk per 100,000 never users of hormonal contraceptives.


S1 Table. Sensitivity analyses.


S2 Table. ORs for breast cancer associated with use of other medications.


S3 Table. Summary of studies included in meta-analyses.


S4 Table. Estimated excess incidence of breast cancer per 100,000 women.


S1 Fig. PRISMA flow diagram.


S2 Fig. Sensitivity analyses for meta-analyses.


S3 Fig. Funnel plots for assessment of publication bias.


  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 13. NHS Business Services Authority. Prescription Cost Analysis England 2020/2021. https://www.nhsbsa.nhs.uk/statistical-collections/prescription-cost-analysis-england/prescription-cost-analysis-england-202021 [cited 2022 May 23].
  • 14. Arnold M. Jasper: Jasper makes plots. https://github.com/arnhew99/Jasper [cited 2022 May 1].
  • 17. Joint Formulary Committee. British National Formulary. 74th ed. London; 2017.
  • 18. Office for National Statistics. Cancer registration statistics, England. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancerregistrationstatisticscancerregistrationstatisticsengland [cited 2019 Nov 9].

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Wiley Open Access Collection

Logo of blackwellopen

A brief introduction of meta‐analyses in clinical practice and research

Xiao‐meng wang.

1 Department of Epidemiology, School of Public Health, Southern Medical University, Guangzhou Guangdong, China

Xi‐Ru Zhang

Zhi‐hao li, wen‐fang zhong, associated data.

Data sharing is not applicable to this article because no datasets were generated or analyzed during the current study.

With the explosive growth of medical information, it is almost impossible for healthcare providers to review and evaluate all relevant evidence to make the best clinical decisions. Meta‐analyses, which summarize all existing evidence and quantitatively synthesize individual studies, have become the best available evidence for informing clinical practice. This article introduces the common methods, steps, principles, strengths and limitations of meta‐analyses and aims to help healthcare providers and researchers obtain a basic understanding of meta‐analyses in clinical practice and research.

This article introduces the common methods, principles, steps, strengths and limitations of meta‐analyses and aims to help clinicians and researchers obtain a basic understanding of meta‐analyses in clinical practice and research.

An external file that holds a picture, illustration, etc.
Object name is JGM-23-e3312-g001.jpg


With the explosive growth of medical information, it has become almost impossible for healthcare providers to review and evaluate all related evidence to inform their decision making. 1 , 2 Furthermore, the inconsistent and often even conflicting conclusions of different studies can confuse these individuals. Systematic reviews were developed to resolve such situations, which comprehensively and systematically summarize all relevant empirical evidence. 3 Many systematic reviews contain meta‐analysis, which use statistical methods to combine the results of individual studies. 4 Through meta‐analyses, researchers can objectively and quantitatively synthesize results from different studies and increase the statistical strength and precision for estimating effects. 5 In the late 1970s, meta‐analysis began to appear regularly in the medical literature. 6 Subsequently, a plethora of meta‐analyses have emerged and the growth is exponential over time. 7 When conducted properly, a meta‐analysis of medical studies is considered as decisive evidence because it occupies a top level in the hierarchy of evidence. 8

An understanding of the principles, performance, advantages and weaknesses of meta‐analyses is important. Therefore, we aim to provide a basic understanding of meta‐analyses for clinicians and researchers in the present article by introducing the common methods, principles, steps, strengths and limitations of meta‐analyses.


There are many types of meta‐analysis methods (Table  1 ). In this article, we mainly introduce five meta‐analysis methods commonly used in clinical practice.

Meta‐analysis methods

2.1. Aggregated data meta‐analysis

Although more information can be obtained based on individual participant‐level data from original studies, it is usually impossible to obtain these data from all included studies in meta‐analysis because such data may have been corrupted, or the main investigator may no longer be contacted or refuse to release the data. Therefore, by extracting summary results of studies available in published accounts, an aggregate data meta‐analysis (AD‐MA) is the most commonly used of all the quantitative approaches. 9 A study has found that > 95% of published meta‐analyses were AD‐MA. 10 In addition, AD‐MA is the mainstay of systematic reviews conducted by the US Preventive Services Task Force, the Cochrane Collaboration and many professional societies. 9 Moreover, AD‐MA can be completed relatively quickly at a low cost, and the data are relatively easy to obtain. 11 , 12 However, AD‐MA has very limited control over the data. A challenge with AD‐MA is that the association between an individual participant‐level covariate and the effect of the interventions at the study level may not reflect the individual‐level effect modification of that covariate. 13 It is also difficult to extract sufficient compatible data to undertake meaningful subgroup analyses in AD‐MA. 14 Furthermore, AD‐MA is prone to ecological bias, as well as to confounding from variables not included in the model, and may have limited power. 15

2.2. Individual participant data meta‐analysis

An individual participant data meta‐analysis (IPD‐MA) is considered the “gold standard” for meta‐analysis; this type of analysis collects individual participant‐level data from original studies. 15 Compared with AD‐MA, IPD‐MA has many advantages, including improved data quality, a greater variety of analytical types that can be performed and the ability to obtain more reliable results. 16 , 17

It is crucial to maintain clusters of participants within studies in the statistical implementation of an IPD‐MA. Clusters can be retained during the analysis using a one‐step or two‐step approach. 18 In the one‐step approach, the individual participant data from all studies are modeled simultaneously, at the same time as accounting for the clustering of participants within studies. 19 This approach requires a model specific to the type of data being synthesized and an appropriate account of the meta‐analysis assumptions (e.g. fixed or random effects across studies). Cheng et al . 20 proposed using a one‐step IPD‐MA to handle binary rare events and found that this method was superior to traditional methods of inverse variance, the Mantel–Haenszel method and the Yusuf‐Peto method. In the two‐step approach, the individual participant data from each study are analyzed independently for each separate study to produce aggregate data for each study (e.g. a mean treatment effect estimate and its standard error) using a statistical method appropriate for the type of data being analyzed (e.g. a linear regression model might be fitted for continuous responses, or a Cox regression might be applied for time‐to‐event data). The aggregate data are then combined to obtain an summary effect in the second step using a suitable model, such as weighting studies by the inverse of the variance. 21 For example, using a two‐step IPD‐MA, Grams et al . 22 found that apolipoprotein‐L1 kidney‐risk variants were not associated with incident cardiovascular disease or death independent of kidney measures.

Compared to the two‐step approach, the one‐step IPD‐MA is recommended for small meta‐analyses 23 and, conveniently, must only specify one model; however, this requires careful distinction of within‐study and between‐study variability. 24 The two‐step IPD‐MA is more laborious, although it allows the use of traditional, well‐known meta‐analysis techniques in the second step, such as those used by the Cochrane Collaboration (e.g. the Mantel–Haenszel method).

2.3. Cumulative meta‐analysis

Meta‐analyses are traditionally used retrospectively to review existing evidence. However, current evidence often undergoes several updates as new studies become available. Thus, updated data must be continuously obtained to simplify and digest the ever‐expanding literature. Therefore, cumulative meta‐analysis was developed, which adds studies to a meta‐analysis based on a predetermined order and then tracks the magnitude of the mean effect and its variance. 25 A cumulative meta‐analysis can be performed multiple times; not only can it obtain summary results and provide a comparison of the dynamic results, but also it can assess the impact of newly added studies on the overall conclusions. 26 For example, initial observational studies and systematic reviews and meta‐analyses suggested that frozen embryo transfer was better for mothers and babies; however, recent primary studies have begun to challenge these conclusions. 27 Maheshwari et al . 27 therefore conducted a cumulative meta‐analysis to investigate whether these conclusions have remained consistent over time and found that the decreased risks of harmful outcomes associated with pregnancies conceived from frozen embryos have been consistent in terms of direction and magnitude of effect over several years, with an increasing precision around the point estimates. Furthermore, continuously updated cumulative meta‐analyses may avoid unnecessary large‐scale randomized controlled trials (RCTs) and prevent wasted research efforts. 28

2.4. Network meta‐analysis

Although RCTs can directly compare the effectiveness of interventions, most of them compare the effectiveness of an intervention with a placebo, and there is almost no direct comparison between different interventions. 29 , 30 Network meta‐analyses comprise a relatively recent development that combines direct and indirect evidence to compare the effectiveness between different interventions. 31 Evidence obtained from RCTs is considered as direct evidence, whereas evidence obtained through one or more common comparators is considered as indirect evidence. For example, when comparing interventions A and C, direct evidence refers to the estimate of the relative effects between A and C. When no RCTs have directly compared interventions A and C, these interventions can be compared indirectly if both have been compared with B (placebo or some standard treatments) in other studies (forming an A–B–C “loop” of evidence). 32 , 33

A valid network meta‐analysis can correctly combine the relative effects of more than two studies and obtain a consistent estimate of the relative effectiveness of all interventions in one analysis. 34 This meta‐analysis may lead to a greater accuracy of estimating intervention effectiveness and the ability to compare all available interventions to calculate the rank of different interventions. 34 , 35 For example, phosphodiesterase type 5 inhibitors (PDE5‐Is) are the first‐line therapy for erectile dysfunction, although there are limited available studies on the comparative effects of different types of PDE5‐Is. 36 Using a network meta‐analysis, Yuan et al . 36 calculated the absolute effects and the relative rank of different PDE5‐Is to provide an overview of the effectiveness and safety of all PDE5‐Is.

Notably, a network meta‐analysis should satisfy the transitivity assumption, in which there are no systematic differences between the available comparisons other than the interventions being compared 37 ; in other words, the participants could be randomized to any of the interventions in a hypothetical RCT consisting of all the interventions included in the network meta‐analysis.

2.5. Meta‐analysis of diagnostic test accuracy

Sensitivity and specificity are commonly used to assess diagnostic accuracy. However, diagnostic tests in clinical practice are rarely 100% specific or sensitive. 38 It is difficult to obtain accurate estimates of sensitivity and specificity in small diagnostic accuracy studies. 39 , 40 Even in a large sample size study, the number of cases may still be small as a result of the low prevalence. By identifying and synthesizing evidence on the accuracy of tests, the meta‐analysis of diagnostic test accuracy (DTA) provides insight into the ability of medical tests to detect the target diseases 41 ; it also can provide estimates of test performance, allow comparisons of the accuracy of different tests and facilitate the identification of sources of variability. 42 For example, the FilmArray® (Biomerieux, Marcy‐l'Étoile, France) meningitis/encephalitis (ME) panel can detect the most common pathogens in central nervous system infections, although reports of false positives and false negatives are confusing. 43 Based on meta‐analysis of DTA, Tansarli et al . 43 calculated that the sensitivity and specificity of the ME panel were both > 90%, indicating that the ME panel has high diagnostic accuracy.


3.1. frame a question.

Researchers must formulate an appropriate research question at the beginning. A well‐formulated question will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, structuring the syntheses and presenting results. 44 There are some tools that may facilitate the construction of research questions, including PICO, as used in clinical practice 45 ; PEO and SPICE, as used for qualitative research questions 46 , 47 ; and SPIDER, as used for mixed‐methods research. 48

3.2. Form the search strategy

It is crucial for researchers to formulate a search strategy in advance that includes inclusion and exclusion criteria, as well as a standardized data extraction form. The definition of inclusion and exclusion criteria depends on established question elements, such as publication dates, research design, population and results. A reasonable inclusion and exclusion criteria will reduce the risk of bias, increase transparency and make the review systematic. Broad criteria may increase the heterogeneity between studies, and narrow criteria may make it difficult to find studies; therefore, a compromise should be found. 49

3.3. Search of the literature databases

To minimize bias and reduce hampered interpretation of outcomes, the search strategy should be as comprehensive as possible, employing multiple databases, such as PubMed, Embase, Cochrane Central Registry of Controlled Trials, Scopus, Web of Science and Google Scholar. 50 , 51 Removing language restrictions and actively searching for non‐English bibliographic databases may also help researchers to perform a comprehensive meta‐analysis. 52

3.4. Select the articles

The selection or rejection of the included articles should be guided by the criteria. 53 Two independent reviewers may screen the included articles, and any disagreements should be resolved by consensus through discussion. First, the titles and abstracts of all relevant searched papers should be read, and inclusion or exclusion criteria applied to determine whether these papers meet. Then, the full texts of the included articles should be reviewed once more to perform the rejection again. Finally, the reference lists of these articles should be searched to widen the research as much as possible. 54

3.5. Data extraction

A pre‐formed standardized data extraction form should be used to extract data of included studies. All data should be carefully converted using uniform standards. Simultaneous extraction by multiple researchers might also make the extracted data more accurate.

3.6. Assess quality of articles

Checklists and scales are often used to assess the quality of articles. For example, the Cochrane Collaboration's tool 55 is usually used to assess the quality of RCTs, whereas the Newcastle Ottawa Scale 56 is one of the most common method to assess the quality of non‐randomized trials. In addition, Quality Assessment of Diagnostic Accuracy Studies 2 57 is often used to evaluate the quality of diagnostic accuracy studies.

3.7. Test for heterogeneity

Several methods have been proposed to detect and quantify heterogeneity, such as Cochran's Q and I 2 values. Cochran's Q test is used to determine whether there is heterogeneity in primary studies or whether the variation observed is due to chance, 58 but it may be underpowered because of the inclusion of a small number of studies or low event rates. 59 Therefore, p < 0.10 (not 0.05) indicates the presence of heterogeneity given the low statistical strength and insensitivity of Cochran's Q test. 60 Another common method for testing heterogeneity is the I 2 value, which describes the percentage of variation across studies that is attributable to heterogeneity rather than chance; this value does not depend on the number of studies. 61 I 2 values of 25%, 50% and 75% are considered to indicate low, moderate and high heterogeneity, respectively. 60

3.8. Estimate the summary effect

Fixed effects and random effects models are commonly used to estimate the summary effect in a meta‐analysis. 62 Fixed effects models, which consider the variability of the results as “random variation”, simply weight individual studies by their precision (inverse of the variance). Conversely, random effects models assume a different underlying effect for each study and consider this an additional source of variation that is randomly distributed. A substantial difference in the summary effect calculated by fixed effects models and random effects models will be observed only if the studies are markedly heterogeneous (heterogeneity p < 0.10) and the random effects model typically provides wider confidence intervals than the fixed effect model. 63 , 64

3.9. Evaluate sources of heterogeneity

Several methods have been proposed to explore the possible reasons for heterogeneity. According to factors such as ethnicity, the number of studies or clinical features, subgroup analyses can be performed that divide the total data into several groups to assess the impact of a potential source of heterogeneity. Sensitivity analysis is a common approach for examining the sources of heterogeneity on a case‐by‐case basis. 65 In sensitivity analysis, one or more studies are excluded at a time and the impact of removing each or several studies is evaluated on the summary results and the between‐study heterogeneity. Sequential and combinatorial algorithms are usually implemented to evaluate the change in between‐study heterogeneity as one or more studies are excluded from the calculations. 66 Moreover, a meta‐regression model can explain heterogeneity based on study‐level covariates. 67

3.10. Assess publication bias

A funnel plot is a scatterplot that is commonly used to assess publication bias. In a funnel plot, the x ‐axis indicates the study effect and the y ‐axis indicates the study precision, such as the standard error or sample size. 68 , 69 If there is no publication bias, the plot will have a symmetrical inverted funnel; conversely, asymmetry indicates the possibility of publication bias.

3.11. Present results

A forest plot is a valid and useful tool for summarizing the results of a meta‐analysis. In a forest plot, the results from each individual study are shown as a blob or square; the confidence interval, usually representing 95% confidence, is shown as a horizontal line that passes through the square; and the summary effect is shown as a diamond. 70


There are four most important principles of meta‐analysis performance that should be emphasized. First, the search scope of meta‐analysis should be expanded as much as possible to contain all relevant research, and it is important to remove language restrictions and actively search for non‐English bibliographic databases. Second, any meta‐analysis should include studies selected based on strict criteria established in advance. Third, appropriate tools must be selected to evaluate the quality of evidence according to different types of primary studies. Fourth, the most suitable statistical model should be chosen for the meta‐analysis and a weighted mean estimate of the effect size should be calculated. Finally, the possible causes of heterogeneity should be identified and publication bias in the meta‐analysis must be assessed.


Meta‐analyses have several strengths. First, a major advantage is their ability to improve the precision of effect estimates with considerably increased statistical power, which is particularly important when the power of the primary study is limited as a result of the small sample size. Second, a meta‐analysis has more power to detect small but clinically significant effects and to examine the effectiveness of interventions in demographic or clinical subgroups of participants, which can help researchers identify beneficial (or harmful) effects in specific groups of patients. 71 , 72 Third, meta‐analyses can be used to analyze rare outcomes and outcomes that individual studies were not designed to test (e.g. adverse events). Fourth, meta‐analyses can be used to examine heterogeneity in study results and explore possible sources in case this heterogeneity would lead to bias from “mixing apples and oranges”. 73 Furthermore, meta‐analyses can compare the effectiveness of various interventions, supplement the existing evidence, and then offer a rational and helpful way of addressing a series of practical difficulties that plague healthcare providers and researchers. Lastly, meta‐analyses may resolve disputes caused by apparently conflicting studies, determine whether new studies are necessary for further investigation and generate new hypotheses for future studies. 7 , 74


6.1. missing related research.

The primary limitation of a meta‐analysis is missing related research. Even in the ideal case in which all relevant studies are available, a faulty search strategy can miss some of these studies. Small differences in search strategies can produce large differences in the set of studies found. 75 When searching databases, relevant research can be missed as a result of the omission of keywords. The search engine (e.g. PubMed, Google) may also affect the type and number of studies that are found. 76 Moreover, it may be impossible to identify all relevant evidence if the search scope is limited to one or two databases. 51 , 77 Finally, language restrictions and the failure to search non‐English bibliographic databases may also lead to an incomplete meta‐analysis. 52 Comprehensive search strategies for different databases and languages might help solve this issue.

6.2. Publication bias

Publication bias means that positive findings are more likely to be published and then identified through literature searches rather than ambiguous or negative findings. 78 This is an important and key source of bias that is recognized as a potential threat to the validity of results. 79 The real research effect may be exaggerated or even falsely positive if only published articles are included. 80 For example, based on studies registered with the US Food and Drug Administration, Turner et al . 81 reviewed 74 trials of 12 antidepressants to assess publication bias and its influence on apparent efficacy. It was found that antidepressant studies with favorable outcomes were 16 times more likely to be published than those with unfavorable outcomes, and the apparent efficacy of antidepressants increased between 11% and 69% when the non‐published studies were not included in the analysis. 81 Moreover, failing to identify and include non‐English language studies may also increase publication bias. 82 Therefore, all relevant studies should be identified to reduce the impact of publication bias on meta‐analysis.

6.3. Selection bias

Because many of the studies identified are not directly related to the subject of the meta‐analysis, it is crucial for researchers to select which studies to include based on defined criteria. Failing to evaluate, select or reject relevant studies based on stricter criteria regarding the study quality may also increase the possibility of selection bias. Missing or inappropriate quality assessment tools may lead to the inclusion of low‐quality studies. If a meta‐analysis includes low‐quality studies, its results will be biased and incorrect, which is also called “garbage in, garbage out”. 83 Strictly defined criteria for included studies and scoring by at least two researchers might help reduce the possibility of selection bias. 84 , 85

6.4. Unavailability of information

The best‐case scenario for meta‐analyses is the availability of individual participant data. However, most individual research reports only contain summary results, such as the mean, standard deviation, proportions, relative risk and odds ratio. In addition to the possibility of reporting errors, the lack of information can severely limit the types of analyses and conclusions that can be achieved in a meta‐analysis. For example, the unavailability of information from individual studies may preclude the comparison of effects in predetermined subgroups of participants. Therefore, if feasible, the researchers could contact the author of the primary study for individual participant data.

6.5. Heterogeneity

Although the studies included in a meta‐analysis have the same research hypothesis, there is still the potential for several areas of heterogeneity. 86 Heterogeneity may exist in various parts of the studies’ design and conduct, including participant selection, interventions/exposures or outcomes studied, data collection, data analyses and selective reporting of results. 87 Although the difference of the results can be overcome by assessing the heterogeneity of the studies and performing subgroup analyses, 88 the results of the meta‐analysis may become meaningless and even may obscure the real effect if the selected studies are too heterogeneous to be comparable. For example, Nicolucci et al . 89 conducted a review of 150 published randomized trials on the treatment of lung cancer. Their review showed serious methodological drawbacks and concluded that heterogeneity made the meta‐analysis of existing trials unlikely to be constructive. 89 Therefore, combining the data in meta‐analysis for studies with large heterogeneity is not recommended.

6.6. Misleading funnel plot

Funnel plots are appealing because they are a simple technique used to investigate the possibility of publication bias. However, their objective is to detect a complex effect, which can be misleading. For example, the lack of symmetry in a funnel plot can also be caused by heterogeneity. 90 Another problem with funnel plots is the difficulty of interpreting them when few studies are included. Readers may also be misled by the choice of axes or the outcome measure. 91 Therefore, in the absence of a consensus on how the plot should be constructed, asymmetrical funnel plots should be interpreted cautiously. 91

6.7. Inevitable subjectivity

Researchers must make numerous judgments when performing meta‐analyses, 92 which inevitably introduces considerable subjectivity into the meta‐analysis review process. For example, there is often a certain amount of subjectivity when deciding how similar studies should be before it is appropriate to combine them. To minimize subjectivity, at least two researchers should jointly conduct a meta‐analysis and reach a consensus.

The explosion of medical information and differences between individual studies make it almost impossible for healthcare providers to make the best clinical decisions. Meta‐analyses, which summarize all eligible evidence and quantitatively synthesize individual results on a specific clinical question, have become the best available evidence for informing clinical practice and are increasingly important in medical research. This article has described the basic concept, common methods, principles, steps, strengths and limitations of meta‐analyses to help clinicians and investigators better understand meta‐analyses and make clinical decisions based on the best evidence.


CM designed and directed the study. XMW and XRZ had primary responsibility for drafting the manuscript. CM, ZHL, WFZ and PY provided insightful discussions and suggestions. All authors critically reviewed the manuscript for important intellectual content.


The authors declare that they have no conflicts of interest.


This work was supported by the Project Supported by Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019 to CM) and the Construction of High‐level University of Guangdong (G820332010, G618339167 and G618339164 to CM). The funders played no role in the study design or implementation; manuscript preparation, review or approval; or the decision to submit the manuscript for publication.

Wang X‐M, Zhang X‐R, Li Z‐H, Zhong W‐F, Yang P, Mao C. A brief introduction of meta‐analyses in clinical practice and research . J Gene Med . 2021; 23 :e3312. 10.1002/jgm.3312 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Xiao‐Meng Wang and Xi‐Ru Zhang contributed equally to this work.


Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 08 May 2024

A meta-analysis on global change drivers and the risk of infectious disease

  • Michael B. Mahon   ORCID: orcid.org/0000-0002-9436-2998 1 , 2   na1 ,
  • Alexandra Sack 1 , 3   na1 ,
  • O. Alejandro Aleuy 1 ,
  • Carly Barbera 1 ,
  • Ethan Brown   ORCID: orcid.org/0000-0003-0827-4906 1 ,
  • Heather Buelow   ORCID: orcid.org/0000-0003-3535-4151 1 ,
  • David J. Civitello 4 ,
  • Jeremy M. Cohen   ORCID: orcid.org/0000-0001-9611-9150 5 ,
  • Luz A. de Wit   ORCID: orcid.org/0000-0002-3045-4017 1 ,
  • Meghan Forstchen 1 , 3 ,
  • Fletcher W. Halliday 6 ,
  • Patrick Heffernan 1 ,
  • Sarah A. Knutie 7 ,
  • Alexis Korotasz 1 ,
  • Joanna G. Larson   ORCID: orcid.org/0000-0002-1401-7837 1 ,
  • Samantha L. Rumschlag   ORCID: orcid.org/0000-0003-3125-8402 1 , 2 ,
  • Emily Selland   ORCID: orcid.org/0000-0002-4527-297X 1 , 3 ,
  • Alexander Shepack 1 ,
  • Nitin Vincent   ORCID: orcid.org/0000-0002-8593-1116 1 &
  • Jason R. Rohr   ORCID: orcid.org/0000-0001-8285-4912 1 , 2 , 3   na1  

Nature ( 2024 ) Cite this article

4881 Accesses

484 Altmetric

Metrics details

  • Infectious diseases

Anthropogenic change is contributing to the rise in emerging infectious diseases, which are significantly correlated with socioeconomic, environmental and ecological factors 1 . Studies have shown that infectious disease risk is modified by changes to biodiversity 2 , 3 , 4 , 5 , 6 , climate change 7 , 8 , 9 , 10 , 11 , chemical pollution 12 , 13 , 14 , landscape transformations 15 , 16 , 17 , 18 , 19 , 20 and species introductions 21 . However, it remains unclear which global change drivers most increase disease and under what contexts. Here we amassed a dataset from the literature that contains 2,938 observations of infectious disease responses to global change drivers across 1,497 host–parasite combinations, including plant, animal and human hosts. We found that biodiversity loss, chemical pollution, climate change and introduced species are associated with increases in disease-related end points or harm, whereas urbanization is associated with decreases in disease end points. Natural biodiversity gradients, deforestation and forest fragmentation are comparatively unimportant or idiosyncratic as drivers of disease. Overall, these results are consistent across human and non-human diseases. Nevertheless, context-dependent effects of the global change drivers on disease were found to be common. The findings uncovered by this meta-analysis should help target disease management and surveillance efforts towards global change drivers that increase disease. Specifically, reducing greenhouse gas emissions, managing ecosystem health, and preventing biological invasions and biodiversity loss could help to reduce the burden of plant, animal and human diseases, especially when coupled with improvements to social and economic determinants of health.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

case study and meta analysis

Similar content being viewed by others

case study and meta analysis

Towards common ground in the biodiversity–disease debate

case study and meta analysis

Biological invasions facilitate zoonotic disease emergences

case study and meta analysis

Measuring the shape of the biodiversity-disease relationship across systems reveals new findings and key gaps

Data availability.

All the data for this Article have been deposited at Zenodo ( https://doi.org/10.5281/zenodo.8169979 ) 52 and GitHub ( https://github.com/mahonmb/GCDofDisease ) 53 .

Code availability

All the code for this Article has been deposited at Zenodo ( https://doi.org/10.5281/zenodo.8169979 ) 52 and GitHub ( https://github.com/mahonmb/GCDofDisease ) 53 . R markdown is provided in Supplementary Data 1 .

Jones, K. E. et al. Global trends in emerging infectious diseases. Nature 451 , 990–994 (2008).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Civitello, D. J. et al. Biodiversity inhibits parasites: broad evidence for the dilution effect. Proc. Natl Acad. Sci USA 112 , 8667–8671 (2015).

Halliday, F. W., Rohr, J. R. & Laine, A.-L. Biodiversity loss underlies the dilution effect of biodiversity. Ecol. Lett. 23 , 1611–1622 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Rohr, J. R. et al. Towards common ground in the biodiversity–disease debate. Nat. Ecol. Evol. 4 , 24–33 (2020).

Article   PubMed   Google Scholar  

Johnson, P. T. J., Ostfeld, R. S. & Keesing, F. Frontiers in research on biodiversity and disease. Ecol. Lett. 18 , 1119–1133 (2015).

Keesing, F. et al. Impacts of biodiversity on the emergence and transmission of infectious diseases. Nature 468 , 647–652 (2010).

Cohen, J. M., Sauer, E. L., Santiago, O., Spencer, S. & Rohr, J. R. Divergent impacts of warming weather on wildlife disease risk across climates. Science 370 , eabb1702 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rohr, J. R. et al. Frontiers in climate change-disease research. Trends Ecol. Evol. 26 , 270–277 (2011).

Altizer, S., Ostfeld, R. S., Johnson, P. T. J., Kutz, S. & Harvell, C. D. Climate change and infectious diseases: from evidence to a predictive framework. Science 341 , 514–519 (2013).

Article   ADS   CAS   PubMed   Google Scholar  

Rohr, J. R. & Cohen, J. M. Understanding how temperature shifts could impact infectious disease. PLoS Biol. 18 , e3000938 (2020).

Carlson, C. J. et al. Climate change increases cross-species viral transmission risk. Nature 607 , 555–562 (2022).

Halstead, N. T. et al. Agrochemicals increase risk of human schistosomiasis by supporting higher densities of intermediate hosts. Nat. Commun. 9 , 837 (2018).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Martin, L. B., Hopkins, W. A., Mydlarz, L. D. & Rohr, J. R. The effects of anthropogenic global changes on immune functions and disease resistance. Ann. N. Y. Acad. Sci. 1195 , 129–148 (2010).

Rumschlag, S. L. et al. Effects of pesticides on exposure and susceptibility to parasites can be generalised to pesticide class and type in aquatic communities. Ecol. Lett. 22 , 962–972 (2019).

Allan, B. F., Keesing, F. & Ostfeld, R. S. Effect of forest fragmentation on Lyme disease risk. Conserv. Biol. 17 , 267–272 (2003).

Article   Google Scholar  

Brearley, G. et al. Wildlife disease prevalence in human‐modified landscapes. Biol. Rev. 88 , 427–442 (2013).

Rohr, J. R. et al. Emerging human infectious diseases and the links to global food production. Nat. Sustain. 2 , 445–456 (2019).

Bradley, C. A. & Altizer, S. Urbanization and the ecology of wildlife diseases. Trends Ecol. Evol. 22 , 95–102 (2007).

Allen, T. et al. Global hotspots and correlates of emerging zoonotic diseases. Nat. Commun. 8 , 1124 (2017).

Sokolow, S. H. et al. Ecological and socioeconomic factors associated with the human burden of environmentally mediated pathogens: a global analysis. Lancet Planet. Health 6 , e870–e879 (2022).

Young, H. S., Parker, I. M., Gilbert, G. S., Guerra, A. S. & Nunn, C. L. Introduced species, disease ecology, and biodiversity–disease relationships. Trends Ecol. Evol. 32 , 41–54 (2017).

Barouki, R. et al. The COVID-19 pandemic and global environmental change: emerging research needs. Environ. Int. 146 , 106272 (2021).

Article   CAS   PubMed   Google Scholar  

Nova, N., Athni, T. S., Childs, M. L., Mandle, L. & Mordecai, E. A. Global change and emerging infectious diseases. Ann. Rev. Resour. Econ. 14 , 333–354 (2021).

Zhang, L. et al. Biological invasions facilitate zoonotic disease emergences. Nat. Commun. 13 , 1762 (2022).

Olival, K. J. et al. Host and viral traits predict zoonotic spillover from mammals. Nature 546 , 646–650 (2017).

Guth, S. et al. Bats host the most virulent—but not the most dangerous—zoonotic viruses. Proc. Natl Acad. Sci. USA 119 , e2113628119 (2022).

Nelson, G. C. et al. in Ecosystems and Human Well-Being (Millennium Ecosystem Assessment) Vol. 2 (eds Rola, A. et al) Ch. 7, 172–222 (Island Press, 2005).

Read, A. F., Graham, A. L. & Raberg, L. Animal defenses against infectious agents: is damage control more important than pathogen control? PLoS Biol. 6 , 2638–2641 (2008).

Article   CAS   Google Scholar  

Medzhitov, R., Schneider, D. S. & Soares, M. P. Disease tolerance as a defense strategy. Science 335 , 936–941 (2012).

Torchin, M. E. & Mitchell, C. E. Parasites, pathogens, and invasions by plants and animals. Front. Ecol. Environ. 2 , 183–190 (2004).

Bellay, S., de Oliveira, E. F., Almeida-Neto, M. & Takemoto, R. M. Ectoparasites are more vulnerable to host extinction than co-occurring endoparasites: evidence from metazoan parasites of freshwater and marine fishes. Hydrobiologia 847 , 2873–2882 (2020).

Scheffer, M. Critical Transitions in Nature and Society Vol. 16 (Princeton Univ. Press, 2020).

Rohr, J. R. et al. A planetary health innovation for disease, food and water challenges in Africa. Nature 619 , 782–787 (2023).

Reaser, J. K., Witt, A., Tabor, G. M., Hudson, P. J. & Plowright, R. K. Ecological countermeasures for preventing zoonotic disease outbreaks: when ecological restoration is a human health imperative. Restor. Ecol. 29 , e13357 (2021).

Hopkins, S. R. et al. Evidence gaps and diversity among potential win–win solutions for conservation and human infectious disease control. Lancet Planet. Health 6 , e694–e705 (2022).

Mitchell, C. E. & Power, A. G. Release of invasive plants from fungal and viral pathogens. Nature 421 , 625–627 (2003).

Chamberlain, S. A. & Szöcs, E. taxize: taxonomic search and retrieval in R. F1000Research 2 , 191 (2013).

Newman, M. Fundamentals of Ecotoxicology (CRC Press/Taylor & Francis Group, 2010).

Rohatgi, A. WebPlotDigitizer v.4.5 (2021); automeris.io/WebPlotDigitizer .

Lüdecke, D. esc: effect size computation for meta analysis (version 0.5.1). Zenodo https://doi.org/10.5281/zenodo.1249218 (2019).

Lipsey, M. W. & Wilson, D. B. Practical Meta-Analysis (SAGE, 2001).

R Core Team. R: A Language and Environment for Statistical Computing Vol. 2022 (R Foundation for Statistical Computing, 2020); www.R-project.org/ .

Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36 , 1–48 (2010).

Pustejovsky, J. E. & Tipton, E. Meta-analysis with robust variance estimation: Expanding the range of working models. Prev. Sci. 23 , 425–438 (2022).

Lenth, R. emmeans: estimated marginal means, aka least-squares means. R package v.1.5.1 (2020).

Bartoń, K. MuMIn: multi-modal inference. Model selection and model averaging based on information criteria (AICc and alike) (2019).

Burnham, K. P. & Anderson, D. R. Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33 , 261–304 (2004).

Article   MathSciNet   Google Scholar  

Marks‐Anglin, A. & Chen, Y. A historical review of publication bias. Res. Synth. Methods 11 , 725–742 (2020).

Nakagawa, S. et al. Methods for testing publication bias in ecological and evolutionary meta‐analyses. Methods Ecol. Evol. 13 , 4–21 (2022).

Gurevitch, J., Koricheva, J., Nakagawa, S. & Stewart, G. Meta-analysis and the science of research synthesis. Nature 555 , 175–182 (2018).

Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 , 1–48 (2015).

Mahon, M. B. et al. Data and code for ‘A meta-analysis on global change drivers and the risk of infectious disease’. Zenodo https://doi.org/10.5281/zenodo.8169979 (2024).

Mahon, M. B. et al. Data and code for ‘A meta-analysis on global change drivers and the risk of infectious disease’. GitHub github.com/mahonmb/GCDofDisease (2024).

Download references


We thank C. Mitchell for contributing data on enemy release; L. Albert and B. Shayhorn for assisting with data collection; J. Gurevitch, M. Lajeunesse and G. Stewart for providing comments on an earlier version of this manuscript; and C. Carlson and two anonymous reviewers for improving this paper. This research was supported by grants from the National Science Foundation (DEB-2109293, DEB-2017785, DEB-1518681, IOS-1754868), National Institutes of Health (R01TW010286) and US Department of Agriculture (2021-38420-34065) to J.R.R.; a US Geological Survey Powell grant to J.R.R. and S.L.R.; University of Connecticut Start-up funds to S.A.K.; grants from the National Science Foundation (IOS-1755002) and National Institutes of Health (R01 AI150774) to D.J.C.; and an Ambizione grant (PZ00P3_202027) from the Swiss National Science Foundation to F.W.H. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Michael B. Mahon, Alexandra Sack, Jason R. Rohr

Authors and Affiliations

Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA

Michael B. Mahon, Alexandra Sack, O. Alejandro Aleuy, Carly Barbera, Ethan Brown, Heather Buelow, Luz A. de Wit, Meghan Forstchen, Patrick Heffernan, Alexis Korotasz, Joanna G. Larson, Samantha L. Rumschlag, Emily Selland, Alexander Shepack, Nitin Vincent & Jason R. Rohr

Environmental Change Initiative, University of Notre Dame, Notre Dame, IN, USA

Michael B. Mahon, Samantha L. Rumschlag & Jason R. Rohr

Eck Institute of Global Health, University of Notre Dame, Notre Dame, IN, USA

Alexandra Sack, Meghan Forstchen, Emily Selland & Jason R. Rohr

Department of Biology, Emory University, Atlanta, GA, USA

David J. Civitello

Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA

Jeremy M. Cohen

Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA

Fletcher W. Halliday

Department of Ecology and Evolutionary Biology, Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA

Sarah A. Knutie

You can also search for this author in PubMed   Google Scholar


J.R.R. conceptualized the study. All of the authors contributed to the methodology. All of the authors contributed to investigation. Visualization was performed by M.B.M. The initial study list and related information were compiled by D.J.C., J.M.C., F.W.H., S.A.K., S.L.R. and J.R.R. Data extraction was performed by M.B.M., A.S., O.A.A., C.B., E.B., H.B., L.A.d.W., M.F., P.H., A.K., J.G.L., E.S., A.S. and N.V. Data were checked for accuracy by M.B.M. and A.S. Analyses were performed by M.B.M. and J.R.R. Funding was acquired by D.J.C., J.R.R., S.A.K. and S.L.R. Project administration was done by J.R.R. J.R.R. supervised the study. J.R.R. and M.B.M. wrote the original draft. All of the authors reviewed and edited the manuscript. J.R.R. and M.B.M. responded to reviewers.

Corresponding author

Correspondence to Jason R. Rohr .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Colin Carlson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 prisma flowchart..

The PRISMA flow diagram of the search and selection of studies included in this meta-analysis. Note that 77 studies came from the Halliday et al. 3 database on biodiversity change.

Extended Data Fig. 2 Summary of the number of studies (A-F) and parasite taxa (G-L) in the infectious disease database across ecological contexts.

The contexts are global change driver ( A , G ), parasite taxa ( B , H ), host taxa ( C , I ), experimental venue ( D , J ), study habitat ( E , K ), and human parasite status ( F , L ).

Extended Data Fig. 3 Summary of the number of effect sizes (A-I), studies (J-R), and parasite taxa (S-a) in the infectious disease database for various parasite and host contexts.

Shown are parasite type ( A , J , S ), host thermy ( B , K , T ), vector status ( C , L , U ), vector-borne status ( D , M , V ), parasite transmission ( E , N , W ), free living stages ( F , O , X ), host (e.g. disease, host growth, host survival) or parasite (e.g. parasite abundance, prevalence, fecundity) endpoint ( G , P , Y ), micro- vs macroparasite ( H , Q , Z ), and zoonotic status ( I , R , a ).

Extended Data Fig. 4 The effects of global change drivers and subsequent subcategories on disease responses with Log Response Ratio instead of Hedge’s g.

Here, Log Response Ratio shows similar trends to that of Hedge’s g presented in the main text. The displayed points represent the mean predicted values (with 95% confidence intervals) from a meta-analytical model with separate random intercepts for study. Points that do not share letters are significantly different from one another (p < 0.05) based on a two-sided Tukey’s posthoc multiple comparison test with adjustment for multiple comparisons. See Table S 3 for pairwise comparison results. Effects of the five common global change drivers ( A ) have the same directionality, similar magnitude, and significance as those presented in Fig. 2 . Global change driver effects are significant when confidence intervals do not overlap with zero and explicitly tested with two-tailed t-test (indicated by asterisks; t 80.62  = 2.16, p = 0.034 for CP; t 71.42  = 2.10, p = 0.039 for CC; t 131.79  = −3.52, p < 0.001 for HLC; t 61.9  = 2.10, p = 0.040 for IS). The subcategories ( B ) also show similar patterns as those presented in Fig. 3 . Subcategories are significant when confidence intervals do not overlap with zero and were explicitly tested with two-tailed one sample t-test (t 30.52  = 2.17, p = 0.038 for CO 2 ; t 40.03  = 4.64, p < 0.001 for Enemy Release; t 47.45  = 2.18, p = 0.034 for Mean Temperature; t 110.81  = −4.05, p < 0.001 for Urbanization); all other subcategories have p > 0.20. Note that effect size and study numbers are lower here than in Figs. 3 and 4 , because log response ratios cannot be calculated for studies that provide coefficients (e.g., odds ratio) rather than raw data; as such, all observations within BC did not have associated RR values. Despite strong differences in sample size, patterns are consistent across effect sizes, and therefore, we can be confident that the results presented in the main text are not biased because of effect size selection.

Extended Data Fig. 5 Average standard errors of the effect sizes (A) and sample sizes per effect size (B) for each of the five global change drivers.

The displayed points represent the mean predicted values (with 95% confidence intervals) from the generalized linear mixed effects models with separate random intercepts for study (Gaussian distribution for standard error model, A ; Poisson distribution for sample size model, B ). Points that do not share letters are significantly different from one another (p < 0.05) based on a two-sided Tukey’s posthoc multiple comparison test with adjustment for multiple comparisons. Sample sizes (number of studies, n, and effect sizes, k) for each driver are as follows: n = 77, k = 392 for BC; n = 124, k = 364 for CP; n = 202, k = 380 for CC; n = 517, k = 1449 for HLC; n = 96, k = 355 for IS.

Extended Data Fig. 6 Forest plots of effect sizes, associated variances, and relative weights (A), Funnel plots (B), and Egger’s Test plots (C) for each of the five global change drivers and leave-one-out publication bias analyses (D).

In panel A , points are the individual effect sizes (Hedge’s G), error bars are standard errors of the effect size, and size of the points is the relative weight of the observation in the model, with larger points representing observations with higher weight in the model. Sample sizes are provided for each effect size in the meta-analytic database. Effect sizes were plotted in a random order. Egger’s tests indicated significant asymmetries (p < 0.05) in Biodiversity Change (worst asymmetry – likely not bias, just real effect of positive relationship between diversity and disease), Climate Change – (weak asymmetry, again likely not bias, climate change generally increases disease), and Introduced Species (relatively weak asymmetry – unclear whether this is a bias, may be driven by some outliers). No significant asymmetries (p > 0.05) were found in Chemical Pollution and Habitat Loss/Change, suggesting negligible publication bias in reported disease responses across these global change drivers ( B , C ). Egger’s test included publication year as moderator but found no significant relationship between Hedge’s g and publication year (p > 0.05) implying no temporal bias in effect size magnitude or direction. In panel D , the horizontal red lines denote the grand mean and SE of Hedge’s g and (g = 0.1009, SE = 0.0338). Grey points and error bars indicate the Hedge’s g and SEs, respectively, using the leave-one-out method (grand mean is recalculated after a given study is removed from dataset). While the removal of certain studies resulted in values that differed from the grand mean, all estimated Hedge’s g values fell well within the standard error of the grand mean. This sensitivity analysis indicates that our results were robust to the iterative exclusion of individual studies.

Extended Data Fig. 7 The effects of habitat loss/change on disease depend on parasite taxa and land use conversion contexts.

A) Enemy type influences the magnitude of the effect of urbanization on disease: helminths, protists, and arthropods were all negatively associated with urbanization, whereas viruses were non-significantly positively associated with urbanization. B) Reference (control) land use type influences the magnitude of the effect of urbanization on disease: disease was reduced in urban settings compared to rural and peri-urban settings, whereas there were no differences in disease along urbanization gradients or between urban and natural settings. C) The effect of forest fragmentation depends on whether a large/continuous habitat patch is compared to a small patch or whether disease it is measured along an increasing fragmentation gradient (Z = −2.828, p = 0.005). Conversely, the effect of deforestation on disease does not depend on whether the habitat has been destroyed and allowed to regrow (e.g., clearcutting, second growth forests, etc.) or whether it has been replaced with agriculture (e.g., row crop, agroforestry, livestock grazing; Z = 1.809, p = 0.0705). The displayed points represent the mean predicted values (with 95% confidence intervals) from a metafor model where the response variable was a Hedge’s g (representing the effect on an infectious disease endpoint relative to control), study was treated as a random effect, and the independent variables included enemy type (A), reference land use type (B), or land use conversion type (C). Data for (A) and (B) were only those studies that were within the “urbanization” subcategory; data for (C) were only those studies that were within the “deforestation” and “forest fragmentation” subcategories. Sample sizes (number of studies, n, and effect sizes, k) in (A) for each enemy are n = 48, k = 98 for Virus; n = 193, k = 343 for Protist; n = 159, k = 490 for Helminth; n = 10, k = 24 for Fungi; n = 103, k = 223 for Bacteria; and n = 30, k = 73 for Arthropod. Sample sizes in (B) for each reference land use type are n = 391, k = 1073 for Rural; n = 29, k = 74 for Peri-urban; n = 33, k = 83 for Natural; and n = 24, k = 58 for Urban Gradient. Sample sizes in (C) for each land use conversion type are n = 7, k = 47 for Continuous Gradient; n = 16, k = 44 for High/Low Fragmentation; n = 11, k = 27 for Clearcut/Regrowth; and n = 21, k = 43 for Agriculture.

Extended Data Fig. 8 The effects of common global change drivers on mean infectious disease responses in the literature depends on whether the endpoint is the host or parasite; whether the parasite is a vector, is vector-borne, has a complex or direct life cycle, or is a macroparasite; whether the host is an ectotherm or endotherm; or the venue and habitat in which the study was conducted.

A ) Parasite endpoints. B ) Vector-borne status. C ) Parasite transmission route. D ) Parasite size. E ) Venue. F ) Habitat. G ) Host thermy. H ) Parasite type (ecto- or endoparasite). See Table S 2 for number of studies and effect sizes across ecological contexts and global change drivers. See Table S 3 for pairwise comparison results. The displayed points represent the mean predicted values (with 95% confidence intervals) from a metafor model where the response variable was a Hedge’s g (representing the effect on an infectious disease endpoint relative to control), study was treated as a random effect, and the independent variables included the main effects and an interaction between global change driver and the focal independent variable (whether the endpoint measured was a host or parasite, whether the parasite is vector-borne, has a complex or direct life cycle, is a macroparasite, whether the study was conducted in the field or lab, habitat, the host is ectothermic, or the parasite is an ectoparasite).

Extended Data Fig. 9 The effects of five common global change drivers on mean infectious disease responses in the literature only occasionally depend on location, host taxon, and parasite taxon.

A ) Continent in which the field study occurred. Lack of replication in chemical pollution precluded us from including South America, Australia, and Africa in this analysis. B ) Host taxa. C ) Enemy taxa. See Table S 2 for number of studies and effect sizes across ecological contexts and global change drivers. See Table S 3 for pairwise comparison results. The displayed points represent the mean predicted values (with 95% confidence intervals) from a metafor model where the response variable was a Hedge’s g (representing the effect on an infectious disease endpoint relative to control), study was treated as a random effect, and the independent variables included the main effects and an interaction between global change driver and continent, host taxon, and enemy taxon.

Extended Data Fig. 10 The effects of human vs. non-human endpoints for the zoonotic disease subset of database and wild vs. domesticated animal endpoints for the non-human animal subset of database are consistent across global change drivers.

(A) Zoonotic disease responses measured on human hosts responded less positively (closer to zero when positive, further from zero when negative) than those measured on non-human (animal) hosts (Z = 2.306, p = 0.021). Note, IS studies were removed because of missing cells. (B) Disease responses measured on domestic animal hosts responded less positively (closer to zero when positive, further from zero when negative) than those measured on wild animal hosts (Z = 2.636, p = 0.008). These results were consistent across global change drivers (i.e., no significant interaction between endpoint and global change driver). As many of the global change drivers increase zoonotic parasites in non-human animals and all parasites in wild animals, this may suggest that anthropogenic change might increase the occurrence of parasite spillover from animals to humans and thus also pandemic risk. The displayed points represent the mean predicted values (with 95% confidence intervals) from a metafor model where the response variable was a Hedge’s g (representing the effect on an infectious disease endpoint relative to control), study was treated as a random effect, and the independent variable of global change driver and human/non-human hosts. Data for (A) were only those diseases that are considered “zoonotic”; data for (B) were only those endpoints that were measured on non-human animals. Sample sizes in (A) for zoonotic disease measured on human endpoints across global change drivers are n = 3, k = 17 for BC; n = 2, k = 6 for CP; n = 25, k = 39 for CC; and n = 175, k = 331 for HLC. Sample sizes in (A) for zoonotic disease measured on non-human endpoints across global change drivers are n = 25, k = 52 for BC; n = 2, k = 3 for CP; n = 18, k = 29 for CC; n = 126, k = 289 for HLC. Sample sizes in (B) for wild animal endpoints across global change drivers are n = 28, k = 69 for BC; n = 21, k = 44 for CP; n = 50, k = 89 for CC; n = 121, k = 360 for HLC; and n = 29, k = 45 for IS. Sample sizes in (B) for domesticated animal endpoints across global change drivers are n = 2, k = 4 for BC; n = 4, k = 11 for CP; n = 7, k = 20 for CC; n = 78, k = 197 for HLC; and n = 1, k = 2 for IS.

Supplementary information

Supplementary information.

Supplementary Discussion, Supplementary References and Supplementary Tables 1–3.

Reporting Summary

Peer review file, supplementary data 1.

R markdown code and output associated with this paper.

Supplementary Table 4

EcoEvo PRISMA checklist.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Mahon, M.B., Sack, A., Aleuy, O.A. et al. A meta-analysis on global change drivers and the risk of infectious disease. Nature (2024). https://doi.org/10.1038/s41586-024-07380-6

Download citation

Received : 02 August 2022

Accepted : 03 April 2024

Published : 08 May 2024

DOI : https://doi.org/10.1038/s41586-024-07380-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

case study and meta analysis


Impact of the use of cannabis as a medicine in pregnancy, on the unborn child: a systematic review and meta-analysis protocol

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: [email protected]
  • Info/History
  • Preview PDF

Introduction: The use of cannabis for medicinal purposes is on the rise. As more people place their trust in the safety of prescribed alternative plant-based medicine and find it easily accessible, there is a growing concern that pregnant women may be increasingly using cannabis for medicinal purposes to manage their pregnancy symptoms and other health conditions. The aim of this review is to investigate the use of cannabis for medicinal purposes during pregnancy, describe the characteristics of the demographic population, and to measure the impact on the unborn child and up to twelve months postpartum. Methods and analyses: Research on pregnant women who use cannabis for medicinal purposes only and infants up to one year after birth who experienced in utero exposure to cannabis for medicinal purposes will be included in this review. Reviews, randomised controlled trials, case control, cross-sectional and cohort studies, that have been peer reviewed and published between 1996 and April 2024 as a primary research paper that investigates prenatal use of cannabis for medicinal purposes on foetal, perinatal, and neonatal outcomes, will be selected for review. Excluding cover editorials, letters, commentaries, protocols, conference papers and book chapters. Effects of illicit drugs use, alcohol misuse and nicotine exposure on neonate outcome will be controlled by excluding studies reporting on the concomitant use of such substances with cannabis for medicinal purposes during pregnancy. All titles and abstracts will be reviewed independently and in duplicate by at least two researchers. Records will be excluded based on title and abstract screening as well as publication type. Where initial disagreement exists between reviewers regarding the inclusion of a study, team members will review disputed articles status until consensus is gained. Selected studies will then be assessed by at least two independent researchers for risk bias assessment using validated tools. Data will be extracted and analysed following a systematic review and meta-analysis methodology. The statistical analysis will combine three or more outcomes that are reported in a consistent manner. The systematic review and meta-analysis will follow the PRISMA guidelines to facilitate transparent reporting [1].

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study will use ONLY openly available human data from studies published in biomedical and scientific journals.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Data Availability

All data produced in the present work are contained in the manuscript.

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One
  • Addiction Medicine (323)
  • Allergy and Immunology (627)
  • Anesthesia (163)
  • Cardiovascular Medicine (2365)
  • Dentistry and Oral Medicine (287)
  • Dermatology (206)
  • Emergency Medicine (378)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (833)
  • Epidemiology (11758)
  • Forensic Medicine (10)
  • Gastroenterology (702)
  • Genetic and Genomic Medicine (3726)
  • Geriatric Medicine (348)
  • Health Economics (632)
  • Health Informatics (2388)
  • Health Policy (929)
  • Health Systems and Quality Improvement (895)
  • Hematology (340)
  • HIV/AIDS (780)
  • Infectious Diseases (except HIV/AIDS) (13301)
  • Intensive Care and Critical Care Medicine (767)
  • Medical Education (365)
  • Medical Ethics (104)
  • Nephrology (398)
  • Neurology (3488)
  • Nursing (198)
  • Nutrition (523)
  • Obstetrics and Gynecology (673)
  • Occupational and Environmental Health (661)
  • Oncology (1819)
  • Ophthalmology (535)
  • Orthopedics (218)
  • Otolaryngology (286)
  • Pain Medicine (232)
  • Palliative Medicine (66)
  • Pathology (445)
  • Pediatrics (1031)
  • Pharmacology and Therapeutics (426)
  • Primary Care Research (420)
  • Psychiatry and Clinical Psychology (3172)
  • Public and Global Health (6133)
  • Radiology and Imaging (1276)
  • Rehabilitation Medicine and Physical Therapy (745)
  • Respiratory Medicine (825)
  • Rheumatology (379)
  • Sexual and Reproductive Health (372)
  • Sports Medicine (322)
  • Surgery (400)
  • Toxicology (50)
  • Transplantation (172)
  • Urology (145)

Association of Breastfeeding and Early Childhood Caries: A Systematic Review and Meta-Analysis


  • 1 School of Health Sciences, Western Sydney University, Locked Bag 1797, Penrith, NSW 2751, Australia.
  • 2 Health Equity Laboratory, Campbelltown, NSW 2560, Australia.
  • 3 Discipline of Child and Adolescent Health, The Children's Hospital at Westmead Clinical School, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia.
  • 4 Translational Health Research Institute, Western Sydney University, Campbelltown, NSW 2560, Australia.
  • 5 Oral Health Services, Sydney Local Health District and Sydney Dental Hospital, NSW Health, Surry Hills, NSW 2010, Australia.
  • 6 Blackdog Institute, Hospital Road, Randwick, NSW 2031, Australia.
  • 7 University of Sydney Library, The University of Sydney, Camperdown, NSW 2006, Australia.
  • 8 School of Nursing and Midwifery, Western Sydney University, Locked Bag 1797, Penrith, NSW 2751, Australia.
  • 9 School of Nursing and Midwifery, University of Canberra, Bruce, ACT 2617, Australia.
  • 10 Ingham Research Institute, Liverpool, NSW 2170, Australia.
  • PMID: 38732602
  • PMCID: PMC11085424
  • DOI: 10.3390/nu16091355

Early childhood caries (ECC) is a growing public health concern worldwide. Although numerous systematic reviews have been published regarding the association between breastfeeding and early childhood caries (ECC), the results remain inconclusive and equivocal. This systematic review synthesises the evidence on the association between breastfeeding and ECC. Five electronic databases and backward citation chasing were performed from inception until May 2023. A total of 31 studies (22 cohort studies and 9 case-control studies) were included in this review. The meta-analysis of the case-control studies showed statistically significant fewer dental caries in children who were breastfed for < 6 months compared to those who were breastfed for ≥6 months (OR = 0.53, 95% CI 0.41-0.67, p < 0.001). There was a statistically significant difference in dental caries between children who were breastfed for <12 months and those who were breastfed for ≥12 months (RR = 0.65, 95% CI 0.50-0.86, p < 0.002). Similarly, there was a statistically significant difference in dental caries in children who were breastfed for < 18 months compared to those who were breastfed for ≥18 months (RR = 0.41, 95% CI 0.18-0.92, p = 0.030). Nocturnal breastfeeding increases the risk of ECC compared with no nocturnal breastfeeding (RR = 2.35, 95% CI 1.42-3.89, p < 0.001). The findings suggest breastfeeding for more than 12 months and nocturnal breastfeeding increase the risk of ECC.

Keywords: breastfeeding; dental caries; early childhood caries; oral health; preschool children.

Publication types

  • Systematic Review
  • Meta-Analysis
  • Breast Feeding* / statistics & numerical data
  • Case-Control Studies
  • Child, Preschool
  • Dental Caries* / epidemiology
  • Dental Caries* / etiology
  • Risk Factors

Grants and funding

  • Open access
  • Published: 04 May 2024

Impacts of heat exposure in utero on long-term health and social outcomes: a systematic review

  • Nicholas Brink 1 ,
  • Darshnika P. Lakhoo 1 ,
  • Ijeoma Solarin 1 ,
  • Gloria Maimela 1 ,
  • Peter von Dadelszen 2 ,
  • Shane Norris 3 ,
  • Matthew F. Chersich 1 &

Climate and Heat-Health Study Group

BMC Pregnancy and Childbirth volume  24 , Article number:  344 ( 2024 ) Cite this article

488 Accesses

8 Altmetric

Metrics details

Climate change, particularly global warming, is amongst the greatest threats to human health. While short-term effects of heat exposure in pregnancy, such as preterm birth, are well documented, long-term effects have received less attention. This review aims to systematically assess evidence on the long-term impacts on the foetus of heat exposure in utero.

A search was conducted in August 2019 and updated in April 2023 in MEDLINE(PubMed). We included studies on the relationship of environmental heat exposure during pregnancy and any long-term outcomes. Risk of bias was assessed using tools developed by the Joanna-Briggs Institute, and the evidence was appraised using the GRADE approach. Synthesis without Meta-Analysis (SWiM) guidelines were used.

Eighteen thousand six hundred twenty one records were screened, with 29 studies included across six outcome groups. Studies were mostly conducted in high-income countries ( n  = 16/25), in cooler climates. All studies were observational, with 17 cohort, 5 case-control and 8 cross-sectional studies. The timeline of the data is from 1913 to 2019, and individuals ranged in age from neonates to adults, and the elderly. Increasing heat exposure during pregnancy was associated with decreased earnings and lower educational attainment ( n  = 4/6), as well as worsened cardiovascular ( n  = 3/6), respiratory ( n  = 3/3), psychiatric ( n  = 7/12) and anthropometric ( n  = 2/2) outcomes, possibly culminating in increased overall mortality ( n  = 2/3). The effect on female infants was greater than on males in 8 of 9 studies differentiating by sex. The quality of evidence was low in respiratory and longevity outcome groups to very low in all others.


Increasing heat exposure was associated with a multitude of detrimental outcomes across diverse body systems. The biological pathways involved are yet to be elucidated, but could include epigenetic and developmental perturbations, through interactions with the placenta and inflammation. This highlights the need for further research into the long-term effects of heat exposure, biological pathways, and possible adaptation strategies in studies, particularly in neglected regions. Heat exposure in-utero has the potential to compound existing health and social inequalities. Poor study design of the included studies constrains the conclusions of this review, with heterogenous exposure measures and outcomes rendering comparisons across contexts/studies difficult.

Trial Registration

PROSPERO CRD 42019140136.

Peer Review reports


Climate change is one of the most significant threats to human health [ 1 ], characterized by an increase in global temperatures amongst other environmental changes. Global temperatures have increased by approximately 1·2 °C, and are projected to increase beyond a critical threshold of 1·5 °C in the next 5–10 years [ 2 ]. Increasingly, heat exposure is being linked with a multitude of short- and long-term health effects in vulnerable populations, including children [ 3 ], the elderly, and pregnant women [ 4 ]. The effect on pregnant women extends to the health of the foetus, with significant detrimental effects associated with heat exposure including preterm birth, stillbirth, and decreased birth weight [ 5 ]. Impacts of heat exposure are increasingly important in populations in resource-constrained settings, where heat adaptation measures such as active (air-conditioning) and passive cooling (water, green and blue spaces) are limited, and often inaccessible [ 6 ]. These populations are often found in some of the hottest climates and in areas whose contribution to global warming is negligible, thus compounding inequities [ 7 ]. In addition, research in this field is biased towards Europe, North America and Asia and is profoundly underrepresented in Africa and South America [ 8 ]. Understanding the scope and distribution of research conducted is key to guiding future research, including biological studies to explore possible mechanisms, and interventional studies to alleviate any observed negative effects. Multiple previous systematic reviews have explored the short-term impacts of heat on the foetus [ 3 , 5 , 9 ] but only one has explored the long-term impacts of heat exposure on mental health [ 10 ]. The in-utero environment has long been considered important in the long-term health and wellbeing of individuals [ 11 , 12 ], although it has been challenging to delineate specific causal pathways. This study aims to systematically review the literature on the long-term effects of heat exposure in-utero on the foetus, and explore possible casual pathways.

Materials and methods

This review forms part of a larger systematic mapping survey of the effect of heat exposure, and adaptation interventions on health (PROSPERO CRD 42019140136) [ 13 ]. The initial literature search was conducted in September 2018, where the authors searched MEDLINE (PubMed), Science Citation Index Expanded, Social Sciences Citation Index, and Arts and Humanities Citation Index using a validated search strategy (Supplementary Text 1 ). This search was updated in April 2023 through a MEDLINE search, as all previous articles were located in this database. Screening of titles and abstracts was done independently in duplicate, with any differences reconciled by MFC, with subsequent updates conducted by NB and DL. The authors only included studies on humans, published in Chinese, English, German, or Italian. Studies on heat exposure from artificial and endogenous sources were excluded, and only exogenous, weather-related heat exposure during pregnancy was included. All study designs were eligible except modelling studies and systematic reviews. No date restrictions were applied. EPPI-Reviewer software [ 14 ] provided a platform for screening, reviewing of full text articles, and for data extraction. No additional information was requested or provided by the authors. Long-term effects were defined as any outcomes that were not apparent at birth.

Articles meeting the eligibility criteria were extracted in duplicate after the initial search and then by a single reviewer in the subsequent update (NB/DL). Data were extracted to include characteristics outlined in Supplementary file 1 .

This systematic review was conducted according to the Systematic Review without Meta-Analysis (SWiM) guidelines, broadly based on PRISMA [ 15 ], as the outcomes, statistical techniques, and heat exposure measurements were heterogenous, rendering a meta-analysis untenable. Outcomes were grouped clinically, reviewed for the magnitude and direction of effect, and their statistical significance, and included negative or null findings when reported on. A text-based summary of these findings was made. ‘Vote-counting’ was utilized to summarise direction of effect findings. Analysis was conducted on the geographical areas, climate zones [ 16 ], mean annual temperature and socioeconomic classification of the country where the studies were conducted. Furthermore, an attempt was made to identify at-risk population sub-groups.

The principal investigator assessed each study for a risk of bias using the tools developed by the Joanna-Briggs Institute (JBI) [ 17 ] (Supplementary file 1 ). Each study was classified as high or low risk of bias. Studies that did not score ‘yes’ on two or more applicable parameters were classified as high risk of bias [ 5 ]. Due to the limited research in this field, no studies were excluded based on risk of bias. The certainty of the evidence was assessed using the GRADE approach, with the body of evidence assessed on a scale of certainty: very low, low, moderate and high  [ 18 ]. Due to the heterogeneity of outcomes, and the reporting thereof, assessment of publication bias was not possible.

The funder of the study had no role in study design, data collection, analysis, interpretation, or writing of the report.

The updated search identified 18 621 non-duplicate records, and after screening 229 full-text articles were reviewed for inclusion, with a total of 29 studies included in the final analysis (Fig.  1 : flow chart). The included studies were conducted in 25 countries across six continents, including six Low-Income Countries (LIC), two Lower-Middle Income Countries (LMIC), one Upper-Middle Income Country (UMIC) and 16 High Income Countries (HIC) [ 19 ]. They included 25 Köppen-Geiger climate zones [ 16 ], and mean annual temperatures ranging from 2.1 °C in Norway to 30.0 °C in Burkina Faso [ 20 ] (Figs.  2 and 3 ). All studies were observational, with 17 cohort, five case-control and eight cross-sectional studies. The timeline of the data is from 1913 to 2019, and individuals included ranged in age from neonates to adults, and the elderly. The studies were grouped by outcomes as follows: behavioural, educational and socioeconomic ( n  = 6), cardiovascular disease ( n  = 6), respiratory disease ( n  = 3), growth and anthropometry ( n  = 2), mental health ( n  = 12) and longevity and mortality ( n  = 3). The measures of heat exposure were variable, with minimum, mean, maximum, and apparent temperature being utilized, as well as temperature variability, heat wave days and discreet shocks (number of times exposure exceeded a specific threshold). The majority of studies measured heat using mean temperature ( n  = 27/29). In addition, the statistical comparison was diverse, with some studies making a continuous linear comparison by degree Celsius, while others compared heat exposure by quartiles, amongst other categorical comparisons. Furthermore, heat exposure by any definition was not reported over the same timeframes, with some studies including variable periods before birth, during pregnancy and at birth in their analysis. Levels of temporal resolution of heat exposure were also diverse, ranging from monthly effects to effects observed over the entire gestational period, or year of birth. In addition, differing use of heat adaptation mechanisms was not uniformly described and adjusted for. Various confounders were adjusted for, and although not uniform, these were generally inadequate. The effect on female infants was greater than on male infants in eight of nine studies differentiating by sex, with increased effects on marginalised groups (African-Americans) in one further study. Overall, the quality of the evidence, as assessed by the GRADE approach, was low in respiratory and longevity outcome groups to very low in all other groups, primarily as a result of their observational nature and high risk of bias, due to insufficient consideration of confounders, and inadequate measures of heat exposure.

figure 1

PRISMA flow diagram

figure 2

Map showing countries where studies were conducted relative to mean annual temperature [ 21 ]

figure 3

Map showing countries where studies were conducted relative to climate zones [ 16 ]

A total of six studies reported on behaviour, educational and socioeconomic outcomes, which were detrimentally affected by increases in heat exposure (Fig.  4 ; Table  1 ), although the quality of the evidence was very low . End-points were not uniform, but included earnings, completion of secondary school or higher education, number of years of schooling, and gamified cooperation-rates in a public-goods game (where test scores represent achieving maximal public benefit in hypothetical situations).

Two large studies reported a detrimental effect of heat exposure on adult income, with the greatest effect noted in first trimester exposure. These studies noted a reduction in earnings of up to 1·2% per 1 °C increase in temperature, with greater effects in females [ 22 ], and a decrease of $55.735 (standard error(SE): 15·425, P  < 0·01) annual earnings at 29–31 years old, per day exposure > 32 °C [ 26 ]. Two studies reported worse educational outcomes, with the greatest effect noted in the second trimester [ 23 ]. Rates of completing secondary education were found to be reduced by 0·2% per 1 °C increase in temperature ( P  = 0·05) [ 22 ], illiteracy was increased by 0·18% (SE=(0·0009); P  < 0·05) and mean years of schooling was lowered by 0.02 (SE=(0·009) P  = 0·07) [ 23 ]. Two studies reported a beneficial effect of heat exposure on educational outcomes, although both studies suffered from significant methodological flaws, and effects were < 0·01% when effect estimates were noted [ 24 , 27 ]. One small study reported lower cooperation rates by 20% ( P  < 0·01) in a public-goods game, with lower predicted public wellbeing [ 25 ].

The studies generally exhibited a dose-response effect with evidence for a critical threshold of effect of 28 °C in one study [ 22 ]. All studies were at a high risk of bias.

figure 4

Figure showing vote counting across all outcome groups. No Effect = No direction of effect noted in study

Six studies reported on cardiovascular pathology and risk factors thereof, which were detrimentally affected by increased exposure to heat (Fig.  4 ; Table  2 ), although measures and surrogates of this outcome were heterogenous. The quality of the evidence was very low, and the sample sizes were small. Outcomes included blood pressure, a composite cardiovascular disease indicator, and specific cardiovascular disease risk factors such as diabetes mellitus (type I), insulin resistance, waste circumference, and triglyceride levels.

Three studies found a detrimental effect of heat exposure on hypertension rates, and increased blood pressure [ 31 ], with a maximum of 1·6 mm Hg increase noted per interquartile range (IQR) increase (95% Confidence interval (CI) = 0·2, 2·9, P  = 0·024) in children [ 30 ], with increased effects on women in the largest study ( N  = 11,237) [ 32 ]. Another study found increasing heat exposure at conception was detrimentally associated with an increase in coronary heart disease ( P  = 0·08) [ 32 ], although one of the smaller studies ( N  = 4286) found a beneficial effect of heat exposure at birth on diverse cardiovascular outcomes, including coronary heart disease ( P  = 0·03 for trend), triglyceride levels ( P  = 0·06 for trend) and insulin resistance ( P  = 0·04 for trend) [ 27 ]. One study found lower odds of type I diabetes mellitus with increasing heat exposure, with odds ratio (OR) = 0·73 (95%CI = 0·48, 1·09, P -value not stated) [ 28 ]. Another study did not detect statistically significant relationships between heat exposure and hypertension or a composite cardiovascular disease indicator, but did not provide effect estimates [ 29 ]. Five studies were at a high risk of bias [ 27 , 29 , 30 , 31 , 32 ], with only one case-control study at a low risk of bias [ 28 ].

Respiratory pathology was reported by three studies, assessing different outcomes. Outcomes were detrimentally associated with increasing heat (Fig.  4 ; Table  3 ), however the quality of the evidence was low . The outcomes were primarily measured in infants and children, with no studies on adult outcomes. The largest study ( N  = 1681) found that increasing heat exposure increased the odds of having childhood asthma [ 33 ], and another small study ( N  = 343) noted worsened lung function with increasing heat exposure [ 34 ].

An additional study noted increased odds of childhood pneumonia with increasing diurnal temperature variation (DTV) in pregnancy, with a maximum OR = 1·85 (95%CI = 1·24, 2·76) in the third trimester [ 35 ].

Exposure in the third trimester had the greatest effect across all three studies [ 33 , 34 , 35 ]. Females showed an increased susceptibility to heat exposure’s effects on lung function, but males were more susceptible to heat’s effect on childhood pneumonia. There was a critical threshold noted in the asthma study of 24·6 °C, with a dose-response effect. The asthma study was assessed as low risk of bias, however the other studies were at high risk.

Growth and anthropometry was reported on by two studies, with differing outcomes, although in both, heat exposure was associated with detrimental, although heterogenous, outcomes (Fig.  4 ; Table  4 ). The overall quality of the evidence was very low . One study found a positive association with heat exposure and increased body mass index (BMI), r  = 0·22 ( P  < 0·05) in the third trimester with greater effects noted in females and in African-Americans [ 36 ]. Another large study ( N  = 23 026) found increased odds of stunting (OR = 1·28, 95%CI = not stated, p  < 0·001) with a negative correlation with height noted ( r =-0·083 P  < 0·01) [ 37 ]. Effects were greatest in the first and third trimester. Both studies were at a high risk of bias.

Mental health was reported on by 12 studies. Increasing heat exposure generally had a detrimental association with mental health outcomes (Fig.  4 ; Table  5 ), although these were heterogenous. The overall quality of the evidence was very low . Five studies reported on schizophrenia rates, with only one study showing a strongly positive association of heat exposure at conception with schizophrenia rates ( r  = 0·50, p  < 0·025) [ 38 ]. Another study noted the same effect with increasing heat in the summer before birth, however this was not statistically significant [ 39 ]. The third study reported no association of this outcome [ 40 ], with another small study ( N  = 2985) showing a negative correlation with temperatures at birth, without reporting on heat exposure during other periods of gestation [ 41 ]. The fifth study failed to report direction of effect, but noted non-significant findings [ 42 ]. Six studies reported on eating disorders, with all six showing a detrimental effect with increasing heat exposure. Of the three studies on clinical anorexia nervosa, one reported increasing rates of anorexia nervosa compared to other eating disorders (χ²= 4·48, P  = 0·017) [ 43 ], another reported increasing rates of a restrictive-subtype (χ²= 3·18, P  = 0·04) as well as reporting worse assessments of restrictive behaviours [ 44 ], which was supported by a third study in a different setting [ 45 ]. Three studies examined non-clinical settings, with some inconsistent effects. The first study showed a weak positive association with heat exposure, and drive for thinness (Spearman’s ⍴ = 0·46, P  < 0·05) and bulimia scores (Spearman’s ⍴ = 0·25, P  < 0·05) [ 46 ], which was supported by a replication study [ 47 ], and one other study [ 48 ]. The most significant and consistent effects noted in the third trimester, at birth, and in females [ 47 , 48 ]. One study reported a beneficial effect of increased temperatures in the first trimester on rates of depression, however no other directions of effect were noted for other periods of exposure [ 49 ]. These studies were at a high risk of bias.

Increasing heat exposure had a detrimental effect on longevity and mortality across various outcomes (Fig.  4 ; Table  6 ), although despite large sample sizes, the quality of the evidence was low . One study found a negative correlation of heat exposure with longevity ( r =-0·667, P  < 0·001), with a greater effect on females [ 50 ]. A second study showed a detrimental effect on telomere length, as a predicter of longevity, with the greatest effect towards the end of gestation (3·29% shorter TL, 95%CI = − 4·67, − 1·88, per 1 C increase above 95th centile) [ 51 ]. Conversely, a third study noted no correlation with mortality [ 24 ]. All but the study on telomere length [ 51 ] were at a high risk of bias.

This study establishes significant patterns of effects amongst the outcomes reviewed, with increasing heat exposure being associated with an overall detrimental effect on multiple, diverse, long-term outcomes. These effects are likely to increase with rising temperatures, however modelling this is beyond the scope of this review.

The most notable detrimental outcomes are related to neurodevelopmental pathways, with behavioural, educational, socioeconomic and mental health outcomes consistently associated with increasing heat exposure, in addition to having the greatest body of literature to support this. Importantly, other systems such as the respiratory and cardiovascular systems also suggest harmful effects of heat exposure, culminating in detrimental associations with longevity and mortality. Some studies illustrated a possible beneficial effect in some disease-processes, such as coronary heart disease and depression showing the potential for shifting disease profiles with rising temperatures.

The detrimental effects of heat exposure became more significant with increasing temperatures, with many studies describing increasing effects beyond critical thresholds which, although varied across studies, suggest that there is a limit of heat adaption strategies, both biological and behavioural [ 52 , 53 ].

In addition, the effect of increasing heat exposure was associated with worse outcomes in already marginalised communities, such as women [ 22 , 32 , 34 , 36 , 44 , 47 , 48 , 50 ] and certain ethnic groups (African-Americans) [ 46 ]. The reasons for sub-population vulnerabilities are unclear and likely complex. In the case of female foetuses being more susceptible to changes in the in-utero environment, it is possible that there is a ‘survivorship bias’. This would occur if women with harmful exposure lose male infants during pregnancy at a higher rate, and thus the surviving female infants appear more at risk. However, despite an increased risk of early pregnancy loss, there are no studies that have assessed this differential vulnerability. This still has the effect of potentially increasing the burden of disease on an already marginalised group.

In the case of certain population groups being more at risk, it is likely that both physiological differences in vulnerability as well as socio-economic effect-modifiers exist to explain these differences, however, the included literature lacks sufficient evidence to assess this. The vulnerabilities of different populations to the long-term effects of heat exposure in-utero likely contributes to the unequal impacts of climate change that have already been established [ 54 ], and will be an important contributor to inequality with future increases in temperature. Further research in this area is critical to inform targeted redistributive interventions.

Although the associations may be clear, establishing causality is fraught with difficulty, with no consensus on an infallible approach [ 55 , 56 , 57 ]. However, it is prudent to highlight supporting evidence in this review.

The hypothesis that the in-utero environment had significant long-term impacts on the foetus was first suggested by Barker, in the context of maternal nutrition and cardiovascular disease [ 11 ]. Further studies supported this hypothesis, and expanded on the effects the in-utero environment has on the foetus and its long-term wellbeing [ 58 ]. Long-term heat exposure may also be associated with changes in nutritional availability [ 11 ], and is likely one of many complex but important environmental exposures in-utero.

Maternal comorbidities, associated with increasing heat exposure such as hypertensive disorders of pregnancy and gestational diabetes mellitus, are known to negatively affect the foetus in the long-term [ 59 , 60 ]. These comorbidities may be part of the long-term pathogenicity of heat exposure, through short-term exposure-outcome pathways. Placental dysfunction is central to the pathology of pre-eclampsia, and is a significant cause for foetal pathology [ 61 , 62 ]. The placenta is not auto-regulated and is therefore acutely affected by changes to blood volume, heart rate and blood pressure, culminating in cardiac output as it is delivered to the placenta as an end-organ with resultant negative effects on the foetus [ 63 ]. Heat-acclimatisation mechanisms are hypothesized to affect this delicate balance [ 52 , 64 ], with observational studies supporting this [ 64 ]. It has been suggested that heat exposure’s increase in inflammation is a possible causative mechanism for pre-term birth [ 5 , 52 ], but inflammation has numerous additional effects on the immune system and could prove an insult to the mother and developing foetus [ 62 , 65 ]. These effects may only manifest in the long-term.

Heat was one of the earliest described teratogens [ 66 ], with significant effects on neurodevelopment noted in animal models in keeping with the observed associations of this review [ 67 ]. Biological organisms are extremely dependent on heat as a trigger for various processes. Plants and animals undergo significant change in response to the seasons, which are often guided by fluctuations in temperature. These changes are often mediated by epigenetic mechanisms, allowing the modification and modulation of gene expression [ 68 , 69 ].

Thus, from an evolutionary perspective, DNA, is sensitive to changes in temperature. The mechanism of this sensitivity has been shown to be primarily epigenetic in nature [ 69 ]. Increasing heat results in modifications to histone deacetylation and DNA methylation [ 69 ]. This is required to provide fast-acting adaptions to acute stressors, but can have long-term effects too [ 70 ]. Thus, it is likely that humans are sensitive to changes in temperature, which can alter epigenetic modifications, and thus our exposome. This sensitivity, may have provided a survival benefit in times of increasing heat, or it may simply be a vestigial function which provides no survival benefit, and may in fact have detrimental effects [ 71 ]. Epigenetic changes have been shown to have significant effects on metabolic diseases and risk profiles, and an in-depth review is provided by Wu et al. [ 72 ]. The exact processes and genes involved would be an area requiring further research, where similar research exists on the effects of nutrition on exact epigenetic pathways [ 73 ]. An important pattern requiring further research involves the effect heat may have on neurodevelopment [ 67 , 74 ]. The above pathways provide additional mechanisms for the long-term lag between exposure-outcome pathways. In addition, acute heat exposure at the time of birth has been associated with various possibly pathogenic mechanisms such as preterm birth [ 5 ], low APGAR scores [ 75 ] and foetal distress [ 76 ], as well as a possible effect on the maternal microbiome and the seeding thereof to the neonate [ 10 , 64 , 77 , 78 ]. These effects, can all provide plausible causes for the long-term outcomes observed through short-term insults. The interplay of these, and additional factors is highlighted in Fig. 5 [ 79 ]. Importantly, the periods of vulnerability are likely different for these various pathways, but specific outcomes may have multiple periods of vulnerability through different pathways.

figure 5

Causal pathways

The outcomes associated with increasing heat exposure highlight the health, social, and economic cost of global warming, establishing current estimates and future predictions for this are beyond the scope of this research but would provide a valuable area for future research. This would entail estimating disease-burden due to climate change through attribution studies. Traditional health impact studies conflate adverse outcomes from natural variations in climate (‘noise’) with adverse outcomes from anthropogenic climate change. However, not every climate-related adverse outcome is the result of anthropogenic climate change, and these effects are likely different in vulnerable populations. This highlights the benefit of studying and implementing effective heat adaptation strategies in areas where the greatest effect is likely to be observed, and where the greatest impacts in lessening the economic and human impact of global warming are possible [ 80 , 81 ].


The difficulty in assessing the data is compounded by the heterogenous measures of heat exposure. No studies used widely accepted heat exposure indices that consider important environmental modifying factors like humidity and windspeed [ 82 , 83 ]. In addition, effect modifiers, heat acclimatisation and adaptation strategies were seldom considered [ 84 , 85 , 86 ]. It may be prudent for future studies to consider the measure of ionizing radiation exposure as an analogous environmental exposure, where different measures exist for the intensity, total quantity (a function of duration of exposure) and biologically-adjusted quantity absorbed [ 87 ]. Differing time-periods of exposure made it difficult to evaluate specific periods of sensitivity, which are likely different for various outcomes, depending on critical periods of development.

Despite consistency across different contexts in this review, the analysis of the distribution of the included studies highlights the unequal weight of studies towards relatively cooler climates, in regions with higher socioeconomic levels and likely greater heat adaptation uptake, and must therefore be interpreted in this context. It is possible that myriad factors that differ geographically, including physiological and socio-economic differences, will influence the effects of heat, and thus there is likely no underlying universal truth to associations and effect estimates.

Quantifying, describing and comparing the effect size across studies was rendered more difficult due to heterogenous statistical analyses.

Although some studies adjusted for possible confounding variables, not all reported on this, with the effects of seasonal, foetal, and maternal biological factors that may not lie on the causal pathway seldom considered [ 3 , 5 , 9 , 88 , 89 , 90 , 91 , 92 ].

Data extraction and assessment of risk of bias was not uniformly undertaken in duplicate due to resource constraints, which may predispose to extraction errors or bias. The high risk of bias of included studies, limits the utility of the overall assessment of effects and suggestions for further action. In addition, publication bias is likely skewing the results towards statistically significant detrimental results, with studies with smaller sample sizes not necessarily showing wider distribution of findings as would be expected.

Climate change, and in particular, global warming, is a significant emerging global public health threat, with far reaching, and disproportionate effects on the most vulnerable populations. The effects of increasing heat exposure in utero are associated with, and possibly causal in, wide-ranging long-term impacts on socioeconomic and health outcomes with a significant cost associated with increasing global temperatures. This association is as a result of a complex interplay of factors, including through direct and indirect effects on the mother and foetus. Further research is urgently required to elicit biological pathways, and targets for intervention as well as predicting future disease-burden and economic impacts through attribution studies.

Availability of data and materials

This study was a review of publicly available information data, with references to data sources made in the reference list.


Apparent Temperature

Body Mass Index

Blood Pressure

Coronary Heart Disease

Confidence Interval

Diastolic Blood Pressure

Eating Disorder Inventory

Functional Residual Capacity

Interquartile Range


Respiratory Rate

Systolic Blood Pressure

Standard Error

Telomere Length

World Health Organisation

Atwoli L, A HB, Benfield T, et al. Call for emergency action to limit global temperature increases, restore biodiversity and protect health. BMJ Lead. 2022;6(1):1–3.

Article   PubMed   Google Scholar  

Organization WM, World Meteorological O. WMO global annual to decadal climate update (Target years: 2023–2027). Geneva: WMO; 2023. p. 24.

Book   Google Scholar  

Lakhoo DP, Blake HA, Chersich MF, Nakstad B, Kovats S. The effect of high and low ambient temperature on infant health: a systematic review. Int J Environ Res Public Health. 2022;19(15):9109.

Article   PubMed   PubMed Central   Google Scholar  

Cissé G, McLeman R, Adams H, et al. Health, wellbeing and the changing structure of communities. In: Pörtner HO, Roberts DC, Tignor MMB, Poloczanska ES, Mintenbeck K, Alegría A, et al. editors. Climate change 2022: impacts, adaptation and vulnerability contribution of working group II to the sixth assessment report of the intergovernmental panel on climate change. Cambridge: Cambridge University Press; 2022.

Chersich MF, Pham MD, Areal A, et al. Associations between high temperatures in pregnancy and risk of preterm birth, low birth weight, and stillbirths: systematic review and meta-analysis. BMJ. 2020;371:m3811.

Birkmann J, Liwenga E, Pandey R, et al. Poverty, livelihoods and sustainable development. In: Pörtner HO, Roberts DC, Tignor MMB, Poloczanska ES, Mintenbeck K, Alegría A, et al. editors. Climate change 2022: impacts, adaptation and vulnerability contribution of working group II to the sixth assessment report of the intergovernmental panel on climate change. Cambridge: Cambridge University Press; 2022.

Watts N, Amann M, Ayeb-Karlsson S, et al. The Lancet countdown on health and climate change: from 25 years of inaction to a global transformation for public health. Lancet. 2018;391(10120):581–630.

Campbell-Lendrum D, Manga L, Bagayoko M, Sommerfeld J. Climate change and vector-borne diseases: what are the implications for public health research and policy? Philos Trans R Soc Lond B Biol Sci. 2015;370(1665):20130552.

Haghighi MM, Wright CY, Ayer J, et al. Impacts of high environmental temperatures on congenital anomalies: a systematic review. Int J Environ Res Public Health. 2021;18(9):4910.

Puthota J, Alatorre A, Walsh S, Clemente JC, Malaspina D, Spicer J. Prenatal ambient temperature and risk for schizophrenia. Schizophr Res. 2022;247:67–83.

Barker DJ. Intrauterine programming of coronary heart disease and stroke. Acta Paediatr Suppl. 1997;423:178–82; discussion 83.

Article   CAS   PubMed   Google Scholar  

Hales CN, Barker DJ. The thrifty phenotype hypothesis. Br Med Bull. 2001;60:5–20.

Manyuchi A, Dhana A, Areal A, et al. Title: systematic review to quantify the impacts of heat on health, and to assess the effectiveness of interventions to reduce these impacts. PROSPERO: International prospective register of systematic reviews. 2019, 42019140136. Available from: https://www.crd.york.ac.uk/PROSPEROFILES/118113_PROTOCOL_20181129.pdf .

Thomas J, Brunton J, Graziosi S. EPPI-Reviewer 4.0: software for research synthesis. EPPI Centre Software. London: Social Science Research Unit, Institute of Education, University of London; 2010.

Campbell M, McKenzie JE, Sowden A, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368:l6890.

Beck HE, Zimmermann NE, McVicar TR, Vergopolan N, Berg A, Wood EF. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci Data. 2018;5(1):180214.

Barker TH, Stone JC, Sears K, et al. Revising the JBI quantitative critical appraisal tools to improve their applicability: an overview of methods and the development process. JBI Evid Synth. 2023;21(3):478.

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

World Bank Country and Lending Groups – World Bank Data Help Desk. Retrieved: June 01 2023. Available from: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups-files/112/906519-world-bank-country-and-lending-groups.html .

Home | Climate change knowledge portal. Retrieved: June 01 2023. Available from: https://climateknowledgeportal.worldbank.org/files/26/climateknowledgeportal.worldbank.org.html .

Harris I, Osborn TJ, Jones P, Lister D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci Data. 2020;7(1):109.

Fishman R, Carrillo P, Russ J. Long-term impacts of exposure to high temperatures on human capital and economic productivity. J Environ Econ Manag. 2019;93:221–38.

Article   Google Scholar  

Hu Z, Li T. Too hot to handle: the effects of high temperatures during pregnancy on adult welfare outcomes. J Environ Econ Manag. 2019;94:236–53.

Wilde J, Apouey BH, Jung T. The effect of ambient temperature shocks during conception and early pregnancy on later life outcomes. Eur Econ Rev. 2017;97:87–107.

Duchoslav J. Prenatal temperature shocks reduce cooperation: evidence from public goods games in Uganda. Front Behav Neurosci. 2017;11: 249.

Isen A, Rossin-Slater M, Walker R. Relationship between season of birth, temperature exposure, and later life wellbeing. Proc Natl Acad Sci U S A. 2017;114(51):13447–52.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lawlor DA, Davey Smith G, Mitchell R, Ebrahim S. Temperature at birth, coronary heart disease, and insulin resistance: cross sectional analyses of the British women’s heart and health study. Heart. 2004;90(4):381–8.

Taha-Khalde A, Haim A, Karakis I, et al. Air pollution and meteorological conditions during gestation and type 1 diabetes in offspring. Environ Int. 2021;154: 106546.

Ho JY. Early-life environmental exposures and height, hypertension, and cardiovascular risk factors among older adults in India. Biodemography Soc Biol. 2015;61(2):121–46.

Warembourg C, Maitre L, Tamayo-Uria I, et al. Early-life environmental exposures and blood pressure in children. J Am Coll Cardiol. 2019;74(10):1317–28.

Warembourg C, Nieuwenhuijsen M, Ballester F, et al. Urban environment during early-life and blood pressure in young children. Environ Int. 2021;146: 106174.

Schreier N, Moltchanova E, Forsen T, Kajantie E, Eriksson JG. Seasonality and ambient temperature at time of conception in term-born individuals - influences on cardiovascular disease and obesity in adult life. Int J Circumpolar Health. 2013;72: 21466.

Zhang J, Bai S, Lin S, et al. Maternal apparent temperature during pregnancy on the risk of offspring asthma and wheezing: effect, critical window, and modifiers. Environ Sci Pollut Res Int. 2023;30(22):62924–37.

Guilbert A, Hough I, Seyve E, et al. Association of prenatal and postnatal exposures to warm or cold air temperatures with lung function in young infants. JAMA Netw Open. 2023;6(3):e233376.

Zheng X, Kuang J, Lu C, et al. Preconceptional and prenatal exposure to diurnal temperature variation increases the risk of childhood pneumonia. BMC Pediatr. 2021;21(1):192.

van Hanswijck de Jonge L, Stettler N, Kumanyika S, Stoa Birketvedt G, Waller G. Environmental temperature during gestation and body mass index in adolescence: new etiologic clues? Int J Obes Relat Metab Disord. 2002;26(6):765–9.

Randell H, Gray C, Grace K. Stunted from the start: early life weather conditions and child undernutrition in Ethiopia. Soc Sci Med. 2020;261: 113234.

Templer DI, Austin RK. Confirmation of relationship between temperature and the conception and birth of schizophrenics. J Orthomolecular Psychiatr. 1980;9(3):220-2.

Hare E, Moran P. A relation between seasonal temperature and the birth rate of schizophrenic patients. Acta Psychiatr Scand. 1981;63(4):396–405.

Watson CG, Kucala T, Tilleskjor C, Jacobs L. Schizophrenic birth seasonality in relation to the incidence of infectious diseases and temperature extremes. Arch Gen Psychiatry. 1984;41(1):85–90.

Tatsumi M, Sasaki T, Iwanami A, Kosuga A, Tanabe Y, Kamijima K. Season of birth in Japanese patients with schizophrenia. Schizophr Res. 2002;54(3):213–8.

McNeil T, Dalén P, Dzierzykray-Rogalska M, Kaij L. Birthrates of schizophrenics following relatively warm versus relatively cool summers. Arch Psychiat Nervenkr. 1975;221(1):1–10.

Watkins B, Willoughby K, Waller G, Serpell L, Lask B. Pattern of birth in anorexia nervosa I: early-onset cases in the United Kingdom. Int J Eat Disord. 2002;32(1):11–7.

Waller G, Watkins B, Potterton C, et al. Pattern of birth in adults with anorexia nervosa. J Nerv Ment Dis. 2002;190(11):752.

Willoughby K, Watkins B, Beumont P, Maguire S, Lask B, Waller G. Pattern of birth in anorexia nervosa. II: a comparison of early-onset cases in the southern and northern hemispheres. Int J Eat Disord. 2002;32(1):18–23.

van Hanswijck L, Meyer C, Smith K, Waller G. Environmental temperature during pregnancy and eating attitudes during teenage years: a replication and extension study. Int J Eat Disord. 2001;30(4):413–20.

van Hanswijck L, Waller G. Influence of environmental temperatures during gestation and at birth on eating characteristics in adolescence: a replication and extension study. Appetite. 2002;38(3):181–7.

Waller G, Meyer C, van Hanswijck de Jonge L. Early environmental influences on restrictive eating pathology among nonclinical females: the role of temperature at birth. Int J Eat Disord. 2001;30(2):204–8.

Boland MR, Parhi P, Li L, et al. Uncovering exposures responsible for birth season – disease effects: a global study. J Am Med Inform Assoc. 2018;25(3):275–88.

Flouris AD, Spiropoulos Y, Sakellariou GJ, Koutedakis Y. Effect of seasonal programming on fetal development and longevity: links with environmental temperature. Am J Hum Biol. 2009;21(2):214–6.

Martens DS, Plusquin M, Cox B, Nawrot TS. Early biological aging and fetal exposure to high and low ambient temperature: a birth cohort study. Environ Health Perspect. 2019;127(11):117001.

Samuels L, Nakstad B, Roos N, et al. Physiological mechanisms of the impact of heat during pregnancy and the clinical implications: review of the evidence from an expert group meeting. Int J Biometeorol. 2022;66(8):1505–13.

Ravanelli N, Casasola W, English T, Edwards KM, Jay O. Heat stress and fetal risk. Environmental limits for exercise and passive heat stress during pregnancy: a systematic review with best evidence synthesis. Br J Sports Med. 2019;53(13):799–805.

Islam SN, Winkel J. Climate change and social inequality, DESA working paper no. 152, United Nations Department of Economic and Social Affiars (2017). Available from: https://www.un.org/esa/desa/papers/2017/wp152_2017.pdf .

Rothman KJ, Greenland S. Causation and causal inference in epidemiology. Am J Public Health. 2005;95(S1):S144–150.

Fedak KM, Bernal A, Capshaw ZA, Gross S. Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology. Emerg Themes Epidemiol. 2015;12(1):14.

Hill AB. The environment and disease: association or causation? J R Soc Med. 2015;108(1):32–7.

Gluckman PD, Hanson MA, Cooper C, Thornburg KL. Effect of in Utero and early-life conditions on adult health and disease. N Engl J Med. 2008;359(1):61–73.

Duley L. The global impact of pre-eclampsia and eclampsia. Semin Perinatol. 2009;33(3):130–7.

Bianco ME, Kuang A, Josefson JL, et al. Hyperglycemia and adverse pregnancy outcome follow-up study: newborn anthropometrics and childhood glucose metabolism. Diabetologia. 2021;64(3):561–70.

Fisher SJ. The placental problem: linking abnormal cytotrophoblast differentiation to the maternal symptoms of preeclampsia. Reprod Biol Endocrinol. 2004;2: 53.

Steegers EAP, von Dadelszen P, Duvekot JJ, Pijnenborg R. Pre-eclampsia. Lancet. 2010;376(9741):631–44.

von Dadelszen P, Ornstein MP, Bull SB, Logan AG, Koren G, Magee LA. Fall in mean arterial pressure and fetal growth restriction in pregnancy hypertension: a meta-analysis. Lancet. 2000;355(9198):87–92.

Bonell A, Sonko B, Badjie J, et al. Environmental heat stress on maternal physiology and fetal blood flow in pregnant subsistence farmers in the Gambia, West Africa: an observational cohort study. Lancet Planet Health. 2022;6(12):e968–76.

Redline RW. Placental inflammation. Semin Neonatol. 2004;9(4):265–74.

Alsop FM. The effect of abnormal temperatures upon the developing nervous system in the chick embryos. Anat Rec. 1919;15(6):306–31.

Edwards MJ. Hyperthermia as a teratogen: a review of experimental studies and their clinical significance. Teratog Carcinog Mutagen. 1986;6(6):563–82.

Nicotra AB, Atkin OK, Bonser SP, et al. Plant phenotypic plasticity in a changing climate. Trends Plant Sci. 2010;15(12):684–92.

McCaw BA, Stevenson TJ, Lancaster LT. Epigenetic responses to temperature and climate. Integr Comp Biol. 2020;60(6):1469–80.

Horowitz M. Epigenetics and cytoprotection with heat acclimation. J Appl Physiol (1985). 2016;120(6):702–10.

Murray KO, Clanton TL, Horowitz M. Epigenetic responses to heat: from adaptation to maladaptation. Exp Physiol. 2022;107(10):1144–58.

Wu YL, Lin ZJ, Li CC, et al. Epigenetic regulation in metabolic diseases: mechanisms and advances in clinical study. Sig Transduct Target Ther. 2023;8:98.

Sookoian S, Gianotti TF, Burgueño AL, Pirola CJ. Fetal metabolic programming and epigenetic modifications: a systems biology approach. Pediatr Res. 2013;73(2):531–42.

Graham JM, Marshall J. Edwards: discoverer of maternal hyperthermia as a human teratogen. Birth Defects Res Clin Mol Teratol. 2005;73(11):857–64.

Article   CAS   Google Scholar  

Andalón M, Azevedo JP, Rodríguez-Castelán C, Sanfelice V, Valderrama-González D. Weather shocks and health at Birth in Colombia. World Dev. 2016;82:69–82.

Cil G, Cameron TA. Potential climate change health risks from increases in heat waves: abnormal birth outcomes and adverse maternal health conditions. Risk Anal. 2017;37(11):2066–79.

Huus KE, Ley RE. Blowing hot and cold: body temperature and the microbiome. mSystems. 2021;6(5):e00707–21.

Wen C, Wei S, Zong X, Wang Y, Jin M. Microbiota-gut-brain axis and nutritional strategy under heat stress. Anim Nutr. 2021;7(4):1329–36.

ACOG Committee. Physical activity and exercise during pregnancy and the postpartum period, ACOG Committee Opinion No. 804. Obstet Gynecol. 2020;135e:88.

Google Scholar  

Lee H, Calvin K, Dasgupta D, et al. Synthesis report of the IPCC Sixth Assessment Report (AR6). Geneva: IPCC; 2023. p. 35–115.

Stern N. Economics: current climate models are grossly misleading. Nature. 2016;530(7591):407–9.

McGregor GR, Vanos JK. Heat: a primer for public health researchers. Public Health. 2018;161:138–46.

Gao C, Kuklane K, Östergren P-O, Kjellstrom T. Occupational heat stress assessment and protective strategies in the context of climate change. Int J Biometeorol. 2018;62(3):359–71.

Alele F, Malau-Aduli B, Malau-Aduli A, Crowe M. Systematic review of gender differences in the epidemiology and risk factors of exertional heat illness and heat tolerance in the armed forces. BMJ Open. 2020;10(4): e031825.

Khosla R, Jani A, Perera R. Health risks of extreme heat. BMJ. 2021;375:n2438.

Spector JT, Masuda YJ, Wolff NH, Calkins M, Seixas N. Heat exposure and occupational injuries: review of the literature and implications. Curr Environ Health Rep. 2019;6(4):286–96.

Measuring Radiation. Centers for disease control and prevention. 2015. Retrieved: May 15 2023. Available from: https://www.cdc.gov/nceh/radiation/measuring.html .

Molina-Vega M, Gutiérrez-Repiso C, Muñoz-Garach A, et al. Relationship between environmental temperature and the diagnosis and treatment of gestational diabetes mellitus: an observational retrospective study. Sci Total Environ. 2020;744: 140994.

Su W-L, Lu C-L, Martini S, Hsu Y-H, Li C-Y. A population-based study on the prevalence of gestational diabetes mellitus in association with temperature in Taiwan. Sci Total Environ. 2020;714: 136747.

Part C, le Roux J, Chersich M, et al. Ambient temperature during pregnancy and risk of maternal hypertensive disorders: a time-to-event study in Johannesburg, South Africa. Environ Res. 2022;212: 113596 (Pt D)).

Shashar S, Kloog I, Erez O, et al. Temperature and preeclampsia: epidemiological evidence that perturbation in maternal heat homeostasis affects pregnancy outcome. PLoS One. 2020;15(5): e0232877.

Hajdu T, Hajdu G. Post-conception heat exposure increases clinically unobserved pregnancy losses. Sci Rep. 2021;11(1):1987.

Download references

This research was funded through the HE2AT Centre, a grant supported by the NIH Common Fund and NIEHS, which is managed by the Fogarty International Centre NIH award number: 1U54TW012083-01, and has received funding through the HIGH Horizons project from the European Union’s Horizon Framework Programme under Grant Agreement No. 101057843. Neither funding group influenced the methodology or reporting of this review.

Author information

Authors and affiliations.

Climate and Health Directorate, Wits RHI, University of the Witwatersrand, Johannesburg, South Africa

Nicholas Brink, Darshnika P. Lakhoo, Ijeoma Solarin, Gloria Maimela & Matthew F. Chersich

King’s College, London, United Kingdom

Peter von Dadelszen

MRC Developmental Pathways for Health Research Unit, University of the Witwatersrand, Johannesburg, South Africa

Shane Norris

You can also search for this author in PubMed   Google Scholar

  • Admire Chikandiwa
  • , Britt Nakstad
  • , Caradee Y. Wright
  • , Lois Harden
  • , Nathalie Roos
  • , Stanley M. F. Luchters
  • , Cherie Part
  • , Ashtyn Areal
  • , Marjan Mosalam Haghighi
  • , Albert Manyuchi
  • , Melanie Boeckmann
  • , Minh Duc Pham
  • , Robyn Hetem
  •  & Dilara Durusu


NB updated the literature search, data extraction, created figures and compiled initial and final drafts, DL updated the literature search, data extraction and reviewed initial and final drafts, IS updated the literature search, data extraction and reviewed initial and final drafts, GM reviewed initial and final drafts, PvD and SN reviewed the final draft and provided input on the causal pathways. MFC conceptualised the research, conducted the initial literature search and data extraction, and reviewed initial and final drafts. The climate and heat-health study group contributed to conceptualisation, literature search, extraction and reviewed the final draft. All authors have reviewed and approve the final manuscript.

Corresponding author

Correspondence to Nicholas Brink .

Ethics declarations

Ethics approval and consent to participate.

This study was a review of publicly available information, and did not require review or approval by an ethics board.

Consent for publication

Not applicable.

Competing interests

NB holds investments in companies involved in the production, distribution and use of fossil-fuels through managed funds and indices. MFC and DL hold investments in the fossil fuel industry through their pension fund, as per the policy of the Wits Health Consortium. The University of the Witwatersrand holds investments in the fossil fuel industry through their endowments and other financial reserves.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Full data extraction table including risk of bias and GRADE assessments.

Additional file 2.

Search terms for Medline(PubMed) and Web of Science.

Additional file 3.

Author list for Climate and Heat-Health Study Group. Individual JBI risk of bias assessment forms, and excluded studies metadata from EPPI reviewer available on request.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Brink, N., Lakhoo, D.P., Solarin, I. et al. Impacts of heat exposure in utero on long-term health and social outcomes: a systematic review. BMC Pregnancy Childbirth 24 , 344 (2024). https://doi.org/10.1186/s12884-024-06512-0

Download citation

Received : 26 October 2023

Accepted : 11 April 2024

Published : 04 May 2024

DOI : https://doi.org/10.1186/s12884-024-06512-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Climate change
  • Heat exposure
  • Long-term effects
  • Socioeconomic impact
  • Maternal health
  • Child health
  • Epigenetics
  • Metabolic disease

BMC Pregnancy and Childbirth

ISSN: 1471-2393

case study and meta analysis


  1. What is a Meta-Analysis? The benefits and challenges

    case study and meta analysis

  2. A guide to prospective meta-analysis

    case study and meta analysis

  3. Τμήμα Οικονομικών Επιστημών

    case study and meta analysis

  4. (PDF) Case study meta‐analysis in the social sciences. Insights on data

    case study and meta analysis

  5. Meta-Analysis Methodology for Basic Research: A Practical Guide

    case study and meta analysis

  6. 1 What is meta-analysis?

    case study and meta analysis


  1. 7-6 How to do a systematic review or a meta-analysis with HubMeta: Outlier Analysis

  2. different type of studies in #pharmacovigilance Post authorization safety study #PASS

  3. 1-4 How to do a systematic review or a meta-analysis with HubMeta: Understanding HubMeta's Dashboard

  4. 4-2 How to do a systematic review or a meta-analysis with HubMeta: Fulltext Screening in HubMeta

  5. Google aur Meta ki Monopoly😱 #google #meta #casestudy #seekho #seekhoshorts

  6. 7-2 How to do a systematic review or a meta-analysis in HubMeta:Meta-analysis calculations/ packages


  1. How to conduct a meta-analysis in eight steps: a practical guide

    In a qualitative meta-analysis, the identified case studies are systematically coded in a meta-synthesis protocol, ... Wood JA (2008) Methodology for dealing with duplicate study effects in a meta-analysis. Organ Res Methods 11(1):79-95. Article Google Scholar Download references. Funding. Open Access funding enabled and organized by Projekt ...

  2. Case study meta‐analysis in the social sciences. Insights on data

    5 LESSONS FOR DESIGN CHOICES IN CASE STUDY META-ANALYSIS. Data derived from our case survey proved largely robust, displaying high degrees of inter-rater reliability and agreement, and only limited effects of the distorting factors tested for. Based on these findings, we identify a number of critical design choices will influence the quality ...

  3. How to Perform a Meta-analysis: a Practical Step-by-step Guide Using R

    By typing forest (meta-analysis name), RStudio will create a forest plot of the meta-analysis. In this case type in the console: forest (metanalisetestex) If you want to omit the result/diamond of the fixed model from the forest plot (Figure 12), set the comb.fixed argument to false by typing the following command line in the console:

  4. Chapter 10: Analysing data and undertaking meta-analyses

    Many judgements are required in the process of preparing a meta-analysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions. Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking meta-analyses.

  5. Introduction to systematic review and meta-analysis

    A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective ...

  6. PDF How to conduct a meta-analysis in eight steps: a practical guide

    who have little experience with meta-analysis as a method but plan to conduct one in the future. 2 Eight steps in conducting a meta‑analysis 2.1 Step 1: dening the research question The rst step in conducting a meta-analysis, as with any other empirical study, is the denition of the research question.

  7. Understanding the Practice, Application, and Limitations of Meta-Analysis

    What the meta-analysis should provide is a commentary of the relevant limitations. The function of meta-analysis in this case becomes an agenda setting function by pointing out how to design the 201st study to advance the understanding of the issue (Eagly & Wood, 1994). The limitations discussion in a meta-analysis should immediately point to ...

  8. Meta-analysis and the science of research synthesis

    Meta-analysis is the quantitative, scientific synthesis of research results. Since the term and modern approaches to research synthesis were first introduced in the 1970s, meta-analysis has had a ...

  9. The 5 min meta-analysis: understanding how to read and ...

    Tip 1: Know the type of outcome than. There are differences in a forest plot depending on the type of outcomes. For a continuous outcome, the mean, standard deviation and number of patients are ...

  10. Meta-Analytic Methodology for Basic Research: A Practical Guide

    The goal of this study is to present a brief theoretical foundation, computational resources and workflow outline along with a working example for performing systematic or rapid reviews of basic research followed by meta-analysis. Conventional meta-analytic techniques are extended to accommodate methods and practices found in basic research.

  11. Research Guides: Study Design 101: Meta-Analysis

    This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive ...

  12. PDF Meta-Analysis

    Meta-analysis is applicable to collections of research that. examine the same constructs and relationships. have findings that can be configured in a comparable statistical form (e.g., as effect sizes, correlation coefficients, odds-ratios, etc.) are "comparable" given the question at hand. Objective of study (effect or variability ...

  13. The Role of Meta-Analysis in Scientific Studies

    Here are just a few benefits of meta-analysis: It has greater statistical power and the ability to extrapolate to the broader population. It is evidence-based. It is more likely to show an effect because smaller studies are combined into one larger study. It has better accuracy (because smaller studies are pooled and analyzed).

  14. A case study of an individual participant data meta-analysis of

    This study aimed to investigate heterogeneity via prediction regions in an individual participant data meta-analysis of the sensitivity and specificity of the Patient Health Questionnaire-9 for ...

  15. Meta-analysis: a case study

    Meta-analysis: a case study Eval Rev. 2005 Apr;29(2):87-127. doi: 10.1177/0193841X04272555. Author Derek C Briggs 1 Affiliation 1 University of Colorado, Boulder, USA. PMID: 15731508 DOI: 10.1177/0193841X04272555 Abstract This article raises some questions about the usefulness of meta-analysis as a means of reviewing quantitative research in ...

  16. Recommendations for light-dosimetry field studies based on a meta

    Firstly, a case study meta-analysis is presented that aimed to identify from published studies whether office workers receive the most recently recommended daytime PLL within offices. Light-dosimetry field studies that report in-office PLL were gathered through a literature search, their data were aggregated and compared with the recently ...

  17. Causally-interpretable meta-analysis: clearly-defined causal effects

    Traditional meta-analysis methods Traditional meta-analysis models include common-effect, fixed-effects, and random-effects meta-analysis. 2 Each model's weighting method and target parameter depend on the assumptions made about the effect-size parameters in individual studies. The common-effect approach assumes all studies share the same effect.

  18. Automatic data extraction to support meta-analysis statistical analysis

    A meta-analysis is a statistical analysis that combines the results of different studies that are all focused on same disease, treatment, or outcome to determine if a treatment is effective or not. Meta-analyses provide the best form of medical evidence and are an essential tool for enabling evidence-based medicine and clinical and health ...

  19. Systematic Reviews and Meta-analysis: Understanding the Best Evidence

    Meta-analyses are studies of studies. Meta-analysis provides a logical framework to a research review where similar measures from comparable studies are listed systematically and the available effect measures are combined wherever possible. The fundamental rationale of meta-analysis is that it reduces the quantity of data by summarizing data ...

  20. Combined and progestagen-only hormonal contraceptives and breast ...

    In a UK nested case-control study and subsequent meta-analysis, Kirstin Pirie and colleagues explore the association between combined and progestogen-only hormonal contraceptives and the risk of breast cancer.

  21. The use of fixed study main effects in arm‐based network meta‐analysis

    Arm-based network meta-analysis may recover inter-study information when all study-specific effect area modelled as random. Recovery of such information violates the principle of concurrent control. ... This is easiest to discuss further by first considering the special case of the random-effects specification where the variance-covariance ...

  22. Meta-Analysis

    This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A meta-analysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive ...

  23. A brief introduction of meta‐analyses in clinical practice and research

    Therefore, cumulative meta‐analysis was developed, which adds studies to a meta‐analysis based on a predetermined order and then tracks the magnitude of the mean effect and its variance. 25 A cumulative meta‐analysis can be performed multiple times; ... Even in the ideal case in which all relevant studies are available, a faulty search ...

  24. A meta-analysis on global change drivers and the risk of infectious

    The list of studies associated with biodiversity change was based on a previous study 3, which combined studies from four meta-analyses (details are provided in Supplementary Table 1 and ref. 3 ...

  25. Impact of the use of cannabis as a medicine in pregnancy, on the unborn

    Selected studies will then be assessed by at least two independent researchers for risk bias assessment using validated tools. Data will be extracted and analysed following a systematic review and meta-analysis methodology. The statistical analysis will combine three or more outcomes that are reported in a consistent manner.

  26. Association of Breastfeeding and Early Childhood Caries: A ...

    A total of 31 studies (22 cohort studies and 9 case-control studies) were included in this review. The meta-analysis of the case-control studies showed statistically significant fewer dental caries in children who were breastfed for < 6 months compared to those who were breastfed for ≥6 months (OR = 0.53, 95% CI 0.41-0.67, p < 0.001).

  27. Impacts of heat exposure in utero on long-term health and social

    A total of six studies reported on behaviour, educational and socioeconomic outcomes, which were detrimentally affected by increases in heat exposure (Fig. 4; Table 1), although the quality of the evidence was very low.End-points were not uniform, but included earnings, completion of secondary school or higher education, number of years of schooling, and gamified cooperation-rates in a public ...