U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Quantitative and Qualitative Approaches to Generalization and Replication–A Representationalist View

In this paper, we provide a re-interpretation of qualitative and quantitative modeling from a representationalist perspective. In this view, both approaches attempt to construct abstract representations of empirical relational structures. Whereas quantitative research uses variable-based models that abstract from individual cases, qualitative research favors case-based models that abstract from individual characteristics. Variable-based models are usually stated in the form of quantified sentences (scientific laws). This syntactic structure implies that sentences about individual cases are derived using deductive reasoning. In contrast, case-based models are usually stated using context-dependent existential sentences (qualitative statements). This syntactic structure implies that sentences about other cases are justifiable by inductive reasoning. We apply this representationalist perspective to the problems of generalization and replication. Using the analytical framework of modal logic, we argue that the modes of reasoning are often not only applied to the context that has been studied empirically, but also on a between-contexts level. Consequently, quantitative researchers mostly adhere to a top-down strategy of generalization, whereas qualitative researchers usually follow a bottom-up strategy of generalization. Depending on which strategy is employed, the role of replication attempts is very different. In deductive reasoning, replication attempts serve as empirical tests of the underlying theory. Therefore, failed replications imply a faulty theory. From an inductive perspective, however, replication attempts serve to explore the scope of the theory. Consequently, failed replications do not question the theory per se , but help to shape its boundary conditions. We conclude that quantitative research may benefit from a bottom-up generalization strategy as it is employed in most qualitative research programs. Inductive reasoning forces us to think about the boundary conditions of our theories and provides a framework for generalization beyond statistical testing. In this perspective, failed replications are just as informative as successful replications, because they help to explore the scope of our theories.

Introduction

Qualitative and quantitative research strategies have long been treated as opposing paradigms. In recent years, there have been attempts to integrate both strategies. These “mixed methods” approaches treat qualitative and quantitative methodologies as complementary, rather than opposing, strategies (Creswell, 2015 ). However, whilst acknowledging that both strategies have their benefits, this “integration” remains purely pragmatic. Hence, mixed methods methodology does not provide a conceptual unification of the two approaches.

Lacking a common methodological background, qualitative and quantitative research methodologies have developed rather distinct standards with regard to the aims and scope of empirical science (Freeman et al., 2007 ). These different standards affect the way researchers handle contradictory empirical findings. For example, many empirical findings in psychology have failed to replicate in recent years (Klein et al., 2014 ; Open Science, Collaboration, 2015 ). This “replication crisis” has been discussed on statistical, theoretical and social grounds and continues to have a wide impact on quantitative research practices like, for example, open science initiatives, pre-registered studies and a re-evaluation of statistical significance testing (Everett and Earp, 2015 ; Maxwell et al., 2015 ; Shrout and Rodgers, 2018 ; Trafimow, 2018 ; Wiggins and Chrisopherson, 2019 ).

However, qualitative research seems to be hardly affected by this discussion. In this paper, we argue that the latter is a direct consequence of how the concept of generalizability is conceived in the two approaches. Whereas most of quantitative psychology is committed to a top-down strategy of generalization based on the idea of random sampling from an abstract population, qualitative studies usually rely on a bottom-up strategy of generalization that is grounded in the successive exploration of the field by means of theoretically sampled cases.

Here, we show that a common methodological framework for qualitative and quantitative research methodologies is possible. We accomplish this by introducing a formal description of quantitative and qualitative models from a representationalist perspective: both approaches can be reconstructed as special kinds of representations for empirical relational structures. We then use this framework to analyze the generalization strategies used in the two approaches. These turn out to be logically independent of the type of model. This has wide implications for psychological research. First, a top-down generalization strategy is compatible with a qualitative modeling approach. This implies that mainstream psychology may benefit from qualitative methods when a numerical representation turns out to be difficult or impossible, without the need to commit to a “qualitative” philosophy of science. Second, quantitative research may exploit the bottom-up generalization strategy that is inherent to many qualitative approaches. This offers a new perspective on unsuccessful replications by treating them not as scientific failures, but as a valuable source of information about the scope of a theory.

The Quantitative Strategy–Numbers and Functions

Quantitative science is about finding valid mathematical representations for empirical phenomena. In most cases, these mathematical representations have the form of functional relations between a set of variables. One major challenge of quantitative modeling consists in constructing valid measures for these variables. Formally, to measure a variable means to construct a numerical representation of the underlying empirical relational structure (Krantz et al., 1971 ). For example, take the behaviors of a group of students in a classroom: “to listen,” “to take notes,” and “to ask critical questions.” One may now ask whether is possible to assign numbers to the students, such that the relations between the assigned numbers are of the same kind as the relations between the values of an underlying variable, like e.g., “engagement.” The observed behaviors in the classroom constitute an empirical relational structure, in the sense that for every student-behavior tuple, one can observe whether it is true or not. These observations can be represented in a person × behavior matrix 1 (compare Figure 1 ). Given this relational structure satisfies certain conditions (i.e., the axioms of a measurement model), one can assign numbers to the students and the behaviors, such that the relations between the numbers resemble the corresponding numerical relations. For example, if there is a unique ordering in the empirical observations with regard to which person shows which behavior, the assigned numbers have to constitute a corresponding unique ordering, as well. Such an ordering coincides with the person × behavior matrix forming a triangle shaped relation and is formally represented by a Guttman scale (Guttman, 1944 ). There are various measurement models available for different empirical structures (Suppes et al., 1971 ). In the case of probabilistic relations, Item-Response models may be considered as a special kind of measurement model (Borsboom, 2005 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0001.jpg

Constructing a numerical representation from an empirical relational structure; Due to the unique ordering of persons with regard to behaviors (indicated by the triangular shape of the relation), it is possible to construct a Guttman scale by assigning a number to each of the individuals, representing the number of relevant behaviors shown by the individual. The resulting variable (“engagement”) can then be described by means of statistical analyses, like, e.g., plotting the frequency distribution.

Although essential, measurement is only the first step of quantitative modeling. Consider a slightly richer empirical structure, where we observe three additional behaviors: “to doodle,” “to chat,” and “to play.” Like above, one may ask, whether there is a unique ordering of the students with regard to these behaviors that can be represented by an underlying variable (i.e., whether the matrix forms a Guttman scale). If this is the case, we may assign corresponding numbers to the students and call this variable “distraction.” In our example, such a representation is possible. We can thus assign two numbers to each student, one representing his or her “engagement” and one representing his or her “distraction” (compare Figure 2 ). These measurements can now be used to construct a quantitative model by relating the two variables by a mathematical function. In the simplest case, this may be a linear function. This functional relation constitutes a quantitative model of the empirical relational structure under study (like, e.g., linear regression). Given the model equation and the rules for assigning the numbers (i.e., the instrumentations of the two variables), the set of admissible empirical structures is limited from all possible structures to a rather small subset. This constitutes the empirical content of the model 2 (Popper, 1935 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0002.jpg

Constructing a numerical model from an empirical relational structure; Since there are two distinct classes of behaviors that each form a Guttman scale, it is possible to assign two numbers to each individual, correspondingly. The resulting variables (“engagement” and “distraction”) can then be related by a mathematical function, which is indicated by the scatterplot and red line on the right hand side.

The Qualitative Strategy–Categories and Typologies

The predominant type of analysis in qualitative research consists in category formation. By constructing descriptive systems for empirical phenomena, it is possible to analyze the underlying empirical structure at a higher level of abstraction. The resulting categories (or types) constitute a conceptual frame for the interpretation of the observations. Qualitative researchers differ considerably in the way they collect and analyze data (Miles et al., 2014 ). However, despite the diverse research strategies followed by different qualitative methodologies, from a formal perspective, most approaches build on some kind of categorization of cases that share some common features. The process of category formation is essential in many qualitative methodologies, like, for example, qualitative content analysis, thematic analysis, grounded theory (see Flick, 2014 for an overview). Sometimes these features are directly observable (like in our classroom example), sometimes they are themselves the result of an interpretative process (e.g., Scheunpflug et al., 2016 ).

In contrast to quantitative methodologies, there have been little attempts to formalize qualitative research strategies (compare, however, Rihoux and Ragin, 2009 ). However, there are several statistical approaches to non-numerical data that deal with constructing abstract categories and establishing relations between these categories (Agresti, 2013 ). Some of these methods are very similar to qualitative category formation on a conceptual level. For example, cluster analysis groups cases into homogenous categories (clusters) based on their similarity on a distance metric.

Although category formation can be formalized in a mathematically rigorous way (Ganter and Wille, 1999 ), qualitative research hardly acknowledges these approaches. 3 However, in order to find a common ground with quantitative science, it is certainly helpful to provide a formal interpretation of category systems.

Let us reconsider the above example of students in a classroom. The quantitative strategy was to assign numbers to the students with regard to variables and to relate these variables via a mathematical function. We can analyze the same empirical structure by grouping the behaviors to form abstract categories. If the aim is to construct an empirically valid category system, this grouping is subject to constraints, analogous to those used to specify a measurement model. The first and most important constraint is that the behaviors must form equivalence classes, i.e., within categories, behaviors need to be equivalent, and across categories, they need to be distinct (formally, the relational structure must obey the axioms of an equivalence relation). When objects are grouped into equivalence classes, it is essential to specify the criterion for empirical equivalence. In qualitative methodology, this is sometimes referred to as the tertium comparationis (Flick, 2014 ). One possible criterion is to group behaviors such that they constitute a set of specific common attributes of a group of people. In our example, we might group the behaviors “to listen,” “to take notes,” and “to doodle,” because these behaviors are common to the cases B, C, and D, and they are also specific for these cases, because no other person shows this particular combination of behaviors. The set of common behaviors then forms an abstract concept (e.g., “moderate distraction”), while the set of persons that show this configuration form a type (e.g., “the silent dreamer”). Formally, this means to identify the maximal rectangles in the underlying empirical relational structure (see Figure 3 ). This procedure is very similar to the way we constructed a Guttman scale, the only difference being that we now use different aspects of the empirical relational structure. 4 In fact, the set of maximal rectangles can be determined by an automated algorithm (Ganter, 2010 ), just like the dimensionality of an empirical structure can be explored by psychometric scaling methods. Consequently, we can identify the empirical content of a category system or a typology as the set of empirical structures that conforms to it. 5 Whereas the quantitative strategy was to search for scalable sub-matrices and then relate the constructed variables by a mathematical function, the qualitative strategy is to construct an empirical typology by grouping cases based on their specific similarities. These types can then be related to one another by a conceptual model that describes their semantic and empirical overlap (see Figure 3 , right hand side).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-12-605191-g0003.jpg

Constructing a conceptual model from an empirical relational structure; Individual behaviors are grouped to form abstract types based on them being shared among a specific subset of the cases. Each type constitutes a set of specific commonalities of a class of individuals (this is indicated by the rectangles on the left hand side). The resulting types (“active learner,” “silent dreamer,” “distracted listener,” and “troublemaker”) can then be related to one another to explicate their semantic and empirical overlap, as indicated by the Venn-diagram on the right hand side.

Variable-Based Models and Case-Based Models

In the previous section, we have argued that qualitative category formation and quantitative measurement can both be characterized as methods to construct abstract representations of empirical relational structures. Instead of focusing on different philosophical approaches to empirical science, we tried to stress the formal similarities between both approaches. However, it is worth also exploring the dissimilarities from a formal perspective.

Following the above analysis, the quantitative approach can be characterized by the use of variable-based models, whereas the qualitative approach is characterized by case-based models (Ragin, 1987 ). Formally, we can identify the rows of an empirical person × behavior matrix with a person-space, and the columns with a corresponding behavior-space. A variable-based model abstracts from the single individuals in a person-space to describe the structure of behaviors on a population level. A case-based model, on the contrary, abstracts from the single behaviors in a behavior-space to describe individual case configurations on the level of abstract categories (see Table 1 ).

Variable-based models and case-based models.

From a representational perspective, there is no a priori reason to favor one type of model over the other. Both approaches provide different analytical tools to construct an abstract representation of an empirical relational structure. However, since the two modeling approaches make use of different information (person-space vs. behavior-space), this comes with some important implications for the researcher employing one of the two strategies. These are concerned with the role of deductive and inductive reasoning.

In variable-based models, empirical structures are represented by functional relations between variables. These are usually stated as scientific laws (Carnap, 1928 ). Formally, these laws correspond to logical expressions of the form

In plain text, this means that y is a function of x for all objects i in the relational structure under consideration. For example, in the above example, one may formulate the following law: for all students in the classroom it holds that “distraction” is a monotone decreasing function of “engagement.” Such a law can be used to derive predictions for single individuals by means of logical deduction: if the above law applies to all students in the classroom, it is possible to calculate the expected distraction from a student's engagement. An empirical observation can now be evaluated against this prediction. If the prediction turns out to be false, the law can be refuted based on the principle of falsification (Popper, 1935 ). If a scientific law repeatedly withstands such empirical tests, it may be considered to be valid with regard to the relational structure under consideration.

In case-based models, there are no laws about a population, because the model does not abstract from the cases but from the observed behaviors. A case-based model describes the underlying structure in terms of existential sentences. Formally, this corresponds to a logical expression of the form

In plain text, this means that there is at least one case i for which the condition XYZ holds. For example, the above category system implies that there is at least one active learner. This is a statement about a singular observation. It is impossible to deduce a statement about another person from an existential sentence like this. Therefore, the strategy of falsification cannot be applied to test the model's validity in a specific context. If one wishes to generalize to other cases, this is accomplished by inductive reasoning, instead. If we observed one person that fulfills the criteria of calling him or her an active learner, we can hypothesize that there may be other persons that are identical to the observed case in this respect. However, we do not arrive at this conclusion by logical deduction, but by induction.

Despite this important distinction, it would be wrong to conclude that variable-based models are intrinsically deductive and case-based models are intrinsically inductive. 6 Both types of reasoning apply to both types of models, but on different levels. Based on a person-space, in a variable-based model one can use deduction to derive statements about individual persons from abstract population laws. There is an analogous way of reasoning for case-based models: because they are based on a behavior space, it is possible to deduce statements about singular behaviors. For example, if we know that Peter is an active learner, we can deduce that he takes notes in the classroom. This kind of deductive reasoning can also be applied on a higher level of abstraction to deduce thematic categories from theoretical assumptions (Braun and Clarke, 2006 ). Similarly, there is an analog for inductive generalization from the perspective of variable-based modeling: since the laws are only quantified over the person-space, generalizations to other behaviors rely on inductive reasoning. For example, it is plausible to assume that highly engaged students tend to do their homework properly–however, in our example this behavior has never been observed. Hence, in variable-based models we usually generalize to other behaviors by means of induction. This kind of inductive reasoning is very common when empirical results are generalized from the laboratory to other behavioral domains.

Although inductive and deductive reasoning are used in qualitative and quantitative research, it is important to stress the different roles of induction and deduction when models are applied to cases. A variable-based approach implies to draw conclusions about cases by means of logical deduction; a case-based approach implies to draw conclusions about cases by means of inductive reasoning. In the following, we build on this distinction to differentiate between qualitative (bottom-up) and quantitative (top-down) strategies of generalization.

Generalization and the Problem of Replication

We will now extend the formal analysis of quantitative and qualitative approaches to the question of generalization and replicability of empirical findings. For this sake, we have to introduce some concepts of formal logic. Formal logic is concerned with the validity of arguments. It provides conditions to evaluate whether certain sentences (conclusions) can be derived from other sentences (premises). In this context, a theory is nothing but a set of sentences (also called axioms). Formal logic provides tools to derive new sentences that must be true, given the axioms are true (Smith, 2020 ). These derived sentences are called theorems or, in the context of empirical science, predictions or hypotheses . On the syntactic level, the rules of logic only state how to evaluate the truth of a sentence relative to its premises. Whether or not sentences are actually true, is formally specified by logical semantics.

On the semantic level, formal logic is intrinsically linked to set-theory. For example, a logical statement like “all dogs are mammals,” is true if and only if the set of dogs is a subset of the set of mammals. Similarly, the sentence “all chatting students doodle” is true if and only if the set of chatting students is a subset of the set of doodling students (compare Figure 3 ). Whereas, the first sentence is analytically true due to the way we define the words “dog” and “mammal,” the latter can be either true or false, depending on the relational structure we actually observe. We can thus interpret an empirical relational structure as the truth criterion of a scientific theory. From a logical point of view, this corresponds to the semantics of a theory. As shown above, variable-based and case-based models both give a formal representation of the same kinds of empirical structures. Accordingly, both types of models can be stated as formal theories. In the variable-based approach, this corresponds to a set of scientific laws that are quantified over the members of an abstract population (these are the axioms of the theory). In the case-based approach, this corresponds to a set of abstract existential statements about a specific class of individuals.

In contrast to mathematical axiom systems, empirical theories are usually not considered to be necessarily true. This means that even if we find no evidence against a theory, it is still possible that it is actually wrong. We may know that a theory is valid in some contexts, yet it may fail when applied to a new set of behaviors (e.g., if we use a different instrumentation to measure a variable) or a new population (e.g., if we draw a new sample).

From a logical perspective, the possibility that a theory may turn out to be false stems from the problem of contingency . A statement is contingent, if it is both, possibly true and possibly false. Formally, we introduce two modal operators: □ to designate logical necessity, and ◇ to designate logical possibility. Semantically, these operators are very similar to the existential quantifier, ∃, and the universal quantifier, ∀. Whereas ∃ and ∀ refer to the individual objects within one relational structure, the modal operators □ and ◇ range over so-called possible worlds : a statement is possibly true, if and only if it is true in at least one accessible possible world, and a statement is necessarily true if and only if it is true in every accessible possible world (Hughes and Cresswell, 1996 ). Logically, possible worlds are mathematical abstractions, each consisting of a relational structure. Taken together, the relational structures of all accessible possible worlds constitute the formal semantics of necessity, possibility and contingency. 7

In the context of an empirical theory, each possible world may be identified with an empirical relational structure like the above classroom example. Given the set of intended applications of a theory (the scope of the theory, one may say), we can now construct possible world semantics for an empirical theory: each intended application of the theory corresponds to a possible world. For example, a quantified sentence like “all chatting students doodle” may be true in one classroom and false in another one. In terms of possible worlds, this would correspond to a statement of contingency: “it is possible that all chatting students doodle in one classroom, and it is possible that they don't in another classroom.” Note that in the above expression, “all students” refers to the students in only one possible world, whereas “it is possible” refers to the fact that there is at least one possible world for each of the specified cases.

To apply these possible world semantics to quantitative research, let us reconsider how generalization to other cases works in variable-based models. Due to the syntactic structure of quantitative laws, we can deduce predictions for singular observations from an expression of the form ∀ i : y i = f ( x i ). Formally, the logical quantifier ∀ ranges only over the objects of the corresponding empirical relational structure (in our example this would refer to the students in the observed classroom). But what if we want to generalize beyond the empirical structure we actually observed? The standard procedure is to assume an infinitely large, abstract population from which a random sample is drawn. Given the truth of the theory, we can deduce predictions about what we may observe in the sample. Since usually we deal with probabilistic models, we can evaluate our theory by means of the conditional probability of the observations, given the theory holds. This concept of conditional probability is the foundation of statistical significance tests (Hogg et al., 2013 ), as well as Bayesian estimation (Watanabe, 2018 ). In terms of possible world semantics, the random sampling model implies that all possible worlds (i.e., all intended applications) can be conceived as empirical sub-structures from a greater population structure. For example, the empirical relational structure constituted by the observed behaviors in a classroom would be conceived as a sub-matrix of the population person × behavior matrix. It follows that, if a scientific law is true in the population, it will be true in all possible worlds, i.e., it will be necessarily true. Formally, this corresponds to an expression of the form

The statistical generalization model thus constitutes a top-down strategy for dealing with individual contexts that is analogous to the way variable-based models are applied to individual cases (compare Table 1 ). Consequently, if we apply a variable-based model to a new context and find out that it does not fit the data (i.e., there is a statistically significant deviation from the model predictions), we have reason to doubt the validity of the theory. This is what makes the problem of low replicability so important: we observe that the predictions are wrong in a new study; and because we apply a top-down strategy of generalization to contexts beyond the ones we observed, we see our whole theory at stake.

Qualitative research, on the contrary, follows a different strategy of generalization. Since case-based models are formulated by a set of context-specific existential sentences, there is no need for universal truth or necessity. In contrast to statistical generalization to other cases by means of random sampling from an abstract population, the usual strategy in case-based modeling is to employ a bottom-up strategy of generalization that is analogous to the way case-based models are applied to individual cases. Formally, this may be expressed by stating that the observed qualia exist in at least one possible world, i.e., the theory is possibly true:

This statement is analogous to the way we apply case-based models to individual cases (compare Table 1 ). Consequently, the set of intended applications of the theory does not follow from a sampling model, but from theoretical assumptions about which cases may be similar to the observed cases with respect to certain relevant characteristics. For example, if we observe that certain behaviors occur together in one classroom, following a bottom-up strategy of generalization, we will hypothesize why this might be the case. If we do not replicate this finding in another context, this does not question the model itself, since it was a context-specific theory all along. Instead, we will revise our hypothetical assumptions about why the new context is apparently less similar to the first one than we originally thought. Therefore, if an empirical finding does not replicate, we are more concerned about our understanding of the cases than about the validity of our theory.

Whereas statistical generalization provides us with a formal (and thus somehow more objective) apparatus to evaluate the universal validity of our theories, the bottom-up strategy forces us to think about the class of intended applications on theoretical grounds. This means that we have to ask: what are the boundary conditions of our theory? In the above classroom example, following a bottom-up strategy, we would build on our preliminary understanding of the cases in one context (e.g., a public school) to search for similar and contrasting cases in other contexts (e.g., a private school). We would then re-evaluate our theoretical description of the data and explore what makes cases similar or dissimilar with regard to our theory. This enables us to expand the class of intended applications alongside with the theory.

Of course, none of these strategies is superior per se . Nevertheless, they rely on different assumptions and may thus be more or less adequate in different contexts. The statistical strategy relies on the assumption of a universal population and invariant measurements. This means, we assume that (a) all samples are drawn from the same population and (b) all variables refer to the same behavioral classes. If these assumptions are true, statistical generalization is valid and therefore provides a valuable tool for the testing of empirical theories. The bottom-up strategy of generalization relies on the idea that contexts may be classified as being more or less similar based on characteristics that are not part of the model being evaluated. If such a similarity relation across contexts is feasible, the bottom-up strategy is valid, as well. Depending on the strategy of generalization, replication of empirical research serves two very different purposes. Following the (top-down) principle of generalization by deduction from scientific laws, replications are empirical tests of the theory itself, and failed replications question the theory on a fundamental level. Following the (bottom-up) principle of generalization by induction to similar contexts, replications are a means to explore the boundary conditions of a theory. Consequently, failed replications question the scope of the theory and help to shape the set of intended applications.

We have argued that quantitative and qualitative research are best understood by means of the structure of the employed models. Quantitative science mainly relies on variable-based models and usually employs a top-down strategy of generalization from an abstract population to individual cases. Qualitative science prefers case-based models and usually employs a bottom-up strategy of generalization. We further showed that failed replications have very different implications depending on the underlying strategy of generalization. Whereas in the top-down strategy, replications are used to test the universal validity of a model, in the bottom-up strategy, replications are used to explore the scope of a model. We will now address the implications of this analysis for psychological research with regard to the problem of replicability.

Modern day psychology almost exclusively follows a top-down strategy of generalization. Given the quantitative background of most psychological theories, this is hardly surprising. Following the general structure of variable-based models, the individual case is not the focus of the analysis. Instead, scientific laws are stated on the level of an abstract population. Therefore, when applying the theory to a new context, a statistical sampling model seems to be the natural consequence. However, this is not the only possible strategy. From a logical point of view, there is no reason to assume that a quantitative law like ∀ i : y i = f ( x i ) implies that the law is necessarily true, i.e.,: □(∀ i : y i = f ( x i )). Instead, one might just as well define the scope of the theory following an inductive strategy. 8 Formally, this would correspond to the assumption that the observed law is possibly true, i.e.,: ◇(∀ i : y i = f ( x i )). For example, we may discover a functional relation between “engagement” and “distraction” without referring to an abstract universal population of students. Instead, we may hypothesize under which conditions this functional relation may be valid and use these assumptions to inductively generalize to other cases.

If we take this seriously, this would require us to specify the intended applications of the theory: in which contexts do we expect the theory to hold? Or, equivalently, what are the boundary conditions of the theory? These boundary conditions may be specified either intensionally, i.e., by giving external criteria for contexts being similar enough to the ones already studied to expect a successful application of the theory. Or they may be specified extensionally, by enumerating the contexts where the theory has already been shown to be valid. These boundary conditions need not be restricted to the population we refer to, but include all kinds of contextual factors. Therefore, adopting a bottom-up strategy, we are forced to think about these factors and make them an integral part of our theories.

In fact, there is good reason to believe that bottom-up generalization may be more adequate in many psychological studies. Apart from the pitfalls associated with statistical generalization that have been extensively discussed in recent years (e.g., p-hacking, underpowered studies, publication bias), it is worth reflecting on whether the underlying assumptions are met in a particular context. For example, many samples used in experimental psychology are not randomly drawn from a large population, but are convenience samples. If we use statistical models with non-random samples, we have to assume that the observations vary as if drawn from a random sample. This may indeed be the case for randomized experiments, because all variation between the experimental conditions apart from the independent variable will be random due to the randomization procedure. In this case, a classical significance test may be regarded as an approximation to a randomization test (Edgington and Onghena, 2007 ). However, if we interpret a significance test as an approximate randomization test, we test not for generalization but for internal validity. Hence, even if we use statistical significance tests when assumptions about random sampling are violated, we still have to use a different strategy of generalization. This issue has been discussed in the context of small-N studies, where variable-based models are applied to very small samples, sometimes consisting of only one individual (Dugard et al., 2012 ). The bottom-up strategy of generalization that is employed by qualitative researchers, provides such an alternative.

Another important issue in this context is the question of measurement invariance. If we construct a variable-based model in one context, the variables refer to those behaviors that constitute the underlying empirical relational structure. For example, we may construct an abstract measure of “distraction” using the observed behaviors in a certain context. We will then use the term “distraction” as a theoretical term referring to the variable we have just constructed to represent the underlying empirical relational structure. Let us now imagine we apply this theory to a new context. Even if the individuals in our new context are part of the same population, we may still get into trouble if the observed behaviors differ from those used in the original study. How do we know whether these behaviors constitute the same variable? We have to ensure that in any new context, our measures are valid for the variables in our theory. Without a proper measurement model, this will be hard to achieve (Buntins et al., 2017 ). Again, we are faced with the necessity to think of the boundary conditions of our theories. In which contexts (i.e., for which sets of individuals and behaviors) do we expect our theory to work?

If we follow the rationale of inductive generalization, we can explore the boundary conditions of a theory with every new empirical study. We thus widen the scope of our theory by comparing successful applications in different contexts and unsuccessful applications in similar contexts. This may ultimately lead to a more general theory, maybe even one of universal scope. However, unless we have such a general theory, we might be better off, if we treat unsuccessful replications not as a sign of failure, but as a chance to learn.

Author Contributions

MB conceived the original idea and wrote the first draft of the paper. MS helped to further elaborate and scrutinize the arguments. All authors contributed to the final version of the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank Annette Scheunpflug for helpful comments on an earlier version of the manuscript.

1 A person × behavior matrix constitutes a very simple relational structure that is common in psychological research. This is why it is chosen here as a minimal example. However, more complex structures are possible, e.g., by relating individuals to behaviors over time, with individuals nested within groups etc. For a systematic overview, compare Coombs ( 1964 ).

2 This notion of empirical content applies only to deterministic models. The empirical content of a probabilistic model consists in the probability distribution over all possible empirical structures.

3 For example, neither the SAGE Handbook of qualitative data analysis edited by Flick ( 2014 ) nor the Oxford Handbook of Qualitative Research edited by Leavy ( 2014 ) mention formal approaches to category formation.

4 Note also that the described structure is empirically richer than a nominal scale. Therefore, a reduction of qualitative category formation to be a special (and somehow trivial) kind of measurement is not adequate.

5 It is possible to extend this notion of empirical content to the probabilistic case (this would correspond to applying a latent class analysis). But, since qualitative research usually does not rely on formal algorithms (neither deterministic nor probabilistic), there is currently little practical use of such a concept.

6 We do not elaborate on abductive reasoning here, since, given an empirical relational structure, the concept can be applied to both types of models in the same way (Schurz, 2008 ). One could argue that the underlying relational structure is not given a priori but has to be constructed by the researcher and will itself be influenced by theoretical expectations. Therefore, abductive reasoning may be necessary to establish an empirical relational structure in the first place.

7 We shall not elaborate on the metaphysical meaning of possible worlds here, since we are only concerned with empirical theories [but see Tooley ( 1999 ), for an overview].

8 Of course, this also means that it would be equally reasonable to employ a top-down strategy of generalization using a case-based model by postulating that □(∃ i : XYZ i ). The implications for case-based models are certainly worth exploring, but lie beyond the scope of this article.

  • Agresti A. (2013). Categorical Data Analysis, 3rd Edn. Wiley Series In Probability And Statistics . Hoboken, NJ: Wiley. [ Google Scholar ]
  • Borsboom D. (2005). Measuring the Mind: Conceptual Issues in Contemporary Psychometrics . Cambridge: Cambridge University Press; 10.1017/CBO9780511490026 [ CrossRef ] [ Google Scholar ]
  • Braun V., Clarke V. (2006). Using thematic analysis in psychology . Qual. Res. Psychol . 3 , 77–101. 10.1191/1478088706qp063oa [ CrossRef ] [ Google Scholar ]
  • Buntins M., Buntins K., Eggert F. (2017). Clarifying the concept of validity: from measurement to everyday language . Theory Psychol. 27 , 703–710. 10.1177/0959354317702256 [ CrossRef ] [ Google Scholar ]
  • Carnap R. (1928). The Logical Structure of the World . Berkeley, CA: University of California Press. [ Google Scholar ]
  • Coombs C. H. (1964). A Theory of Data . New York, NY: Wiley. [ Google Scholar ]
  • Creswell J. W. (2015). A Concise Introduction to Mixed Methods Research . Los Angeles, CA: Sage. [ Google Scholar ]
  • Dugard P., File P., Todman J. B. (2012). Single-Case and Small-N Experimental Designs: A Practical Guide to Randomization Tests 2nd Edn . New York, NY: Routledge; 10.4324/9780203180938 [ CrossRef ] [ Google Scholar ]
  • Edgington E., Onghena P. (2007). Randomization Tests, 4th Edn. Statistics. Hoboken, NJ: CRC Press; 10.1201/9781420011814 [ CrossRef ] [ Google Scholar ]
  • Everett J. A. C., Earp B. D. (2015). A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers . Front. Psychol . 6 :1152. 10.3389/fpsyg.2015.01152 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flick U. (Ed.). (2014). The Sage Handbook of Qualitative Data Analysis . London: Sage; 10.4135/9781446282243 [ CrossRef ] [ Google Scholar ]
  • Freeman M., Demarrais K., Preissle J., Roulston K., St. Pierre E. A. (2007). Standards of evidence in qualitative research: an incitement to discourse . Educ. Res. 36 , 25–32. 10.3102/0013189X06298009 [ CrossRef ] [ Google Scholar ]
  • Ganter B. (2010). Two basic algorithms in concept analysis , in Lecture Notes In Computer Science. Formal Concept Analysis, Vol. 5986 , eds Hutchison D., Kanade T., Kittler J., Kleinberg J. M., Mattern F., Mitchell J. C., et al. (Berlin, Heidelberg: Springer Berlin Heidelberg; ), 312–340. 10.1007/978-3-642-11928-6_22 [ CrossRef ] [ Google Scholar ]
  • Ganter B., Wille R. (1999). Formal Concept Analysis . Berlin, Heidelberg: Springer Berlin Heidelberg; 10.1007/978-3-642-59830-2 [ CrossRef ] [ Google Scholar ]
  • Guttman L. (1944). A basis for scaling qualitative data . Am. Sociol. Rev . 9 :139 10.2307/2086306 [ CrossRef ] [ Google Scholar ]
  • Hogg R. V., Mckean J. W., Craig A. T. (2013). Introduction to Mathematical Statistics, 7th Edn . Boston, MA: Pearson. [ Google Scholar ]
  • Hughes G. E., Cresswell M. J. (1996). A New Introduction To Modal Logic . London; New York, NY: Routledge; 10.4324/9780203290644 [ CrossRef ] [ Google Scholar ]
  • Klein R. A., Ratliff K. A., Vianello M., Adams R. B., Bahník Š., Bernstein M. J., et al. (2014). Investigating variation in replicability . Soc. Psychol. 45 , 142–152. 10.1027/1864-9335/a000178 [ CrossRef ] [ Google Scholar ]
  • Krantz D. H., Luce D., Suppes P., Tversky A. (1971). Foundations of Measurement Volume I: Additive And Polynomial Representations . New York, NY; London: Academic Press; 10.1016/B978-0-12-425401-5.50011-8 [ CrossRef ] [ Google Scholar ]
  • Leavy P. (2014). The Oxford Handbook of Qualitative Research . New York, NY: Oxford University Press; 10.1093/oxfordhb/9780199811755.001.0001 [ CrossRef ] [ Google Scholar ]
  • Maxwell S. E., Lau M. Y., Howard G. S. (2015). Is psychology suffering from a replication crisis? what does “failure to replicate” really mean? Am. Psychol. 70 , 487–498. 10.1037/a0039400 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Miles M. B., Huberman A. M., Saldaña J. (2014). Qualitative Data Analysis: A Methods Sourcebook, 3rd Edn . Los Angeles, CA; London; New Delhi; Singapore; Washington, DC: Sage. [ Google Scholar ]
  • Open Science, Collaboration (2015). Estimating the reproducibility of psychological science . Science 349 :Aac4716. 10.1126/science.aac4716 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Popper K. (1935). Logik Der Forschung . Wien: Springer; 10.1007/978-3-7091-4177-9 [ CrossRef ] [ Google Scholar ]
  • Ragin C. (1987). The Comparative Method : Moving Beyond Qualitative and Quantitative Strategies . Berkeley, CA: University Of California Press. [ Google Scholar ]
  • Rihoux B., Ragin C. (2009). Configurational Comparative Methods: Qualitative Comparative Analysis (Qca) And Related Techniques . Thousand Oaks, CA: Sage Publications, Inc; 10.4135/9781452226569 [ CrossRef ] [ Google Scholar ]
  • Scheunpflug A., Krogull S., Franz J. (2016). Understanding learning in world society: qualitative reconstructive research in global learning and learning for sustainability . Int. Journal Dev. Educ. Glob. Learn. 7 , 6–23. 10.18546/IJDEGL.07.3.02 [ CrossRef ] [ Google Scholar ]
  • Schurz G. (2008). Patterns of abduction . Synthese 164 , 201–234. 10.1007/s11229-007-9223-4 [ CrossRef ] [ Google Scholar ]
  • Shrout P. E., Rodgers J. L. (2018). Psychology, science, and knowledge construction: broadening perspectives from the replication crisis . Annu. Rev. Psychol . 69 , 487–510. 10.1146/annurev-psych-122216-011845 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith P. (2020). An Introduction To Formal Logic . Cambridge: Cambridge University Press. 10.1017/9781108328999 [ CrossRef ] [ Google Scholar ]
  • Suppes P., Krantz D. H., Luce D., Tversky A. (1971). Foundations of Measurement Volume II: Geometrical, Threshold, and Probabilistic Representations . New York, NY; London: Academic Press. [ Google Scholar ]
  • Tooley M. (Ed.). (1999). Necessity and Possibility. The Metaphysics of Modality . New York, NY; London: Garland Publishing. [ Google Scholar ]
  • Trafimow D. (2018). An a priori solution to the replication crisis . Philos. Psychol . 31 , 1188–1214. 10.1080/09515089.2018.1490707 [ CrossRef ] [ Google Scholar ]
  • Watanabe S. (2018). Mathematical Foundations of Bayesian Statistics. CRC Monographs On Statistics And Applied Probability . Boca Raton, FL: Chapman And Hall. [ Google Scholar ]
  • Wiggins B. J., Chrisopherson C. D. (2019). The replication crisis in psychology: an overview for theoretical and philosophical psychology . J. Theor. Philos. Psychol. 39 , 202–217. 10.1037/teo0000137 [ CrossRef ] [ Google Scholar ]

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Recent quantitative research on determinants of health in high income countries: A scoping review

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Centre for Health Economics Research and Modelling Infectious Diseases, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium

ORCID logo

Roles Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing

  • Vladimira Varbanova, 
  • Philippe Beutels

PLOS

  • Published: September 17, 2020
  • https://doi.org/10.1371/journal.pone.0239031
  • Peer Review
  • Reader Comments

Fig 1

Identifying determinants of health and understanding their role in health production constitutes an important research theme. We aimed to document the state of recent multi-country research on this theme in the literature.

We followed the PRISMA-ScR guidelines to systematically identify, triage and review literature (January 2013—July 2019). We searched for studies that performed cross-national statistical analyses aiming to evaluate the impact of one or more aggregate level determinants on one or more general population health outcomes in high-income countries. To assess in which combinations and to what extent individual (or thematically linked) determinants had been studied together, we performed multidimensional scaling and cluster analysis.

Sixty studies were selected, out of an original yield of 3686. Life-expectancy and overall mortality were the most widely used population health indicators, while determinants came from the areas of healthcare, culture, politics, socio-economics, environment, labor, fertility, demographics, life-style, and psychology. The family of regression models was the predominant statistical approach. Results from our multidimensional scaling showed that a relatively tight core of determinants have received much attention, as main covariates of interest or controls, whereas the majority of other determinants were studied in very limited contexts. We consider findings from these studies regarding the importance of any given health determinant inconclusive at present. Across a multitude of model specifications, different country samples, and varying time periods, effects fluctuated between statistically significant and not significant, and between beneficial and detrimental to health.

Conclusions

We conclude that efforts to understand the underlying mechanisms of population health are far from settled, and the present state of research on the topic leaves much to be desired. It is essential that future research considers multiple factors simultaneously and takes advantage of more sophisticated methodology with regards to quantifying health as well as analyzing determinants’ influence.

Citation: Varbanova V, Beutels P (2020) Recent quantitative research on determinants of health in high income countries: A scoping review. PLoS ONE 15(9): e0239031. https://doi.org/10.1371/journal.pone.0239031

Editor: Amir Radfar, University of Central Florida, UNITED STATES

Received: November 14, 2019; Accepted: August 28, 2020; Published: September 17, 2020

Copyright: © 2020 Varbanova, Beutels. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: This study (and VV) is funded by the Research Foundation Flanders ( https://www.fwo.be/en/ ), FWO project number G0D5917N, award obtained by PB. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Identifying the key drivers of population health is a core subject in public health and health economics research. Between-country comparative research on the topic is challenging. In order to be relevant for policy, it requires disentangling different interrelated drivers of “good health”, each having different degrees of importance in different contexts.

“Good health”–physical and psychological, subjective and objective–can be defined and measured using a variety of approaches, depending on which aspect of health is the focus. A major distinction can be made between health measurements at the individual level or some aggregate level, such as a neighborhood, a region or a country. In view of this, a great diversity of specific research topics exists on the drivers of what constitutes individual or aggregate “good health”, including those focusing on health inequalities, the gender gap in longevity, and regional mortality and longevity differences.

The current scoping review focuses on determinants of population health. Stated as such, this topic is quite broad. Indeed, we are interested in the very general question of what methods have been used to make the most of increasingly available region or country-specific databases to understand the drivers of population health through inter-country comparisons. Existing reviews indicate that researchers thus far tend to adopt a narrower focus. Usually, attention is given to only one health outcome at a time, with further geographical and/or population [ 1 , 2 ] restrictions. In some cases, the impact of one or more interventions is at the core of the review [ 3 – 7 ], while in others it is the relationship between health and just one particular predictor, e.g., income inequality, access to healthcare, government mechanisms [ 8 – 13 ]. Some relatively recent reviews on the subject of social determinants of health [ 4 – 6 , 14 – 17 ] have considered a number of indicators potentially influencing health as opposed to a single one. One review defines “social determinants” as “the social, economic, and political conditions that influence the health of individuals and populations” [ 17 ] while another refers even more broadly to “the factors apart from medical care” [ 15 ].

In the present work, we aimed to be more inclusive, setting no limitations on the nature of possible health correlates, as well as making use of a multitude of commonly accepted measures of general population health. The goal of this scoping review was to document the state of the art in the recent published literature on determinants of population health, with a particular focus on the types of determinants selected and the methodology used. In doing so, we also report the main characteristics of the results these studies found. The materials collected in this review are intended to inform our (and potentially other researchers’) future analyses on this topic. Since the production of health is subject to the law of diminishing marginal returns, we focused our review on those studies that included countries where a high standard of wealth has been achieved for some time, i.e., high-income countries belonging to the Organisation for Economic Co-operation and Development (OECD) or Europe. Adding similar reviews for other country income groups is of limited interest to the research we plan to do in this area.

In view of its focus on data and methods, rather than results, a formal protocol was not registered prior to undertaking this review, but the procedure followed the guidelines of the PRISMA statement for scoping reviews [ 18 ].

We focused on multi-country studies investigating the potential associations between any aggregate level (region/city/country) determinant and general measures of population health (e.g., life expectancy, mortality rate).

Within the query itself, we listed well-established population health indicators as well as the six world regions, as defined by the World Health Organization (WHO). We searched only in the publications’ titles in order to keep the number of hits manageable, and the ratio of broadly relevant abstracts over all abstracts in the order of magnitude of 10% (based on a series of time-focused trial runs). The search strategy was developed iteratively between the two authors and is presented in S1 Appendix . The search was performed by VV in PubMed and Web of Science on the 16 th of July, 2019, without any language restrictions, and with a start date set to the 1 st of January, 2013, as we were interested in the latest developments in this area of research.

Eligibility criteria

Records obtained via the search methods described above were screened independently by the two authors. Consistency between inclusion/exclusion decisions was approximately 90% and the 43 instances where uncertainty existed were judged through discussion. Articles were included subject to meeting the following requirements: (a) the paper was a full published report of an original empirical study investigating the impact of at least one aggregate level (city/region/country) factor on at least one health indicator (or self-reported health) of the general population (the only admissible “sub-populations” were those based on gender and/or age); (b) the study employed statistical techniques (calculating correlations, at the very least) and was not purely descriptive or theoretical in nature; (c) the analysis involved at least two countries or at least two regions or cities (or another aggregate level) in at least two different countries; (d) the health outcome was not differentiated according to some socio-economic factor and thus studied in terms of inequality (with the exception of gender and age differentiations); (e) mortality, in case it was one of the health indicators under investigation, was strictly “total” or “all-cause” (no cause-specific or determinant-attributable mortality).

Data extraction

The following pieces of information were extracted in an Excel table from the full text of each eligible study (primarily by VV, consulting with PB in case of doubt): health outcome(s), determinants, statistical methodology, level of analysis, results, type of data, data sources, time period, countries. The evidence is synthesized according to these extracted data (often directly reflected in the section headings), using a narrative form accompanied by a “summary-of-findings” table and a graph.

Search and selection

The initial yield contained 4583 records, reduced to 3686 after removal of duplicates ( Fig 1 ). Based on title and abstract screening, 3271 records were excluded because they focused on specific medical condition(s) or specific populations (based on morbidity or some other factor), dealt with intervention effectiveness, with theoretical or non-health related issues, or with animals or plants. Of the remaining 415 papers, roughly half were disqualified upon full-text consideration, mostly due to using an outcome not of interest to us (e.g., health inequality), measuring and analyzing determinants and outcomes exclusively at the individual level, performing analyses one country at a time, employing indices that are a mixture of both health indicators and health determinants, or not utilizing potential health determinants at all. After this second stage of the screening process, 202 papers were deemed eligible for inclusion. This group was further dichotomized according to level of economic development of the countries or regions under study, using membership of the OECD or Europe as a reference “cut-off” point. Sixty papers were judged to include high-income countries, and the remaining 142 included either low- or middle-income countries or a mix of both these levels of development. The rest of this report outlines findings in relation to high-income countries only, reflecting our own primary research interests. Nonetheless, we chose to report our search yield for the other income groups for two reasons. First, to gauge the relative interest in applied published research for these different income levels; and second, to enable other researchers with a focus on determinants of health in other countries to use the extraction we made here.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0239031.g001

Health outcomes

The most frequent population health indicator, life expectancy (LE), was present in 24 of the 60 studies. Apart from “life expectancy at birth” (representing the average life-span a newborn is expected to have if current mortality rates remain constant), also called “period LE” by some [ 19 , 20 ], we encountered as well LE at 40 years of age [ 21 ], at 60 [ 22 ], and at 65 [ 21 , 23 , 24 ]. In two papers, the age-specificity of life expectancy (be it at birth or another age) was not stated [ 25 , 26 ].

Some studies considered male and female LE separately [ 21 , 24 , 25 , 27 – 33 ]. This consideration was also often observed with the second most commonly used health index [ 28 – 30 , 34 – 38 ]–termed “total”, or “overall”, or “all-cause”, mortality rate (MR)–included in 22 of the 60 studies. In addition to gender, this index was also sometimes broken down according to age group [ 30 , 39 , 40 ], as well as gender-age group [ 38 ].

While the majority of studies under review here focused on a single health indicator, 23 out of the 60 studies made use of multiple outcomes, although these outcomes were always considered one at a time, and sometimes not all of them fell within the scope of our review. An easily discernable group of indices that typically went together [ 25 , 37 , 41 ] was that of neonatal (deaths occurring within 28 days postpartum), perinatal (fetal or early neonatal / first-7-days deaths), and post-neonatal (deaths between the 29 th day and completion of one year of life) mortality. More often than not, these indices were also accompanied by “stand-alone” indicators, such as infant mortality (deaths within the first year of life; our third most common index found in 16 of the 60 studies), maternal mortality (deaths during pregnancy or within 42 days of termination of pregnancy), and child mortality rates. Child mortality has conventionally been defined as mortality within the first 5 years of life, thus often also called “under-5 mortality”. Nonetheless, Pritchard & Wallace used the term “child mortality” to denote deaths of children younger than 14 years [ 42 ].

As previously stated, inclusion criteria did allow for self-reported health status to be used as a general measure of population health. Within our final selection of studies, seven utilized some form of subjective health as an outcome variable [ 25 , 43 – 48 ]. Additionally, the Health Human Development Index [ 49 ], healthy life expectancy [ 50 ], old-age survival [ 51 ], potential years of life lost [ 52 ], and disability-adjusted life expectancy [ 25 ] were also used.

We note that while in most cases the indicators mentioned above (and/or the covariates considered, see below) were taken in their absolute or logarithmic form, as a—typically annual—number, sometimes they were used in the form of differences, change rates, averages over a given time period, or even z-scores of rankings [ 19 , 22 , 40 , 42 , 44 , 53 – 57 ].

Regions, countries, and populations

Despite our decision to confine this review to high-income countries, some variation in the countries and regions studied was still present. Selection seemed to be most often conditioned on the European Union, or the European continent more generally, and the Organisation of Economic Co-operation and Development (OECD), though, typically, not all member nations–based on the instances where these were also explicitly listed—were included in a given study. Some of the stated reasons for omitting certain nations included data unavailability [ 30 , 45 , 54 ] or inconsistency [ 20 , 58 ], Gross Domestic Product (GDP) too low [ 40 ], differences in economic development and political stability with the rest of the sampled countries [ 59 ], and national population too small [ 24 , 40 ]. On the other hand, the rationales for selecting a group of countries included having similar above-average infant mortality [ 60 ], similar healthcare systems [ 23 ], and being randomly drawn from a social spending category [ 61 ]. Some researchers were interested explicitly in a specific geographical region, such as Eastern Europe [ 50 ], Central and Eastern Europe [ 48 , 60 ], the Visegrad (V4) group [ 62 ], or the Asia/Pacific area [ 32 ]. In certain instances, national regions or cities, rather than countries, constituted the units of investigation instead [ 31 , 51 , 56 , 62 – 66 ]. In two particular cases, a mix of countries and cities was used [ 35 , 57 ]. In another two [ 28 , 29 ], due to the long time periods under study, some of the included countries no longer exist. Finally, besides “European” and “OECD”, the terms “developed”, “Western”, and “industrialized” were also used to describe the group of selected nations [ 30 , 42 , 52 , 53 , 67 ].

As stated above, it was the health status of the general population that we were interested in, and during screening we made a concerted effort to exclude research using data based on a more narrowly defined group of individuals. All studies included in this review adhere to this general rule, albeit with two caveats. First, as cities (even neighborhoods) were the unit of analysis in three of the studies that made the selection [ 56 , 64 , 65 ], the populations under investigation there can be more accurately described as general urban , instead of just general. Second, oftentimes health indicators were stratified based on gender and/or age, therefore we also admitted one study that, due to its specific research question, focused on men and women of early retirement age [ 35 ] and another that considered adult males only [ 68 ].

Data types and sources

A great diversity of sources was utilized for data collection purposes. The accessible reference databases of the OECD ( https://www.oecd.org/ ), WHO ( https://www.who.int/ ), World Bank ( https://www.worldbank.org/ ), United Nations ( https://www.un.org/en/ ), and Eurostat ( https://ec.europa.eu/eurostat ) were among the top choices. The other international databases included Human Mortality [ 30 , 39 , 50 ], Transparency International [ 40 , 48 , 50 ], Quality of Government [ 28 , 69 ], World Income Inequality [ 30 ], International Labor Organization [ 41 ], International Monetary Fund [ 70 ]. A number of national databases were referred to as well, for example the US Bureau of Statistics [ 42 , 53 ], Korean Statistical Information Services [ 67 ], Statistics Canada [ 67 ], Australian Bureau of Statistics [ 67 ], and Health New Zealand Tobacco control and Health New Zealand Food and Nutrition [ 19 ]. Well-known surveys, such as the World Values Survey [ 25 , 55 ], the European Social Survey [ 25 , 39 , 44 ], the Eurobarometer [ 46 , 56 ], the European Value Survey [ 25 ], and the European Statistics of Income and Living Condition Survey [ 43 , 47 , 70 ] were used as data sources, too. Finally, in some cases [ 25 , 28 , 29 , 35 , 36 , 41 , 69 ], built-for-purpose datasets from previous studies were re-used.

In most of the studies, the level of the data (and analysis) was national. The exceptions were six papers that dealt with Nomenclature of Territorial Units of Statistics (NUTS2) regions [ 31 , 62 , 63 , 66 ], otherwise defined areas [ 51 ] or cities [ 56 ], and seven others that were multilevel designs and utilized both country- and region-level data [ 57 ], individual- and city- or country-level [ 35 ], individual- and country-level [ 44 , 45 , 48 ], individual- and neighborhood-level [ 64 ], and city-region- (NUTS3) and country-level data [ 65 ]. Parallel to that, the data type was predominantly longitudinal, with only a few studies using purely cross-sectional data [ 25 , 33 , 43 , 45 – 48 , 50 , 62 , 67 , 68 , 71 , 72 ], albeit in four of those [ 43 , 48 , 68 , 72 ] two separate points in time were taken (thus resulting in a kind of “double cross-section”), while in another the averages across survey waves were used [ 56 ].

In studies using longitudinal data, the length of the covered time periods varied greatly. Although this was almost always less than 40 years, in one study it covered the entire 20 th century [ 29 ]. Longitudinal data, typically in the form of annual records, was sometimes transformed before usage. For example, some researchers considered data points at 5- [ 34 , 36 , 49 ] or 10-year [ 27 , 29 , 35 ] intervals instead of the traditional 1, or took averages over 3-year periods [ 42 , 53 , 73 ]. In one study concerned with the effect of the Great Recession all data were in a “recession minus expansion change in trends”-form [ 57 ]. Furthermore, there were a few instances where two different time periods were compared to each other [ 42 , 53 ] or when data was divided into 2 to 4 (possibly overlapping) periods which were then analyzed separately [ 24 , 26 , 28 , 29 , 31 , 65 ]. Lastly, owing to data availability issues, discrepancies between the time points or periods of data on the different variables were occasionally observed [ 22 , 35 , 42 , 53 – 55 , 63 ].

Health determinants

Together with other essential details, Table 1 lists the health correlates considered in the selected studies. Several general categories for these correlates can be discerned, including health care, political stability, socio-economics, demographics, psychology, environment, fertility, life-style, culture, labor. All of these, directly or implicitly, have been recognized as holding importance for population health by existing theoretical models of (social) determinants of health [ 74 – 77 ].

thumbnail

https://doi.org/10.1371/journal.pone.0239031.t001

It is worth noting that in a few studies there was just a single aggregate-level covariate investigated in relation to a health outcome of interest to us. In one instance, this was life satisfaction [ 44 ], in another–welfare system typology [ 45 ], but also gender inequality [ 33 ], austerity level [ 70 , 78 ], and deprivation [ 51 ]. Most often though, attention went exclusively to GDP [ 27 , 29 , 46 , 57 , 65 , 71 ]. It was often the case that research had a more particular focus. Among others, minimum wages [ 79 ], hospital payment schemes [ 23 ], cigarette prices [ 63 ], social expenditure [ 20 ], residents’ dissatisfaction [ 56 ], income inequality [ 30 , 69 ], and work leave [ 41 , 58 ] took center stage. Whenever variables outside of these specific areas were also included, they were usually identified as confounders or controls, moderators or mediators.

We visualized the combinations in which the different determinants have been studied in Fig 2 , which was obtained via multidimensional scaling and a subsequent cluster analysis (details outlined in S2 Appendix ). It depicts the spatial positioning of each determinant relative to all others, based on the number of times the effects of each pair of determinants have been studied simultaneously. When interpreting Fig 2 , one should keep in mind that determinants marked with an asterisk represent, in fact, collectives of variables.

thumbnail

Groups of determinants are marked by asterisks (see S1 Table in S1 Appendix ). Diminishing color intensity reflects a decrease in the total number of “connections” for a given determinant. Noteworthy pairwise “connections” are emphasized via lines (solid-dashed-dotted indicates decreasing frequency). Grey contour lines encircle groups of variables that were identified via cluster analysis. Abbreviations: age = population age distribution, associations = membership in associations, AT-index = atherogenic-thrombogenic index, BR = birth rate, CAPB = Cyclically Adjusted Primary Balance, civilian-labor = civilian labor force, C-section = Cesarean delivery rate, credit-info = depth of credit information, dissatisf = residents’ dissatisfaction, distrib.orient = distributional orientation, EDU = education, eHealth = eHealth index at GP-level, exch.rate = exchange rate, fat = fat consumption, GDP = gross domestic product, GFCF = Gross Fixed Capital Formation/Creation, GH-gas = greenhouse gas, GII = gender inequality index, gov = governance index, gov.revenue = government revenues, HC-coverage = healthcare coverage, HE = health(care) expenditure, HHconsump = household consumption, hosp.beds = hospital beds, hosp.payment = hospital payment scheme, hosp.stay = length of hospital stay, IDI = ICT development index, inc.ineq = income inequality, industry-labor = industrial labor force, infant-sex = infant sex ratio, labor-product = labor production, LBW = low birth weight, leave = work leave, life-satisf = life satisfaction, M-age = maternal age, marginal-tax = marginal tax rate, MDs = physicians, mult.preg = multiple pregnancy, NHS = Nation Health System, NO = nitrous oxide emissions, PM10 = particulate matter (PM10) emissions, pop = population size, pop.density = population density, pre-term = pre-term birth rate, prison = prison population, researchE = research&development expenditure, school.ref = compulsory schooling reform, smoke-free = smoke-free places, SO = sulfur oxide emissions, soc.E = social expenditure, soc.workers = social workers, sugar = sugar consumption, terror = terrorism, union = union density, UR = unemployment rate, urban = urbanization, veg-fr = vegetable-and-fruit consumption, welfare = welfare regime, Wwater = wastewater treatment.

https://doi.org/10.1371/journal.pone.0239031.g002

Distances between determinants in Fig 2 are indicative of determinants’ “connectedness” with each other. While the statistical procedure called for higher dimensionality of the model, for demonstration purposes we show here a two-dimensional solution. This simplification unfortunately comes with a caveat. To use the factor smoking as an example, it would appear it stands at a much greater distance from GDP than it does from alcohol. In reality however, smoking was considered together with alcohol consumption [ 21 , 25 , 26 , 52 , 68 ] in just as many studies as it was with GDP [ 21 , 25 , 26 , 52 , 59 ], five. To aid with respect to this apparent shortcoming, we have emphasized the strongest pairwise links. Solid lines connect GDP with health expenditure (HE), unemployment rate (UR), and education (EDU), indicating that the effect of GDP on health, taking into account the effects of the other three determinants as well, was evaluated in between 12 to 16 studies of the 60 included in this review. Tracing the dashed lines, we can also tell that GDP appeared jointly with income inequality, and HE together with either EDU or UR, in anywhere between 8 to 10 of our selected studies. Finally, some weaker but still worth-mentioning “connections” between variables are displayed as well via the dotted lines.

The fact that all notable pairwise “connections” are concentrated within a relatively small region of the plot may be interpreted as low overall “connectedness” among the health indicators studied. GDP is the most widely investigated determinant in relation to general population health. Its total number of “connections” is disproportionately high (159) compared to its runner-up–HE (with 113 “connections”), and then subsequently EDU (with 90) and UR (with 86). In fact, all of these determinants could be thought of as outliers, given that none of the remaining factors have a total count of pairings above 52. This decrease in individual determinants’ overall “connectedness” can be tracked on the graph via the change of color intensity as we move outwards from the symbolic center of GDP and its closest “co-determinants”, to finally reach the other extreme of the ten indicators (welfare regime, household consumption, compulsory school reform, life satisfaction, government revenues, literacy, research expenditure, multiple pregnancy, Cyclically Adjusted Primary Balance, and residents’ dissatisfaction; in white) the effects on health of which were only studied in isolation.

Lastly, we point to the few small but stable clusters of covariates encircled by the grey bubbles on Fig 2 . These groups of determinants were identified as “close” by both statistical procedures used for the production of the graph (see details in S2 Appendix ).

Statistical methodology

There was great variation in the level of statistical detail reported. Some authors provided too vague a description of their analytical approach, necessitating some inference in this section.

The issue of missing data is a challenging reality in this field of research, but few of the studies under review (12/60) explain how they dealt with it. Among the ones that do, three general approaches to handling missingness can be identified, listed in increasing level of sophistication: case-wise deletion, i.e., removal of countries from the sample [ 20 , 45 , 48 , 58 , 59 ], (linear) interpolation [ 28 , 30 , 34 , 58 , 59 , 63 ], and multiple imputation [ 26 , 41 , 52 ].

Correlations, Pearson, Spearman, or unspecified, were the only technique applied with respect to the health outcomes of interest in eight analyses [ 33 , 42 – 44 , 46 , 53 , 57 , 61 ]. Among the more advanced statistical methods, the family of regression models proved to be, by and large, predominant. Before examining this closer, we note the techniques that were, in a way, “unique” within this selection of studies: meta-analyses were performed (random and fixed effects, respectively) on the reduced form and 2-sample two stage least squares (2SLS) estimations done within countries [ 39 ]; difference-in-difference (DiD) analysis was applied in one case [ 23 ]; dynamic time-series methods, among which co-integration, impulse-response function (IRF), and panel vector autoregressive (VAR) modeling, were utilized in one study [ 80 ]; longitudinal generalized estimating equation (GEE) models were developed on two occasions [ 70 , 78 ]; hierarchical Bayesian spatial models [ 51 ] and special autoregressive regression [ 62 ] were also implemented.

Purely cross-sectional data analyses were performed in eight studies [ 25 , 45 , 47 , 50 , 55 , 56 , 67 , 71 ]. These consisted of linear regression (assumed ordinary least squares (OLS)), generalized least squares (GLS) regression, and multilevel analyses. However, six other studies that used longitudinal data in fact had a cross-sectional design, through which they applied regression at multiple time-points separately [ 27 , 29 , 36 , 48 , 68 , 72 ].

Apart from these “multi-point cross-sectional studies”, some other simplistic approaches to longitudinal data analysis were found, involving calculating and regressing 3-year averages of both the response and the predictor variables [ 54 ], taking the average of a few data-points (i.e., survey waves) [ 56 ] or using difference scores over 10-year [ 19 , 29 ] or unspecified time intervals [ 40 , 55 ].

Moving further in the direction of more sensible longitudinal data usage, we turn to the methods widely known among (health) economists as “panel data analysis” or “panel regression”. Most often seen were models with fixed effects for country/region and sometimes also time-point (occasionally including a country-specific trend as well), with robust standard errors for the parameter estimates to take into account correlations among clustered observations [ 20 , 21 , 24 , 28 , 30 , 32 , 34 , 37 , 38 , 41 , 52 , 59 , 60 , 63 , 66 , 69 , 73 , 79 , 81 , 82 ]. The Hausman test [ 83 ] was sometimes mentioned as the tool used to decide between fixed and random effects [ 26 , 49 , 63 , 66 , 73 , 82 ]. A few studies considered the latter more appropriate for their particular analyses, with some further specifying that (feasible) GLS estimation was employed [ 26 , 34 , 49 , 58 , 60 , 73 ]. Apart from these two types of models, the first differences method was encountered once as well [ 31 ]. Across all, the error terms were sometimes assumed to come from a first-order autoregressive process (AR(1)), i.e., they were allowed to be serially correlated [ 20 , 30 , 38 , 58 – 60 , 73 ], and lags of (typically) predictor variables were included in the model specification, too [ 20 , 21 , 37 , 38 , 48 , 69 , 81 ]. Lastly, a somewhat different approach to longitudinal data analysis was undertaken in four studies [ 22 , 35 , 48 , 65 ] in which multilevel–linear or Poisson–models were developed.

Regardless of the exact techniques used, most studies included in this review presented multiple model applications within their main analysis. None attempted to formally compare models in order to identify the “best”, even if goodness-of-fit statistics were occasionally reported. As indicated above, many studies investigated women’s and men’s health separately [ 19 , 21 , 22 , 27 – 29 , 31 , 33 , 35 , 36 , 38 , 39 , 45 , 50 , 51 , 64 , 65 , 69 , 82 ], and covariates were often tested one at a time, including other covariates only incrementally [ 20 , 25 , 28 , 36 , 40 , 50 , 55 , 67 , 73 ]. Furthermore, there were a few instances where analyses within countries were performed as well [ 32 , 39 , 51 ] or where the full time period of interest was divided into a few sub-periods [ 24 , 26 , 28 , 31 ]. There were also cases where different statistical techniques were applied in parallel [ 29 , 55 , 60 , 66 , 69 , 73 , 82 ], sometimes as a form of sensitivity analysis [ 24 , 26 , 30 , 58 , 73 ]. However, the most common approach to sensitivity analysis was to re-run models with somewhat different samples [ 39 , 50 , 59 , 67 , 69 , 80 , 82 ]. Other strategies included different categorization of variables or adding (more/other) controls [ 21 , 23 , 25 , 28 , 37 , 50 , 63 , 69 ], using an alternative main covariate measure [ 59 , 82 ], including lags for predictors or outcomes [ 28 , 30 , 58 , 63 , 65 , 79 ], using weights [ 24 , 67 ] or alternative data sources [ 37 , 69 ], or using non-imputed data [ 41 ].

As the methods and not the findings are the main focus of the current review, and because generic checklists cannot discern the underlying quality in this application field (see also below), we opted to pool all reported findings together, regardless of individual study characteristics or particular outcome(s) used, and speak generally of positive and negative effects on health. For this summary we have adopted the 0.05-significance level and only considered results from multivariate analyses. Strictly birth-related factors are omitted since these potentially only relate to the group of infant mortality indicators and not to any of the other general population health measures.

Starting with the determinants most often studied, higher GDP levels [ 21 , 26 , 27 , 29 , 30 , 32 , 43 , 48 , 52 , 58 , 60 , 66 , 67 , 73 , 79 , 81 , 82 ], higher health [ 21 , 37 , 47 , 49 , 52 , 58 , 59 , 68 , 72 , 82 ] and social [ 20 , 21 , 26 , 38 , 79 ] expenditures, higher education [ 26 , 39 , 52 , 62 , 72 , 73 ], lower unemployment [ 60 , 61 , 66 ], and lower income inequality [ 30 , 42 , 53 , 55 , 73 ] were found to be significantly associated with better population health on a number of occasions. In addition to that, there was also some evidence that democracy [ 36 ] and freedom [ 50 ], higher work compensation [ 43 , 79 ], distributional orientation [ 54 ], cigarette prices [ 63 ], gross national income [ 22 , 72 ], labor productivity [ 26 ], exchange rates [ 32 ], marginal tax rates [ 79 ], vaccination rates [ 52 ], total fertility [ 59 , 66 ], fruit and vegetable [ 68 ], fat [ 52 ] and sugar consumption [ 52 ], as well as bigger depth of credit information [ 22 ] and percentage of civilian labor force [ 79 ], longer work leaves [ 41 , 58 ], more physicians [ 37 , 52 , 72 ], nurses [ 72 ], and hospital beds [ 79 , 82 ], and also membership in associations, perceived corruption and societal trust [ 48 ] were beneficial to health. Higher nitrous oxide (NO) levels [ 52 ], longer average hospital stay [ 48 ], deprivation [ 51 ], dissatisfaction with healthcare and the social environment [ 56 ], corruption [ 40 , 50 ], smoking [ 19 , 26 , 52 , 68 ], alcohol consumption [ 26 , 52 , 68 ] and illegal drug use [ 68 ], poverty [ 64 ], higher percentage of industrial workers [ 26 ], Gross Fixed Capital creation [ 66 ] and older population [ 38 , 66 , 79 ], gender inequality [ 22 ], and fertility [ 26 , 66 ] were detrimental.

It is important to point out that the above-mentioned effects could not be considered stable either across or within studies. Very often, statistical significance of a given covariate fluctuated between the different model specifications tried out within the same study [ 20 , 49 , 59 , 66 , 68 , 69 , 73 , 80 , 82 ], testifying to the importance of control variables and multivariate research (i.e., analyzing multiple independent variables simultaneously) in general. Furthermore, conflicting results were observed even with regards to the “core” determinants given special attention, so to speak, throughout this text. Thus, some studies reported negative effects of health expenditure [ 32 , 82 ], social expenditure [ 58 ], GDP [ 49 , 66 ], and education [ 82 ], and positive effects of income inequality [ 82 ] and unemployment [ 24 , 31 , 32 , 52 , 66 , 68 ]. Interestingly, one study [ 34 ] differentiated between temporary and long-term effects of GDP and unemployment, alluding to possibly much greater complexity of the association with health. It is also worth noting that some gender differences were found, with determinants being more influential for males than for females, or only having statistically significant effects for male health [ 19 , 21 , 28 , 34 , 36 , 37 , 39 , 64 , 65 , 69 ].

The purpose of this scoping review was to examine recent quantitative work on the topic of multi-country analyses of determinants of population health in high-income countries.

Measuring population health via relatively simple mortality-based indicators still seems to be the state of the art. What is more, these indicators are routinely considered one at a time, instead of, for example, employing existing statistical procedures to devise a more general, composite, index of population health, or using some of the established indices, such as disability-adjusted life expectancy (DALE) or quality-adjusted life expectancy (QALE). Although strong arguments for their wider use were already voiced decades ago [ 84 ], such summary measures surface only rarely in this research field.

On a related note, the greater data availability and accessibility that we enjoy today does not automatically equate to data quality. Nonetheless, this is routinely assumed in aggregate level studies. We almost never encountered a discussion on the topic. The non-mundane issue of data missingness, too, goes largely underappreciated. With all recent methodological advancements in this area [ 85 – 88 ], there is no excuse for ignorance; and still, too few of the reviewed studies tackled the matter in any adequate fashion.

Much optimism can be gained considering the abundance of different determinants that have attracted researchers’ attention in relation to population health. We took on a visual approach with regards to these determinants and presented a graph that links spatial distances between determinants with frequencies of being studies together. To facilitate interpretation, we grouped some variables, which resulted in some loss of finer detail. Nevertheless, the graph is helpful in exemplifying how many effects continue to be studied in a very limited context, if any. Since in reality no factor acts in isolation, this oversimplification practice threatens to render the whole exercise meaningless from the outset. The importance of multivariate analysis cannot be stressed enough. While there is no “best method” to be recommended and appropriate techniques vary according to the specifics of the research question and the characteristics of the data at hand [ 89 – 93 ], in the future, in addition to abandoning simplistic univariate approaches, we hope to see a shift from the currently dominating fixed effects to the more flexible random/mixed effects models [ 94 ], as well as wider application of more sophisticated methods, such as principle component regression, partial least squares, covariance structure models (e.g., structural equations), canonical correlations, time-series, and generalized estimating equations.

Finally, there are some limitations of the current scoping review. We searched the two main databases for published research in medical and non-medical sciences (PubMed and Web of Science) since 2013, thus potentially excluding publications and reports that are not indexed in these databases, as well as older indexed publications. These choices were guided by our interest in the most recent (i.e., the current state-of-the-art) and arguably the highest-quality research (i.e., peer-reviewed articles, primarily in indexed non-predatory journals). Furthermore, despite holding a critical stance with regards to some aspects of how determinants-of-health research is currently conducted, we opted out of formally assessing the quality of the individual studies included. The reason for that is two-fold. On the one hand, we are unaware of the existence of a formal and standard tool for quality assessment of ecological designs. And on the other, we consider trying to score the quality of these diverse studies (in terms of regional setting, specific topic, outcome indices, and methodology) undesirable and misleading, particularly since we would sometimes have been rating the quality of only a (small) part of the original studies—the part that was relevant to our review’s goal.

Our aim was to investigate the current state of research on the very broad and general topic of population health, specifically, the way it has been examined in a multi-country context. We learned that data treatment and analytical approach were, in the majority of these recent studies, ill-equipped or insufficiently transparent to provide clarity regarding the underlying mechanisms of population health in high-income countries. Whether due to methodological shortcomings or the inherent complexity of the topic, research so far fails to provide any definitive answers. It is our sincere belief that with the application of more advanced analytical techniques this continuous quest could come to fruition sooner.

Supporting information

S1 checklist. preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (prisma-scr) checklist..

https://doi.org/10.1371/journal.pone.0239031.s001

S1 Appendix.

https://doi.org/10.1371/journal.pone.0239031.s002

S2 Appendix.

https://doi.org/10.1371/journal.pone.0239031.s003

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 75. Dahlgren G, Whitehead M. Policies and Strategies to Promote Equity in Health. Stockholm, Sweden: Institute for Future Studies; 1991.
  • 76. Brunner E, Marmot M. Social Organization, Stress, and Health. In: Marmot M, Wilkinson RG, editors. Social Determinants of Health. Oxford, England: Oxford University Press; 1999.
  • 77. Najman JM. A General Model of the Social Origins of Health and Well-being. In: Eckersley R, Dixon J, Douglas B, editors. The Social Origins of Health and Well-being. Cambridge, England: Cambridge University Press; 2001.
  • 85. Carpenter JR, Kenward MG. Multiple Imputation and its Application. New York: John Wiley & Sons; 2013.
  • 86. Molenberghs G, Fitzmaurice G, Kenward MG, Verbeke G, Tsiatis AA. Handbook of Missing Data Methodology. Boca Raton: Chapman & Hall/CRC; 2014.
  • 87. van Buuren S. Flexible Imputation of Missing Data. 2nd ed. Boca Raton: Chapman & Hall/CRC; 2018.
  • 88. Enders CK. Applied Missing Data Analysis. New York: Guilford; 2010.
  • 89. Shayle R. Searle GC, Charles E. McCulloch. Variance Components: John Wiley & Sons, Inc.; 1992.
  • 90. Agresti A. Foundations of Linear and Generalized Linear Models. Hoboken, New Jersey: John Wiley & Sons Inc.; 2015.
  • 91. Leyland A. H. (Editor) HGE. Multilevel Modelling of Health Statistics: John Wiley & Sons Inc; 2001.
  • 92. Garrett Fitzmaurice MD, Geert Verbeke, Geert Molenberghs. Longitudinal Data Analysis. New York: Chapman and Hall/CRC; 2008.
  • 93. Wolfgang Karl Härdle LS. Applied Multivariate Statistical Analysis. Berlin, Heidelberg: Springer; 2015.

Book cover

Handbook of Research Methods in Health Social Sciences pp 27–49 Cite as

Quantitative Research

  • Leigh A. Wilson 2 , 3  
  • Reference work entry
  • First Online: 13 January 2019

4072 Accesses

4 Citations

Quantitative research methods are concerned with the planning, design, and implementation of strategies to collect and analyze data. Descartes, the seventeenth-century philosopher, suggested that how the results are achieved is often more important than the results themselves, as the journey taken along the research path is a journey of discovery. High-quality quantitative research is characterized by the attention given to the methods and the reliability of the tools used to collect the data. The ability to critique research in a systematic way is an essential component of a health professional’s role in order to deliver high quality, evidence-based healthcare. This chapter is intended to provide a simple overview of the way new researchers and health practitioners can understand and employ quantitative methods. The chapter offers practical, realistic guidance in a learner-friendly way and uses a logical sequence to understand the process of hypothesis development, study design, data collection and handling, and finally data analysis and interpretation.

  • Quantitative
  • Epidemiology
  • Data analysis
  • Methodology
  • Interpretation

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Babbie ER. The practice of social research. 14th ed. Belmont: Wadsworth Cengage; 2016.

Google Scholar  

Descartes. Cited in Halverston, W. (1976). In: A concise introduction to philosophy, 3rd ed. New York: Random House; 1637.

Doll R, Hill AB. The mortality of doctors in relation to their smoking habits. BMJ. 1954;328(7455):1529–33. https://doi.org/10.1136/bmj.328.7455.1529 .

Article   Google Scholar  

Liamputtong P. Research methods in health: foundations for evidence-based practice. 3rd ed. Melbourne: Oxford University Press; 2017.

McNabb DE. Research methods in public administration and nonprofit management: quantitative and qualitative approaches. 2nd ed. New York: Armonk; 2007.

Merriam-Webster. Dictionary. http://www.merriam-webster.com . Accessed 20th December 2017.

Olesen Larsen P, von Ins M. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics. 2010;84(3):575–603.

Pannucci CJ, Wilkins EG. Identifying and avoiding bias in research. Plast Reconstr Surg. 2010;126(2):619–25. https://doi.org/10.1097/PRS.0b013e3181de24bc .

Petrie A, Sabin C. Medical statistics at a glance. 2nd ed. London: Blackwell Publishing; 2005.

Portney LG, Watkins MP. Foundations of clinical research: applications to practice. 3rd ed. New Jersey: Pearson Publishing; 2009.

Sheehan J. Aspects of research methodology. Nurse Educ Today. 1986;6:193–203.

Wilson LA, Black DA. Health, science research and research methods. Sydney: McGraw Hill; 2013.

Download references

Author information

Authors and affiliations.

School of Science and Health, Western Sydney University, Penrith, NSW, Australia

Leigh A. Wilson

Faculty of Health Science, Discipline of Behavioural and Social Sciences in Health, University of Sydney, Lidcombe, NSW, Australia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Leigh A. Wilson .

Editor information

Editors and affiliations.

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Wilson, L.A. (2019). Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-10-5251-4_54

Download citation

DOI : https://doi.org/10.1007/978-981-10-5251-4_54

Published : 13 January 2019

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-5250-7

Online ISBN : 978-981-10-5251-4

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Systematic review
  • Open access
  • Published: 19 June 2020

Quantitative measures of health policy implementation determinants and outcomes: a systematic review

  • Peg Allen   ORCID: orcid.org/0000-0001-7000-796X 1 ,
  • Meagan Pilar 1 ,
  • Callie Walsh-Bailey 1 ,
  • Cole Hooley 2 ,
  • Stephanie Mazzucca 1 ,
  • Cara C. Lewis 3 ,
  • Kayne D. Mettert 3 ,
  • Caitlin N. Dorsey 3 ,
  • Jonathan Purtle 4 ,
  • Maura M. Kepper 1 ,
  • Ana A. Baumann 5 &
  • Ross C. Brownson 1 , 6  

Implementation Science volume  15 , Article number:  47 ( 2020 ) Cite this article

17k Accesses

59 Citations

34 Altmetric

Metrics details

Public policy has tremendous impacts on population health. While policy development has been extensively studied, policy implementation research is newer and relies largely on qualitative methods. Quantitative measures are needed to disentangle differential impacts of policy implementation determinants (i.e., barriers and facilitators) and outcomes to ensure intended benefits are realized. Implementation outcomes include acceptability, adoption, appropriateness, compliance/fidelity, feasibility, penetration, sustainability, and costs. This systematic review identified quantitative measures that are used to assess health policy implementation determinants and outcomes and evaluated the quality of these measures.

Three frameworks guided the review: Implementation Outcomes Framework (Proctor et al.), Consolidated Framework for Implementation Research (Damschroder et al.), and Policy Implementation Determinants Framework (Bullock et al.). Six databases were searched: Medline, CINAHL Plus, PsycInfo, PAIS, ERIC, and Worldwide Political. Searches were limited to English language, peer-reviewed journal articles published January 1995 to April 2019. Search terms addressed four levels: health, public policy, implementation, and measurement. Empirical studies of public policies addressing physical or behavioral health with quantitative self-report or archival measures of policy implementation with at least two items assessing implementation outcomes or determinants were included. Consensus scoring of the Psychometric and Pragmatic Evidence Rating Scale assessed the quality of measures.

Database searches yielded 8417 non-duplicate studies, with 870 (10.3%) undergoing full-text screening, yielding 66 studies. From the included studies, 70 unique measures were identified to quantitatively assess implementation outcomes and/or determinants. Acceptability, feasibility, appropriateness, and compliance were the most commonly measured implementation outcomes. Common determinants in the identified measures were organizational culture, implementation climate, and readiness for implementation, each aspects of the internal setting. Pragmatic quality ranged from adequate to good, with most measures freely available, brief, and at high school reading level. Few psychometric properties were reported.

Conclusions

Well-tested quantitative measures of implementation internal settings were under-utilized in policy studies. Further development and testing of external context measures are warranted. This review is intended to stimulate measure development and high-quality assessment of health policy implementation outcomes and determinants to help practitioners and researchers spread evidence-informed policies to improve population health.

Registration

Not registered

Peer Review reports

Contributions to the literature

This systematic review identified 70 quantitative measures of implementation outcomes or determinants in health policy studies.

Readiness to implement and organizational climate and culture were commonly assessed determinants, but fewer studies assessed policy actor relationships or implementation outcomes of acceptability, fidelity/compliance, appropriateness, feasibility, or implementation costs.

Study team members rated most identified measures’ pragmatic properties as good, meaning they are straightforward to use, but few studies documented pilot or psychometric testing of measures.

Further development and dissemination of valid and reliable measures of policy implementation outcomes and determinants can facilitate identification, use, and spread of effective policy implementation strategies.

Despite major impacts of policy on population health [ 1 , 2 , 3 , 4 , 5 , 6 , 7 ], there have been relatively few policy studies in dissemination and implementation (D&I) science to inform implementation strategies and evaluate implementation efforts [ 8 ]. While health outcomes of policies are commonly studied, fewer policy studies assess implementation processes and outcomes. Of 146 D&I studies funded by the National Institutes of Health (NIH) through D&I funding announcements from 2007 to 2014, 12 (8.2%) were policy studies that assessed policy content, policy development processes, or health outcomes of policies, representing 10.5% of NIH D&I funding [ 8 ]. Eight of the 12 studies (66.7%) assessed health outcomes, while only five (41.6%) assessed implementation [ 8 ].

Our ability to explore the differential impact of policy implementation determinants and outcomes and disentangle these from health benefits and other societal outcomes requires high quality quantitative measures [ 9 ]. While systematic reviews of measures of implementation of evidence-based interventions (in clinical and community settings) have been conducted in recent years [ 10 , 11 , 12 , 13 ], to our knowledge, no reviews have explored the quality of quantitative measures of determinants and outcomes of policy implementation.

Policy implementation research in political science and the social sciences has been active since at least the 1970s and has much to contribute to the newer field of D&I research [ 1 , 14 ]. Historically, theoretical frameworks and policy research largely emphasized policy development or analysis of the content of policy documents themselves [ 15 ]. For example, Kingdon’s Multiple Streams Framework and its expansions have been widely used in political science and the social sciences more broadly to describe how factors related to sociopolitical climate, attributes of a proposed policy, and policy actors (e.g., organizations, sectors, individuals) contribute to policy change [ 16 , 17 , 18 ]. Policy frameworks can also inform implementation planning and evaluation in D&I research. Although authors have named policy stages since the 1950s [ 19 , 20 ], Sabatier and Mazmanian’s Policy Implementation Process Framework was one of the first such frameworks that gained widespread use in policy implementation research [ 21 ] and later in health promotion [ 22 ]. Yet, available implementation frameworks are not often used to guide implementation strategies or inform why a policy worked in one setting but not another [ 23 ]. Without explicit focus on implementation, the intended benefits of health policies may go unrealized, and the ability may be lost to move the field forward to understand policy implementation (i.e., our collective knowledge building is dampened) [ 24 ].

Differences in perspectives and terminology between D&I and policy research in political science are noteworthy to interpret the present review. For example, Proctor et al. use the term implementation outcomes for what policy researchers call policy outputs [ 14 , 20 , 25 ]. To non-D&I policy researchers, policy implementation outcomes refer to the health outcomes in the target population [ 20 ]. D&I science uses the term fidelity [ 26 ]; policy researchers write about compliance [ 20 ]. While D&I science uses the terms outer setting, outer context, or external context to point to influences outside the implementing organization [ 26 , 27 , 28 ], non-D&I policy research refers to policy fields [ 24 ] which are networks of agencies that carry out policies and programs.

Identification of valid and reliable quantitative measures of health policy implementation processes is needed. These measures are needed to advance from classifying constructs to understanding causality in policy implementation research [ 29 ]. Given limited resources, policy implementers also need to know which aspects of implementation are key to improve policy acceptance, compliance, and sustainability to reap the intended health benefits [ 30 ]. Both pragmatic and psychometrically sound measures are needed to accomplish these objectives [ 10 , 11 , 31 , 32 ], so the field can explore the influence of nuanced determinants and generate reliable and valid findings.

To fill this void in the literature, this systematic review of health policy implementation measures aimed to (1) identify quantitative measures used to assess health policy implementation outcomes (IOF outcomes commonly called policy outputs in policy research) and inner and outer setting determinants, (2) describe and assess pragmatic quality of policy implementation measures, (3) describe and assess the quality of psychometric properties of identified instruments, and (4) elucidate health policy implementation measurement gaps.

The study team used systematic review procedures developed by Lewis and colleagues for reviews of D&I research measures and received detailed guidance from the Lewis team coauthors for each step [ 10 , 11 ]. We followed the PRISMA reporting guidelines as shown in the checklist (Supplemental Table 1 ). We have also provided a publicly available website of measures identified in this review ( https://www.health-policy-measures.org/ ).

For the purposes of this review, policy and policy implementation are defined as follows. We deemed public policy to include legislation at the federal, state/province/regional unit, or local levels; and governmental regulations, whether mandated by national, state/province, or local level governmental agencies or boards of elected officials (e.g., state boards of education in the USA) [ 4 , 20 ]. Here, public policy implementation is defined as the carrying out of a governmental mandate by public or private organizations and groups of organizations [ 20 ].

Two widely used frameworks from the D&I field guide the present review, and a third recently developed framework that bridges policy and D&I research. In the Implementation Outcomes Framework (IOF), Proctor and colleagues identify and define eight implementation outcomes that are differentiated from health outcomes: acceptability, adoption, appropriateness, cost, feasibility, fidelity, penetration, and sustainability [ 25 ]. In the Consolidated Framework for Implementation Research (CFIR), Damschroder and colleagues articulate determinants of implementation including the domains of intervention characteristics, outer setting, inner setting of an organization, characteristics of individuals within organizations, and process [ 33 ]. Finally, Bullock developed the Policy Implementation Determinants Framework to present a balanced framework that emphasizes both internal setting constructs and external setting constructs including policy actor relationships and networks, political will for implementation, and visibility of policy actors [ 34 ]. The constructs identified in these frameworks were used to guide our list of implementation determinants and outcomes.

Through EBSCO, we searched MEDLINE, PsycInfo, and CINAHL Plus. Through ProQuest, we searched PAIS, Worldwide Political, and ERIC. Due to limited time and staff in the 12-month study, we did not search the grey literature. We used multiple search terms in each of four required levels: health, public policy, implementation, and measurement (Table 1 ). Table 1 shows search terms for each string. Supplemental Tables 2 and 3 show the final search syntax applied in EBSCO and ProQuest.

The authors developed the search strings and terms based on policy implementation framework reviews [ 34 , 35 ], additional policy implementation frameworks [ 21 , 22 ], labels and definitions of the eight implementation outcomes identified by Proctor et al. [ 25 ], CFIR construct labels and definitions [ 9 , 33 ], and additional D&I research and search term sources [ 28 , 36 , 37 , 38 ] (Table 1 ). The full study team provided three rounds of feedback on draft terms, and a library scientist provided additional synonyms and search terms. For each test search, we calculated the percentage of 18 benchmark articles the search captured. We determined a priori 80% as an acceptable level of precision.

Inclusion and exclusion criteria

This review addressed only measures of implementation by organizations mandated to act by governmental units or legislation. Measures of behavior changes by individuals in target populations as a result of legislation or governmental regulations and health status changes were outside the realm of this review.

There were several inclusion criteria: (1) empirical studies of the implementation of public policies already passed or approved that addressed physical or behavioral health, (2) quantitative self-report or archival measurement methods utilized, (3) published in peer-reviewed journals from January 1995 through April 2019, (4) published in the English language, (5) public policy implementation studies from any continent or international governing body, and (6) at least two transferable quantitative self-report or archival items that assessed implementation determinants [ 33 , 34 ] and/or IOF implementation outcomes [ 25 ]. This study sought to identify transferable measures that could be used to assess multiple policies and contexts. Here, a transferable item is defined as one that needed no wording changes or only a change in the referent (e.g., policy title or topic such as tobacco or malaria) to make the item applicable to other policies or settings [ 11 ]. The year 1995 was chosen as a starting year because that is about when web-based quantitative surveying began [ 39 ]. Table 2 provides definitions of the IOF implementation outcomes and the selected determinants of implementation. Broader constructs, such as readiness for implementation, contained multiple categories.

Exclusion criteria in the searches included (1) non-empiric health policy journal articles (e.g., conceptual articles, editorials); (2) narrative and systematic reviews; (3) studies with only qualitative assessment of health policy implementation; (4) empiric studies reported in theses and books; (5) health policy studies that only assessed health outcomes (i.e., target population changes in health behavior or status); (6) bill analyses, stakeholder perceptions assessed to inform policy development, and policy content analyses without implementation assessment; (7) studies of changes made in a private business not encouraged by public policy; and (8) countries with authoritarian regimes. We electronically programmed the searches to exclude policy implementation studies from countries that are not democratically governed due to vast differences in policy environments and implementation factors.

Screening procedures

Citations were downloaded into EndNote version 7.8 and de-duplicated electronically. We conducted dual independent screening of titles and abstracts after two group pilot screening sessions in which we clarified inclusion and exclusion criteria and screening procedures. Abstract screeners used Covidence systematic review software [ 40 ] to code inclusion as yes or no. Articles were included in full-text review if one screener coded it as meeting the inclusion criteria. Full-text screening via dual independent screening was coded in Covidence [ 40 ], with weekly meetings to reach consensus on inclusion/exclusion discrepancies. Screeners also coded one of the pre-identified reasons for exclusion.

Data extraction strategy

Extraction elements included information about (1) measure meta-data (e.g., measure name, total number of items, number of transferable items) and studies (e.g., policy topic, country, setting), (2) development and testing of the measure, (3) implementation outcomes and determinants assessed (Table 2 ), (4) pragmatic characteristics, and (5) psychometric properties. Where needed, authors were emailed to obtain the full measure and measure development information. Two coauthors (MP, CWB) reached consensus on extraction elements. For each included measure, a primary extractor conducted initial entries and coding. Due to time and staff limitations in the 12-month study, we did not search for each empirical use of the measure. A secondary extractor checked the entries, noting any discrepancies for discussion in consensus meetings. Multiple measures in a study were extracted separately.

Quality assessment of measures

To assess the quality of measures, we applied the Psychometric and Pragmatic Evidence Rating Scales (PAPERS) developed by Lewis et al. [ 10 , 11 , 41 , 42 ]. PAPERS includes assessment of five pragmatic instrument characteristics that affect the level of ease or difficulty to use the instrument: brevity (number of items), simplicity of language (readability level), cost (whether it is freely available), training burden (extent of data collection training needed), and analysis burden (ease or difficulty of interpretation of scoring and results). Lewis and colleagues developed the pragmatic domains and rating scales with stakeholder and D&I researchers input [ 11 , 41 , 42 ] and developed the psychometric rating scales in collaboration with D&I researchers [ 10 , 11 , 43 ]. The psychometric rating scale has nine properties (Table 3 ): internal consistency; norms; responsiveness; convergent, discriminant, and known-groups construct validity; predictive and concurrent criterion validity; and structural validity. In both the pragmatic and psychometric scales, reported evidence for each domain is scored from poor (− 1), none/not reported (0), minimal/emerging (1), adequate (2), good (3), or excellent (4). Higher values are indicative of more desirable pragmatic characteristics (e.g., fewer items, freely available, scoring instructions, and interpretations provided) and stronger evidence of psychometric properties (e.g., adequate to excellent reliability and validity) (Supplemental Tables 4 and 5 ).

Data synthesis and presentation

This section describes the synthesis of measure transferability, empiric use study settings and policy topics, and PAPERS scoring. Two coauthors (MP, CWB) consensus coded measures into three categories of item transferability based on quartile item transferability percentages: mostly transferable (≥ 75% of items deemed transferable), partially transferable (25–74% of items deemed transferable), and setting-specific (< 25% of items deemed transferable). Items were deemed transferable if no wording changes or only a change in the referent (e.g., policy title or topic) was needed to make the item applicable to the implementation of other policies or in other settings. Abstractors coded study settings into one of five categories: hospital or outpatient clinics; mental or behavioral health facilities; healthcare cost, access, or quality; schools; community; and multiple. Abstractors also coded policy topics to healthcare cost, access, or quality; mental or behavioral health; infectious or chronic diseases; and other, while retaining documentation of subtopics such as tobacco, physical activity, and nutrition. Pragmatic scores were totaled for the five properties, with possible total scores of − 5 to 20, with higher values indicating greater ease to use the instrument. Psychometric property total scores for the nine properties were also calculated, with possible scores of − 9 to 36, with higher values indicating evidence of multiple types of validity.

The database searches yielded 11,684 articles, of which 3267 were duplicates (Fig. 1 ). Titles and abstracts of the 8417 articles were independently screened by two team members; 870 (10.3%) were selected for full-text screening by at least one screener. Of the 870 studies, 804 were excluded at full-text screening or during extraction attempts with the consensus of two coauthors; 66 studies were included. Two coauthors (MP, CWB) reached consensus on extraction and coding of information on 70 unique quantitative eligible measures identified in the 66 included studies plus measure development articles where obtained. Nine measures were used in more than one included study. Detailed information on identified measures is publicly available at https://www.health-policy-measures.org/ .

figure 1

PRISMA flow diagram

The most common exclusion reason was lack of transferable items in quantitative measures of policy implementation ( n = 597) (Fig. 1 ). While this review focused on transferable measures across any health issue or setting, researchers addressing specific health policies or settings may find the excluded studies of interest. The frequencies of the remaining exclusion reasons are listed in Fig. 1 .

A variety of health policy topics and settings from over two dozen countries were found in the database searches. For example, the searches identified quantitative and mixed methods implementation studies of legislation (such as tobacco smoking bans), regulations (such as food/menu labeling requirements), governmental policies that mandated specific clinical practices (such as vaccination or access to HIV antiretroviral treatment), school-based interventions (such as government-mandated nutritional content and physical activity), and other public policies.

Among the 70 unique quantitative implementation measures, 15 measures were deemed mostly transferable (at least 75% transferable, Table 4 ). Twenty-three measures were categorized as partially transferable (25 to 74% of items deemed transferable, Table 5 ); 32 measures were setting-specific (< 25% of items deemed transferable, data not shown).

Implementation outcomes

Among the 70 measures, the most commonly assessed implementation outcomes were fidelity/compliance of the policy implementation to the government mandate (26%), acceptability of the policy to implementers (24%), perceived appropriateness of the policy (17%), and feasibility of implementation (17%) (Table 2 ). Fidelity/compliance was sometimes assessed by asking implementers the extent to which they had modified a mandated practice [ 45 ]. Sometimes, detailed checklists were used to assess the extent of compliance with the many mandated policy components, such as school nutrition policies [ 83 ]. Acceptability was assessed by asking staff or healthcare providers in implementing agencies their level of agreement with the provided statements about the policy mandate, scored in Likert scales. Only eight (11%) of the included measures used multiple transferable items to assess adoption, and only eight (11%) assessed penetration.

Twenty-six measures of implementation costs were found during full-text screening (10 in included studies and 14 in excluded studies, data not shown). The cost time horizon varied from 12 months to 21 years, with most cost measures assessed at multiple time points. Ten of the 26 measures addressed direct implementation costs. Nine studies reported cost modeling findings. The implementation cost survey developed by Vogler et al. was extensive [ 53 ]. It asked implementing organizations to note policy impacts in medication pricing, margins, reimbursement rates, and insurance co-pays.

Determinants of implementation

Within the 70 included measures, the most commonly assessed implementation determinants were readiness for implementation (61% assessed any readiness component) and the general organizational culture and climate (39%), followed by the specific policy implementation climate within the implementation organization/s (23%), actor relationships and networks (17%), political will for policy implementation (11%), and visibility of the policy role and policy actors (10%) (Table 2 ). Each component of readiness for implementation was commonly assessed: communication of the policy (31%, 22 of 70 measures), policy awareness and knowledge (26%), resources for policy implementation (non-training resources 27%, training 20%), and leadership commitment to implement the policy (19%).

Only two studies assessed organizational structure as a determinant of health policy implementation. Lavinghouze and colleagues assessed the stability of the organization, defined as whether re-organization happens often or not, within a set of 9-point Likert items on multiple implementation determinants designed for use with state-level public health practitioners, and assessed whether public health departments were stand-alone agencies or embedded within agencies addressing additional services, such as social services [ 69 ]. Schneider and colleagues assessed coalition structure as an implementation determinant, including items on the number of organizations and individuals on the coalition roster, number that regularly attend coalition meetings, and so forth [ 72 ].

Tables of measures

Tables 4 and 5 present the 38 measures of implementation outcomes and/or determinants identified out of the 70 included measures with at least 25% of items transferable (useable in other studies without wording changes or by changing only the policy name or other referent). Table 4 shows 15 mostly transferable measures (at least 75% transferable). Table 5 shows 23 partially transferable measures (25–74% of items deemed transferable). Separate measure development articles were found for 20 of the 38 measures; the remaining measures seemed to be developed for one-time, study-specific use by the empirical study authors cited in the tables. Studies listed in Tables 4 and 5 were conducted most commonly in the USA ( n = 19) or Europe ( n = 11). A few measures were used elsewhere: Africa ( n = 3), Australia ( n = 1), Canada ( n = 1), Middle East ( n = 1), Southeast Asia ( n = 1), or across multiple continents ( n = 1).

Quality of identified measures

Figure 2 shows the median pragmatic quality ratings across the 38 measures with at least 25% transferable items shown in Tables 4 and 5 . Higher scores are desirable and indicate the measures are easier to use (Table 3 ). Overall, the measures were freely available in the public domain (median score = 4), brief with a median of 11–50 items (median score = 3), and had good readability, with a median reading level between 8th and 12th grade (median score = 3). However, instructions on how to score and interpret item scores were lacking, with a median score of 1, indicating the measures did not include suggestions for interpreting score ranges, clear cutoff scores, and instructions for handling missing data. In general, information on training requirements or availability of self-training manuals on how to use the measures was not reported in the included study or measure development article/s (median score = 0, not reported). Total pragmatic rating scores among the 38 measures with at least 25% of items transferable ranged from 7 to 17 (Tables 4 and 5 ), with a median total score of 12 out of a possible total score of 20. Median scores for each pragmatic characteristic were the same across all measures as for the 38 mostly or partially transferable measures, with a median total score of 11 across all measures.

figure 2

Pragmatic rating scale results across identified measures. Footnote: pragmatic criteria scores from Psychometric and Pragmatic Evidence Rating Scale (PAPERS) (Lewis et al. [ 11 ], Stanick et al. [ 42 ]). Total possible score = 20, total median score across 38 measures = 11. Scores ranged from 0 to 18. Rating scales for each domain are provided in Supplemental Table 4

Few psychometric properties were reported. The study team found few reports of pilot testing and measure refinement as well. Among the 38 measures with at least 25% transferable items, the psychometric properties from the PAPERS rating scale total scores ranged from − 1 to 17 (Tables 4 and 5 ), with a median total score of 5 out of a possible total score of 36. Higher scores indicate more types of validity and reliability were reported with high quality. The 32 measures with calculable norms had a median norms PAPERS score of 3 (good), indicating appropriate sample size and distribution. The nine measures with reported internal consistency mostly showed Cronbach’s alphas in the adequate (0.70 to 0.79) to excellent (≥ 90) range, with a median of 0.78 (PAPERS score of 2, adequate) indicating adequate internal consistency. The five measures with reported structural validity had a median PAPERS score of 2, adequate (range 1 to 3, poor to good), indicating the sample size was sufficient and the factor analysis goodness of fit was reasonable. Among the 38 measures, no reports were found for responsiveness, convergent validity, discriminant validity, known-groups construct validity, or predictive or concurrent criterion validity.

In this systematic review, we sought to identify quantitative measures used to assess health policy implementation outcomes and determinants, rate the pragmatic and psychometric quality of identified measures, and point to future directions to address measurement gaps. In general, the identified measures are easy to use and freely available, but we found little data on validity and reliability. We found more quantitative measures of intra-organizational determinants of policy implementation than measures of the relationships and interactions between organizations that influence policy implementation. We found a limited number of measures that had been developed for or used to assess one of the eight IOF policy implementation outcomes that can be applied to other policies or settings, which may speak more to differences in terms used by policy researchers and D&I researchers than to differences in conceptualizations of policy implementation. Authors used a variety of terms and rarely provided definitions of the constructs the items assessed. Input from experts in policy implementation is needed to better understand and define policy implementation constructs for use across multiple fields involved in policy-related research.

We found several researchers had used well-tested measures of implementation determinants from D&I research or from organizational behavior and management literature (Tables 4 and 5 ). For internal setting of implementing organizations, whether mandated through public policy or not, well-developed and tested measures are available. However, a number of authors crafted their own items, with or without pilot testing, and used a variety of terms to describe what the items assessed. Further dissemination of the availability of well-tested measures to policy researchers is warranted [ 9 , 13 ].

What appears to be a larger gap involves the availability of well-developed and tested quantitative measures of the external context affecting policy implementation that can be used across multiple policy settings and topics [ 9 ]. Lack of attention to how a policy initiative fits with the external implementation context during policymaking and lack of policymaker commitment of adequate resources for implementation contribute to this gap [ 23 , 93 ]. Recent calls and initiatives to integrate health policies during policymaking and implementation planning will bring more attention to external contexts affecting not only policy development but implementation as well [ 93 , 94 , 95 , 96 , 97 , 98 , 99 ]. At the present time, it is not well-known which internal and external determinants are most essential to guide and achieve sustainable policy implementation [ 100 ]. Identification and dissemination of measures that assess factors that facilitate the spread of evidence-informed policy implementation (e.g., relative advantage, flexibility) will also help move policy implementation research forward [ 1 , 9 ].

Given the high potential population health impact of evidence-informed policies, much more attention to policy implementation is needed in D&I research. Few studies from non-D&I researchers reported policy implementation measure development procedures, pilot testing, scoring procedures and interpretation, training of data collectors, or data analysis procedures. Policy implementation research could benefit from the rigor of D&I quantitative research methods. And D&I researchers have much to learn about the contexts and practical aspects of policy implementation and can look to the rich depth of information in qualitative and mixed methods studies from other fields to inform quantitative measure development and testing [ 101 , 102 , 103 ].

Limitations

This systematic review has several limitations. First, the four levels of the search string and multiple search terms in each level were applied only to the title, abstract, and subject headings, due to limitations of the search engines, so we likely missed pertinent studies. Second, a systematic approach with stakeholder input is needed to expand the definitions of IOF implementation outcomes for policy implementation. Third, although the authors value intra-organizational policymaking and implementation, the study team restricted the search to governmental policies due to limited time and staffing in the 12-month study. Fourth, by excluding tools with only policy-specific implementation measures, we excluded some well-developed and tested instruments in abstract and full-text screening. Since only 12 measures had 100% transferable items, researchers may need to pilot test wording modifications of other items. And finally, due to limited time and staffing, we only searched online for measures and measures development articles and may have missed separately developed pragmatic information, such as training and scoring materials not reported in a manuscript.

Despite the limitations, several recommendations for measure development follow from the findings and related literature [ 1 , 11 , 20 , 35 , 41 , 104 ], including the need to (1) conduct systematic, mixed-methods procedures (concept mapping, expert panels) to refine policy implementation outcomes, (2) expand and more fully specify external context domains for policy implementation research and evaluation, (3) identify and disseminate well-developed measures for specific policy topics and settings, (4) ensure that policy implementation improves equity rather than exacerbating disparities [ 105 ], and (5) develop evidence-informed policy implementation guidelines.

Easy-to-use, reliable, and valid quantitative measures of policy implementation can further our understanding of policy implementation processes, determinants, and outcomes. Due to the wide array of health policy topics and implementation settings, sound quantitative measures that can be applied across topics and settings will help speed learnings from individual studies and aid in the transfer from research to practice. Quantitative measures can inform the implementation of evidence-informed policies to further the spread and effective implementation of policies to ultimately reap greater population health benefit. This systematic review of measures is intended to stimulate measure development and high-quality assessment of health policy implementation outcomes and predictors to help practitioners and researchers spread evidence-informed policies to improve population health and reduce inequities.

Availability of data and materials

A compendium of identified measures is available for dissemination at https://www.health-policy-measures.org/ . A link will be provided on the website of the Prevention Research Center, Brown School, Washington University in St. Louis, at https://prcstl.wustl.edu/ . The authors invite interested organizations to provide a link to the compendium. Citations and abstracts of excluded policy-specific measures are available on request.

Abbreviations

Consolidated Framework for Implementation Research

Cumulative Index of Nursing and Allied Health Literature

Dissemination and implementation science

Elton B. Stephens Company

Education Resources Information Center

Implementation Outcomes Framework

Psychometric and Pragmatic Evidence Rating Scale

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Purtle J, Dodson EA, Brownson RC. Policy dissemination research. In: Brownson RC, Colditz GA, Proctor EK, editors. Dissemination and Implementation Research in Health: Translating Science to Practice, Second Edition. New York: Oxford University Press; 2018.

Google Scholar  

Brownson RC, Baker EA, Deshpande AD, Gillespie KN. Evidence-based public health. Third ed. New York, NY: Oxford University Press; 2018.

Guide to Community Preventive Services. About the community guide.: community preventive services task force; 2020 [updated October 03, 2019; cited 2020. Available from: https://www.thecommunityguide.org/ .

Eyler AA, Chriqui JF, Moreland-Russell S, Brownson RC, editors. Prevention, policy, and public health, first edition. New York, NY: Oxford University Press; 2016.

Andre FE, Booy R, Bock HL, Clemens J, Datta SK, John TJ, et al. Vaccination greatly reduces disease, disability, death, and inequity worldwide. Geneva, Switzerland: World Health Organization; 2008 February 2008. Contract No.: 07-040089.

Cheng JJ, Schuster-Wallace CJ, Watt S, Newbold BK, Mente A. An ecological quantification of the relationships between water, sanitation and infant, child, and maternal mortality. Environ Health. 2012;11:4.

PubMed   PubMed Central   Google Scholar  

Levy DT, Li Y, Yuan Z. Impact of nations meeting the MPOWER targets between 2014 and 2016: an update. Tob Control. 2019.

Purtle J, Peters R, Brownson RC. A review of policy dissemination and implementation research funded by the National Institutes of Health, 2007-2014. Implement Sci. 2016;11:1.

Lewis CC, Proctor EK, Brownson RC. Measurement issues in dissemination and implementation research. In: Brownson RC, Ga C, Proctor EK, editors. Disssemination and Implementation Research in Health: Translating Science to Practice, Second Edition. New York: Oxford University Press; 2018.

Lewis CC, Fischer S, Weiner BJ, Stanick C, Kim M, Martinez RG. Outcomes for implementation science: an enhanced systematic review of instruments using evidence-based rating criteria. Implement Sci. 2015;10:155.

Lewis CC, Mettert KD, Dorsey CN, Martinez RG, Weiner BJ, Nolen E, et al. An updated protocol for a systematic review of implementation-related measures. Syst Rev. 2018;7(1):66.

Chaudoir SR, Dugan AG, Barr CH. Measuring factors affecting implementation of health innovations: a systematic review of structural, organizational, provider, patient, and innovation level measures. Implement Sci. 2013;8:22.

Rabin BA, Lewis CC, Norton WE, Neta G, Chambers D, Tobin JN, et al. Measurement resources for dissemination and implementation research in health. Implement Sci. 2016;11:42.

Nilsen P, Stahl C, Roback K, Cairney P. Never the twain shall meet?--a comparison of implementation science and policy implementation research. Implement Sci. 2013;8:63.

Sabatier PA, editor. Theories of the Policy Process. New York, NY: Routledge; 2019.

Kingdon J. Agendas, alternatives, and public policies, second edition. Second ed. New York: Longman; 1995.

Jones MD, Peterson HL, Pierce JJ, Herweg N, Bernal A, Lamberta Raney H, et al. A river runs through it: a multiple streams meta-review. Policy Stud J. 2016;44(1):13–36.

Fowler L. Using the multiple streams framework to connect policy adoption to implementation. Policy Studies Journal. 2020 (11 Feb).

Howlett M, Mukherjee I, Woo JJ. From tools to toolkits in policy design studies: the new design orientation towards policy formulation research. Policy Polit. 2015;43(2):291–311.

Natesan SD, Marathe RR. Literature review of public policy implementation. Int J Public Policy. 2015;11(4):219–38.

Sabatier PA, Mazmanian. Implementation of public policy: a framework of analysis. Policy Studies Journal. 1980 (January).

Sabatier PA. Theories of the Policy Process. Westview; 2007.

Tomm-Bonde L, Schreiber RS, Allan DE, MacDonald M, Pauly B, Hancock T, et al. Fading vision: knowledge translation in the implementation of a public health policy intervention. Implement Sci. 2013;8:59.

Roll S, Moulton S, Sandfort J. A comparative analysis of two streams of implementation research. Journal of Public and Nonprofit Affairs. 2017;3(1):3–22.

Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, et al. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Admin Pol Ment Health. 2011;38(2):65–76.

Brownson RC, Colditz GA, Proctor EK, editors. Dissemination and implementation research in health: translating science to practice, second edition. New York: Oxford University Press; 2018.

Tabak RG, Khoong EC, Chambers DA, Brownson RC. Bridging research and practice: models for dissemination and implementation research. Am J Prev Med. 2012;43(3):337–50.

Rabin BA, Brownson RC, Haire-Joshu D, Kreuter MW, Weaver NL. A glossary for dissemination and implementation research in health. J Public Health Manag Pract. 2008;14(2):117–23.

PubMed   Google Scholar  

Lewis CC, Klasnja P, Powell BJ, Lyon AR, Tuzzio L, Jones S, et al. From classification to causality: advancing understanding of mechanisms of change in implementation science. Front Public Health. 2018;6:136.

Boyd MR, Powell BJ, Endicott D, Lewis CC. A method for tracking implementation strategies: an exemplar implementing measurement-based care in community behavioral health clinics. Behav Ther. 2018;49(4):525–37.

Glasgow RE. What does it mean to be pragmatic? Pragmatic methods, measures, and models to facilitate research translation. Health Educ Behav. 2013;40(3):257–65.

Glasgow RE, Riley WT. Pragmatic measures: what they are and why we need them. Am J Prev Med. 2013;45(2):237–43.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009;4:50.

Bullock HL. Understanding the implementation of evidence-informed policies and practices from a policy perspective: a critical interpretive synthesis in: How do systems achieve their goals? the role of implementation in mental health systems improvement [Dissertation]. Hamilton, Ontario: McMaster University; 2019.

Watson DP, Adams EL, Shue S, Coates H, McGuire A, Chesher J, et al. Defining the external implementation context: an integrative systematic literature review. BMC Health Serv Res. 2018;18(1):209.

McKibbon KA, Lokker C, Wilczynski NL, Ciliska D, Dobbins M, Davis DA, et al. A cross-sectional study of the number and frequency of terms used to refer to knowledge translation in a body of health literature in 2006: a Tower of Babel? Implement Sci. 2010;5:16.

Terwee CB, Jansma EP, Riphagen II, de Vet HC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18(8):1115–23.

Egan M, Maclean A, Sweeting H, Hunt K. Comparing the effectiveness of using generic and specific search terms in electronic databases to identify health outcomes for a systematic review: a prospective comparative study of literature search method. BMJ Open. 2012;2:3.

Dillman DA, Smyth JD, Christian LM. Internet, mail, and mixed-mode surveys: the tailored design method. Hoboken, NJ: John Wiley & Sons; 2009.

Covidence systematic review software. Melbourne, Australia: Veritas Health Innovation. https://www.covidence.org . Accessed Mar 2019.

Powell BJ, Stanick CF, Halko HM, Dorsey CN, Weiner BJ, Barwick MA, et al. Toward criteria for pragmatic measurement in implementation research and practice: a stakeholder-driven approach using concept mapping. Implement Sci. 2017;12(1):118.

Stanick CF, Halko HM, Nolen EA, Powell BJ, Dorsey CN, Mettert KD, et al. Pragmatic measures for implementation research: development of the Psychometric and Pragmatic Evidence Rating Scale (PAPERS). Transl Behav Med. 2019.

Henrikson NB, Blasi PR, Dorsey CN, Mettert KD, Nguyen MB, Walsh-Bailey C, et al. Psychometric and pragmatic properties of social risk screening tools: a systematic review. Am J Prev Med. 2019;57(6S1):S13–24.

Stirman SW, Miller CJ, Toder K, Calloway A. Development of a framework and coding system for modifications and adaptations of evidence-based interventions. Implement Sci. 2013;8:65.

Lau AS, Brookman-Frazee L. The 4KEEPS study: identifying predictors of sustainment of multiple practices fiscally mandated in children’s mental health services. Implement Sci. 2016;11:1–8.

Ekvall G. Organizational climate for creativity and innovation. European J Work Organizational Psychology. 1996;5(1):105–23.

Lövgren G, Eriksson S, Sandman PO. Effects of an implemented care policy on patient and personnel experiences of care. Scand J Caring Sci. 2002;16(1):3–11.

Dwyer DJ, Ganster DC. The effects of job demands and control on employee attendance and satisfaction. J Organ Behav. 1991;12:595–608.

Condon-Paoloni D, Yeatman HR, Grigonis-Deane E. Health-related claims on food labels in Australia: understanding environmental health officers’ roles and implications for policy. Public Health Nutr. 2015;18(1):81–8.

Patterson MG, West MA, Shackleton VJ, Dawson JF, Lawthom R, Maitlis S, et al. Validating the organizational climate measure: links to managerial practices, productivity and innovation. J Organ Behav. 2005;26:279–408.

Glisson C, Green P, Williams NJ. Assessing the Organizational Social Context (OSC) of child welfare systems: implications for research and practice. Child Abuse Negl. 2012;36(9):621–32.

Beidas RS, Aarons G, Barg F, Evans A, Hadley T, Hoagwood K, et al. Policy to implementation: evidence-based practice in community mental health--study protocol. Implement Sci. 2013;8(1):38.

Eisenberger R, Cummings J, Armeli S, Lynch P. Perceived organizational support, discretionary treatment, and job satisfaction. J Appl Psychol. 1997;82:812–20.

CAS   PubMed   Google Scholar  

Eby L, George K, Brown BL. Going tobacco-free: predictors of clinician reactions and outcomes of the NY state office of alcoholism and substance abuse services tobacco-free regulation. J Subst Abus Treat. 2013;44(3):280–7.

Vogler S, Zimmermann N, de Joncheere K. Policy interventions related to medicines: survey of measures taken in European countries during 2010-2015. Health Policy. 2016;120(12):1363–77.

Wanberg CRB, Banas JT. Predictors and outcomes of openness to change in a reorganizing workplace. J Applied Psychology. 2000;85:132–42.

CAS   Google Scholar  

Hardy LJ, Wertheim P, Bohan K, Quezada JC, Henley E. A model for evaluating the activities of a coalition-based policy action group: the case of Hermosa Vida. Health Promot Pract. 2013;14(4):514–23.

Gavriilidis G, Östergren P-O. Evaluating a traditional medicine policy in South Africa: phase 1 development of a policy assessment tool. Glob Health Action. 2012;5:17271.

Hongoro C, Rutebemberwa E, Twalo T, Mwendera C, Douglas M, Mukuru M, et al. Analysis of selected policies towards universal health coverage in Uganda: the policy implementation barometer protocol. Archives Public Health. 2018;76:12.

Roeseler A, Solomon M, Beatty C, Sipler AM. The tobacco control network’s policy readiness and stage of change assessment: what the results suggest for moving tobacco control efforts forward at the state and territorial levels. J Public Health Manag Pract. 2016;22(1):9–19.

Brämberg EB, Klinga C, Jensen I, Busch H, Bergström G, Brommels M, et al. Implementation of evidence-based rehabilitation for non-specific back pain and common mental health problems: a process evaluation of a nationwide initiative. BMC Health Serv Res. 2015;15(1):79.

Rütten A, Lüschen G, von Lengerke T, Abel T, Kannas L, Rodríguez Diaz JA, et al. Determinants of health policy impact: comparative results of a European policymaker study. Sozial-Und Praventivmedizin. 2003;48(6):379–91.

Smith SN, Lai Z, Almirall D, Goodrich DE, Abraham KM, Nord KM, et al. Implementing effective policy in a national mental health reengagement program for veterans. J Nerv Ment Dis. 2017;205(2):161–70.

Carasso BS, Lagarde M, Cheelo C, Chansa C, Palmer N. Health worker perspectives on user fee removal in Zambia. Hum Resour Health. 2012;10:40.

Goldsmith REH, C.F. Measuring consumer innovativeness. J Acad Mark Sci. 1991;19(3):209–21.

Webster CA, Caputi P, Perreault M, Doan R, Doutis P, Weaver RG. Elementary classroom teachers’ adoption of physical activity promotion in the context of a statewide policy: an innovation diffusion and socio-ecologic perspective. J Teach Phys Educ. 2013;32(4):419–40.

Aarons GA, Glisson C, Hoagwood K, Kelleher K, Landsverk J, Cafri G. Psychometric properties and U.S. National norms of the Evidence-Based Practice Attitude Scale (EBPAS). Psychol Assess. 2010;22(2):356–65.

Gill KJ, Campbell E, Gauthier G, Xenocostas S, Charney D, Macaulay AC. From policy to practice: implementing frontline community health services for substance dependence--study protocol. Implement Sci. 2014;9:108.

Lavinghouze SR, Price AW, Parsons B. The environmental assessment instrument: harnessing the environment for programmatic success. Health Promot Pract. 2009;10(2):176–85.

Bull FC, Milton K, Kahlmeier S. National policy on physical activity: the development of a policy audit tool. J Phys Act Health. 2014;11(2):233–40.

Bull F, Milton K, Kahlmeier S, Arlotti A, Juričan AB, Belander O, et al. Turning the tide: national policy approaches to increasing physical activity in seven European countries. British J Sports Med. 2015;49(11):749–56.

Schneider EC, Smith ML, Ory MG, Altpeter M, Beattie BL, Scheirer MA, et al. State fall prevention coalitions as systems change agents: an emphasis on policy. Health Promot Pract. 2016;17(2):244–53.

Helfrich CD, Savitz LA, Swiger KD, Weiner BJ. Adoption and implementation of mandated diabetes registries by community health centers. Am J Prev Med. 2007;33(1,Suppl):S50-S65.

Donchin M, Shemesh AA, Horowitz P, Daoud N. Implementation of the Healthy Cities’ principles and strategies: an evaluation of the Israel Healthy Cities network. Health Promot Int. 2006;21(4):266–73.

Were MC, Emenyonu N, Achieng M, Shen C, Ssali J, Masaba JP, et al. Evaluating a scalable model for implementing electronic health records in resource-limited settings. J Am Med Inform Assoc. 2010;17(3):237–44.

Konduri N, Sawyer K, Nizova N. User experience analysis of e-TB Manager, a nationwide electronic tuberculosis recording and reporting system in Ukraine. ERJ Open Research. 2017;3:2.

McDonnell E, Probart C. School wellness policies: employee participation in the development process and perceptions of the policies. J Child Nutr Manag. 2008;32:1.

Mersini E, Hyska J, Burazeri G. Evaluation of national food and nutrition policy in Albania. Zdravstveno Varstvo. 2017;56(2):115–23.

Cavagnero E, Daelmans B, Gupta N, Scherpbier R, Shankar A. Assessment of the health system and policy environment as a critical complement to tracking intervention coverage for maternal, newborn, and child health. Lancet. 2008;371 North American Edition(9620):1284-93.

Lehman WE, Greener JM, Simpson DD. Assessing organizational readiness for change. J Subst Abus Treat. 2002;22(4):197–209.

Pankratz M, Hallfors D, Cho H. Measuring perceptions of innovation adoption: the diffusion of a federal drug prevention policy. Health Educ Res. 2002;17(3):315–26.

Cook JM, Thompson R, Schnurr PP. Perceived characteristics of intervention scale: development and psychometric properties. Assessment. 2015;22(6):704–14.

Probart C, McDonnell ET, Jomaa L, Fekete V. Lessons from Pennsylvania’s mixed response to federal school wellness law. Health Aff. 2010;29(3):447–53.

Probart C, McDonnell E, Weirich JE, Schilling L, Fekete V. Statewide assessment of local wellness policies in Pennsylvania public school districts. J Am Diet Assoc. 2008;108(9):1497–502.

Rakic S, Novakovic B, Stevic S, Niskanovic J. Introduction of safety and quality standards for private health care providers: a case-study from the Republic of Srpska, Bosnia and Herzegovina. Int J Equity Health. 2018;17(1):92.

Rozema AD, Mathijssen JJP, Jansen MWJ, van Oers JAM. Sustainability of outdoor school ground smoking bans at secondary schools: a mixed-method study. Eur J Pub Health. 2018;28(1):43–9.

Barbero C, Moreland-Russell S, Bach LE, Cyr J. An evaluation of public school district tobacco policies in St. Louis County, Missouri. J Sch Health. 2013;83(8):525–32.

Williams KM, Kirsh S, Aron D, Au D, Helfrich C, Lambert-Kerzner A, et al. Evaluation of the Veterans Health Administration’s specialty care transformational initiatives to promote patient-centered delivery of specialty care: a mixed-methods approach. Telemed J E-Health. 2017;23(7):577–89.

Spencer E, Walshe K. National quality improvement policies and strategies in European healthcare systems. Quality Safety Health Care. 2009;18(Suppl 1):i22–i7.

Assunta M, Dorotheo EU. SEATCA Tobacco Industry Interference Index: a tool for measuring implementation of WHO Framework Convention on Tobacco Control Article 5.3. Tob Control. 2016;25(3):313–8.

Tummers L. Policy alienation of public professionals: the construct and its measurement. Public Adm Rev. 2012;72(4):516–25.

Tummers L, Bekkers V. Policy implementation, street-level bureaucracy, and the importance of discretion. Public Manag Rev. 2014;16(4):527–47.

Raghavan R, Bright CL, Shadoin AL. Toward a policy ecology of implementation of evidence-based practices in public mental health settings. Implement Sci. 2008;3:26.

Peters D, Harting J, van Oers H, Schuit J, de Vries N, Stronks K. Manifestations of integrated public health policy in Dutch municipalities. Health Promot Int. 2016;31(2):290–302.

Tosun J, Lang A. Policy integration: mapping the different concepts. Policy Studies. 2017;38(6):553–70.

Tubbing L, Harting J, Stronks K. Unravelling the concept of integrated public health policy: concept mapping with Dutch experts from science, policy, and practice. Health Policy. 2015;119(6):749–59.

Donkin A, Goldblatt P, Allen J, Nathanson V, Marmot M. Global action on the social determinants of health. BMJ Glob Health. 2017;3(Suppl 1):e000603-e.

Baum F, Friel S. Politics, policies and processes: a multidisciplinary and multimethods research programme on policies on the social determinants of health inequity in Australia. BMJ Open. 2017;7(12):e017772-e.

Delany T, Lawless A, Baum F, Popay J, Jones L, McDermott D, et al. Health in All Policies in South Australia: what has supported early implementation? Health Promot Int. 2016;31(4):888–98.

Valaitis R, MacDonald M, Kothari A, O'Mara L, Regan S, Garcia J, et al. Moving towards a new vision: implementation of a public health policy intervention. BMC Public Health. 2016;16:412.

Bennett LM, Gadlin H, Marchand, C. Collaboration team science: a field guide. Bethesda, MD: National Cancer Institute, National Institutes of Health; 2018. Contract No.: NIH Publication No. 18-7660.

Mazumdar M, Messinger S, Finkelstein DM, Goldberg JD, Lindsell CJ, Morton SC, et al. Evaluating academic scientists collaborating in team-based research: a proposed framework. Acad Med. 2015;90(10):1302–8.

Brownson RC, Fielding JE, Green LW. Building capacity for evidence-based public health: reconciling the pulls of practice and the push of research. Annu Rev Public Health. 2018;39:27–53.

Brownson RC, Colditz GA, Proctor EK. Future issues in dissemination and implementation research. In: Brownson RC, Colditz GA, Proctor EK, editors. Dissemination and Implementation Research in Health: Translating Science to Practice. Second Edition ed. New York: Oxford University Press; 2018.

Thomson K, Hillier-Brown F, Todd A, McNamara C, Huijts T, Bambra C. The effects of public health policies on health inequalities in high-income countries: an umbrella review. BMC Public Health. 2018;18(1):869.

Download references

Acknowledgements

The authors are grateful for the policy expertise and guidance of Alexandra Morshed and the administrative support of Mary Adams, Linda Dix, and Cheryl Valko at the Prevention Research Center, Brown School, Washington University in St. Louis. We thank Lori Siegel, librarian, Brown School, Washington University in St. Louis, for assistance with search terms and procedures. We appreciate the D&I contributions of Enola Proctor and Byron Powell at the Brown School, Washington University in St. Louis, that informed this review. We thank Russell Glasgow, University of Colorado Denver, for guidance on the overall review and pragmatic measure criteria.

This project was funded March 2019 through February 2020 by the Foundation for Barnes-Jewish Hospital, with support from the Washington University in St. Louis Institute of Clinical and Translational Science Pilot Program, NIH/National Center for Advancing Translational Sciences (NCATS) grant UL1 TR002345. The project was also supported by the National Cancer Institute P50CA244431, Cooperative Agreement number U48DP006395-01-00 from the Centers for Disease Control and Prevention, R01MH106510 from the National Institute of Mental Health, and the National Institute of Diabetes and Digestive and Kidney Diseases award number P30DK020579. The findings and conclusions in this paper are those of the authors and do not necessarily represent the official positions of the Foundation for Barnes-Jewish Hospital, Washington University in St. Louis Institute of Clinical and Translational Science, National Institutes of Health, or the Centers for Disease Control and Prevention.

Author information

Authors and affiliations.

Prevention Research Center, Brown School, Washington University in St. Louis, One Brookings Drive, Campus Box 1196, St. Louis, MO, 63130, USA

Peg Allen, Meagan Pilar, Callie Walsh-Bailey, Stephanie Mazzucca, Maura M. Kepper & Ross C. Brownson

School of Social Work, Brigham Young University, 2190 FJSB, Provo, UT, 84602, USA

Cole Hooley

Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave, Seattle, WA, 98101, USA

Cara C. Lewis, Kayne D. Mettert & Caitlin N. Dorsey

Department of Health Management & Policy, Drexel University Dornsife School of Public Health, Nesbitt Hall, 3215 Market St, Philadelphia, PA, 19104, USA

Jonathan Purtle

Brown School, Washington University in St. Louis, One Brookings Drive, Campus Box 1196, St. Louis, MO, 63130, USA

Ana A. Baumann

Department of Surgery (Division of Public Health Sciences) and Alvin J. Siteman Cancer Center, Washington University School of Medicine, 4921 Parkview Place, Saint Louis, MO, 63110, USA

Ross C. Brownson

You can also search for this author in PubMed   Google Scholar

Contributions

Review methodology and quality assessment scale: CCL, KDM, CND. Eligibility criteria: PA, RCB, CND, KDM, SM, MP, JP. Search strings and terms: CH, PA, MP with review by AB, RCB, CND, CCL, MMK, SM, KDM. Framework selection: PA, AB, CH, MP. Abstract screening: PA, CH, MMK, SM, MP. Full-text screening: PA, CH, MP. Pilot extraction: PA, DNC, CH, KDM, SM, MP. Data extraction: MP, CWB. Data aggregation: MP, CWB. Writing: PA, RCB, JP. Editing: RCB, JP, SM, AB, CD, CH, MMK, CCL, KM, MP, CWB. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Peg Allen .

Ethics declarations

Ethics approval and consent to participate.

Not applicable

Consent for publication

Competing interests.

The authors declare they have no conflicting interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: table s1.

. PRISMA checklist. Table S2 . Electronic search terms for databases searched through EBSCO. Table S3 . Electronic search terms for searches conducted through PROQUEST. Table S4: PAPERS Pragmatic rating scales. Table S5 . PAPERS Psychometric rating scales.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Allen, P., Pilar, M., Walsh-Bailey, C. et al. Quantitative measures of health policy implementation determinants and outcomes: a systematic review. Implementation Sci 15 , 47 (2020). https://doi.org/10.1186/s13012-020-01007-w

Download citation

Received : 24 March 2020

Accepted : 05 June 2020

Published : 19 June 2020

DOI : https://doi.org/10.1186/s13012-020-01007-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Implementation science
  • Health policy
  • Policy implementation
  • Implementation
  • Public policy
  • Psychometric

Implementation Science

ISSN: 1748-5908

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

quantitative research methods peer reviewed articles

  • Open access
  • Published: 03 October 2022

Quantitative data collection approaches in subject-reported oral health research: a scoping review

  • Carl A. Maida 1 ,
  • Di Xiong 1 , 2 ,
  • Marvin Marcus 1 ,
  • Linyu Zhou 1 , 2 ,
  • Yilan Huang 1 , 2 ,
  • Yuetong Lyu 1 , 2 ,
  • Jie Shen 1 ,
  • Antonia Osuna-Garcia 3 &
  • Honghu Liu 1 , 2 , 4  

BMC Oral Health volume  22 , Article number:  435 ( 2022 ) Cite this article

4688 Accesses

3 Citations

Metrics details

This scoping review reports on studies that collect survey data using quantitative research to measure self-reported oral health status outcome measures. The objective of this review is to categorize measures used to evaluate self-reported oral health status and oral health quality of life used in surveys of general populations.

The review is guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) with the search on four online bibliographic databases. The criteria include (1) peer-reviewed articles, (2) papers published between 2011 and 2021, (3) only studies using quantitative methods, and (4) containing outcome measures of self-assessed oral health status, and/or oral health-related quality of life. All survey data collection methods are assessed and papers whose methods employ newer technological approaches are also identified.

Of the 2981 unduplicated papers, 239 meet the eligibility criteria. Half of the papers use impact scores such as the OHIP-14; 10% use functional measures, such as the GOHAI, and 26% use two or more measures while 8% use rating scales of oral health status. The review identifies four data collection methods: in-person, mail-in, Internet-based, and telephone surveys. Most (86%) employ in-person surveys, and 39% are conducted in Asia-Pacific and Middle East countries with 8% in North America. Sixty-six percent of the studies recruit participants directly from clinics and schools, where the surveys were carried out. The top three sampling methods are convenience sampling (52%), simple random sampling (12%), and stratified sampling (12%). Among the four data collection methods, in-person surveys have the highest response rate (91%), while the lowest response rate occurs in Internet-based surveys (37%). Telephone surveys are used to cover a wider population compared to other data collection methods. There are two noteworthy approaches: 1) sample selection where researchers employ different platforms to access subjects, and 2) mode of interaction with subjects, with the use of computers to collect self-reported data.

The study provides an assessment of oral health outcome measures, including subject-reported oral health status and notes newly emerging computer technological approaches recently used in surveys conducted on general populations. These newer applications, though rarely used, hold promise for both researchers and the various populations that use or need oral health care.

Peer Review reports

A fundamentally different approach is currently needed to address the oral health of populations worldwide namely by considering the perspective of patients or populations and not only dental professionals' views [ 1 ]. It seems increasingly necessary to integrate the self-reported perceptions of oral health, as they can complete or even replace clinical measures of dental status in surveys of populations. Indeed, such subjective measures are easy to use in large-scale populations and can provide a broader health perspective as compared to clinically determined measures of dental status alone [ 2 , 3 ]. Since the topic is broad, this scoping review sets out to identify methods employed in population surveys that discussed self-reported perceptions of oral health, and the extent to which new computer-oriented technological approaches are being incorporated in the research methods.

The literature on oral health and dental-related scoping and systematic reviews includes studies that use specific populations in terms of disease or clinical conditions, treatments, political or social status and typically do not explore oral health status outcome measures [ 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]. These studies only occasionally provide perspectives on general populations. A review by Mittal et al. identifies dental Patient-Reported Outcomes (dPROs), and dental Patient Reported Outcome Measures (dPROMs) related to oral function, oral-facial pain, orofacial pain and psychosocial impact [ 16 ]. The study affords a valuable and extensive review of self-reported oral health and quality of life measures, many of which are found in this paper. This scoping review, then, seeks approaches used in subject-reported surveys, including those with general populations, which may broaden the perspective on oral health outcome measures.

The objective of this review is to categorize measures used to evaluate self-reported oral health status and oral health quality of life used in surveys of general populations.

This work is implemented following the framework of scoping reviews [ 17 , 18 , 19 ] and is presented according to the recommendations of the Preferred Reporting of Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR), as listed in Additional file 1 : Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist [ 20 ]. Additional file 5 : Glossary of Terms provides definitions for the important terms used across the paper.

Search strategy and data sources

A health science librarian assisted in the development of a search strategy that identified papers concerning subject-reported oral health status surveys. The search terms consisted of three broad categories, including survey methods, subject-reported outcomes, and oral health and disease (see Additional file 2 : Search Terms for the full list of search strings). The search comprised peer-reviewed journal articles, conference proceedings and reviews with at least one keyword from each of three aspects. Four online databases: Ovid Medline, Embase, Web of Science, and Cochrane Reviews and Trials were used. In addition, a manual search used similar keywords for the gray literature achieved on MedRxiv. The search focused on peer-reviewed papers written in English and published between 2011 to September 2021. Publications in the last decade were reviewed to investigate the extent to which different methods were being used and the trends that occurred during this period. The final search was completed on September 29, 2021. Using the current decade provides a period where there is considerable interest in non-clinical oral health status outcome measures and the potential for examining technological innovation. All references were imported for review and appraisal. Duplicates were identified using Mendeley (Mendeley, London, UK) and manually verified. After removing the duplicates, data were tabulated in Microsoft Excel (Microsoft, Redmond, WA, USA) for recording screening results and data charting.

Study inclusion criteria

Studies that did not meet with the research objective were excluded using a screening tool (Additional file 3 : Search Tool). First, the titles and abstracts of publications were screened to determine if studies conducted quantitative surveys, and to assess if self-reported and/or proxy-reported OHS was a primary objective. Only surveys with more than three questions that related to OHS were considered. Studies with secondary analysis were excluded because the data collection methods were normally not developed as part of the research and were developed previously. Papers whose sole purpose was to validate well-known measures of oral health were also rejected since the intent was not to assess the OHS of a population. Literature reviews were likewise excluded, as were papers describing results from focus groups and other qualitative studies. Papers whose objectives were to validate measures or predict specific oral disease entities, such as caries or gingival bleeding, rather than overall OHS. Studies primarily focusing on general health status or other systemic diseases instead of OHS were eliminated. Randomized Controlled Trials (RCT) or quasi RCT studies that tested an active agent (e.g., therapy, experiment, and medicine) were excluded because the main research purpose was a comparison of treatment rather than an assessment of subject-reported OHS.

The research team performed the secondary screening through a full-text review. We dropped papers with full text missing or not in English. Then, we screened the available full-text works using a similar set of inclusion criteria aforementioned and further excluded papers without information about data collection methods.

Selection strategies

Figure  1 outlines the review process utilizing the PRISMA-ScR framework. The title-and-abstract screening was completed by a researcher (D.X.) against the inclusion criteria using a screening tool (Additional file 3 : Search Tool). To check for reliability and consistency, one of the researchers (L.Z.) randomly screened 10% of articles independently and compared the inclusion decisions. Given the result of title-and-abstract screening, two researchers (L.Z. and Y.H.) verified the eligibility of the remaining articles independently through full-text review. Inclusion discrepancies were resolved by an additional researcher (D.X.).

figure 1

PRISMA framework with additional examples

Data extraction

The data charting form (Additional file 4 : Data Charting Form) consists of quantitative and qualitative variables for the data collection methods and their characteristics, such as outcome measures, use of assistive devices/tools or data sources, report type, and so on. The form has been pre-tested by two project staff (C.M. and M.M.) before being utilized. Two researchers (Y.H. and L.Z.) extracted data using the form. Two project staff (C.M. and M.M.) collaborated to review the charted study characteristics and the discrepancies have been addressed through discussion.

Data synthesis and analysis

The scoping review synthesizes the research findings based on dimensions and attributes of major oral health survey data collection methods using descriptive and content analyses. The review provides an overview of various related data collection methods in the recent literature, which refer to the quantitative methods to collect information from a pool of respondents, and the trends in using these new technological approaches, which involve computerized modes, Internet-supported devices and interactive web technologies. Through the literature review, we locate four major types of data collection methods: in-person, Internet-based, telephone-based, and mail-in based approaches.

Screening and study selection

After removing duplicates, the initial search revealed 2981 articles from four online databases for title-and-abstract screening; 2503 of which were excluded after being examined against the inclusion criteria. The interrater reliability of screening was measured by Kappa agreement as 0.94 (95% confidence interval [0.89, 0.99]) for title-and-abstract screening, which implies almost perfect agreement [ 21 ]. After full-text reviewing and excluding 239 articles, we summarized and categorized the remaining 239 studies based on the pre-tested data charting form. In addition, we identified 12 studies with various technological approaches to data collection. Figure  1 presented the PRISMA Framework used for this scoping review.

General characteristics of included studies

Table 1 presents various characteristics of the 239 articles that meet inclusion criteria that were published from 2011 to September 2021. Fifty-six percent of the papers are published in dental journals. About 40% of the papers are published in journals from the Asia-Pacific and Middle East region (APAC), and only 8.4% are from North America (NA). The majority of studies (69%) focus on the general population. Most (88.6%) of the studies use in-person surveys. Around two-thirds of the studies invite and recruit participants from the study sites, e.g., schools, clinics, and hospitals. Some studies recruit participants by having the research team visit communities (16%) or by sampling directly from a database (13%). In the latter case, participants are selected using probability and/or non-probability sampling methods, including convenience sampling (52%), simple random sampling (12%), and stratified sampling (12%). Most studies (193 or 80.8%) investigate self-reported outcomes. Dental examinations accompany the survey in 54% of the studies, while 32% of studies do not use any clinical exam or records. The data charting details are listed in Additional file 4 : Data Charting Form.

Characteristics of data collection methods

The four main data collection methods include in-person (N = 206, 86.2%), mail-in (N = 15, 6.3%), Internet-based (N = 6, 2.5%), and telephone-based (N = 3, 1.3%) surveys. The characteristics of the various data collection methods are summarized in Table 2 .

The majority of the studies using in-person surveys have high response rates with an average of 90.6%. Those studies using in-person survey methods represent half 55.8% of the studies employ face-to-face interviews, while 35.4% used a paper-and-pencil approach. Participants for 58.7% of the studies are recruited directly from clinics [ 22 ], hospitals [ 23 ], and community care centers [ 24 ]. For those sites with electronic records, additional data sources are directly linked to the survey, for example, clinical dental exams with visual components (e.g., X-ray [ 25 ] and pictures [ 26 ]) and medical records [ 23 , 27 , 28 ]. Moreover, different qualitative assessments (e.g., Malocclusion Assessment [ 22 ] and Masticatory Performance Test [ 24 , 29 ]) are captured in patient progress notes.

The mail-in survey method is used by 15 studies and may be more cost-effective than in-person delivery, though these were the two main sources, via post (80%) and by carriers (20%). Mail-in surveys have a relatively high response rate averaging 72%, especially when children or other respondents bring surveys home to complete. Similar to in-person surveys, mail-in surveys can incorporate additional resources, such as photographs and explanations of clinical conditions and treatments [ 30 , 31 ].

Only six studies are identified as using an Internet-based survey, mainly through computer-assisted web interviews (4 studies), and email (2 studies). Three papers employ direct recruitment and another three papers recruit participants through websites and databases. The average response rate is as low as 36.7% for this method with small sample sizes with a median of 259 participants.

Three studies use a telephone survey method covering large populations compared to other survey methods with more responders on average. Two of these studies recruit participants through an existing database, and all surveys used interviewers. Computer-Assisted Telephone Interviews (CATI) [ 32 ] and Voice Response Systems [ 33 ] which are commonly used in industry are not found in the studies.

In addition to the data collection methods, we further categorize the measures found in the 239 articles. Table 3 presents the frequencies and percentages of the various self-reported outcome measures. The three basic approaches are oral health impact measures [ 34 ], functional measures [ 34 ], and self- or proxy-ratings of OHS, with the terms defined in Additional file 5 : Glossary of Terms. These are used as single measures or in combination. The Oral Health Impact Profile-14 (OHIP-14) is the most prevalent single measure with 69 papers and 29% overall, of which 25 papers are about child impact, representing 10% of the total number of selected papers. The Geriatric Oral Health Assessment Index (GOHAI), a functional measure, is second with 21 papers and 9% overall. The GOHAI is the first among the studies on the elderly. There are also two adolescent papers representing 9% of the functioning category. The self- or proxy-rating of OHS has 18 single-measure papers representing 8% of these articles. Of these, 12 or 80% are children's measure's, representing 5% of all selected papers.

There is a total of 63 papers using more than one type of measure. Either combining functional and impact measures (36 and 15%) or self-rating OHS and one or more of the other measures (27 or 11%). The group of single impact measures is 50% of the overall and also represents where two or more measures were used. The single measure, GOHAI, based on function is only 9% of all measures but also played a role in combination with other measures. Finally, the self-reported OHS as a single measure represents 8% of the studies. Its role is mainly in combination with other measures and represented another 15% of the articles. In total. children's oral health measures form a considerable portion of the self-reported oral health outcome research papers, representing 16% of all studies. There are additional studies where children’s measures are used in combination with adults.

Currently, the use of technological approaches emerged in the field of survey research to improve the quality and quantity of data collection. After reviewing and charting all qualified 239 articles, twelve studies that employ technological approaches are summarized in Table 4 .

This scoping review provides an overview of data collection methods used for subject-reported surveys to measure oral health outcomes. Studies are characterized by four survey methods (in-person, mail-in, Internet-based, and telephone) and by summarized dimensions and attributes of data collection for each method, such as technological approaches, survey population or sampling methods. Studies typically employ in-person surveys and more studies were conducted in Asia-Pacific and Middle East countries than in any other world region. Most studies recruit participants directly from study sites. Both probability and non-probability sampling methods employ typically convenience sampling, simple random sampling, and stratified sampling. Studies that achieve the highest response rate on average use in-person surveys, while the lowest rate occurs in Internet-based surveys. Telephone surveys are used to cover a wider population compared to other data collection methods. Many studies, especially those using in-person and mail-in data collection methods, incorporate supplemental data types and technological approaches. Outcome measures are frequently used to evaluate impacts caused by functional limitations related to physical, psychological, and social factors.

Frequently used self-reported oral health status and OHRQoL measures are OHIP-14, an impact measure, and the GOHAI, a functional measure. Children’s oral health outcomes measures form a considerable portion of the self-reported oral health outcome research papers. Although OHIP-14 is the most utilized single measure, many other papers use only portions of this measure, while adding other outcome measures, such as dental care needs, satisfaction, oral health status, and so on. The validity of these measures is therefore compromised and could not provide insight into the degree that the studies are measuring self-reported oral health status or quality of life [ 4 ]. Other measures rate an individual’s oral health status using a simple self-rating scale, from very poor to excellent. This approach is more directly related to a person’s oral conditions and therefore their perceptions and behavior tend to be more consistent with this rating [ 56 , 57 ]. These self-rating measures focus on the overall dimension of perceived oral health status. Unlike the measures previously discussed, these simple ratings do not delineate the psychological, social and physical dimensions of oral health. Nevertheless, such measures can enable researchers to identify hidden dimensions by analyzing independent variables that account for the respondent’s perception.

This review identifies research that employs more conventional methods. The face-to-face interview and the pencil and paper format are conventionally used in many studies along with a clinical dental exam. While offering unique flexibility and easier administration, in-person approaches are more labor-intensive and normally take more time compared to other methods. Countries, such as Brazil, rely for years on these techniques to develop national epidemiological oral health surveys [ 28 , 58 , 59 ]. Although these surveys are very well-organized and established throughout the country, this review does not find that newer technological approaches are introduced into their conventional approach. In this case, there may be little incentive to change their approach because their methods are well understood and employing more technological approaches may be costly.

The use of Internet-based surveys is increasingly common in the medical field. Although these surveys end with potentially lower response rates, this approach is normally more cost-effective [ 60 ]. Internet-based surveys have many notable advantages, including easy administration, fast data collection process, lower cost, wider population coverage and better data quality with fewer overall data errors and fewer missing items [ 61 , 62 , 63 ]. However, this data collection method is constrained by sample bias, topic salience, data security concerns and low digital literacy that may affect response rates [ 62 ]. In settings where Internet-based surveys are not practical, longstanding and effective conventional oral health data collection methods in research will continue. It is evident from this review that the use of computerized technological approaches is limited. While such approaches in survey research improve the quality and quantity of data collection, only twelve studies in this review employ them. The most widely used technical approaches are Computer-Assisted Personal Interviewing (CAPI) and online survey platforms (e.g., Google Forms and SurveyMonkey).

Two noteworthy approaches to survey research methodology emerge from this review, particularly in: (1) sample selection, and (2) mode of interaction with research subjects. North American researchers found different platforms to access subjects for their studies. Canadian studies use random digit dialing to recruit and conduct computer-assisted interviews [ 54 ]. In the United States, researchers access existing polling populations or use Amazon’s MTurk platform for “workers” who are paid small amounts for each survey they respond to [ 64 ]. The second approach is the use of computers to collect self-reported data. The basic surveying technique is CAPI with interviewers directly entering the data into a database. There is also Computer-Assisted Telephone Interviewing (CATI), a survey technique, where the interviewer follows a scripted interview guided by a questionnaire that appears on the screen. A third Internet-based survey technique, the Computer-Assisted Web Interviewing (CAWI), requires no live interviewer. Instead, the respondent follows a script made in a program for designing web interviews that may include images, audio and video clips, and web-based information.

An innovative technological approach worth noting is the use of OralCam to perform self-examination using a smartphone camera [ 65 ]. The study applies research used in medicine to detect liver problems from face photos as well as other diseases [ 66 ]. The paper describes the use of a smartphone camera to interact with a computer using diagnostic algorithms, such as the deep convolutional neural network-based multitask learning approach. Based on over three thousand intraoral photos, the system learns to analyze teeth and gingiva. The smartphone camera takes a picture using a mouth opener. The computer’s algorithms analyze the captured picture, along with survey data, to diagnose several dental conditions including caries, chronic gingival inflammation, and dental calculus. This use of multitask learning technology, with the extensive availability of cell phones, may revolutionize oral health research and care.

This scoping review is limited to oral health survey-based studies in peer-reviewed journals and MedRxiv published in English between 2011 to 2021. A further limitation is that many of the reviewed papers do not adequately describe the methods they use to collect data. Publications using secondary data from national studies are excluded, The exclusion is based on the fact that these researchers are not engaged in designing the methods or conducting the data collection. Often, the publications refer to the original study to describe the method used. Also, the original data collection may have occurred before the time frame of this review. The fifteen papers that use secondary data published over this study’s time frame represent only about six percent of the reviewed papers. Thus, the overall impact of this exclusion is minimal on this scoping review’s results.

Conclusions

This scoping review provides an assessment of oral health outcome measures, including subject-reported oral health status, and notes newly emerging computer technological approaches recently used in surveys conducted on general populations. Such technological approaches, although rarely used in the reviewed studies, hold promise for both researchers and the various populations that use or need oral health care. Future studies employing more developed computer applications for survey research to boost recruitment and participation of study subjects with wide and diverse backgrounds from almost unlimited geographic areas can then provide a broader perspective on oral health survey methods and outcomes.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews

Asia-Pacific (including the Middle East)

Latin America

North America

Computer-Assisted Personal Interviewing

Computer-Assisted Web Interviewing

Computer-Assisted Telephone Interview

Oral Health-Related Quality of Life

Oral Health Impact Profile-14

Geriatric Oral Health Assessment Index

Oral Health Status

Watt RG, Daly B, Allison P, Macpherson LMD, Venturelli R, Listl S, et al. Ending the neglect of global oral health: time for radical action. Lancet. 2019;394:261–72. https://doi.org/10.1016/S0140-6736(19)31133-X .

Article   PubMed   Google Scholar  

Liu H, Hays R, Wang Y, Marcus M, Maida C, Shen J, et al. Short form development for oral health patient-reported outcome evaluation in children and adolescents. Qual Life Res. 2018;27:1599–611. https://doi.org/10.1007/S11136-018-1820-9 .

Article   PubMed   PubMed Central   Google Scholar  

Wang Y, Hays R, Marcus M, Maida C, Shen J, Xiong D, et al. Development of a parents’ short form survey of their children’s oral health. Int J Pediatr Dent. 2019;29:332–44. https://doi.org/10.1111/ipd.12453 .

Article   Google Scholar  

Yang C, Crystal YO, Ruff RR, Veitz-Keenan A, McGowan RC, Niederman R. Quality appraisal of child oral health-related quality of life measures: a scoping review. JDR Clin Transl Res. 2020;5:109–17. https://doi.org/10.1177/2380084419855636 .

Gupta M, Bosma H, Angeli F, Kaur M, Chakrapani V, Rana M, et al. A mixed methods study on evaluating the performance of a multi-strategy national health program to reduce maternal and child health disparities in Haryana. India BMC Public Health. 2017;17:698. https://doi.org/10.1186/s12889-017-4706-9 .

Keboa MT, Hiles N, Macdonald ME. The oral health of refugees and asylum seekers: a scoping review. Glob Health. 2016;12:1–11. https://doi.org/10.1186/S12992-016-0200-X/TABLES/2 .

Wilson NJ, Lin Z, Villarosa A, Lewis P, Philip P, Sumar B, et al. Countering the poor oral health of people with intellectual and developmental disability: a scoping literature review. BMC Public Health. 2019;19:1–16. https://doi.org/10.1186/S12889-019-7863-1/TABLES/1 .

Ajwani S, Jayanti S, Burkolter N, Anderson C, Bhole S, Itaoui R, et al. Integrated oral health care for stroke patients—a scoping review. J Clin Nurs. 2017;26:891–901. https://doi.org/10.1111/JOCN.13520 .

Shrestha AD, Vedsted P, Kallestrup P, Neupane D. Prevalence and incidence of oral cancer in low- and middle-income countries: a scoping review. Eur J Cancer Care. 2020;29:66. https://doi.org/10.1111/ECC.13207 .

Patterson-Norrie T, Ramjan L, Sousa MS, Sank L, George A. Eating disorders and oral health: a scoping review on the role of dietitians. J Eat Disord. 2020;8:1–21. https://doi.org/10.1186/S40337-020-00325-0/TABLES/1 .

Lansdown K, Smithers-Sheedy H, Mathieu Coulton K, Irving M. Oral health outcomes for people with cerebral palsy: a scoping review protocol. JBI Database System Rev Implement Rep 2019;17:2551–8. https://doi.org/10.11124/JBISRIR-2017-004037 .

Beaton L, Humphris G, Rodriguez A, Freeman R. Community-based oral health interventions for people experiencing homelessness: a scoping review. Community Dent Health. 2020;37:150–60. https://doi.org/10.1922/CDH_00014BEATON11 .

Marquillier T, Lombrail P, Azogui-Lévy S. Social inequalities in oral health and early childhood caries: How can they be effectively prevented? A scoping review of disease predictors. Rev Epidemiol Sante Publique. 2020;68:201–14. https://doi.org/10.1016/J.RESPE.2020.06.004 .

Como DH, Duker LIS, Polido JC, Cermak SA. The persistence of oral health disparities for African American children: a scoping review. Int J Environ Res Public Health. 2019;16:66. https://doi.org/10.3390/IJERPH16050710 .

Stein K, Farmer J, Singhal S, Marra F, Sutherland S, Quiñonez C. The use and misuse of antibiotics in dentistry: a scoping review. J Am Dent Assoc. 2018;149:869-884.e5. https://doi.org/10.1016/J.ADAJ.2018.05.034 .

Mittal H, John MT, Sekulić S, Theis-Mahon N, Rener-Sitar K. Patient-reported outcome measures for adult dental patients: a systematic review. J Evid Based Dent Pract. 2019;19:53–70. https://doi.org/10.1016/J.JEBDP.2018.10.005 .

Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Method Theory Pract. 2005;8:19–32. https://doi.org/10.1080/1364557032000119616 .

Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implement Sci. 2010;5:69. https://doi.org/10.1186/1748-5908-5-69 .

Paré G, Trudel M-C, Jaana M, Kitsiou S. Synthesizing information systems knowledge: a typology of literature reviews. Inf Manag. 2015. https://doi.org/10.1016/j.im.2014.08.008 .

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169:467–73. https://doi.org/10.7326/M18-0850 .

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159. https://doi.org/10.2307/2529310 .

Masood M, Masood Y, Newton T, Lahti S. Development of a conceptual model of oral health for malocclusion patients. Angle Orthod. 2015;85:1057–63. https://doi.org/10.2319/081514-575.1 .

Massarente DB, Domaneschi C, Marques HHS, Andrade SB, Goursand D, Antunes JLF. Oral health-related quality of life of paediatric patients with AIDS. BMC Oral Health. 2011;11:2. https://doi.org/10.1186/1472-6831-11-2 .

Lu TY, Chen JH, Du JK, Lin YC, Ho PS, Lee CH, et al. Dysphagia and masticatory performance as a mediator of the xerostomia to quality of life relation in the older population. BMC Geriatr. 2020;20:66. https://doi.org/10.1186/S12877-020-01901-4 .

Strömberg E, Holmèn A, Hagman-Gustafsson ML, Gabre P, Wardh I. Oral health-related quality-of-life in homebound elderly dependent on moderate and substantial supportive care for daily living. Acta Odontol Scand. 2013;71:771–7. https://doi.org/10.3109/00016357.2012.734398 .

Morgan JP, Isyagi M, Ntaganira J, Gatarayiha A, Pagni SE, Roomian TC, et al. Building oral health research infrastructure: the first national oral health survey of Rwanda. Glob Health Act. 2018;11:66. https://doi.org/10.1080/16549716.2018.1477249 .

Preciado A, del Río J, Suárez-García MJ, Montero J, Lynch CD, Castillo-Oyagüe R. Differences in impact of patient and prosthetic characteristics on oral health-related quality of life among implant-retained overdenture wearers. J Dent. 2012;40:857–65. https://doi.org/10.1016/J.JDENT.2012.07.006 .

de Quadros Coelho M, Cordeiro JM, Vargas AMD, de Barros Lima Martins AME, de Almeida Santa Rosa TT, Senna MIB, et al. Functional and psychosocial impact of oral disorders and quality of life of people living with HIV/AIDS. Qual Life Res. 2015;24:503–11. https://doi.org/10.1007/S11136-014-0778-5 .

Said M, Otomaru T, Aimaijiang Y, Li N, Taniguchi H. Association between masticatory function and oral health-related quality of life in partial maxillectomy patients. Int J Prosthodont. 2016;29:561–4. https://doi.org/10.11607/IJP.4852 .

Owens J, Jones K, Marshman Z. The oral health of people with learning disabilities—a user–friendly questionnaire survey. Community Dent Health. 2017;34:4–7. https://doi.org/10.1922/CDH_3867OWENS04 .

Abuzar MA, Kahwagi E, Yamakawa T. Investigating oral health-related quality of life and self-perceived satisfaction with partial dentures. J Investig Clin Dent. 2012;3:109–17. https://doi.org/10.1111/J.2041-1626.2012.00111.X .

Wilson D, Taylor A, Chittleborough C. The second Computer Assisted Telephone Interview (CATI) Forum: the state of play of CATI survey methods in Australia. Aust NZ J Public Health. 2001;25:272–4. https://doi.org/10.1111/J.1467-842X.2001.TB00576.X .

Lee H, Friedman ME, Cukor P, Ahern D. Interactive voice response system (IVRS) in health care services. Nurs Outlook. 2003;51:277–83. https://doi.org/10.1016/S0029-6554(03)00161-1 .

Campos JADB, Zucoloto ML, Bonafé FSS, Maroco J. General Oral Health Assessment Index: a new evaluation proposal. Gerodontology. 2017;34:334–42. https://doi.org/10.1111/GER.12270 .

Slade GD. Derivation and validation of a short-form oral health impact profile. Community Dent Oral Epidemiol. 1997;25:284–90. https://doi.org/10.1111/J.1600-0528.1997.TB00941.X .

Adulyanon S, Vourapukjaru J, Sheiham A. Oral impacts affecting daily performance in a low dental disease Thai population. Community Dent Oral Epidemiol. 1996;24:385–9. https://doi.org/10.1111/J.1600-0528.1996.TB00884.X .

Pahel BT, Rozier RG, Slade GD. Parental perceptions of children’s oral health: the Early Childhood Oral Health Impact Scale (ECOHIS). Health Qual Life Outcomes. 2007;5:6. https://doi.org/10.1186/1477-7525-5-6 .

Gherunpong S, Tsakos G, Sheiham A. Developing and evaluating an oral health-related quality of life index for children: the CHILD-OIDP. Undefined. 2004;21:161–9.

Google Scholar  

Slade GD, Spencer AJ. Development and evaluation of the Oral Health Impact Profile. Community Dent Health. 1994;11:3–11.

PubMed   Google Scholar  

Broder HL, Wilson-Genderson M. Reliability and convergent and discriminant validity of the Child Oral Health Impact Profile (COHIP Child’s version). Community Dent Oral Epidemiol. 2007;35(Suppl 1):20–31. https://doi.org/10.1111/J.1600-0528.2007.0002.X .

Atieh MA. Arabic version of the geriatric oral health assessment Index. Gerodontology. 2008;25:34–41. https://doi.org/10.1111/j.1741-2358.2007.00195.x .

Wright WG, Spiro A, Jones JA, Rich SE, Garcia RI. Development of the teen oral health-related quality of life instrument. J Public Health Dent. 2017;77:115–24. https://doi.org/10.1111/JPHD.12181 .

Jokovic A, Locker D, Tompson B, Guyatt G. Questionnaire for measuring oral health-related quality of life in eight- to ten-year-old children. Undefined. 2004;26:512–8.

Jokovic A, Locker D, Stephens M, Kenny D, Tompson B, Guyatt G. Validity and reliability of a questionnaire for measuring child oral-health-related quality of life. J Dent Res. 2002;81:459–63. https://doi.org/10.1177/154405910208100705 .

Broughton JR, TeH Maipi J, Person M, Randall A, Thomson WM. Self-reported oral health and dental service-use of rangatahi within the rohe of Tainui. NZ Dent J. 2012;108:90–4.

Monaghan N, Karki A, Playle R, Johnson I, Morgan M. Measuring oral health impact among care home residents in Wales. Community Dent Health. 2017;34:14–8. https://doi.org/10.1922/CDH_3950MORGAN05 .

Echeverria MS, Silva AER, Agostini BA, Schuch HS, Demarco FF. Regular use of dental services among university students in southern Brazil. Revista de Saude Publica 2020;54:85. https://doi.org/10.11606/S1518-8787.2020054001935 .

Mohamad Fuad MA, Yacob H, Mohamed N, Wong NI. Association of sociodemographic factors and self-perception of health status on oral health-related quality of life among the older persons in Malaysia. Geriatr Gerontol Int. 2020;20(Suppl 2):57–62. https://doi.org/10.1111/GGI.13969 .

Hanisch M, Wiemann S, Bohner L, Kleinheinz J, Susanne SJ. Association between oral health-related quality of life in people with rare diseases and their satisfaction with dental care in the health system of the Federal Republic of Germany. Int J Environ Res Public Health. 2018. https://doi.org/10.3390/IJERPH15081732 .

Nam SH, Kiml HY, IlChun D. Influential factors on the quality of life and dental health of university students in a specific area. Biomed Res. 2017;28:12.

Mortimer-Jones S, Stomski N, Cope V, Maurice L, Théroux J. Association between temporomandibular symptoms, anxiety and quality of life among nursing students. Collegian. 2019;26:373–7. https://doi.org/10.1016/J.COLEGN.2018.10.003 .

Liu C, Zhang S, Zhang C, Tai B, Jiang H, Du M. The impact of coronavirus lockdown on oral healthcare and its associated issues of pre-schoolers in China: an online cross-sectional survey. BMC Oral Health. 2021;21:66. https://doi.org/10.1186/s12903-021-01410-9 .

Makizodila BAM, van de Wijdeven JHE, de Soet JJ, van Selms MKA, Volgenant CMC. Oral hygiene in patients with motor neuron disease requires attention: a cross-sectional survey study. Spec Care Dent. 2021. https://doi.org/10.1111/SCD.12636 .

Kotzer RD, Lawrence HP, Clovis JB, Matthews DC. Oral health-related quality of life in an aging Canadian population. Health Qual Life Outcomes. 2012;10:50. https://doi.org/10.1186/1477-7525-10-50 .

Hakeberg M, Wide U. General and oral health problems among adults with focus on dentally anxious individuals. Int Dent J. 2018;68:405–10. https://doi.org/10.1111/IDJ.12400 .

Lawal FB, Olawole WO, Sigbeku OF. Self rating of oral health status by student dental surgeon assistants in Ibadan, Nigerian—a Pilot Survey. Ann Ibadan Postgrad Med. 2013;11:12.

Locker D, Wexler E, Jokovic A. What do older adults’ global self-ratings of oral health measure? J Public Health Dent. 2005;65:146–52. https://doi.org/10.1111/J.1752-7325.2005.TB02804.X .

Saintrain MVDL, de Souza EHA. Impact of tooth loss on the quality of life. Gerodontology. 2012;29:66. https://doi.org/10.1111/J.1741-2358.2011.00535.X .

Grando LJ, Mello ALSF, Salvato L, Brancher AP, del Moral JAG, Steffenello-Durigon G. Impact of leukemia and lymphoma chemotherapy on oral cavity and quality of life. Spec Care Dent. 2015;35:236–42. https://doi.org/10.1111/SCD.12113 .

Ebert JF, Huibers L, Christensen B, Christensen MB. Paper- or Web-Based Questionnaire Invitations as a method for data collection: cross-sectional comparative study of differences in response rate, completeness of data, and financial cost. J Med Internet Res. 2018;20:66. https://doi.org/10.2196/JMIR.8353 .

Hohwü L, Lyshol H, Gissler M, Jonsson SH, Petzold M, Obel C. Web-based versus traditional paper questionnaires: a mixed-mode survey with a Nordic perspective. J Med Internet Res. 2013;15:66. https://doi.org/10.2196/JMIR.2595 .

Maymone MBC, Venkatesh S, Secemsky E, Reddy K, Vashi NA. Research techniques made simple: Web-Based Survey Research in Dermatology: conduct and applications. J Invest Dermatol. 2018;138:1456–62. https://doi.org/10.1016/J.JID.2018.02.032 .

Weigold A, Weigold IK, Natera SN. Response rates for surveys completed with paper-and-pencil and computers: using meta-analysis to assess equivalence. Soc Sci Comput Rev. 2018;37:649–68. https://doi.org/10.1177/0894439318783435 .

Burnham MJ, Le YK, Piedmont RL. Who is Mturk? Personal characteristics and sample consistency of these online workers. Ment Health Relig Cult. 2018;21:934–44. https://doi.org/10.1080/13674676.2018.1486394 .

Liang Y, Fan HW, Fang Z, Miao L, Li W, Zhang X, et al. OralCam: enabling self-examination and awareness of oral health using a smartphone camera. In: Conference on human factors in computing systems—proceedings, vol 20. New York: Association for Computing Machinery; 2020. p. 1–13. https://doi.org/10.1145/3313831.3376238 .

Ding X, Jiang Y, Qin X, Chen Y, Zhang W, Qi L. Reading face, reading health: Exploring face reading technologies for everyday health. In: Conference on human factors in computing systems—proceedings. New York: Association for Computing Machinery; 2019. p. 1–13. https://doi.org/10.1145/3290605.3300435 .

Download references

Acknowledgements

Not applicable.

This research was supported by an NIDCR/NIH grant to the University of California, Los Angeles (UCLA) (U01DE029491). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

Division of Oral and Systemic Health Sciences, School of Dentistry, University of California, Los Angeles, 10833 Le Conte Ave, Los Angeles, CA, USA

Carl A. Maida, Di Xiong, Marvin Marcus, Linyu Zhou, Yilan Huang, Yuetong Lyu, Jie Shen & Honghu Liu

Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, 650 Charles E Young Drive South, Los Angeles, CA, USA

Di Xiong, Linyu Zhou, Yilan Huang, Yuetong Lyu & Honghu Liu

Louise M. Darling Biomedical Library, University of California, Los Angeles, 12-077 Center for Health Sciences, Los Angeles, CA, USA

Antonia Osuna-Garcia

Division of General Internal Medicine and Health Services Research, Geffen School of Medicine, University of California, Los Angeles, 10833 Le Conte Ave, Los Angeles, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

C.M., D.X., M.M. and H.L. conceptualized the study and designed the data collection form and established the data analysis plan. A.O. developed search strategies and carried out searching on multiple databases. D.X., Y.L., Y.H., J.S. and Y.L. performed additional searching and tested the data charting form. D.X., Y.L., and Y.H. helped to screen studies for relevance and data charting. C.M. and M.M. reviewed full-text papers and verify the data charting results. C.M., D.X., and M.M. drafted the original manuscript. D.X. and L.Z. prepared Tables 1 , 2 and Fig.  1 . C.M., D.X., M.M., and L.Z prepared Tables 3 and 4 . All authors read and provided substantial comments/edits on the manuscript and approved the final version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Honghu Liu .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist.

Additional file 2.

Search Terms.

Additional file 3.

Search Tool.

Additional file 4.

 Data Charting Form.

Additional file 5.

Glossary of Terms.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Maida, C.A., Xiong, D., Marcus, M. et al. Quantitative data collection approaches in subject-reported oral health research: a scoping review. BMC Oral Health 22 , 435 (2022). https://doi.org/10.1186/s12903-022-02399-5

Download citation

Received : 31 December 2021

Accepted : 17 August 2022

Published : 03 October 2022

DOI : https://doi.org/10.1186/s12903-022-02399-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Quantitative research
  • Patient-reported outcomes
  • Dental disease experience
  • Oral health-related quality of life
  • Data collection

BMC Oral Health

ISSN: 1472-6831

quantitative research methods peer reviewed articles

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is Quantitative Research? | Definition, Uses & Methods

What Is Quantitative Research? | Definition, Uses & Methods

Published on June 12, 2020 by Pritha Bhandari . Revised on June 22, 2023.

Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations.

Quantitative research is the opposite of qualitative research , which involves collecting and analyzing non-numerical data (e.g., text, video, or audio).

Quantitative research is widely used in the natural and social sciences: biology, chemistry, psychology, economics, sociology, marketing, etc.

  • What is the demographic makeup of Singapore in 2020?
  • How has the average temperature changed globally over the last century?
  • Does environmental pollution affect the prevalence of honey bees?
  • Does working from home increase productivity for people with long commutes?

Table of contents

Quantitative research methods, quantitative data analysis, advantages of quantitative research, disadvantages of quantitative research, other interesting articles, frequently asked questions about quantitative research.

You can use quantitative research methods for descriptive, correlational or experimental research.

  • In descriptive research , you simply seek an overall summary of your study variables.
  • In correlational research , you investigate relationships between your study variables.
  • In experimental research , you systematically examine whether there is a cause-and-effect relationship between variables.

Correlational and experimental research can both be used to formally test hypotheses , or predictions, using statistics. The results may be generalized to broader populations based on the sampling method used.

To collect quantitative data, you will often need to use operational definitions that translate abstract concepts (e.g., mood) into observable and quantifiable measures (e.g., self-ratings of feelings and energy levels).

Note that quantitative research is at risk for certain research biases , including information bias , omitted variable bias , sampling bias , or selection bias . Be sure that you’re aware of potential biases as you collect and analyze your data to prevent them from impacting your work too much.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Once data is collected, you may need to process it before it can be analyzed. For example, survey and test data may need to be transformed from words to numbers. Then, you can use statistical analysis to answer your research questions .

Descriptive statistics will give you a summary of your data and include measures of averages and variability. You can also use graphs, scatter plots and frequency tables to visualize your data and check for any trends or outliers.

Using inferential statistics , you can make predictions or generalizations based on your data. You can test your hypothesis or use your sample data to estimate the population parameter .

First, you use descriptive statistics to get a summary of the data. You find the mean (average) and the mode (most frequent rating) of procrastination of the two groups, and plot the data to see if there are any outliers.

You can also assess the reliability and validity of your data collection methods to indicate how consistently and accurately your methods actually measured what you wanted them to.

Quantitative research is often used to standardize data collection and generalize findings . Strengths of this approach include:

  • Replication

Repeating the study is possible because of standardized data collection protocols and tangible definitions of abstract concepts.

  • Direct comparisons of results

The study can be reproduced in other cultural settings, times or with different groups of participants. Results can be compared statistically.

  • Large samples

Data from large samples can be processed and analyzed using reliable and consistent procedures through quantitative data analysis.

  • Hypothesis testing

Using formalized and established hypothesis testing procedures means that you have to carefully consider and report your research variables, predictions, data collection and testing methods before coming to a conclusion.

Despite the benefits of quantitative research, it is sometimes inadequate in explaining complex research topics. Its limitations include:

  • Superficiality

Using precise and restrictive operational definitions may inadequately represent complex concepts. For example, the concept of mood may be represented with just a number in quantitative research, but explained with elaboration in qualitative research.

  • Narrow focus

Predetermined variables and measurement procedures can mean that you ignore other relevant observations.

  • Structural bias

Despite standardized procedures, structural biases can still affect quantitative research. Missing data , imprecise measurements or inappropriate sampling methods are biases that can lead to the wrong conclusions.

  • Lack of context

Quantitative research often uses unnatural settings like laboratories or fails to consider historical and cultural contexts that may affect data collection and results.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

quantitative research methods peer reviewed articles

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Quantitative Research? | Definition, Uses & Methods. Scribbr. Retrieved April 5, 2024, from https://www.scribbr.com/methodology/quantitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, descriptive statistics | definitions, types, examples, inferential statistics | an easy introduction & examples, what is your plagiarism score.

University of Portland Clark Library

Thursday, February 23: The Clark Library is closed today.

Nursing & Health Innovations: Peer-reviewed Quantitative Research

  • Books & eBooks
  • Quality Improvement / Quality Assurance
  • Peer-reviewed Quantitative Research
  • Peer-reviewed Qualitative Research
  • Systematic Review
  • Meta Analysis
  • Anatomy & Physiology
  • Lippincott Advisor & Procedures
  • Best Practice Guidelines
  • Integrative Health
  • Patient Education
  • NCLEX Test Prep
  • Finding Tests & Measures This link opens in a new window
  • Dissertations & Theses This link opens in a new window
  • Cite Sources: APA This link opens in a new window

What is Quantitative Research?

Typical attributes of Quantitative Research:

  • The basic element of analysis: numbers, statistical analyses (p values, chi square, t-test)
  • Methods: counting, measuring, quantifying (e.g. Likert scale)
  • Tests a theory

How to Find Peer-reviewed Quantitative Research Articles

In CINAHL and MEDLINE , to find Peer-reviewed Quantitative Research articles, add several of the following subject terms to your search:

CINAHL terms:

  • Quantitative Studies
  • Analysis of Variance 
  • Chi Square Test

MEDLINE terms:

  • Evaluation Studies
  • Analysis of Variance
  • Chi Square Distribution 

quantitative research methods peer reviewed articles

Identifying Quantitative Research Articles

Here's an example of an article that has several quantitative research terms as Minor Subjects in the CINAHL database.

Chi Square Test, T-Tests, Two-Way Analysis of Variance, P-Value in Minor Subjects

  • << Previous: Quality Improvement / Quality Assurance
  • Next: Peer-reviewed Qualitative Research >>
  • Last Updated: Mar 28, 2024 10:08 AM
  • URL: https://libguides.up.edu/nursing
  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Quantifying possible...

Quantifying possible bias in clinical and epidemiological studies with quantitative bias analysis: common approaches and limitations

  • Related content
  • Peer review

Please note that it may take up to 5 days for the peer review documents to appear.

For research papers The BMJ has fully open peer review. This means that accepted research papers published from early 2015 onwards usually have their prepublication history posted alongside them on bmj.com.

This prepublication history comprises all previous versions of the manuscript, the study protocol (submitting the protocol is mandatory for all clinical trials and encouraged for all other studies at The BMJ ), the report from the manuscript committee meeting, the reviewers’ comments, and the authors’ responses to all the comments from reviewers and editors.

In rare instances we determine after careful consideration that we should not make certain portions of the prepublication record publicly available. For example, in cases of stigmatised illnesses we seek to protect the confidentiality of reviewers who have these illnesses. In other instances there may be legal or regulatory considerations that make it inadvisable or impermissible to make available certain parts of the prepublication record.

In all instances in which we have determined that elements of the prepublication record should not be made publicly available, we expect that authors will respect these decisions and also will not share this information.

quantitative research methods peer reviewed articles

  • Open access
  • Published: 26 March 2024

Barriers and enablers to the implementation of patient-reported outcome and experience measures (PROMs/PREMs): protocol for an umbrella review

  • Guillaume Fontaine   ORCID: orcid.org/0000-0002-7806-814X 1 , 2 ,
  • Marie-Eve Poitras 3 , 4 ,
  • Maxime Sasseville 5 , 6 ,
  • Marie-Pascale Pomey 7 , 8 ,
  • Jérôme Ouellet 9 ,
  • Lydia Ould Brahim 1 ,
  • Sydney Wasserman 1 ,
  • Frédéric Bergeron 10 &
  • Sylvie D. Lambert 1 , 11  

Systematic Reviews volume  13 , Article number:  96 ( 2024 ) Cite this article

728 Accesses

1 Altmetric

Metrics details

Patient-reported outcome and experience measures (PROMs and PREMs, respectively) are evidence-based, standardized questionnaires that can be used to capture patients’ perspectives of their health and health care. While substantial investments have been made in the implementation of PROMs and PREMs, their use remains fragmented and limited in many settings. Analysis of multi-level barriers and enablers to the implementation of PROMs and PREMs has been hampered by the lack of use of state-of-the-art implementation science frameworks. This umbrella review aims to consolidate available evidence from existing quantitative, qualitative, and mixed-methods systematic and scoping reviews covering factors that influence the implementation of PROMs and PREMs in healthcare settings.

An umbrella review of systematic and scoping reviews will be conducted following the guidelines of the Joanna Briggs Institute (JBI). Qualitative, quantitative, and mixed methods reviews of studies focusing on the implementation of PROMs and/or PREMs in all healthcare settings will be considered for inclusion. Eight bibliographical databases will be searched. All review steps will be conducted by two reviewers independently. Included reviews will be appraised and data will be extracted in four steps: (1) assessing the methodological quality of reviews using the JBI Critical Appraisal Checklist; (2) extracting data from included reviews; (3) theory-based coding of barriers and enablers using the Consolidated Framework for Implementation Research (CFIR) 2.0; and (4) identifying the barriers and enablers best supported by reviews using the Grading of Recommendations Assessment, Development and Evaluation-Confidence in the Evidence from Reviews of Qualitative research (GRADE-CERQual) approach. Findings will be presented in diagrammatic and tabular forms in a manner that aligns with the objective and scope of this umbrella review, along with a narrative summary.

This umbrella review of quantitative, qualitative, and mixed-methods systematic and scoping reviews will inform policymakers, researchers, managers, and clinicians regarding which factors hamper or enable the adoption and sustained use of PROMs and PREMs in healthcare settings, and the level of confidence in the evidence supporting these factors. Findings will orient the selection and adaptation of implementation strategies tailored to the factors identified.

Systematic review registration

PROSPERO CRD42023421845.

Peer Review reports

Capturing patients’ perspectives of their health and healthcare needs using standardized patient-reported outcome and experience measures (referred to herein as PROMs and PREMs, respectively) has been the focus of over 40 years of research [ 1 , 2 ]. PROMs/PREMs are standardized, validated questionnaires (generic or disease-specific); PROMs are completed by patients about their health, functioning, and quality of life, whereas PREMs are focused on patients’ experiences whilst receiving care [ 1 ]. PROMs/PREMs are associated with a robust evidence-base across multiple illnesses; they can increase charting of patients’ needs [ 3 ], and improve patient-clinician communication [ 3 , 4 , 5 ], which in turn can lead to improved symptom management [ 4 , 5 , 6 ], thereby improving patients’ quality of life, reducing health care utilization [ 5 ], and increasing survival rates [ 7 ].

Multipurpose applications of PROMs/PREMs have led to substantial investments in their implementation. In the USA, PROMs are part of payer mandates; in the United Kingdom, they are used for benchmarking and included in a national registry; and Denmark has embedded them across healthcare sectors [ 8 , 9 , 10 , 11 ]. In Canada, the Canadian Institute for Health Information (CIHI) has advocated for a standardized core set of PROMs [ 12 ], and the Canadian Partnership Against Cancer (CPAC) recently spearheaded PROM implementation in oncology in 10 provinces/territories. In 2017, the Organisation for Economic Co-operation and Development (OECD) launched the Patient-Reported Indicators Surveys (PaRIS) to build international capacity for PROMs/PREMs in primary care [ 13 ]. Yet, in many countries across the globe, their use remains fragmented, characterized by broad swaths of pre-implementation, pilots, and full implementation in narrow domains [ 12 , 14 , 15 ]. PROM/PREM implementation remains driven by silos of local healthcare networks [ 16 ].

Barriers and enablers to the implementation of PROMs/PREMs exist at the patient level (e.g., low health literacy), [ 17 ] clinician level (e.g., obtaining PROM/PREM results from external digital platforms) [ 17 , 18 , 19 ], service level (e.g., lack of integration in clinics’ workflow) [ 17 , 20 ] and organizational/system-level (e.g., organizational policies conflicting with PROM implementation goals) [ 21 ]. Foster and colleagues [ 22 ] conducted an umbrella review on the barriers and facilitators to implementing PROMs in healthcare settings. The umbrella review identified a number of bidirectional factors arising at different stages that can impact the implementation of PROMs; these factors were related to the implementation process, the organization, and healthcare providers [ 22 ]. However, the umbrella review focused solely on PROMs, excluding PREMs, and the theory-based analysis of implementation factors was limited. Another ongoing umbrella review is restricted to investigating barriers and enablers at the healthcare provider level, omitting the multilevel changes required for successful PROM/PREM implementation [ 23 ].

State-of-the-art approaches from implementation science can support the identification of multilevel factors influencing the implementation of PROMs and PREMs in different healthcare settings [ 24 , 25 , 26 ]. The second version of the Consolidated Framework for Implementation Research (CFIR 2.0) can guide the exploration of determinants influencing the implementation of PROMs and PREMs [ 27 ]. The CFIR is a meta-theoretical framework providing a repository of standardized implementation-related constructs at the individual, organizational, and external levels that can be applied across the spectrum of implementation research [ 27 ]. CFIR 2.0 includes five domains pertaining to the characteristics of the innovation targeted for implementation, the implementation process, the individuals involved in the implementation, the inner setting, and the outer setting [ 27 ]. Using an implementation framework to identify the multilevel factors influencing the implementation of PROMs/PREMs is critical to select and tailor implementation strategies to address barriers [ 28 , 29 , 30 , 31 ]. Implementation strategies are the “how”, the specific means or methods for promoting the adoption of evidence-based innovations (e.g., role revisions, audit, provide feedback) [ 32 ]. Selecting and adapting implementation strategies to facilitate the implementation of PROMs/PREMs can be time-consuming, as there are more than 73 implementation strategies to choose from [ 33 ]. Thus, a detailed understanding of the barriers to PROM/PREM implementation can inform and streamline the selection and adaptation of implementation strategies, saving financial, human, and material resources [ 24 , 25 , 26 , 32 , 34 ].

Review objective and questions

In this umbrella review, we aim to consolidate available evidence from existing quantitative, qualitative, and mixed-methods systematic and scoping reviews covering factors that influence the implementation of PROMs and PREMs in healthcare settings.

We will address the following questions:

What are the factors that hinder or enable the implementation of PROMs and PREMs in healthcare settings, and what is the level of confidence in the evidence supporting these factors?

What are the similarities and differences in barriers and enablers across settings and geographical regions?

What are the similarities and differences in the perceptions of barriers and enablers between patients, clinicians, managers, and decision-makers?

What are the implementation theories, models, and frameworks that have been used to guide research in this field?

Review design and registration

An umbrella review of systematic and scoping reviews will be conducted following the guidelines of the Joanna Briggs Institute (JBI) [ 35 , 36 ]. The umbrella review is a form of evidence synthesis that aims to address the challenge of collating, assessing, and synthesizing evidence from multiple reviews on a specific topic [ 35 ]. This protocol was registered on PROSPERO (CRD42023421845) and is presented according to the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) guidelines (see Supplementary material  1 ) [ 37 ]. We will use the Preferred Reporting Items for Overviews of Reviews (PRIOR) guidelines [ 38 ] and the PRISMA guidelines [ 39 ] to report results (e.g., flowchart, search process).

Eligibility criteria

The eligibility criteria were developed following discussions among the project team including researchers with experience in the implementation of PROMs and PREMs in different fields (e.g., cancer care, primary care) and implementation science. These criteria were refined after being piloted on a set of studies. The final eligibility criteria for the review are detailed in Table  1 . We will consider for inclusion all qualitative, quantitative, and mixed methods reviews of studies focusing on the implementation of PROMs or PREMs in any healthcare setting.

Information sources

Searches will be conducted in eight databases: CINAHL, via EBSCOhost (1980 to present); Cochrane Database of Systematic Reviews; Evidence-Based Medicine Reviews; EMBASE, via Ovid SP (1947 to present); ERIC, via Ovid SP (1966 to present); PsycINFO, via APA PsycNet (1967 to present); PubMed (including MEDLINE), via NCBI (1946 to present); Web of Science, via Clarivate Analytics (1900 to present). CINAHL is a leading database for nursing and allied health literature. The Cochrane Database of Systematic Reviews and Evidence-Based Medicine Reviews are essential for accessing high-quality systematic reviews and meta-analyses. EMBASE is a biomedical and pharmacological database offering extensive coverage of drug research, pharmacology, and medical devices, complementing PubMed. ERIC provides valuable insights from educational research that are relevant to our study given the intersection of healthcare and education in PROMs and PREMs. PsycINFO is crucial for accessing research on the psychological aspects of PROMs and PREMs. PubMed, encompassing MEDLINE, is a primary resource for biomedical literature. Web of Science offers a broad and diverse range of scientific literature providing interdisciplinary coverage. We will use additional strategies to complement our exploration including examining references cited in eligible articles, searching for authors who have published extensively in the field, and conducting backward/forward citation searches of related systematic reviews and influential articles.

Search strategy

A comprehensive search strategy was developed iteratively by the review team in collaboration with an experienced librarian with a Master’s of Science in Information (FB). First, an initial limited search of MEDLINE and CINAHL will be undertaken to identify reviews on PROM/PREM implementation. The text words contained in the titles and abstracts, and the index terms used to describe these reviews will be analyzed and applied to a modified search strategy (as needed). We adapted elements from the search strategies of two recent reviews in the field of PROM/PREM implementation [ 22 , 23 ] to fit our objectives. The search strategy for PubMed is presented in Supplementary material 2 . The search strategy will be tailored for each information source. The complete search strategy for each database will be made available for transparency and reproducibility in the final manuscript.

Selection process

All identified citations will be collated and uploaded into the Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia), and duplicates removed. Following training on 50 titles, titles will be screened by two independent reviewers for assessment against the inclusion criteria for the review. Multiple rounds of calibration might be needed. Once titles have been screened, retained abstracts will be reviewed, preferably by the same two reviewers. However, inter-rater reliability will be re-established on 50 abstracts to re-calibrate (as needed). Lastly, the full texts of retained abstracts will be located and assessed in detail against the inclusion criteria by two independent reviewers. Reasons for excluding articles from full-text review onwards will be recorded in the PRIOR flow diagram (PRISMA-like flowchart) [ 38 ]. Any disagreements that arise between the reviewers at each stage of the selection process will be resolved through discussion, or with an additional reviewer. More specifically, throughout the project, weekly team meetings will be held and will provide the opportunity for the team to discuss and resolve any disagreement that arises during the different stages, from study selection to data extraction.

Quality appraisal and data extraction

As presented in Fig.  1 , included reviews will be appraised and data will be extracted and analyzed in four steps using validated tools and methodologies [ 27 , 36 , 40 ]. All four steps will be conducted by two reviewers independently, and a third will be involved in case of disagreement. More reviewers may be needed depending on the number of reviews included.

figure 1

Tools/methodology applied in each phase of the umbrella review. Figure adapted from Boudewijns and colleagues [ 41 ] with permission. CFIR 2.0 = Consolidated Framework for Implementation Research, version 2 [ 27 ]. GRADE–CERQual = Grading of Recommendations Assessment Development and Evaluation–Confidence in the Evidence from Reviews of Qualitative Research [ 42 ]. JBI = Joanna Briggs Institute [ 36 ]

Step 1—assessing the quality of included reviews

In the first step, two reviewers will independently assess the methodological quality of the reviews using the JBI Critical Appraisal Checklist for Systematic Reviews and Research Syntheses, presented in Supplementary material  3 . We have selected this checklist for its comprehensiveness, applicability to different types of knowledge syntheses, and ease of use, requiring minimal training for reviewers to apply it. The checklist consists of 11 questions. It evaluates whether the review question is clearly and explicitly stated, the inclusion criteria were appropriate for that question, and the search strategy and sources used to determine if they were suitable and adequate for capturing relevant studies. It also assesses the appropriateness of the criteria used for appraising studies, as well as whether the critical appraisal was conducted independently by two or more reviewers. The checklist further examines if there were methods in place to minimize errors during data extraction, if the methods used to combine studies were appropriate, and whether the likelihood of publication bias was assessed. Additionally, it verifies if the recommendations for policy and/or practice are supported by the reported data and if the directives for new research are appropriate. Each question should be answered as “yes”, “no”, or “unclear”. Not applicable “NA” is also provided as an option and may be appropriate in rare instances. The results of the quality appraisal will provide the basis for assessing confidence in the evidence in step four. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer, or at team meetings.

Step 2—extracting data from included reviews

For the second step, we have developed a modified version of the JBI Data Extraction Form for Umbrella Reviews, presented in Supplementary material  3 . We will pilot our data extraction form on two of the included reviews, and it will be revised for clarity, as needed. Subsequently, two independent reviewers will conduct all extraction for each review independently. We will collect the following data: (a) authors and date; (b) country; (c) review aims, objectives; (d) focus of the review; (e) context; (f) population; (g) eligibility criteria; (f) review type and methodology; (g) data sources; (h) dates of search; (i) number of included studies; (j) characteristics of included studies (including study type, critical appraisal score); (k) implementation framework guiding analysis; (l) implementation strategies discussed; (m) results and significance; and (n) conclusions. Barriers and enablers will be extracted separately in step 3. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer, or at team meetings.

Step 3—theory-based coding of barriers and enablers

In the third step, we will use the second version of the Consolidated Framework for Implementation Research (CFIR) [ 27 ] to guide our proposed exploration of determinants influencing the implementation of PROMs and PREMs (see Fig.  2 ). The CFIR is a meta-theoretical framework providing a repository of standardized implementation-related constructs at the individual, organizational, and external levels that can be applied across the spectrum of implementation research. CFIR contains 48 constructs and 19 subconstructs representing determinants of implementation across five domains: Innovation (i.e., PROMs and PREMs), Outer Setting (e.g., national policy context), Inner Setting (e.g., work infrastructure), Individuals (e.g., healthcare professional motivation) and Implementation Process (e.g., assessing context) [ 27 ]. To ensure that coding remains grounded in the chosen theoretical framework, we have developed a codebook based on the second version of the CFIR, presented in Supplementary material 3 . Furthermore, an initial training session and regular touchpoints will be held to discuss coding procedures among the team members involved.

figure 2

The second version of the Consolidated Framework for Implementation Research and its five domains: innovation, outer setting, inner setting, individuals, and implementation process [ 27 , 43 ]

To code factors influencing the implementation of PROMs and PREMs using the CFIR, we will upload all PDFs of the included reviews and their appendices in the NVivo qualitative data analysis software (QSR International, Burlington, USA). All reviews will be independently coded by two reviewers. Any disagreements that arise between the reviewers will be resolved through discussion, or with a third reviewer.

Step 4—identifying the barriers and enablers best supported by the reviews

In the fourth and final step, we will use the Grading of Recommendations Assessment, Development, and Evaluation-Confidence in the Evidence from Reviews of Qualitative research (GRADE-CERQual) approach to assess the level of confidence in the barriers and enablers to PROM/PREM implementation identified in step 3 (see Supplementary material  3 ). This process will identify which barriers and enablers are best supported by the evidence in the included reviews. GRADE-CERQual includes four domains: (a) methodological limitations, (b) coherence and (c) adequacy of data, and (d) relevance (see Table  2 ). For each review finding, we will assign a score per domain from one point (substantial concerns) to four points (no concerns to very minor concerns). The score for the methodological limitations of the review will be assigned based on the JBI Critical Appraisal (step 1). The score for coherence will be assigned based on the presence of contradictory findings as well as ambiguous/incomplete data for that finding in the umbrella review. The score for adequacy of data will be assigned based on the richness of the data supporting the umbrella review finding. Finally, the score for relevance will be assigned based on how well the included reviews supporting a specific barrier or enabler to the implementation of PROMs/PREMs are applicable to the umbrella review context. This will allow us to identify which factors are supported by evidence with the highest level of confidence, and their corresponding level of evidence. A calibration exercise will be conducted on three systematic reviews with team members involved in this stage of the umbrella review, and adjustments to procedures will be discussed in team meetings.

The data synthesis plan for the umbrella review has been meticulously designed to present extracted data in a format that is both informative and accessible, aiding in decision-making and providing a clear overview of the synthesized evidence.

Data extracted from the included systematic reviews will be organized into diagrams and tables, ensuring the presentation is closely aligned with our objectives and scope. These will categorize the distribution of reviews in several ways: by the year or period of publication, country of origin, target population, context, type of review, and various implementation factors. This stratification will allow for an at-a-glance understanding of the breadth and focus of the existing literature. To further assist in the application of the findings, a Summary of Qualitative Findings (SoQF) table will be constructed. This table will list each barrier and enabler identified within the systematic reviews and provide an overall confidence assessment for each finding. The confidence assessment will be based on the methodological soundness and relevance of the evidence supporting each identified barrier or enabler. Importantly, the SoQF table will include explanations for these assessments, making the basis for each judgement transparent [ 42 ]. Additionally, a CERQual Evidence Profile will be prepared, offering a detailed look at the reviewers’ judgements concerning each component of the CERQual approach. These components contribute to the overall confidence in the evidence for each identified barrier or enabler. The CERQual Evidence Profile will serve as a comprehensive record of the quality and applicability of the evidence [ 42 ].

Finally, we will conduct a narrative synthesis accompanying the tabular and diagrammatic presentations, summarizing the findings and discussing their implications concerning the review’s objectives and questions. This narrative will interpret the significance of the barriers and enablers identified, explaining how the synthesized evidence fits into the existing knowledge base and pointing out potential directions for future research or policy formulation.

This protocol outlines an umbrella review aiming to consolidate available evidence on the implementation of PROMs and PREMs in healthcare settings. Through our synthesis of quantitative, qualitative, and mixed-methods systematic and scoping reviews, we will answer two key questions: which factors hinder or enable the adoption and sustained use of PROMs and PREMs in healthcare settings, and what is the level of confidence in the evidence supporting these factors? Our findings will indicate which factors can influence the adoption of PROMs and PREMs, including clinician buy-in, patient engagement, and organizational support. Furthermore, our review will provide key insights regarding how barriers and enablers to PROM/PREM implementation differ across settings and how perceptions around their implementation differ between patients, clinicians, managers, and decision-makers. The consideration of different healthcare settings and the inclusion of studies from different geographical regions and healthcare systems will provide a global perspective, essential for understanding how context-specific factors might influence the generalizability of findings.

Strengths of this umbrella review include the use of a state-of-the-art implementation framework (CFIR 2.0) to identify, categorize, and synthesize multilevel factors influencing the implementation of PROMs/PREMS, and the use of the GRADE-CERQual approach to identify the level of confidence in the evidence supporting these factors. Using CFIR 2.0 will address a key limitation of current research in the field, since reviews and primary research are often focused on provider- and patient-level barriers and enablers, omitting organizational- and system-level factors affecting PROM/PREM implementation. This umbrella review will expose knowledge gaps to orient further research to improve our understanding of the complex factors at play in the adoption and sustained use of PROMs and PREMs in healthcare settings. Importantly, using CFIR 2.0 will allow the mapping of barriers and enablers identified to relevant implementation strategy taxonomies, such as the Expert Recommendations for Implementing Change (ERIC) Taxonomy [ 34 ]. This is crucial for designing tailored implementation strategies, as it can ensure that the chosen approaches to support implementation are directly aligned with the specific barriers and enablers to the uptake of PROMs and PREMs.

Umbrella reviews are also associated with some limitations, including being limited to the inclusion of systematic reviews and other knowledge syntheses, while additional primary studies are likely to have since been published. These additional empirical studies will not be captured, but we will minimize this risk by updating the search strategy at least once before the completion of the umbrella review. A second key challenge in umbrella reviews is the overlap between the primary studies, as many studies will have been included in different systematic reviews on the same topic. To address this issue, we will prepare a matrix of primary studies included in systematic reviews to gain insight into double counting of primary studies.

We will maintain an audit trail document amendments to this umbrella review protocol and report these in both the PROSPERO register and subsequent publications. Findings will be disseminated through publications in peer-reviewed journals in the fields of implementation, medicine, as well as health services, and policy research. We will also disseminate results through relevant conferences and social media using different strategies (e.g., graphical abstract). Furthermore, we will leverage existing connections between SDL and decision-makers at a provincial and national level in Canada to disseminate the findings of the review to a wider audience (e.g., the Director of Quebec Cancerology Program, Canadian Association of Psychosocial Oncology).

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed for the purposes of this publication.

Abbreviations

Confidence in the evidence from reviews of qualitative research

Consolidated framework for implementation research

Canadian Institute for Health Information

Canadian Partnership Against Cancer

Expert recommendations for implementing change

Grading of recommendations assessment, development and evaluation

Joanna Briggs Institute

Organisation for economic co-operation and development

Patient-reported indicators surveys

Patient-reported experience measure

Preferred reporting items for overviews of reviews

Preferred reporting items for systematic review and meta-analysis

Preferred reporting items for systematic review and meta-analysis protocols

Patient-reported outcome measure

Kingsley C, Patel S. Patient-reported outcome measures and patient-reported experience measures. BJA Education. 2017;16:137–44.

Article   Google Scholar  

Jamieson Gilmore K CI, Coletta L, Allin S. The uses of patient reported experience measures in health systems: a systematic narrative review. Health Policy. 2022.

Gibbons CPI, Gonçalves-Bradley DC, et al. Routine provision of feedback from patientreported outcome measurements to healthcare providers and patients in clinical practice. Cochrane Database Syst Rev. 2021;10:Cd011589.

PubMed   Google Scholar  

Howell DMS, Wilkinson K, et al. Patient-reported outcomes in routine cancer clinical practice: a scoping review of use, impact on health outcomes, and implementation factors. Ann Oncol. 2015;26:1846–58.

Article   CAS   PubMed   Google Scholar  

Kotronoulas GKN, Maguire R, et al. What is the value of the routine use of patient-reported outcome measures toward improvement of patient outcomes, processes of care, and health service outcomes in cancer care? A systematic review of controlled trials. J Clin Oncol. 2014;32:1480–501.

Article   PubMed   Google Scholar  

Chen J OL, Hollis SJ. A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting. BMC Health Serv Res. 2013;13(211).

Basch E. Symptom monitoring With patient-reported outcomes during routine cancer treatment: A randomized controlled trial. J Clin Oncol. 2016;34:2198–2198.

Forcino RCMM, Engel JA, O’Malley AJ, Elwyn G. Routine patient-reported experience measurement of shared decision-making in the USA: a qualitative study of the current state according to frontrunners. BMJ Open. 2020;10: e037087.

Article   PubMed   PubMed Central   Google Scholar  

Timmins N. NHS goes to the PROMS. BMJ. 2008;336:1464–5.

Mjåset C. Value-based health care in four different health care systems. NEJM Catalyst. 2020.

Sekretariatet P. PRO – patient reported outcome. https://pro-danmark.dk/da/proenglish .

Terner MLK, Chow C, Webster G. Advancing PROMs for health system use in Canada and beyond. J Patient Rep Outcomes. 2021;5:94.

Slawomirski L, van den Berg M, Karmakar-Hore S. Patient-Reported indicator survey (Paris): aligning practice and policy for better health outcomes. World Med J. 2018;64(3):8–14.

Google Scholar  

Ahmed SBL, Bartlett SJ, et al. A catalyst for transforming health systems and person-centred care: Canadian national position statement on patient-reported outcomes. Curr Oncol. 2020;27:90–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Pross C, Geissler A, Busse R. Measuring, reporting, and rewarding quality of care in 5 nations: 5 policy levers to enhance hospital quality accountability. Milbank Q. 2017;95(1):136–83.

Ernst SCK, Steinbeck V, Busse R, Pross C. Toward system-wide implementation of patient-reported outcome measures: a framework for countries, states, and regions. Value in Health. 2022;25(9):1539–47.

Nguyen HBP, Dhillon H, Sundaresan P. A review of the barriers to using Patient-Reported Outcomes (PROs) and Patient-Reported Outcome Measures (PROMs) in routine cancer care. J Med Radiation Sci. 2021;68:186–95.

Davis SAM, Smith M, et al. Paving the way for electronic patient-centered measurement in team-based primary care: integrated knowledge translation approach. JMIR Form Res. 2022;6: e33584.

Bull CTH, Watson D, Callander EJ. Selecting and implementing patient-reported outcome and experience measures to assess health system performance. JAMA Health Forum. 2022;3: e220326.

Schepers SAHL, Zadeh S, Grootenhuis MA, Wiener L. Healthcare professionals’ preferences and perceived barriers for routine assessment of patient-reported outcomes in pediatric oncology practice: moving toward international processes of change. Pediatr Blood Cancer. 2016;63:2181–8.

Glenwright BG, Simmich J, Cottrell Mea. Facilitators and barriers to implementing electronic patient-reported outcome and experience measures in a health care setting: a systematic review. J Patient Rep Outcomes. 2023;7(13).  https://doi.org/10.1186/s41687-023-00554-2

Foster A, Croot L, Brazier J, Harris J, O’Cathain A. The facilitators and barriers to implementing patient reported outcome measures in organisations delivering health related services: a systematic review of reviews. J Patient Rep Outcomes. 2018;2(1):1–16.

Wolff AC, Dresselhuis A, Hejazi Sea. Healthcare provider characteristics that influence the implementation of individual-level patient-centered outcome measure (PROM) and patient-reported experience measure (PREM) data across practice settings: a protocol for a mixed methods systematic review with a narrative synthesis. Syst Rev. 2021;10(169).  https://doi.org/10.1186/s13643-021-01725-2

Grimshaw JM, Eccles MP, Lavis JN, Hill SJ, Squires JE. Knowledge translation of research findings. Implement Sci. 2012;7(1):50. https://doi.org/10.1186/1748-5908-7-50 .

French SD, Green SE, O’Connor DA, et al. Developing theory-informed behaviour change interventions to implement evidence into practice: a systematic approach using the Theoretical Domains Framework. Implement Sci. 2012;7(1):38. https://doi.org/10.1186/1748-5908-7-38 .

Wolfenden L, Foy R, Presseau J, Grimshaw J M, Ivers N M, al. PBJe. Designing and undertaking randomised implementation trials: guide for researchers. BMJ. 2021;372.  https://doi.org/10.1136/bmj.m3721

Damschroder LJ, Reardon, C.M., Widerquist, M.A.O. et al. ,. The updated Consolidated Framework for Implementation Research based on user feedback. Implementation Science. 2022;17:75. https://doi.org/10.1186/s13012-022-01245-0

Bradshaw ASM, Mulderrig M, et al. Implementing person-centred outcome measures in palliative care: An exploratory qualitative study using Normalisation Process Theory to understand processes and context. Palliat Med. 2021;35:397–407.

Stover AMHL, van Oers HA, Greenhalgh J, Potter CM. Using an implementation science approach to implement and evaluate patient-reported outcome measures (PROM) initiatives in routine care settings. Qual Life Res. 2021;30:3015–33.

Manalili KSM. Using implementation science to inform the integration of electronic patient-reported experience measures (ePREMs) into healthcare quality improvement: description of a theory-based application in primary care. Qual Life Res. 2021;30:3073–84.

Patey AM, Fontaine, G., Francis, J. J., McCleary, N., Presseau, J., & Grimshaw, J. M. Healthcare professional behaviour: health impact, prevalence of evidence-based behaviours, correlates and interventions. Psychol Health. 2022:766–794. https://doi.org/10.1080/08870446.2022.2100887

Proctor EK, Powell BJ, McMillen JC. Implementation strategies: recommendations for specifying and reporting. Implement Sci. 2013;8(1):1–11. https://doi.org/10.1186/1748-5908-8-139 .

Powell BJ, Waltz TJ, Chinman MJ, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implementation Sci. 2017;10:21. https://doi.org/10.1186/s13012-015-0209-1 .

Waltz TJ, Powell BJ, Matthieu MM, et al. Use of concept mapping to characterize relationships among implementation strategies and assess their feasibility and importance: results from the Expert Recommendations for Implementing Change (ERIC) study. Implement Sci. 2015;10:109. https://doi.org/10.1186/s13012-015-0295-0 .

Aromataris E MZ. Chapter 11: Umbrella Reviews. In: Aromataris E, Munn Z, eds. Joanna Briggs Institute Reviewer's Manual. The Joanna Briggs Institute; 2020.

Aromataris E, Fernandez R, Godfrey C, Holly C, Kahlil H, Tungpunkom P. Summarizing systematic reviews: methodological development, conduct and reporting of an Umbrella review approach. Int J Evid Based Healthc. 2015;13(3):132–40.

Moher D, Shamseer, L., Clarke, M. et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1). https://doi.org/10.1186/2046-4053-4-1

Gates MGA, Pieper D, Fernandes RM, Tricco AC, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378: e070849. https://doi.org/10.1136/bmj-2022-070849 .

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D. The. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2020;2021(88).

Dixon-Woods M, Agarwal, S., Young, B., Jones, D., & Sutton, A. Integrative approaches to qualitative and quantitative evidence. Health Development Agency; 2004.

Boudewijns EA, Trucchi, M., van der Kleij, R. M., Vermond, D., Hoffman, C. M., Chavannes, N. H., ... & Brakema, E. A. Facilitators and barriers to the implementation of improved solid fuel cookstoves and clean fuels in low-income and middle-income countries: an umbrella review. Lancet Planet Health. 2022.

Lewin SGC, Munthe-Kaas H, et al. Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLoS Med. 2015;12(10):e1001895. https://doi.org/10.1371/journal.pmed.1001895 .

The Centre for Implementation. The Consolidated Framework for Implementation Research (CFIR) 2.0. Adapted from "The updated Consolidated Framework for Implementation Research based on user feedback," by Damschroder, L.J., Reardon, C.M., Widerquist, M.A.O. et al., 2022, Implementation Sci 17, 75. Image copyright 2022 by The Center for Implementation. https://thecenterforimplementation.com/toolbox/cfir

Lewin S, Booth A, Glenton C, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings: introduction to the series. Implementation Sci 2018;13(Suppl 1):2. https://doi.org/10.1186/s13012-017-0688-3 .

Download references

Acknowledgements

We wish to acknowledge the involvement of a patient-partner on the RRISIQ grant supporting this project (Lisa Marcovici). LM will provide feedback and guidance on the findings of the umbrella review, orienting the interpretation of findings and the next steps of this project.

We wish to acknowledge funding from the Quebec Network on Nursing Intervention Research/Réseau de recherche en intervention en sciences infirmières du Québec (RRISIQ), a research network funded by the Fonds de recherche du Québec en Santé (FRQ-S). Funders had no role in the development of this protocol.

Author information

Authors and affiliations.

Ingram School of Nursing, Faculty of Medicine and Health Sciences, McGill University, 680 Rue Sherbrooke O #1800, Montréal, QC, H3A 2M7, Canada

Guillaume Fontaine, Lydia Ould Brahim, Sydney Wasserman & Sylvie D. Lambert

Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Sir Mortimer B. Davis Jewish General Hospital, CIUSSS West-Central Montreal, 3755 Chem. de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada

Guillaume Fontaine

Department of Family Medicine and Emergency Medicine, Faculty of Medicine and Health Sciences, Université de Sherbrooke, 3001 12 Ave N Building X1, Sherbrooke, QC, J1H 5N4, Canada

Marie-Eve Poitras

Centre Intégré Universitaire de Santé Et de Services Sociaux (CIUSSS) du Saguenay-Lac-Saint-Jean du Québec, 930 Rue Jacques-Cartier E, Chicoutimi, QC, G7H 7K9, Canada

Faculty of Nursing, Université Laval, 1050 Av. de La Médecine, Québec, QC, G1V 0A6, Canada

Maxime Sasseville

Centre de Recherche en Santé Durable VITAM, CIUSSS de La Capitale-Nationale, 2480, Chemin de La Canardière, Quebec City, QC, G1J 2G1, Canada

Faculty of Medicine & School of Public Health, Université de Montréal, Pavillon Roger-Gaudry, 2900 Edouard Montpetit Blvd, Montreal, QC, H3T 1J4, Canada

Marie-Pascale Pomey

Centre de Recherche du Centre Hospitalier de L, Université de Montréal (CR-CHUM), 900 Saint Denis St., Montreal, QC, H2X 0A9, Canada

Direction of Nursing, CIUSSS de L’Ouest de L’Île-de-Montréal, 3830, Avenue Lacombe, Montreal, QC, H3T 1M5, Canada

Jérôme Ouellet

Université Laval Library, Pavillon Alexandre-Vachon 1045, Avenue de La Médecine, Québec, Québec), G1V 0A6, Canada

Frédéric Bergeron

St. Mary’s Research Centre, CIUSSS de L’Ouest de L’Île-de-Montréal, 3777 Jean Brillant St, Montreal, QC, H3T 0A2, Canada

Sylvie D. Lambert

You can also search for this author in PubMed   Google Scholar

Contributions

GF and SDL conceptualized the study. GF, MEP, MS, MP, and SDL developed the study methods. GF drafted the manuscript, with critical revisions and additions by SDL. All authors provided intellectual content and reviewed and edited the manuscript. GF is the guarantor of this review. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guillaume Fontaine .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Fontaine, G., Poitras, ME., Sasseville, M. et al. Barriers and enablers to the implementation of patient-reported outcome and experience measures (PROMs/PREMs): protocol for an umbrella review. Syst Rev 13 , 96 (2024). https://doi.org/10.1186/s13643-024-02512-5

Download citation

Received : 24 May 2023

Accepted : 13 March 2024

Published : 26 March 2024

DOI : https://doi.org/10.1186/s13643-024-02512-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Patient-reported outcome measures
  • Patient-reported experience measures
  • Implementation science
  • Umbrella review
  • Systematic review
  • Overview of reviews
  • Facilitators

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

quantitative research methods peer reviewed articles

  • Open access
  • Published: 05 April 2024

Exploring differences in the utilization of the emergency department between migrant and non-migrant populations: a systematic review

  • Giulia Acquadro-Pacera 1 ,
  • Martina Valente 1 , 2 ,
  • Giulia Facci 1 , 3 ,
  • Bereket Molla Kiros 4 ,
  • Francesco Della Corte 1 , 3 ,
  • Francesco Barone-Adesi 1 , 3 ,
  • Luca Ragazzoni 1 , 2 &
  • Monica Trentin 1 , 3  

BMC Public Health volume  24 , Article number:  963 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Migrants face several barriers when accessing care and tend to rely on emergency services to a greater extent than primary care. Comparing emergency department (ED) utilization by migrants and non-migrants can unveil inequalities affecting the migrant population and pave the way for public health strategies aimed at improving health outcomes. This systematic review aims to investigate differences in ED utilization between migrant and non-migrant populations to ultimately advance research on migrants’ access to care and inform health policies addressing health inequalities.

A systematic literature search was conducted in March 2023 on the Pubmed, Scopus, and Web of Science databases. The included studies were limited to those relying on data collected from 2012 and written in English or Italian. Data extracted included information on the migrant population and the ED visit, the differences in ED utilization between migrants and non-migrants, and the challenges faced by migrants prior to, during, and after the ED visit. The findings of this systematic review are reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines.

After full-text review, 23 articles met the inclusion criteria. All but one adopted a quantitative methodology. Some studies reported a higher frequency of ED visits among migrants, while others a higher frequency among non-migrants. Migrants tend to leave the hospital against medical advice more frequently than the native population and present at the ED without consulting a general practitioner (GP). They are also less likely to access the ED via ambulance. Admissions for ambulatory care-sensitive conditions, namely health conditions for which adequate, timely, and effective outpatient care can prevent hospitalization, were higher for migrants, while still being significant for the non-migrant population.

Conclusions

The comparison between migrants’ and non-migrants’ utilization of the ED did not suggest a clear pattern. There is no consensus on whether migrants access EDs more or less than non-migrants and on whether migrants are hospitalized at a higher or lower extent. However, migrants tend to access EDs for less urgent conditions, lack a referral from a GP and access the ED as walk-ins more frequently. Migrants are also discharged against medical advice more often compared to non-migrants. Findings of this systematic review suggest that migrants’ access to care is hindered by language barriers, poor insurance coverage, lack of entitlement to a GP, and lack of knowledge of the local healthcare system.

Peer Review reports

Global migration has been steadily increasing over the past 30 years, with a substantial surge in the number of migrants from 152 million in 1990 to 280 million in 2020 [ 1 ]. Migrants, namely people who move away from their place of residence, temporarily or permanently, and for a variety of reasons such as conflicts, work or family issues, [ 2 , 3 ], may have health needs that are different from those of the general population. In particular, communicable diseases, injuries and trauma, delivery-related complications, as well as mental health issues can result from the harsh conditions experienced throughout the migratory journey [ 4 , 5 ] and may exert a greater impact on those originating from countries affected by wars, conflicts, or disasters [ 6 ]. On the other hand, migrants are often identified as healthier than the host population in light of the “healthy immigrant effect”, which is based on the assumption that people who managed to migrate are more physically fit, younger, healthier, and wealthier [ 7 , 8 ].

Regardless of their health conditions, migrants, especially those who are undocumented, tend to underutilize healthcare systems compared to the general population [ 9 , 10 ]. Following Andersen’s expanded behavioral model of health service use [ 11 ], the underlying reasons can be clustered in: a) contextual factors, including healthcare organisation and the social, economic and political settings; b) predisposing characteristics, such as demographic attributes; c) enabling factors, which either enable or impede individuals from using healthcare services, such as social and financial resources; d) individuals’ need for healthcare and health needs. Many of these factors coincide with the social determinants of health (SDH), namely non-medical factors that can influence health outcomes and health equity such as income and social protection, unemployment and job insecurity, housing and education [ 12 , 13 ].

At a systemic level, one of the possible barriers that prevent migrants from using healthcare systems is the lack of migrant-inclusive health policies [ 14 , 15 ]. Among the many hindering factors it is possible to identify migrants’ financial constraints, limited health literacy, and administrative problems, discriminatory behaviors perpetuated by healthcare professionals, and poor access to health insurance [ 16 ]. The fear of being reported to the authorities and deported often prevents irregular migrants from seeking care [ 17 ]. Furthermore, language barriers and the lack of professional cultural mediators are also disclosed as reasons for migrants missing medical appointments [ 18 , 19 , 20 ]. Migrants may also be unaware of their healthcare rights [ 21 , 22 ]. Access to care for migrants is further compromised during disasters or public health emergencies, which tend to affect migrants more than the host populations [ 23 , 24 , 25 , 26 ].

The lack of access to primary health care (PHC) is one of the expressions of migrants’ underuse of the healthcare system as a result of the barriers mentioned above. They may either not have the right to access PHC or be unaware of being entitled to a general practitioner (GP). A short duration of stay in the host country can also prevent registration with a PHC provider. This is particularly problematic as GPs are the entry points to healthcare systems in many countries [ 27 ]. A study conducted in Spain in 2016 showed that visits to primary care doctors and nurses were about 50% and 75% less frequent for immigrants than non-migrants [ 28 ]. Recent data from England (UK) suggest that the number of GPs and GP funding are lower per capita in more deprived neighborhoods - where migrants presumably live at a higher rate - despite higher health needs in these populations [ 29 ]. In the absence of a GP, emergency departments (EDs), accessible around the clock, usually less demanding in terms of bureaucracy and free of charge in many countries, may represent the best option for migrants seeking medical advice [ 30 , 31 ]. Migrants who have access to PHC may encounter difficulties in visiting a doctor during normal working hours as they are typically employed in informal and inflexible jobs. Due to the poor use of primary and preventive care services, migrants are expected to overuse the ED, especially for lower acuity and non-urgent conditions [ 31 , 32 , 33 ]. Therefore, EDs constitute a unique healthcare setting, as they are situated at the interface of outpatient and inpatient care [ 34 ]. Studying their utilization is relevant because it reflects the need for urgent care and is an indicator of the accessibility and quality of outpatient and hospital-based care [ 35 ]. In other words, investigating migrants’ use of the ED can provide a glimpse into their relationship with the healthcare system of the host country and into the obstacles they may face.

Studies dealing with the utilization of the ED by migrant populations often lack comparisons with host populations [ 30 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Yet, such a comparative method would capture the relevant inequities existing between migrants and the general population in terms of health-seeking behavior, barriers to accessing ambulatory care, relationship with healthcare professionals, clinical outcomes and quality of care received. As for reviews of the literature dealing with migrants’ utilization of the ED, these are either country-specific [ 45 ] or limited to the European context [ 27 , 46 , 47 ]. Conversely, the review of Mahmoud et al. [ 48 ] considers studies conducted worldwide, but was published in 2012 and it is therefore outdated, as many new articles have been published since then.

The aim of this systematic literature review is to gather and summarize published literature that compares ED utilization between migrant and non-migrant populations to identify differences in access to care and utilization of the ED. This systematic review will provide decision-makers with relevant information that can support the design of healthcare policies, practices, and interventions addressing migrants’ inequities. This is even more pressing considering that over the next 30 years approximately 143 million people will be displaced due to the consequences of climate change [ 49 ], while others are expected to migrate for other causes, such as non-climate-related disasters, wars, conflicts, environmental degradation, and poverty.

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines were followed for reporting the findings of this systematic review [ 50 ].

Search strategy

A systematic literature search was conducted on March 20th 2023, on the PubMed, Scopus, and Web of Science databases. The search strings (Supplementary material 1 ) combined two different sets of terms, namely migrant-related and ED-related ones. No restrictions or filters were applied for the search. After the removal of duplicates, titles and abstracts of the remaining articles were manually screened by three investigators (GAP, GF, BMK) and those not meeting the inclusion criteria were excluded. All the full-text articles eligible for inclusion were reviewed independently by three investigators (GAP, GF, BMK) and discrepancies were resolved after discussion with the whole group. The references of the selected articles were also screened to identify any other relevant studies to be included.

Eligibility criteria

The study selection process relied on the following inclusion criteria: a ) the study included a comparison between migrants and non-migrants regarding the utilization of the ED; b ) the study relied on data collected over the period 2012 - March 2023; c ) the study was original research, adopting either a quantitative, qualitative or a mixed-methods methodology; d ) the article is either in English or Italian. Exclusion criteria were: a ) the study is not about migrants’ utilization of the ED; b ) the study does not include any comparison between migrants and non-migrants regarding the utilization of the ED; c ) the study is about migrants’ utilization of pre-hospital emergency medical services; d ) the study does not distinguish between data on the use of the ED and data related to other levels of care; e ) the study is a clinical case study; f ) the study is a review or a commentary; g ) the study is not in English or Italian.

Data extraction, analysis, and reporting

A data extraction sheet was developed to extract relevant information from the included studies (Supplementary material 2 ). Data extraction was performed by two investigators (GAP, GF). Extracted data included, among others, information on the article’s main characteristics and the study design, information about the migrant population and the ED visit, the differences in the ED utilization between migrants and non-migrants, and information about the challenges faced by migrants prior to, during, and following the ED visit.

After demographic information, the differences in ED utilization between migrant and non-migrant populations are reported following four main themes: i ) access to the ED; ii ) adequacy of utilization of the ED; iii ) reasons for accessing the ED; iv ) hospitalization and discharge. Different types of barriers to access to care and health systems’ characteristics are reported in Supplementary material 3 .

For operational purposes, the term “migrant” is used in its broadest sense to refer to people who move away from their place of usual residence across an international border, temporarily or permanently, for a variety of reasons such as war, family issues or work [ 2 ]. To account for the peculiarities of different migratory experiences, the definition of migrant as reported by the authors of the original studies (e.g., asylum-seeker, refugee, etc.) has been specified when possible.

The search returned a total of 1,798 articles. After removing duplicates, 907 articles were eligible for title and abstract review. Among these, 844 were excluded because they did not meet our inclusion criteria. One article was identified through manual search. In total, 64 articles met the criteria for full-text review. After full-text review, 23 articles were included. Detailed information regarding the selection of articles can be found in the PRISMA diagram (Fig.  1 ), while a comprehensive overview of the main characteristics of the studies is presented in Supplementary material 4 .

figure 1

Study selection process

Characteristics of the studies

Among the included studies, 22 adopted a quantitative approach, whereas only one [ 51 ] used a qualitative methodology. Among the quantitative studies, 16 were cross-sectional, 4 were observational, and 2 were cohort studies. The qualitative study adopted a grounded theory approach. Sources of data primarily included hospital medical records, population surveys, and interviews (Table 1 ). The studies included in this review were conducted in 12 countries: United States (US) ( n = 5), Switzerland ( n = 4), Germany ( n = 3), Italy ( n = 2), Spain ( n = 2), Australia ( n = 1), Canada ( n = 1), China ( n = 1), France ( n = 1), Lebanon ( n = 1), Singapore ( n = 1), and Türkiye ( n = 1).

Demographic characteristics of migrants

The studies included in this review refer to their target population as “immigrants” [ 59 , 63 , 64 , 65 , 66 , 68 ], “migrants” [ 52 , 61 , 62 , 69 , 70 ], “asylum seekers” [ 56 , 57 , 58 , 60 ], “undocumented” [ 51 , 53 , 54 , 55 ], “refugees” [ 71 , 73 ], and “foreign workers” (FWs) [ 72 ]. For studies employing interviews or surveys, migratory status was primarily based on self-reported information. Only four articles reported information about migrants’ length of stay in the host country, whereas one study distinguished between first and second-generation immigrants (Supplementary material 5 ). Migrants’ country of origin was reported in six articles, whereas ten reported the broader area or region. A map illustrating migrants’ home and host countries is reported in Fig.  2 .

figure 2

Map illustrating host and home countries according to the information reported in the included studies

Information on migrants’ age was not always reported. Pediatric patients were the focus of three studies [ 56 , 57 , 70 ], while another study included both pediatric patients and their mothers [ 52 ]. One study specifically focused on migrants aged 60 and above [ 69 ]. For further information about participants’ age, see Supplementary material 5 .

Differences in ED utilization between migrants and non-migrants

Access to the ed.

The results show varying patterns concerning the frequency and likelihood of ED visits by migrants compared to non-migrants (Table  2 ).

A total of three studies showed either a higher utilization of the ED from migrants or an increased probability of migrants accessing the ED than the host population. The study of Abdulla et al. [ 52 ] considered a group of immigrant mothers and their preterm infants seeking care at the ED in the US and found that infants of immigrant mothers were more likely to visit the ED in the first 30 and 90 days after being discharged (odds ratio (OR): 1.7; 95%CI: 1.12-2.59), as compared to those of non-immigrant mothers. However, when considering mothers with Medicaid coverage - namely an insurance program for people with limited income and resources - the immigrant status in relation to high ED utilization lost significance, suggesting that the higher risk of ED visits for preterm infants may be due to stressors like poverty. A retrospective analysis comparing ED utilization between immigrants and Italian citizens [ 63 ] found a higher frequency of visits to the ED among immigrants than Italians. The authors ascribe this finding to immigrants’ poor familiarity with the host country’s healthcare system, compounded by complex bureaucracy and language barriers. Similarly, Rodriguez-Alvarez et al. [ 66 ] found that, compared to their native counterparts, immigrants used the ED to a greater extent (19.3% vs. 9.9%; p -value < 0.001). The authors attribute this trend to factors such as easy accessibility, the services being free of charge, and their 24-hour availability.

Conversely, two studies found a lower utilization of the ED from migrants than the host population. Brandenberger et al. [ 57 ] found that the proportion of asylum-seeking pediatric patients visiting the ED in Switzerland was lower than that of non-asylum-seeking patients (19% vs. 32%; p -value < 0.001); in this regard, it must be clarified that, regarding ED access, nationality was unspecified for the non-asylum-seeking group, thus some non-asylum-seeking migrants (i.e., refugees and undocumented children) may have been grouped together with Swiss nationals. In another qualitative study conducted in the US, migrants’ low utilization of the ED has been attributed to their fear of discrimination, denial of services, and law enforcement in the hospital in the years following the 2016 US presidential elections [ 51 ].

Last, Henares-Montiel et al. [ 65 ] compared immigrants and the host population in Spain, finding very similar percentages of ED utilization across the two groups (24.5% vs. 24.7%; p -value > 0.05); nevertheless, the relationship was not statistically significant.

When it comes to the type of hospital, Al-Hajj et al. [ 71 ] examined injured patients presenting to the ED and found that almost 90% of Lebanese patients sought care at private hospitals, as compared to only 52% of refugees ( p -value < 0.001). According to the authors, a reason for this difference is that refugees are frequently unable to pay for medical care and therefore tend to rely to a greater extent on public hospitals or other facilities sponsored by local non-governmental organizations or the United Nations High Commissioner for Refugees (UNHCR).

Adequacy of utilization of the ED

In total, three factors have been considered as indicative of the adequacy of ED utilization: a ) urgency/appropriateness, b ) admissions for ambulatory care-sensitive conditions (ACSC), c ) self-referral and walk-in access (Table  3 ). The results show varying patterns; however, they suggest a lower adequacy of ED utilization by migrants compared to non-migrants.

Urgency/Appropriateness

All the studies dealing with urgency and/or appropriateness of the ED visit report migrants’ accesses as being less urgent compared to non-migrants (Table  3 ). Klukowska-Roetzler et al. [ 59 ] found a significant association between triage level and immigration from South Eastern Europe, with migrants being assigned lower triage codes, meaning that they were categorized as having less urgent medical needs than native Swiss patients. The same authors also noted how there were more migrants from Southeast Europe (18.9%) utilizing fast-track services, designed for less serious illnesses and injuries, compared to Swiss nationals (9.9%). Schwachenwalde et al. [ 62 ] identified low-acculturated migrant women (acculturation was assessed by the Frankfurt Acculturation Scale [ 7 ]) seeking gynecology emergency care in Germany as more likely to visit the ED for non-urgent care as compared to non-migrant women (OR: 1.58; 95%CI: 1.02-2.44). When analyzing the impact of acculturation on overall non-urgent healthcare utilization among migrants, the authors found no significant difference as compared to non-migrants. However, low acculturation emerged as a significant positive predictor of system-defined non-urgent visits, meaning visits categorized as non-urgent based on health system criteria, such as no ambulance transport, absence of a referral by a physician, and not resulting in hospital admission. On the other hand, low acculturation represented a negative predictor of patient-defined non-urgent visits, categorized based on subjective criteria such as low level of pain or symptom severity, and low estimation of urgency by the patient. Such findings underline the difficulty of defining urgency and the authors speculate that the inappropriate use of the ED by migrants can be attributed both to the patients’ distorted perception and to deficiencies in the provision of care (e.g., bias and language barriers). Sauzet et al. [ 61 ] investigated the adequacy of the use of ED services in Germany, considering whether patients were sent by medical professionals, reported severe pain, or had a medical urgency. The authors found that first-generation migrants were significantly less likely to appropriately use the ED compared to non-migrants. Similarly, Chan et al. [ 72 ] found that FWs living in Singapore were significantly more often triaged as low-acuity patients compared to the general ED population.

Rodriguez et al. [ 55 ] report a higher fear of accessing the ED among undocumented Latino immigrants (UDLI) compared to non-Latino legal residents/citizens (NLRC) (UDLI 24%, 95% CI 20-28% vs. NLRC 4%, 95% CI 2-6%) after the anti-immigrant statements made during the 2016 US presidential campaign. The authors found that this fear ultimately led migrants to delay care, which could suggest migrants presented with more urgent conditions, contrary to what the other studies have reported.

Admissions for ACSC

Admissions for ACSC, namely medical conditions for which hospitalization is not needed when primary care is timely and effective, occurred more for migrant populations, as compared to host populations (Table  3 ). Brandenberger et al. [ 56 ] found that, in Switzerland, 10.74% of asylum-seeking pediatric patients’ admissions were for ACSC and happened via the ED, while the percentage dropped to 9.45% for the host population. Similarly, Lichtl et al. [ 60 ] found that asylum-seeking pediatric patients were 4.89 times (95%CI: 4.10-5.85) more likely to use emergency outpatient services for ACSC than the general population in Heidelberg (Germany), with children up to three years old being the most likely to use the ED for ACSC (OR: 1.19; 95%CI: 1.0-1.42). The authors mention as a possible explanation for this finding how asylum seekers might have insufficient knowledge and information on the host country’s health system, which may lead to utilizing emergency outpatient services even for conditions that could be treated at a primary care level.

Self-referral and walk-in access

When it comes to the modality of referral, the included studies report a trend toward increased self-referrals, and walk-in accesses, by migrants compared to non-migrants (Table  3 ). Klukowska-Roetzler et al. [ 59 ] found a higher percentage of Southeast European migrants visiting the ED upon self-referral compared to Swiss patients (59.9% vs. 41.2%), which were instead referred by ambulance to a greater extent (16.2% vs. 7.7%). Similarly, Mahmoud et al. [ 67 ] compared ED utilization across three groups, namely non-English speaking non-native patients (NESB), English-speaking non-native patients (ESB-NBA), and English-speaking native Australian patients (ESB-BA) and found that NESB patients were less likely to contact a GP before seeking care at the ED compared to ESB-BA patients (OR: 0.6; 95%CI: 0.4-0.8). These findings are in agreement with those from Chan et al. [ 72 ] who found a significantly lower percentage of FWs arriving by ambulance compared to the native population in Singapore (6.1% vs. 13.3%; p -value < 0.001). As brought about by Di Napoli et al. [ 63 ] in a study conducted in Italy, the limited working hours of GPs may represent a barrier to accessing primary care services, especially for those people having precarious working conditions. Yet, the results of Klingberg et al. [ 58 ] go in the opposite direction, as they found a smaller percentage of asylum seekers visiting the ED without prior consultation with a GP than Swiss patients (63.2% vs. 67.6%).

Reasons for accessing the ED

Al-Hajj et al. [ 71 ] found that refugees experienced a higher proportion of occupational injuries compared to Lebanese nationals (12.4% vs. 4.9%, p -value < 0.001) and explain this difference by noting how the refugee’s male workforce may be exposed to hazardous workplace conditions in industrial or construction sites, which may increase their likelihood of being injured. The regression analysis also shows that being a refugee increases the odds of sustaining cuts/bites/open wounds (OR: 1.30; 95%CI: 1.07-1.58), concussion (OR: 1.72; 95%CI: 1.15-2.57), gunshot or stab injuries (OR: 3.392, 95%CI=2.605-4.416), and organ system injury (OR: 1.77; 95%CI: 1.16-2.7), as well as lower odds for presenting with a bruise (OR: 0.74, 95%CI: 0.61-0.90).

Ro et al. [ 54 ] compared the ED visits between undocumented migrants and individuals covered by MediCal, an insurance scheme that covers individuals with low income, both natives and authorized foreign-born individuals. The authors identified higher odds of having a COVID-19-related ED visit among young undocumented patients than young MediCal patients (OR: 1.37; 95%CI: 1.24-1.52). Similarly, Huynh et al. [ 53 ] further expanded the analysis by comparing ED visits for COVID-19 between undocumented migrants and MediCal patients over time, finding higher percentages of COVID-19-related visits in the former (5.9% vs. 3.7%). The authors reject the hypothesis that undocumented patients were over-reliant on EDs compared to MediCal patients: a sensitivity check highlighted how undocumented migrants were less likely to go to the ED for heart failure than MediCal patients (OR: 0.66; 95%CI: 0.55-0.79) in the same period. Thus, it appears that the differences in ED utilization for COVID-19-related needs have to be ascribed to higher rates of COVID-19 infections among undocumented patients. The choice of MediCal patients as a comparison group implies that authorized foreign-born individuals are analyzed together with US citizens, posing potential issues in the interpretation of results. However, the Public Policy Institute of California reports that applicants may face waiting periods of several years to become legal permanent residents [ 74 ]; thus, we assume that a longer stay in the country would be a proxy for a higher level of knowledge of the functioning of the healthcare system. This likely leads to a health-seeking behavior more similar to that of US citizens compared to undocumented immigrants.

Hospitalization and discharge

The results show varying patterns concerning ED contacts resulting in hospitalization, as well as discharge, for migrants compared to non-migrants (Table  4 ).

Klukowska-Roetzler et al. [ 59 ] showed that immigrants from Southeast Europe were hospitalized to a lesser extent than native Swiss patients (21.0% vs. 34.5%), yet those triaged with more urgency had a higher hospitalization rate. Al-Hajj et al. [ 71 ] found lower hospitalization rates for refugees as compared to local Lebanese patients (7.1% vs. 10.3%; p -value = 0.018) and, along the same lines, Zunino et al. [ 70 ] found lower hospitalization rates for migrant children in France, as compared to children from the local population (9% vs. 14.6%). As for the latter, it is important to acknowledge a significant selection bias, as migrant children with more serious health conditions were not counted in emergency visits; yet, these findings seem to agree with migrants receiving lower triage codes.

Conversely, Abdulla et al. [ 52 ] found higher hospitalization rates for infants of immigrant mothers compared to natives in the US (13% vs. 8%; p -value = 0.06), proposing illness severity, challenges with communication or discharge planning as possible reasons. Brandenberger et al. [ 56 ] found that the proportion of ED contacts leading to admission was higher in asylum seekers compared to non-asylum seekers (25% vs. 10%).

Huynh et al. [ 53 ] compared ED visits for COVID-19 between undocumented migrants and MediCal patients, finding undocumented patients to be as likely to have a visit resulting in admission as MediCal patients (OR: 1.05; 95%CI: 0.80-1.38).

Al-Hajj et al. [ 71 ] found a higher percentage of refugees leaving the hospital Against Medical Advice (AMA) compared to locals (5.6% vs. 2.8%, p -value < 0.001). The authors explained the finding by mentioning refugees’ limited access to health care and limited resources, which could result in them being unable to sustain the costs associated with hospital admission. Similarly, Chan et al. [ 72 ] found that AMA discharges for FWs visiting the ED in Singapore were more numerous than for the general population (11.3% vs. 4.3%; p -value < 0.001), with the majority of AMA discharges being for non-trauma-related conditions. These findings could further justify lower hospitalization rates among migrant populations.

As for the length of stay in the ED, Klingberg et al. [ 58 ] examined emergency care utilization of asylum seekers in Switzerland and found a shorter median length of stay for asylum seekers as compared to the host population (3.09h vs. 3.22h; p -value = 0.141). On the opposite, Gulacti et al. [ 73 ] assessed ED utilization by Syrian refugees in Türkiye and found that the median length of stay in the ED was significantly longer for refugees than for the host population (8.54h vs. 5.95; p -value < 0.001). Similarly, Zunino et al. [ 70 ] found that the average length of stay for migrants was 3.9h, slightly longer than visits of other patients ( p -value < 0.025). Language and communication barriers, with limited use of interpreters, could significantly influence the length of stay in the ED [ 58 , 70 ].

This systematic review gathered and summarized published literature highlighting differences in ED utilization between migrant and non-migrant populations. Our findings did not suggest a single pattern regarding migrants’ access to and use of EDs. Some studies [ 52 , 63 , 66 ] reported a higher frequency and/or likelihood of ED visits among migrants, while others [ 57 , 69 ] reported a higher frequency and/or likelihood of ED visits among non-migrants. Some authors agree on the fact that migrants tend to visit the ED for less urgent conditions compared to the host populations [ 59 , 62 , 63 , 72 ]; however, there is disagreement about whether migrants are more [ 52 , 57 ], or less [ 59 , 70 , 71 ] hospitalized than non-migrants. Migrants are consistently reported as being more prone to leaving the hospital AMA than the host population [ 71 , 72 ]; more consistency was found regarding the mode of access: compared to non-migrants, migrants seek care at the ED without consulting a GP first more often [ 58 , 67 ] and access the ED via ambulance less often [ 59 , 72 ].

Several considerations can be made in regard to the lower reliance on ambulances. The first one is associated with contextual factors, as in countries where the cost of ambulance services is covered only for urgent conditions, like Singapore or Switzerland, migrants of low socioeconomic status (SES) may be unwilling to take the risk of being charged. Second, migrants may be impeded from using this service because i) calling an ambulance implies knowing the local emergency number and migrants often have insufficient knowledge of the health system [ 60 , 64 , 67 , 75 ], ii) contacting the local ambulance or calling the emergency number might be challenging for those who do not speak the local language.

Higher utilization of the ED can be ascribed to poor access to PHC services [ 52 , 63 , 66 ]. Host country’s healthcare policies may prevent irregular migrants from accessing PHC services. Other groups of migrants may face barriers when trying to register for PHC services or may be unaware of their entitlement to a GP. The higher rates of admissions and hospitalizations due to ACSC for migrants compared to non-migrants seem to confirm the hypothesis that PHC services are not easily accessible for migrants [ 56 , 60 ]. Yet, ACSC rates are shown to be high for non-migrants too, suggesting potential structural issues regarding the use of PHC. Additionally, practitioners usually work by appointment and require booking by phone. This is challenging for migrants who do not speak the local language, have strict working schedules, or are employed under irregular contracts that prevent them from requesting time off.

The findings of this systematic review reporting migrants’ lower utilization of the ED compared to non-migrants [ 51 , 55 , 57 , 69 ] can be interpreted according to the “healthy immigrant effect”. According to this theory, immigrants have better health outcomes than native-born residents and therefore their need for healthcare, including ED care, would be lower. This is evident in included articles that report lower triage codes among migrants [ 59 , 70 ], which could mean that migrants are generally in better health conditions compared to natives, regardless of the overall number of ED accesses. Similar results were found in a study conducted in a large urban ED of Parma (Italy) [ 76 ] that analyzed ED records from 2008 to 2012 , in which a significantly higher rate of low acuity triage codes was reported for migrants compared to non-migrants. According to the authors, this difference was partially attributable to the younger average age of the migrant population, less affected by the chronic conditions that characterize the local, aging population.

The findings of this review should be interpreted according to the SDH, which have a major impact on people’s health and well-being, and affect migrants’ utilization of healthcare services. In the article by Abdulla et al. [ 52 ], immigrant mothers were more likely than non-immigrant ones to visit the ED in the weeks after discharge, as a result of the combined effect of migrant status and poverty. In another included study, unsafe working conditions were the possible cause for the higher rate of ED visits for injuries among refugee men compared to Lebanese men [ 71 ]. ED utilization has also been studied specifically in light of patients’ SDH. A study investigating ED use of a Medicaid cohort found that the need for ED care and the number of visits that could have been treated in a PHC setting increased as the SDH characteristics worsened, with patients facing food insecurity, unemployment, and housing instability [ 77 ]. Migration is a SDH too, as it significantly influences health outcomes by exposing people to barriers directly related to migratory status, such as fear of deportation and insecure working conditions [ 78 , 79 ].

Differences in access to public versus private hospitals between migrants and non-migrants [ 71 ] can reflect inequalities within highly privatized health systems, where public hospitals provide inpatient acute care and the private sector specializes in more technologically advanced care, which is typically sought by wealthier people. In such cases, access to public services becomes disputed between nationals and refugees, creating tensions, as in the case of Lebanon [ 80 ]. The same trend was identified in a multi-country study [ 81 ] conducted by the European Social Policy Network (ESPN), which shows that wealthier patients in countries such as Austria, Spain, and Finland often bypass waiting times in the public sector by consulting a practitioner privately and paying out-of-pocket. As a result, waiting times significantly worsen for economically disadvantaged people.

Despite this study not being strictly focused on intragroup differences among migrants, such differences exist, especially concerning documented versus undocumented migrants, and are reported in several studies regarding access to the ED. Ro et al. [ 82 ] compared ED utilization between undocumented Latino patients and MediCal-insured Latino patients in Los Angeles, finding a lower rate of ED visits in the former group as compared to the latter (544.25 vs. 571.08). The same study confirmed that undocumented patients experienced a steeper decline in ED utilization during the COVID-19 pandemic than MediCal-insured patients. A 2018 systematic review of studies conducted in Europe [ 83 ] reported a lower utilization of healthcare services among undocumented migrants compared to documented migrants. This pattern was often attributed to an existing gap between the health entitlements of undocumented migrants and their service utilization, due to barriers such as lack of awareness, fear, and socioeconomic factors.

To summarize, our systematic review identified several barriers (Supplementary material  3 ) that could be possible drivers for the inequities experienced by migrants. These can be categorized according to Andersen’s expanded behavioral model of health service use [ 11 ].

Among the contextual factors, which can be referred to as “systemic”, there are public charge, fear of discovery [ 84 ], safety concerns, low availability of interpreters, long waiting times for a referral, GPs’ working hours, and lack of entitlement to a GP.

When it comes to predisposing characteristics, language was the main hindrance to accessing EDs for migrants across different host countries.

Finally, several enabling factors that can facilitate or impede the utilization of health services - in our case, the ED - were mentioned: low SES, communication issues with providers due to different perceptions of pain and urgency, lack of insurance, lack of knowledge of the local healthcare system, transportation problems, difficulties in obtaining information, lack of family support and loss of previous social networks, as well as precarious working conditions. While the aforementioned barriers are presented as compartmentalized, migrants’ inequities tend to arise from the presence of several barriers reinforcing and influencing each other.

Recommendations

All the studies included in this review but one adopted a quantitative methodology. More qualitative research engaging both migrants and healthcare providers is needed, as it would allow a deep understanding of migrants’ health-seeking behavior, and their experience when utilizing the ED. We urge authors to present disaggregated data (e.g., age, home country, legal status, SES, and length of stay) in a clear, accurate, and consistent manner to enable the identification of subgroups collectively referred to as “migrants”. To advance research in this field, terms referring to migrants should be used more consistently. Authors tend to rather use terms such as “migrant” and “immigrant” interchangeably, or to apply their own criteria to define this population. While terms such as “asylum seeker” and “refugee” are mostly agreed upon, “migrant” and “immigrant” are typically intended and used at the authors’ discretion.

At the institutional level, we recommend policymakers and health authorities take into consideration inequalities affecting migrants and implement specific interventions to facilitate their access to care. There is a pressing need for tailored and sustainable strategies that consider the diverse health needs of migrants and the deficiencies existing within the healthcare systems of the host countries [ 85 ]. Possible strategies include developing health literacy programs, integrating migrants in the development and implementation of health policies, and extending the availability of interpreters and cultural mediators in health facilities [ 20 , 85 ].

Strengths and limitations

To the best of our knowledge, this is the first systematic review exploring migrants’ access to the ED without applying geographical restrictions, thus allowing a more comprehensive understanding of the phenomenon. Moreover, by focusing on articles that compare migrant and non-migrant populations, this review provides precious insights into the inequities faced by migrants in host countries. This review also has some limitations. First, no gray literature was included in the search process. Second, the search was restricted only to articles written in English or Italian. Third, the studies included in this review adopted different definitions of migrants, thus preventing a deeper exploration of the factors influencing ED utilization among specific communities. Fourth, the paucity and heterogeneity of included studies prevented their quality appraisal. Nevertheless, details were provided on the type of studies and methodological aspects to enable the reader to understand what studies the results came from. Fifth, the choice of including countries with different health systems and economic conditions may hinder the generalizability of the findings.

This systematic review gathered and summarized published literature comparing ED utilization between migrant and non-migrant populations to identify differences in access to care and utilization of the ED. Overall, this review highlights that a single pattern of ED utilization by migrants can hardly be identified. There is no consensus on whether migrants access EDs more or less than non-migrants, as well as on whether migrants have more or fewer ED contacts resulting in hospitalization. However, migrants tend to access EDs for less urgent conditions, lack a referral from a GP, access the ED as walk-ins in higher proportions, and are more often discharged AMA, as compared to non-migrants. Higher ED utilization and walk-ins can be attributed to poor access to PHC services. Lower rates of hospitalizations may be associated with migrants’ better health outcomes and lower triage levels, or with difficulties in affording hospitalization-related costs. Language barriers, lack of entitlement to GP services, lack of knowledge of the local healthcare system, as well as other barriers, are significant hindrances to migrants’ effective access to healthcare services.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Ambulatory Care Sensitive

Ambulatory Care Sensitive Conditions

Against Medical Advice

Emergency Department

English-Speaking Non-Native

English-Speaking Background Born in Australia

European Social Policy Network

Foreign Worker

General Practitioner

Non-English-Speaking Background

Non-Latino Legal Residents/Citizens 

Primary Health Care

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Socioeconomic Status

Social Determinants of Health

Undocumented Latino Immigrants

United Nations High Commissioner for Refugees

United States

united-nations-population division. International Migrant Stock | Population Division. https://www.un.org/development/desa/pd/content/international-migrant-stock . Accessed 16 June 2023.

iom. About Migration. https://www.iom.int/about-migration . Accessed 16 June 2023.

UNDRR. Disaster | UNDRR. 2007. https://www.undrr.org/terminology/disaster . Accessed 18 June 2023.

world-health organization. Refugee and migrant health. https://www.who.int/news-room/fact-sheets/detail/refugee-and-migrant-health . Accessed 18 June 2023.

Davies AA, Basten A, Frattini C. Migration: A Social Determinant of the Health of Migrants. International Organization for Migration (IOM). 2009. https://migrationhealthresearch.iom.int/migration-social-determinant-health-migrants . Accessed 20 Mar 2024.

Batalova J. Article: Top Statistics on Global Migration and Mi. | migrationpolicy.org. https://www.migrationpolicy.org/article/top-statistics-global-migration-migrants . Accessed 16 June 2023.

Bongard S, Pogge SF, Arslaner H, Rohrmann S, Hodapp V. Acculturation and cardiovascular reactivity of second-generation Turkish migrants in Germany. J Psychosom Res. 2002;53(3):795–803. https://doi.org/10.1016/S0022-3999(02)00347-1 .

Article   PubMed   Google Scholar  

Newbold KB. Chronic Conditions and the Healthy Immigrant Effect: Evidence from Canadian Immigrants. J Ethn Migr Stud. 2006;32(5):765–84. https://doi.org/10.1080/13691830600704149 .

Article   Google Scholar  

Gimeno-Feliu LA, Pastor-Sanz M, Poblador-Plou B, Calderón-Larrañaga A, Díaz E, Prados-Torres A. Overuse or underuse? Use of healthcare services among irregular migrants in a north-eastern Spanish region. Int J Equity Health. 2021;20(1):41. https://doi.org/10.1186/s12939-020-01373-3 .

Article   PubMed   PubMed Central   Google Scholar  

Sarría-Santamera A, Hijas-Gómez AI, Carmona R, Gimeno-Feliú LA. A systematic review of the use of health services by immigrants and native populations. Public Health Rev. 2016;37:28. https://doi.org/10.1186/s40985-016-0042-3 .

Andersen RM. Revisiting the Behavioral Model and Access to Medical Care: Does it Matter? J Health Soc Behav. 1995;36(1):1–10. https://doi.org/10.2307/2137284 .

Article   CAS   PubMed   Google Scholar  

Abubakar I, Aldridge RW, Devakumar D, Orcutt M, Burns R, Barreto ML, et al. The UCL-Lancet Commission on Migration and Health: the health of a world on the move. Lancet. 2018;392(10164):2606–54. https://doi.org/10.1016/S0140-6736(18)32114-7 .

who. Social determinants of health. https://www.who.int/health-topics/social-determinants-of-health . Accessed 15 Feb 2024.

Galanis P, Spyros K, Siskou O, Konstantakopoulou O, Angelopoulos G, Kaitelidou D. Healthcare services access, use, and barriers among migrants in Europe: a systematic review. medRxiv. 2022. https://doi.org/10.1101/2022.02.24.22271449 .

World Health Organization Regional Office for Europe. How health systems can address health inequities linked to migration and ethnicity. World Health Organization. Regional Office for Europe; 2010. https://apps.who.int/iris/handle/10665/345463 . Accessed 18 June 2023.

Hacker K, Anies M, Folb BL, Zallman L. Barriers to health care for undocumented immigrants: a literature review. Risk Manag Healthc Policy. 2015;8:175–83. https://doi.org/10.2147/RMHP.S70173 .

Mona H, Andersson LMC, Hjern A, Ascher H. Barriers to accessing health care among undocumented migrants in Sweden - a principal component analysis. BMC Health Serv Res. 2021;21(1):830. https://doi.org/10.1186/s12913-021-06837-y .

Donnelly TT, Hwang JJ, Este D, Ewashen C, Adair C, Clinton M. If I was going to kill myself, I wouldn’t be calling you. I am asking for help: challenges influencing immigrant and refugee women’s mental health. Issues Ment Health Nurs. 2011;32(5):279–290. https://doi.org/10.3109/01612840.2010.550383 .

Iliadi P. Refugee women in Greece: - a qualitative study of their attitudes and experience in antenatal care. Health Sci J. 2008;2(3):173–80.

Google Scholar  

World Health Organization Regional Office for Europe. Migration and health: enhancing intercultural competence and diversity sensitivity. World Health Organization. Regional Office for Europe; 2020. https://apps.who.int/iris/handle/10665/332186 . Accessed 18 June 2023.

Mangrio E, Sjögren Forss K. Refugees’ experiences of healthcare in the host country: a scoping review. BMC Health Serv Res. 2017;17(1):814. https://doi.org/10.1186/s12913-017-2731-0 .

Abood J, Woodward K, Polonsky M, Green J, Tadjoeddin M, Renzaho A. Understanding immigrant settlement services literacy in the context of settlement service utilisation, settlement outcomes and wellbeing among new migrants: A mixed methods systematic review. Wellbeing Space Soc. 2021;2:100057. https://doi.org/10.1016/j.wss.2021.100057 .

Trentin M, Rubini E, Bahattab A, et al. Vulnerability of migrant women during disasters: a scoping review of the literature. Int J Equity Health. 22;135(2023). https://doi.org/10.1186/s12939-023-01951-1 .

Da Mosto D, Bodini C, Mammana L, Gherardi G, Quargnolo M, Fantini MP. Health equity during COVID-19: A qualitative study on the consequences of the syndemic on refugees’ and asylum seekers’ health in reception centres in Bologna (Italy). J Migr Health. 2021;4:100057. https://doi.org/10.1016/j.jmh.2021.100057 .

Kluge HHP, Jakab Z, Bartovic J, D’Anna V, Severoni S. Refugee and migrant health in the COVID-19 response. Lancet. 2020;395(10232):1237–9. https://doi.org/10.1016/S0140-6736(20)30791-1 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

World Health Organization. Health emergency and disaster risk management framework. Geneva: World Health Organization; 2019. Section: xi, 31 p. https://apps.who.int/iris/handle/10665/326106 . Accessed 18 June 2023.

Credé SH, Such E, Mason S. International migrants’ use of emergency departments in Europe compared with non-migrants’ use: a systematic review. Eur J Public Health. 2018;28(1):61–73. https://doi.org/10.1093/eurpub/ckx057 .

Gimeno-Feliu LA, Calderón-Larrañaga A, Diaz E, Poblador-Plou B, Macipe-Costa R, Prados-Torres A. Global healthcare use by immigrants in Spain according to morbidity burden, area of origin, and length of stay. BMC Public Health. 2016;16(1):450. https://doi.org/10.1186/s12889-016-3127-5 .

Fisher R, Dunn P, Asaria M, Thorlby R. Level or not? - The Health Foundation. https://www.health.org.uk/publications/reports/level-or-not . Accessed 25 Feb 2024.

Boutziona I, Papanikolaou D, Sokolakis I, Mytilekas KV, Apostolidis A. Healthcare Access, Quality, and Satisfaction Among Albanian Immigrants Using the Emergency Department in Northern Greece. J Immigr Minor Health. 2020;22(3):512–25. https://doi.org/10.1007/s10903-020-00983-x .

Norredam M, Mygind A, Nielsen AS, Bagger J, Krasnik A. Motivation and relevance of emergency room visits among immigrants and patients of Danish origin. Eur J Public Health. 2007;17(5):497–502. https://doi.org/10.1093/eurpub/ckl268 .

Petersen LA, Burstin HR, O’Neil AC, Orav EJ, Brennan TA. Nonurgent Emergency Department Visits: The Effect of Having a Regular Doctor. Med Care. 1998;36(8):1249.

De Luca G, Ponzo M, Andrés AR. Health care utilization by immigrants in Italy. Int J Health Care Finance Econ. 2013;13(1):1–31. https://doi.org/10.1007/s10754-012-9119-9 .

Morisod K, Luta X, Marti J, Spycher J, Malebranche M, Bodenmann P. Measuring Health Equity in Emergency Care Using Routinely Collected Data: A Systematic Review. Health Equity. 2021;5(1):801–17. https://doi.org/10.1089/heq.2021.0035 .

Trappolini E, Marino C, Agabiti N, Giudici C, Davoli M, Cacciani L. Disparities in emergency department use between Italians and migrants residing in Rome, Italy: the Rome Dynamic Longitudinal Study from 2005 to 2015. BMC Public Health. 2020;20(1):1548. https://doi.org/10.1186/s12889-020-09280-6 .

Leaman AM, Rysdale E, Webber R. Use of the emergency department by Polish migrant workers. Emerg Med J EMJ. 2006;23(12):918–9. https://doi.org/10.1136/emj.2006.035980 .

Lee J, Bruce J, Wang NE. Opportunities for Supporting Latino Immigrants in Emergency and Ambulatory Care Settings. J Community Health. 2021;46(3):494–501. https://doi.org/10.1007/s10900-020-00889-7 .

Russo V, Santarelli S, Magrini L, Moscatelli P, Altomonte F, Cremonesi G, et al. Multicentre Italian analysis on cardiovascular diseases: impact of immigrants’ referral to emergency department. J Cardiovasc Med (Hagerstown, Md). 2017;18(3):136–43. https://doi.org/10.2459/JCM.0000000000000221 .

Nandi A, Galea S, Lopez G, Nandi V, Strongarone S, Ompad DC. Access to and use of health services among undocumented Mexican immigrants in a US urban area. Am J Public Health. 2008;98(11):2011–20. https://doi.org/10.2105/AJPH.2006.096222 .

Watts DJ, Friedman JF, Vivier PM, Tompkins CEA, Alario AJ. Health care utilization of refugee children after resettlement. J Immigr Minor Health. 2012;14(4):583–8. https://doi.org/10.1007/s10903-011-9530-1 .

Müller M, Klingberg K, Srivastava D, Exadaktylos AK. Consultations by Asylum Seekers: Recent Trends in the Emergency Department of a Swiss University Hospital. PLoS ONE. 2016;11(5):e0155423. https://doi.org/10.1371/journal.pone.0155423 .

Deans AK, Boerma CJ, Fordyce J, De Souza M, Palmer DJ, Davis JS. Use of Royal Darwin Hospital emergency department by immigration detainees in 2011. Med J Aust. 2013;199(11):776–8. https://doi.org/10.5694/mja13.10447 .

Reko A, Bech P, Wohlert C, Noerregaard C, Csillag C. Usage of psychiatric emergency services by asylum seekers: Clinical implications based on a descriptive study in Denmark. Nord J Psychiatry. 2015;69(8):587–93. https://doi.org/10.3109/08039488.2015.1019923 .

Chatzidiakou K, Schoretsanitis G, Schruers KR. Acute Psychiatric Problems among Migrants Living in Switzerland- a Retrospective Study from a Swiss University Emergency Department. Emerg Med Open Access. 2016;6(5). https://doi.org/10.4172/2165-7548.1000338 .

Saeki S, Kurosawa Y, Tomiyama K, Tomizawa R, Honda C, Minamitani K. Foreign Patients Visiting the Emergency Department: A Systematic Review of Studies in Japan. JMA J. 2023;6(2):95–103. https://doi.org/10.31662/jmaj.2022-0177 .

Lebano A, Hamed S, Bradby H, Gil-Salmerón A, Durá-Ferrandis E, Garcés-Ferrer J, et al. Migrants’ and refugees’ health status and healthcare in Europe: a scoping literature review. BMC Public Health. 2020;20(1):1039. https://doi.org/10.1186/s12889-020-08749-8 .

Graetz V, Rechel B, Groot W, Norredam M, Pavlova M. Utilization of health care services by migrants in Europe-a systematic literature review. Br Med Bull. 2017;121(1):5–18. https://doi.org/10.1093/bmb/ldw057 .

Mahmoud I, Hou XY. Immigrants and the utilization of hospital emergency departments. World J Emerg Med. 2012;3(4):245–50. https://doi.org/10.5847/wjem.j.issn.1920-8642.2012.04.001 .

Change, Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate. Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change: Technical Summary. 2022. Cambridge University Press. https://doi.org/10.1017/9781009325844.002 .

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. https://doi.org/10.1136/bmj.n71 .

Ornelas C, Torres J, Torres J, Alter H, Taira B, Rodriguez R. Anti-immigrant Rhetoric and the Experiences of Latino Immigrants in the Emergency Department. Western J Emerg Med. 2021;22(3). https://doi.org/10.5811/westjem.2021.2.50189 .

Abdulla L, McGowan EC, Tucker RJ, Vohr BR. Disparities in Preterm Infant Emergency Room Utilization and Rehospitalization by Maternal Immigrant Status. J Pediatr. 2020;220:27–33. https://doi.org/10.1016/j.jpeds.2020.01.052 .

Huynh MP, Du S, Hanlon C, Yang H, Young A, Ro A. COVID-19-Related Emergency Department Visits Among Undocumented Patients in Los Angeles County. J Health Care Poor Underserved. 2023;34(1):263–74. https://doi.org/10.1353/hpu.2023.0017 .

Ro A, Huynh MP, Bruckner TA, Du S, Young A. COVID-19 case counts and COVID-19 related Emergency Department visits: differences by immigration status, March-December 2020. BMC Public Health. 2022;22(1):1965. https://doi.org/10.1186/s12889-022-14345-9 .

Rodriguez RM, Torres JR, Sun J, Alter H, Ornelas C, Cruz M, et al. Declared impact of the US President’s statements and campaign statements on Latino populations’ perceptions of safety and emergency care access. PLoS ONE. 2019;14(10):e0222837. https://doi.org/10.1371/journal.pone.0222837 .

Brandenberger J, Bozorgmehr K, Vogt F, Tylleskär T, Ritz N. Preventable admissions and emergency-department-visits in pediatric asylum-seeking and non-asylum-seeking patients. Int J Equity Health. 2020;19(1):58. https://doi.org/10.1186/s12939-020-01172-w .

Brandenberger J, Pohl C, Vogt F, Tylleskär T, Ritz N. Health care provided to recent asylum-seeking and non-asylum-seeking pediatric patients in 2016 and 2017 at a Swiss tertiary hospital - a retrospective study. BMC Public Health. 2021;21(1):81. https://doi.org/10.1186/s12889-020-10082-z .

Klingberg K, Stoller A, Müller M, Jegerlehner S, Brown AD, Exadaktylos A, et al. Asylum Seekers and Swiss Nationals with Low-Acuity Complaints: Disparities in the Perceived level of Urgency, Health Literacy and Ability to Communicate-A Cross-Sectional Survey at a Tertiary Emergency Department. Int J Environ Res Public Health. 2020;17(8):2769. https://doi.org/10.3390/ijerph17082769 .

Klukowska-Röetzler J, Eracleous M, Müller M, Srivastava D, Krummrey G, Keidar O, et al. Increased Urgent Care Center Visits by Southeast European Migrants: A Retrospective, Controlled Trial from Switzerland. Int J Environ Res Public Health. 2018;15(9):1857. https://doi.org/10.3390/ijerph15091857 .

Lichtl C, Lutz T, Szecsenyi J, Bozorgmehr K. Differences in the prevalence of hospitalizations and utilization of emergency outpatient services for ambulatory care sensitive conditions between asylum-seeking children and children of the general population: a cross-sectional medical records study (2015). BMC Health Serv Res. 2017;17(1):731. https://doi.org/10.1186/s12913-017-2672-7 .

Sauzet O, David M, Naghavi B, Borde T, Sehouli J, Razum O. Adequate Utilization of Emergency Services in Germany: Is There a Differential by Migration Background? Front Public Health. 2021;8:613250. https://doi.org/10.3389/fpubh.2020.613250 .

Schwachenwalde S, Sauzet O, Razum O, Sehouli J, David M. The role of acculturation in migrants’ use of gynecologic emergency departments. Int J Gynecol Obstet. 2020;149(1):24–30. https://doi.org/10.1002/ijgo.13099 .

Di Napoli A, Ventura M, Spadea T, Giorgi Rossi P, Bartolini L, Battisti L, et al. Barriers to Accessing Primary Care and Appropriateness of Healthcare Among Immigrants in Italy. Front Public Health. 2022;10:817696. https://doi.org/10.3389/fpubh.2022.817696 .

Di Napoli A, Rossi A, Battisti L, Cacciani L, Caranci N, Cernigliaro A, et al. Valutazione dell’assistenza sanitaria della popolazione immigrata in Italia attraverso alcuni indicatori di un sistema nazionale di monitoraggio. Epidemiol Prev. 2020;44(5-6 Suppl 1):85–93. https://doi.org/10.19191/EP20.5-6.S1.P085.077 .

Henares-Montiel J, Ruiz-Perez I, Mendoza-Garcia O. Health inequalities between male and female immigrants in Spain after the beginning of the economic crisis. Health Soc Care Community. 2018;26(6):891–7. https://doi.org/10.1111/hsc.12613 .

Rodríguez-Álvarez E, Lanborena N, Borrell LN. Health Services Access Inequalities Between Native and Immigrant in a Southern European Region. Int J Health Serv. 2019;49(1):108–26. https://doi.org/10.1177/0020731418809858 .

Mahmoud I, Eley R, Hou XY. Subjective reasons why immigrant patients attend the emergency department. BMC Emerg Med. 2015;15(1):4. https://doi.org/10.1186/s12873-015-0031-8 .

Etowa J, Sano Y, Hyman I, Dabone C, Mbagwu I, Ghose B, et al. Difficulties accessing health care services during the COVID-19 pandemic inCanada: examining the intersectionality between immigrant status and visible minority status. Int J Equity Health. 2021;20(1):255. https://doi.org/10.1186/s12939-021-01593-1 .

Xi S, Song Y, Li X, Li M, Lu Z, Yang Y, et al. Local-Migrant Gaps in Healthcare Utilization Between Older Migrants and Local Residents in China. J Am Geriatr Soc. 2020;68(7):1560–7. https://doi.org/10.1111/jgs.16421 .

Zunino L, Colineaux H, Claudet I, Bréhin C. Description of a migrant pediatric population visiting the Toulouse Children’s Hospital emergency department. Arch Pediatr. 2021;28(7):514–9. https://doi.org/10.1016/j.arcped.2021.08.002 .

Al-Hajj S, Chahrour MA, Nasrallah AA, Hamed L, Pike I. Physical trauma and injury: A multi-center study comparing local residents and refugees in Lebanon. J Global Health. 2021;11:17001. https://doi.org/10.7189/jogh.11.17001 .

Chan JS, Chia DW, Hao Y, Lian SW, Chua MT, Ong ME. Health-seeking behaviour of foreign workers in Singapore:Insights from emergency department visits. Ann Acad Med Singap. 2021;50(4):315–324. https://doi.org/10.47102/annals-acadmedsg.2020484 .

Gulacti U, Lok U, Polat H. Emergency department visits of Syrian refugees and the cost of their healthcare. Pathog Glob Health. 2017;111(5):219–24. https://doi.org/10.1080/20477724.2017.1349061 .

ppic. The Immigration and Citizenship Process. https://www.ppic.org/publication/the-immigration-and-citizenship-process/ . Accessed 15 Feb 2024.

Abbott S, Riga M. Delivering services to the Bangladeshi community: the views of healthcare professionals in East London. Public Health. 2007;121(12):935–41. https://doi.org/10.1016/j.puhe.2007.04.014 .

Zinelli M, Musetti V, Comelli I, Lippi G, Cervellin G. Emergency department utilization rates and modalities among immigrant population. A 5-year survey in a large Italian urban emergency department. Emergency Care J. 2014;10(1). https://doi.org/10.4081/ecj.2014.1896 .

McCarthy ML, Zheng Z, Wilder ME, Elmi A, Li Y, Zeger SL. The Influence of Social Determinants of Health on Emergency Departments Visits in a Medicaid Sample. Ann Emerg Med. 2021;77(5):511–22. https://doi.org/10.1016/j.annemergmed.2020.11.010 .

Fleischman Y, Willen SS, Davidovitch N, Mor Z. Migration as a social determinant of health for irregular migrants: Israel as case study. Soc Sci Med. 2015;147:89–97. https://doi.org/10.1016/j.socscimed.2015.10.046 .

Viruell-Fuentes EA, Miranda PY, Abdulrahim S. More than culture: Structural racism, intersectionality theory, and immigrant health. Soc Sci Med. 2012;75(12):2099–106. https://doi.org/10.1016/j.socscimed.2011.12.037 .

Blanchet K, Fouad FM, Pherali T. Syrian refugees in Lebanon: the search for universal health coverage. Confl Health. 2016;10(1):12. https://doi.org/10.1186/s13031-016-0079-4 .

Baeten R, Spasova S, Vanhercke B, Coster S. Inequalities in access to healthcare. A study of national policies 2018. 2018. https://doi.org/10.2767/371408 .

Ro A, Bruckner TA, Huynh MP, Du S, Young A. Emergency Department Utilization Among Undocumented Latino Patients During the COVID-19 Pandemic. J Racial Ethnic Health Disparities. 2022. https://doi.org/10.1007/s40615-022-01382-8 .

Winters M, Rechel B, de Jong L, Pavlova M. A systematic review on the use of healthcare services by undocumented migrants in Europe. BMC Health Serv Res. 2018;18(1):30. https://doi.org/10.1186/s12913-018-2838-y .

Omarzu J. A Disclosure Decision Model: Determining How and When Individuals Will Self-Disclose. Pers Soc Psychol Rev. 2000;4(2):174–85. https://doi.org/10.1207/S15327957PSPR0402_05 .

Mladovsky P, Rechel B, Ingleby D, McKee M. Responding to diversity: an exploratory study of migrant health policies in Europe. Health Policy (Amsterdam, Netherlands). 2012;105(1):1–9. https://doi.org/10.1016/j.healthpol.2012.01.007 .

Download references

Acknowledgements

This manuscript is the result of a study conducted in the framework of the International PhD in Global Health, Humanitarian Aid, and Disaster Medicine organized by Università del Piemonte Orientale (UPO). The authors are thankful to dr. Edoardo Mello Rella for his technical support.

Code availability

Not applicable.

This study was supported by Fondazione Cariplo (grant number 2022-1447).

Author information

Authors and affiliations.

CRIMEDIM - Center for Research and Training in Disaster Medicine, Humanitarian Aid and Global Health, Universià del Piemonte Orientale, Novara, 28100, Italy

Giulia Acquadro-Pacera, Martina Valente, Giulia Facci, Francesco Della Corte, Francesco Barone-Adesi, Luca Ragazzoni & Monica Trentin

Department for Sustainable Development and Ecological Transition, Università del Piemonte Orientale, Vercelli, 13100, Italy

Martina Valente & Luca Ragazzoni

Department of Translational Medicine, Università del Piemonte Orientale, Novara, 28100, Italy

Giulia Facci, Francesco Della Corte, Francesco Barone-Adesi & Monica Trentin

School of Medicine, Università del Piemonte Orientale, Novara, 28100, Italy

Bereket Molla Kiros

You can also search for this author in PubMed   Google Scholar

Contributions

M.V. and M.T. conceived the original idea. G.A.P., G.F, and B.M.K retrieved and analyzed data. G.A.P, M.V, G.F and M.T wrote the manuscript. M.V provided methodological support to the study. M.T coordinated the study. F.D.C, F.B.A, and L.R provided senior supervision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Monica Trentin .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., supplementary material 4., supplementary material 5., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Acquadro-Pacera, G., Valente, M., Facci, G. et al. Exploring differences in the utilization of the emergency department between migrant and non-migrant populations: a systematic review. BMC Public Health 24 , 963 (2024). https://doi.org/10.1186/s12889-024-18472-3

Download citation

Received : 16 November 2023

Accepted : 28 March 2024

Published : 05 April 2024

DOI : https://doi.org/10.1186/s12889-024-18472-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Emergency department
  • Access to care
  • Inequalities

BMC Public Health

ISSN: 1471-2458

quantitative research methods peer reviewed articles

  • Open access
  • Published: 03 April 2024

A mixed methods systematic literature review of barriers and facilitators to help-seeking among women with stigmatised pelvic health symptoms

  • Clare Jouanny   ORCID: orcid.org/0000-0002-4959-5901 1 ,
  • Purva Abhyankar   ORCID: orcid.org/0000-0002-0779-6588 2 &
  • Margaret Maxwell   ORCID: orcid.org/0000-0003-3318-9500 3  

BMC Women's Health volume  24 , Article number:  217 ( 2024 ) Cite this article

271 Accesses

8 Altmetric

Metrics details

Women’s pelvic health is a globally important subject, included in international and United Kingdom health policies, emphasising the importance of improving information and access to pelvic health services. Consequences of pelvic symptoms are intimate, personal, and varied, often causing embarrassment and shame, affecting women’s quality of life and wellbeing.

To understand the experience of seeking healthcare for stigmatised pelvic health symptoms by synthesising all types of published primary research and mapping the results to behavioural theory, to identify potential targets for intervention.

Systematic search of MEDLINE, CINAHL, PsycINFO, SocINDEX, PubMED databases, CDSR and CENTRAL registers, from inception to May 2023 for all types of research capturing women’s views and experiences of seeking help with stigmatised urogenital and bowel symptoms. Studies only reporting prevalence, predictors of help-seeking, non-health related help-seeking, or written in languages other than English, German, French, Spanish and Swedish were excluded. Reference checking and forward citation searching for all included studies was performed. A results-based synthesis approach was used to integrate quantitative and qualitative data. Themes were mapped to the Common-Sense model and Candidacy framework. The Mixed Methods Appraisal Tool was used for critical appraisal. Grading of Recommendations Assessment, Development and Evaluation - Confidence in Evidence from Reviews of Qualitative research for assessing certainty of review findings.

86 studies representing over 20,000 women from 24 high income countries were included. Confidence was high that barriers to help-seeking were similar across all study types and pelvic symptoms: stigma, lack of knowledge, women’s perception that clinicians dismissed their symptoms, and associated normalising and deprioritising of low bother symptoms. Supportive clinicians and increased knowledge were key facilitators.

Conclusions

Using the Common-Sense Model to explore women’s help-seeking behaviour with stigmatised pelvic symptoms reveals problems with cognitive representation of symptom identity, emotional representations of embarrassment and shame, and a subjective norm that women believe their symptoms will be trivialised by clinicians. Together these barriers frustrate women’s identification of their candidacy for healthcare. Addressing these issues through behavioural change interventions for women and clinicians, will help to achieve universal access to pelvic healthcare services (United Nations Sustainable Development Goal 3.7).

Systematic Review Registration

PROSPERO CRD42021256956.

Peer Review reports

Women’s health is finally emerging as a globally important subject. United Nations (UN) Sustainable Development Goals (SDG) 3.7 states we should “by 2030 ensure universal access to sexual and reproductive health care services, including for family planning, information and education, and the integration of reproductive health into national strategies and programmes” [ 1 ]. In the United Kingdom (UK), there is growing emphasis on promoting education on women’s health issues, reducing associated stigma, and increasing access to reliable information about women’s health [ 2 , 3 ].

Many women’s health symptoms are considered difficult to talk about, both by women, health care professionals (clinicians) and the public in general [ 4 , 5 ]. Stigma surrounding pelvic symptoms (including urogynaecological and bowel symptoms) matters because it stops women from seeking help. Symptoms such as urinary incontinence (UI) and prolapse can be addressed through early detection and timely receipt of conservative therapies such as pelvic floor muscle training [ 6 , 7 ]. Although not life threatening, these pelvic symptoms are common: pelvic floor dysfunction (PFD) including urinary and faecal incontinence, bladder, bowel, and sexual dysfunction, prolapse and persistent pelvic pain, is prevalent in up to 50% of women [ 8 ], and has a significant impact on women’s quality of life and physical, mental, and social wellbeing [ 9 , 10 ]. The intimate, personal and varied nature of pelvic symptoms, causes significant embarrassment and shame, leading to further psychological distress, reduced functioning, poor body image and social and occupational difficulties [ 9 , 11 , 12 , 13 , 14 ].

Despite the widespread experience of pelvic symptoms, the number of women who seek healthcare is relatively low, as evidenced by most prevalence data on healthcare seeking related to UI. In a large population from the Nurses’ Health Study I and II, of 94,692 middle aged and older women with UI, only 34% reported discussing their symptoms with a clinician [ 15 ]. Similarly, in a web-based survey of 5,861 Danish women experiencing UI, only 29% had sought professional help [ 16 ]. In the UK, a postal evaluation of 2,414 women registered to a general practice found UI prevalent in 40% but only 17% sought professional help [ 14 ]. More stigmatized pelvic symptoms were included in an online survey of 376 Australian women: 99% had bladder, bowel, sexual dysfunction or prolapse, with 51% seeking help [ 5 ], but in the United States (US), only 29% of 938 women aged 45years or more with accidental bowel leakage sought care [ 17 ]. Two recent systematic literature reviews exploring experiences of prolapse, found that despite the availability of effective early treatment options, women lack knowledge and awareness about symptoms and available treatments [ 10 , 18 ].

It is important to understand the barriers and facilitators women experience when seeking healthcare for stigmatized pelvic symptoms to develop approaches to increase knowledge and awareness among the public and clinicians, to encourage women to seek healthcare when necessary and design or redesign services to meet women’s needs. Literature on barriers and facilitators to help seeking with pelvic symptoms does exist but is spread across different conditions or symptom groups, settings, and populations, and has been generated using different methodologies. To our knowledge, this literature on barriers and facilitators has not been brought together systematically to share learning across different conditions, populations, and methodologies.

This systematic review aimed to identify the barriers and facilitators women in high income countries face in seeking help for stigmatised pelvic symptoms. We used the Common-Sense Model of Self-Regulation of Illness and Behaviour (CSM) [ 19 , 20 , 21 ], a model from health psychology, to synthesise and interpret the review’s findings as it helps explain how people behave (e.g. whether to seek help or not) in reponse to potential health threats (e.g. experience of symptoms or receipt of a diagnosis). The model argues that, on being faced with a possible health threat (such as pelvic symptoms), people are triggered to respond, which takes place in three stages. In Stage 1, people interpret or make sense of the threat in relation to previous experiences and their sociocultural environment, to form beliefs about what condition they have, its likely cause, consequences, duration, and cure/controllability (‘interpretation’). These beliefs are also accompanied by emotional responses to the health threat. In Stage 2, they decide how to cope with the threat (‘coping’), which may include going to a doctor, taking medication, self-care (‘approach coping’) or denial, wishful thinking (‘avoidance coping’). In Stage 3 they assess if their way of coping was effective in returning to a normal state of self (‘appraisal’). The model was recently extended to include people’s beliefs about the behaviour and treatment as determinants of coping procedures and illness outcomes, in addition to illness representations [ 22 ].

The review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA) statement [ 23 ]. PRISMA checklists are available (Additional File 4 ). PROSPERO protocol registration number CRD42021256956.

The SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) search structure [ 24 ] was chosen as the conceptual framework to specify the review question, develop selection criteria and design search strategy. Although its authors [ 24 ] found that SPIDER was not as sensitive as a traditional PICO [ 25 ], it has been recommended as a systematic and rigorous tool in reviews addressing non-quantitative research questions and offers an optimal balance between sensitivity and specificity in searching [ 26 ] and more easily managed results [ 24 ]. Table  1 shows the framework concepts.

Eligibility criteria

Included pelvic symptoms (Sample) were limited to those likely to affect quality, rather than length, of life. From the literature, pelvic symptoms associated with a degree of stigma in disclosure, and eligible for inclusion in this literature review, were prolapse [ 27 ], urinary and faecal incontinence [ 12 , 28 ], sexual dysfunction, PFD [ 5 ], genital infections such as warts and herpes [ 29 , 30 ], pelvic pain, and abnormal uterine bleeding [ 4 , 31 ]. Some pelvic symptoms arise from issues such as intimate partner violence (IPV), rape, abortion, infertility, female genital mutilation, Human Immunodeficiency Virus/ Acquired Immunodeficiency Syndrome, Human Papilloma Virus, and urogynaecological cancers. These issues were excluded in favour of including the symptoms that may result from them. Table  2 shows the full list.

The Phenomenon of Interest was help-seeking and its alternative terms. Any study design that captured help-seeking views were included. Evaluation included barriers and facilitators that women expressed about seeking help. The ‘research type’ included peer reviewed, published, qualitative, quantitative, or mixed methods primary studies, set in high-income countries only. A summary of eligibility criteria is in Table  3 .

Information sources

Databases were searched using the platform EBSCOhost: MEDLINE, CINAHL complete, PsycINFO, SocINDEX with Full Text; PubMed, and the Cochrane Database of Systematic Reviews (CDSR), Cochrane Central Register of Controlled Trials (CENTRAL); primary studies included in topic relevant systematic reviews; reference list checking of included studies; forward citation searching of included studies in Scopus. Studies were included from year of inception of databases searched, to May 2023.

Search strategy

Scoping searches, MeSH headings used in known relevant studies, thesaurus, and the author’s clinical experience were used to identify subject headings and key words for pelvic symptoms, barriers, and facilitators to seeking healthcare. Peer Review of Electronic Search Strategies (PRESS) checklist [ 32 ] was applied by a medical information specialist. Ethical approval was not sought because this review synthesised results from primary research studies already published. The final search included a combination of terms related to two main concepts: stigmatised pelvic symptoms (Sample) AND help seeking (Phenomenon of Interest). Table  4 shows an example of the search strategy used in MEDLINE. The search strategy was translated by hand for the other databases and registers searched.

Selection process

After removing duplicates, all retrieved studies were screened by title and abstract by two independent reviewers. 10% of full texts were independently dual screened, with substantial agreement (83%; prevalence and bias adjusted kappa [PABAK] 0.66). Study authors were contacted by email where information was unclear or appeared missing, with a response time of three weeks, after which studies were excluded.

Data extraction

A data extraction form designed using Excel, with data items informed by Noyes, Booth [ 33 ] and NICE [ 34 ]. was reviewed and piloted by the research team. Data were extracted by the author, and independently from 33% of included papers by a research assistant. Quantitative data on barriers and facilitators to help-seeking were copied verbatim into the data extraction form and narratively summarised. Qualitative data recording participants’ help seeking views or experiences, found in results or discussion sections, were copied verbatim into NVivo software for analysis.

Quality assessment

The Mixed Methods Appraisal Tool (MMAT) [ 35 ] was used to appraise the methodological quality of each study by the author, and jointly for 33% of studies by a research assistant. The MMAT is pilot tested, interrater reliability tested, and offered five study design categories (one qualitative, three quantitative and one mixed methods) with five core criteria. Information about which areas of a study were problematic are reported, rather than summative scores because this gives more detail.

Data analysis and synthesis

Quantitative data were narratively synthesised, with content analysis of barriers and facilitators, and discussed and agreed with co-authors. Primary qualitative data were extracted and imported into NVivo software, before coding into pre-existing concepts from the analysis of quantitative data, with new concepts added as necessary. Reflecting on patterns and meaning in the data, themes were generated, developed, and reviewed at length through reflective thematic analysis [ 36 ], sense-checked with co-authors, and refined before naming and definition. Quotations from participants were used to illustrate themes. Synthesis of quantitative, qualitative, and mixed methods results drew together themes about barriers and facilitators to healthcare seeking with stigmatised pelvic symptoms, which were mapped to the CSM. Mapping the data to theory helped to explain the relationship of identified themes to help-seeking behaviours and identify potential targets for intervention.

Assessment of confidence in cumulative evidence

Grading of Recommendations Assessment, Development and Evaluation - Confidence in Evidence from Reviews of Qualitative research (GRADE-CERQual) [ 37 ], was used to assess confidence in the findings in terms of methodological limitations, relevance to the review aim, coherence of the review findings in relation to the primary data, and the adequacy of data presented in the primary studies. The Data Richness Scale [ 38 ] was used to assess adequacy of qualitative data.

Reflexivity. The authors have backgrounds in pelvic health physiotherapy, with lived experience (CJ), applied health research (PA, MM), health psychology (PA) and sociology (MM). Before conducting the review, the authors considered their own philosophical positions, context, and life experiences in discussion with each other, to facilitate transparency of relevant preconceptions and beliefs.

Results of search

The electronic search generated 4,527 papers, and reference list checking and forward citation searching found 572 papers. After removal of duplicates, 3,963 titles and abstracts were screened, of which 3,569 were excluded, leaving 394 studies. It was not possible to access 20 papers, and eligibility criteria were not met by 215 papers after full text screening, leaving 159 papers that met all inclusion criteria (53 quantitative, 101 qualitative, 5 mixed methods). Initially, studies were not excluded based on publication year. However, it became apparent that the publication year of included studies ranged from 1988 to 2023, with 48.3% published between 1988 and 2010. This range encompassed a period of significant technological and cultural change, that occurred following the turn of the millennium (e.g., emergence of world wide web). It was speculated whether women’s experiences of barriers and facilitators were the same or had changed due to developments and cultural changes over this period. To test this speculation, data from all quantitative studies were extracted, and content analysis used to code healthcare seeking barriers and facilitators. These were compared across five decades from the 1980s to the present and were found to be similar. This suggested that excluding papers before 2010 was unlikely to miss barriers to healthcare seeking that are currently important to women. Exclusion of 73 studies prior to 2010 led to a total of 86 studies included in this review (33 quantitative, 48 qualitative, and 5 mixed methods). Figure  1 . shows the search results displayed in a PRISMA flow diagram.

figure 1

Prisma flow diagram

Overview of studies

The main characteristics of quantitative, qualitative, and mixed methods studies are available (Additional File 1 ). Broadly, 36.05% of papers were from Europe, 31.40% from North America, 20.93% from East Asia and Pacific, 6.98% from Middle East and North Africa, 2.33% Worldwide, and 1.16% from Latin America. The geographical representation of all included studies is shown in Table  5 . Participants in all studies were described as ‘women’ or ‘female’: whilst recognising that not everyone with female anatomy identifies as a woman, or female, we have used these terms throughout this paper.

Quantitative studies ( n  = 33) represented 19,185 female participants from nineteen high income countries. All but one study used cross sectional survey design with questionnaires, mostly using unvalidated, bespoke questions on healthcare seeking. Due to heterogeneity of methods, meta-analysis was not possible. One study used a discrete choice experiment to investigate if cost of care and appointment wait time affected healthcare seeking intentions with urinary tract infection (UTI) symptoms [ 76 ]. Stigmatised pelvic symptoms studied included urinary incontinence (18 papers), PFD (five papers), sexual dysfunction (three papers), prolapse (two papers), pelvic pain, urinary tract infection, uro-genital atrophy, constipation, and menstrual dysfunction (one paper each).

Qualitative ( n  = 48) and mixed methods ( n  = 5) studies represented views and experiences of 2,653 women collected through interviews, focus groups, and 216 blog posts, from fifteen high income countries. Only eight papers stated the theoretical framework on which their study was based. Stigmatised pelvic symptoms studied included urinary incontinence (19 papers), PFD (nine papers), prolapse, and pelvic pain (eight papers each), urinary dysfunction, and sexual dysfunction (six papers each), anal incontinence (two papers), and mixed urinary and anal incontinence (two papers).

Quality Assessment

Using the MMAT indicated that 19 of 33 quantitative papers lacked information about representativeness of the sample. Authors were contacted for clarification, with few responses. Ten quantitative papers lacked appropriateness of measures for the healthcare seeking element, possibly because healthcare seeking was often a secondary theme. 28 quantitative papers did not provide any, or enough information on reasons for non-participation, resulting in an uncertain risk of non-response bias. All MMAT criteria were met in 37 of the 48 qualitative papers. In five papers it could not be established if the findings were adequately derived from the data, and in nine papers there was not enough information to determine coherence between data sources, collection, analysis, and interpretation. The interpretation of results was not substantially derived from data in five papers. Data Richness Scale assessments showed 40 qualitative papers had reasonable to good amounts and depth of data. No papers were excluded based on their data richness score. Each theme was assessed for data ‘adequacy’. Most had only minor concerns meaning that there were many studies within a theme, some with only little or superficial data, but some more detailed and specific. Mixed methods papers met all the qualitative methodological quality criteria but there were limitations in quantitative methodological quality in all five studies, and in mixed methods methodological quality in all but one paper. Most frequently this was uncertainty about different components of the study adhering to the quality criteria of each tradition of the methods involved. Quality assessment of all studies using the MMAT is accessible (Additional File 2 ). The CERQual assessment of confidence in the evidence across the key themes was high, with no, or minor concerns about methodological limitations, coherence, relevance, and data adequacy. The results of quality assessment suggest the need for higher quality research in quantitative descriptive studies in this field, particularly to facilitate the assessment of risk of nonresponse bias.

Quantitative studies

The most cited barriers were coded as embarrassment, shame, and taboo, (18 papers) closely followed by participants expressing a lack of knowledge about where to seek healthcare, and about treatment options, with a low expectation of benefit (18 papers). Some participants indicated that they did not recognise their symptoms as a significant medical problem, or thought their symptoms were not troublesome enough to seek healthcare and deprioritised them (19 papers). Many thought their symptoms were normal, especially after childbirth, or with ageing (15 papers). Participants frequently reported that if their clinician asked at all, they were embarrassed, were not interested in, or would not take their pelvic symptoms seriously (14 papers). Others perceived their clinician was too busy and did not want to bother them about pelvic symptoms (5 papers). Fear of being examined, and of required investigations and treatment, were barriers (17 papers), with a few participants being fearful that their symptoms indicated more serious disease (3 papers). Waiting times, inconvenience, being too busy to attend, transport issues, religious, and cultural factors, language difficulties and service issues such as appointment delays, and cost, were all obstacles (21 papers). A less common barrier to seeking healthcare was a desire to cope or self-help (5 papers).

Facilitators for seeking healthcare most often included increased bother from pelvic symptoms (9 papers). Support from family and friends to seek healthcare (4 papers), and knowledge and learning about new treatments encouraged some participants (3 papers), whilst others only sought help due to stigma, embarrassment, self-blame, guilt, or depression about their pelvic symptoms (3 papers) or feared that their symptoms were indicative of serious disease (2 papers). Papers containing the key barriers and facilitators are referenced (Additional File 3 ).

Qualitative and mixed methods studies

Four themes encompassed women’s barriers to healthcare-seeking: (1) Stigma, (2) Women’s lack of knowledge (with three sub-themes of normalising, deprioritising, and fear), (3) Trivialising by clinicians, and (4) Inconvenience and cost of seeking healthcare.

Stigma this theme was a key barrier to help-seeking, encapsulating the frequently used codes, “embarrassing’, “ashamed’, and less often, ‘taboo” (30 papers).

“For me, I was embarrassed to speak to anybody, really, about it, for a long time. But now, I regret that I did that, because I left myself to a bad stage.” [prolapse]; [ 27 ] “You don’t know why, you feel sort of ashamed, you feel embarrassed to talk about it, as if you are somehow a failure, with guilt, you know?” [47 years with UI]; [ 57 ] “Yes. You can talk about almost anything else I think, all kinds of matters considering your genitals and. but not this, this I think is very taboo” [SUI] [ 59 ].

Embarrassment is the emotional impact from stigma, with shame also associated with stigma [ 120 ]. Stigma may be categorised as enacted or felt. Felt stigma may be internalised, perceived, and anticipated [ 121 ]. Internalised stigma was most often described by women seeking help with stigmatised pelvic symptoms, in the way they internalised negative beliefs and perceptions around their symptoms, expressed psychological distress, reduced self-worth, shame, and self-loathing [ 122 ]. Some participants expressed greater embarrassment to talk to a male clinician: “ …My GP is a handsome 40-year-old man, and I would not dream of [laughs] talking to him about anything like that!” [sexual dysfunction] [ 51 ], while others blamed themselves for their symptoms: “ When I was younger, I took a lot of laxatives, so I did this to myself ” [bowel leakage] [ 78 ], or felt self-disgust: “… I feel dirty and disgusted in myself already ” [bowel leakage] [ 115 ].

Lack of knowledge about symptoms in general caused many participants uncertainty over whether to seek healthcare (23 papers):

“You feel disoriented, you don’t know if it is normal or not, whether you should worry or not” [45 years with UI] [ 57 ]. “I did not know that happened to women. I did not know anything about it. I was scared because I didn’t know what it was.” [prolapse] [ 98 ]. “ How can you talk about something [when] you don’t even know what it is?” [bowel leakage] [ 78 ].

Three sub-themes related to ‘lack of knowledge’: normalising, deprioritising, and fear. ‘ Normalising ’: participants normalised pelvic symptoms as women, following childbirth, and with ageing, as something they should not seek medical help for (22 papers):

“I simply thought: the urinary incontinence is just part of it. Your whole body is turned inside out after delivery anyway. So I thought it’s just part of the game.’ ” [PFD] [ 12 ]. “I have some good friends, and my daughter. Well, they have the same problem. It’s age. That’s all we boil it down to is the age. Nothing you can do about it.” [urinary dysfunction] [ 85 ].

‘ Deprioritising ’ was developed from new codes in qualitative data relating to prioritising other things, avoiding, or denying pelvic symptoms, and low bother from symptoms, which was found across all data (19 papers):

“ We forget about ourselves a little. Everybody else comes first, and then later, me.” [PFD] [ 84 ]. Participants across a wide range of pelvic symptoms felt low symptom bother did not justify seeking help: “… it’s only a little bit, not like oh I’ve wet my pants” [urinary dysfunction] [ 47 ] and “ I just forget about it, because it’s not an every week thing.” [bowel leakage] [ 78 ].

‘ Fear ’ related to women’s lack of knowledge and information and included codes about fear of examinations, investigations, and treatments, and inappropriate fear of serious disease, all of which delayed seeking help (8 papers):

“To be exposed, that is something you don’t want to risk, so every time [examination] it is like a mental procedure, the sense of exposure. Well, it’s almost like an abuse, it is something you don’t want to do but you must.” [pelvic pain -endometriosis] [ 61 ]. “ I didn’t want to be put on some pill that would make me more constipated. Sometimes the cure is worse than the disease…” [bowel leakage] [ 78 ]. “ When your uterus or bladder falls, it is very dangerous. You can get cancer ” [PFD] [ 86 ].

‘ Trivialising ’ was a significant theme that grew around codes involving women’s relationship difficulties with their clinician (25 papers). A new code from qualitative data included in this theme was women feeling judged by clinicians if they mentioned pelvic symptoms. Women felt they were not being taken seriously, not being asked about symptoms, and perceived their clinician was embarrassed to discuss symptoms:

“ I told my doctor, I had urine loss all the time…you know what he said? Honestly, I will tell you…”wear a kotex”” [PFD] [ 86 ]. “ You’ve got a rectocele.’ ‘What is it?’ ‘Oh, you don’t need to know.’ Well, hey, if it’s to do with you, you’re the one person who needs to know about it. You shouldn’t be sort of kept like, ‘Oh, you’re a child being a nuisance. Go away. You don’t need to know.” [prolapse] [ 87 ]. “And then she also said that maybe I should learn to live with it, I thought that was a bit crazy. And ehm, that also made me think I did not feel taken seriously. Because I really thought, well, hello, I’m 20!” (22 yrs) [pelvic pain – vulvodynia] [ 52 ]. “The lack of urgency is real with OBGYNs. Maybe younger doctors are more open, but the attitude of older gynaecologists is to do what they did to me. He just gave me a pat on the butt and told me I could live with it.” [prolapse] [ 90 ].

‘Trivialising’ also included women expressing their perception that their clinicians lacked knowledge or training about pelvic symptoms, found in two quantitative and 12 qualitative and mixed methods studies:

“ The GP took me seriously, but in retrospect I think he didn’t have the knowledge…” [pelvic pain – vulvodynia] [ 52 ].

‘Inconvenience and cost of seeking healthcare ’ developed as a theme from overlapping codes in which women described a variety of cultural, gender or religious factors, as well as communication issues with their clinician, long waiting times at appointments making them difficult to fit in to everyday life, and for some, the cost of having to take time away from paid work, or childcare to attend, as barriers to help-seeking (15 papers). Codes around service issues were incorporated: the inconvenience women experienced to physically attend appointments or have treatment, delays in receiving an appointment for a particular service, and the cost of care, especially if they did not have health insurance (12 papers).

Facilitator codes only found in qualitative and mixed methods studies included clinicians taking women seriously, being open to uncertainty, asking about symptoms, and offering support, developing the new theme of ‘supportive clinician attitude’(18 papers), which was added to the themes of worsening symptoms, increasing women’s knowledge, and social support already found in quantitative studies.

Synthesis of all results

Table  6 shows how themes were developed from codes across the data. There was high certainty from the data that barriers and facilitators to healthcare seeking were similar across different stigmatised pelvic symptoms, countries, and research designs.

The extended CSM was applied to better explain these results by describing how women’s perceptions about, and interpretation of their symptoms influence their behaviours in relation to coping with those symptoms. Women’s interpretation of symptoms is influenced by the cognitive and emotional representations triggered by their symptoms, which may be influenced by previous experiences, and sociocultural factors. Accurate cognitive representation of the potential threat from pelvic symptoms requires women to know the identity, cause, consequences, cure/controllability, and likely timeline of their symptoms. Findings from this review suggest that women’s lack of knowledge, reported in 44 studies, and normalising of symptoms, reported in 37 studies, threaten identification of pelvic symptoms. Attribution of cause is threatened by women believing their symptoms are normal. In the early stages, the full consequences of pelvic symptoms may not be appreciated because initially symptoms cause low bother and are deprioritised and normalised. Conversely, some women delay healthcare seeking because they (usually incorrectly) fear serious disease because of their symptoms. Lack of knowledge of treatment options threatens appropriate representation of the timeline and cure/controllability of pelvic symptoms, with some women hoping for spontaneous resolution, whilst others believe their symptoms are incurable. Women’s ability to make sense of their perceptions (coherence) of symptoms is affected by a lack of knowledge, that disrupts women’s cognitive representation of their symptoms.

Women’s main emotional representation of the potential threat from pelvic symptoms is stigma (embarrassment, shame, and taboo). Cited in 52 studies, stigma was the most reported barrier to healthcare seeking, and to a lesser extent, fear: of examination, investigations, treatment, and serious disease.

Women’s treatment beliefs are affected by a lack of knowledge about treatment options, where to seek healthcare, and low expectations of treatment benefit, all delaying healthcare seeking. Women’s beliefs about seeking healthcare are influenced by sociocultural factors (subjective norms, perceived behavioural control), their own, and others’ attitudes. Attitudes of women seen in the data from this literature review indicated that women believed that seeking help for pelvic symptoms would cause them to feel stigmatised, that they would be judged, and their symptoms trivialised by their clinician, and that clinicians would normalise their symptoms, possibly due to a perception that clinicians lacked knowledge and training about pelvic symptoms. Subjective norms define what women believe others would do if they had pelvic symptoms: our data suggest the subjective norms are to normalise and deprioritise their own symptoms, cope, and feel stigmatised. Women’s perceived behavioural control over pelvic symptoms is reduced by lack of knowledge, service issues, and is affected by inappropriate self-help and coping. The key themes from help seeking barriers mapped to the CSM are shown in Fig.  2 .

figure 2

Using the extended CSM to explain barriers to healthcare seeking with stigmatised pelvic symptoms

In summary, women’s cognitive and emotional representations, treatment beliefs, and beliefs about help-seeking affect their ability to manage their pelvic symptoms. The data show how coping (Stage 2. CSM) is affected by women’s lack of knowledge, causing (mis-)interpretation of their symptoms, and leading them to display either ‘approach-oriented coping’ through inappropriate self-help, such as relying on sanitary pads for incontinence, or ‘avoidant-oriented coping’ procedures such as normalising, and deprioritising symptoms, instead of seeking help. Women appear to become stuck in a maladaptive, distressed loop between the interpretation and coping stages of the CSM, because iterative interpretation of their symptom perceptions, and the social messages they gather about seeking help with pelvic symptoms, reinforce the stigma of, and their lack of knowledge about symptoms. For many women, it was only worsening impact from symptoms and fear of more serious disease that pushed them to seek healthcare. There were a small number of voices (six papers), who believed they should assert themselves to take responsibility to ask for professional help, but the majority of women suggested that a supportive attitude from their clinician, especially to ask women about pelvic symptoms, would facilitate seeking healthcare for stigmatised pelvic symptoms.

This is the first review which covers such a wide range of stigmatised pelvic symptoms, to our knowledge. The principal findings of this mixed methods systematic literature review are that stigma (embarrassment, shame, and taboo), lack of knowledge, and women feeling ‘trivialised’ by clinicians, are definitive barriers to seeking help. Using a health psychology model (CSM) contributed to understanding how the emotional representations (stigma) and cognitive representations (lack of knowledge) particularly affect identification of pelvic symptoms, and clinician behaviour. Referring again to theory, Dixon-Woods, Cavers [ 123 ] described the construct of Candidacy, to explain how, influenced by their context, other people, and sociocultural issues, individuals negotiate their eligibility for healthcare between themselves and healthcare services, in an iterative cycle. When someone seeks healthcare, they assert their candidacy, which is then judged by clinicians (‘Adjudication’), either helping, or hindering their healthcare journey. In the case of seeking help with pelvic symptoms, stigma, women’s lack of knowledge, and their experiences leading to an expectation of their symptoms being trivialised, combine to make women’s candidacy for healthcare unclear. If clinicians lack knowledge and training about pelvic symptoms, they may trivialise, normalise, or judge symptoms, and so adjudicate against women’s healthcare seeking attempts.

‘Unclear candidacy’ is proposed as the overarching theme for this synthesis. The connection between the Candidacy model and the CSM’s illness representations was demonstrated in a paper exploring access to, and experiences of healthcare services [ 124 ]. This connection helps to understand the voices of women seeking help in this analysis: Stigma, lack of knowledge, and feeling trivialised by clinicians were the key factors affecting women’s identification of themselves as candidates for healthcare. Women both judge their own symptoms and feel judged by clinicians as unsuitable, or unworthy, to seek help for stigmatised pelvic symptoms. Women’s beliefs that if they seek healthcare they will not be taken seriously by clinicians, collude to frustrate their candidacy for healthcare. Our data show that women experience felt stigma, and enacted stigma from negative judgements by clinicians, further discriminating against women’s candidacy for healthcare with stigmatised pelvic symptoms. The facilitators that most often prompted women to seek healthcare were more knowledge about pelvic symptoms, worsening symptoms, and feeling that their clinician was supportive, especially in asking specifically about pelvic symptoms. This suggests that women who believe their clinician will have a supportive attitude are more likely to develop a positive emotional representation of their symptoms and will more likely seek healthcare. Increasing women’s knowledge would help them to appropriately identify the cognitive representation of threat posed by their symptoms, to decide if they can appropriately self-manage their symptoms or need to seek professional help.

The strengths of our review are the inclusion of a wide range of carefully considered, stigmatised pelvic symptoms, explored across many high-income countries, with rigorous application of eligibility criteria, and the use of theoretical models to explain the link between barriers and facilitators and help-seeking behaviours, allowing suggestion of possible targets for intervention. Selection bias was reduced by the ability to include studies published in English, German, French, Spanish, and Swedish. Ethnic representation where reported, was mostly white and also included Black, Hispanic and Asian women. The overall CERQual assessments of confidence [ 37 ] were high for the barriers to healthcare seeking found in our review, signifying issues common to women across stigmatised pelvic symptoms. Help-seeking barriers concur with those found in recent systematic literature reviews investigating experiences of individual, stigmatised pelvic symptoms: abnormal uterine bleeding [ 4 ], prolapse [ 10 , 125 ], and a recently published study exploring women’s experiences of PFD [ 93 , 126 ], and urogynaecological care for racial and ethnic minority women [ 127 ]. Stigma, and lack of knowledge were likewise barriers for those with urinary incontinence [ 128 , 129 ]. In a public survey, which was part of a call for evidence to inform the Women’s Health Strategy for England [ 130 ], published after commencement of this review, 84% of respondents said they had not been listened to by healthcare clinicians, which concurs with our findings, although not specific to pelvic health. Our finding that women perceived clinicians lacked knowledge and training (cited in 12 qualitative and mixed methods studies) was only found in one recent review relating to prolapse [ 10 ]. Our finding of women’s perception that clinicians normalise their pelvic symptoms (cited in seven qualitative and mixed methods studies), was only found in one review about abnormal uterine bleeding [ 4 ]. Few facilitators to healthcare seeking were reported in other reviews. Increased knowledge, social support and worsening symptoms were similarly found to encourage women to seek healthcare with PFD [ 10 , 127 , 131 ]. In contrast to others’ results, we found a large volume of qualitative data expressing the importance of a supportive clinician to facilitate women’s healthcare-seeking for pelvic symptoms. This may be due to the large number of women’s voices represented over a wide range of pelvic symptoms. It is likely to be an important consideration in developing future interventions.

We recognise limitations in this review. Although our search included many stigmatised pelvic symptoms, some relevant publications may have been missed, and not all symptoms were represented in the included literature. Grey literature was not investigated because we chose to include only peer reviewed studies to ensure a degree of rigour, and due to resource restrictions. Only women living in high income countries were included, to allow better understanding of barriers and facilitators in countries with similar economies to the UK, whilst recognising that the UK National Health Service is unique. Excluding studies published before 2010 is mitigated by thorough content analysis of the data in all quantitative studies concerning barriers and facilitators prior to exclusion, confirming that issues that currently concern women were unlikely to be missed. Quality appraisal using the MMAT was challenging because non-response bias was unclear in many quantitative studies, there was insufficient focus on healthcare seeking in ten papers, and few contacted authors responded to requests for clarification. Most included studies only captured the voices of women already seeking healthcare with symptoms: taking a public health approach to seek the concerns of all women may uncover further barriers and facilitators to seeking help for stigmatised pelvic symptoms not found in this review.

The findings of this review mean that efforts to encourage women to seek healthcare with pelvic symptoms need to target the barriers by reducing stigma, increasing knowledge, and supporting primary care clinicians to routinely discuss stigmatised pelvic symptoms with women. Changing the social norm so women believe they will be taken seriously if they seek healthcare is likely to empower them to appropriately manage their symptoms. Since this review began, there has been an explosion of interest and information about menopause, with celebrity endorsement in the UK [ 132 ], which along with the first ever UK Government Women’s Health Strategy [ 130 ], may help to normalise discussion of stigmatised pelvic health symptoms, reducing stigma. Clinicians at all levels, particularly in primary care, need to legitimise women’s candidacy for pelvic healthcare. This may require clinician education and training to better understand the significant effects of pelvic symptoms on women’s quality of life and wellbeing, to confidently educate women about their anatomy, their symptoms, and how to negotiate the healthcare system. Evidence informed, local pathways of care should be available and widely recognised to enable women to self-manage symptoms, when possible, to know when and where to seek help, and to expect to be supported by clinicians throughout their journey, with timely referral to specialist multidisciplinary services when required.

There are unanswered questions about facilitating early help-seeking in women with stigmatised pelvic symptoms: A few interventions have successfully increased pelvic health knowledge for a short duration [ 133 , 134 , 135 , 136 ], probably by improving cognitive representations of illness identity, but there is a lack of research targeting emotional representations to reduce the stigma of pelvic symptoms. Results from this systematic, mixed methods literature review suggest that changing stigma, knowledge, and beliefs about seeking help for pelvic symptoms will support women to identify their candidacy for healthcare, reduce normalising and deprioritising of symptoms, inappropriate self-help, and incorrect adjudication by clinicians who normalise and trivialise women’s pelvic symptoms. Future research needs to explore whether targeting both cognitive and emotional representations towards stigmatised pelvic symptoms, and the attitudes and norms women encounter, can encourage women to seek healthcare sooner. A successful intervention to raise awareness, reduce stigma and encourage women with stigmatised pelvic symptoms to seek timely healthcare could be used to better inform public health policy, reduce unnecessary surgical costs, and work towards meeting the United Nations Sustainable Development Goals core target 3.7 by 2030 [ 1 ].

Abbreviations

Cochrane Database of Systematic Reviews

Cochrane Central Register of Controlled Trials

Common-Sense Model of Self-Regulation of Illness and Behaviour

Female Genital Mutilation

Grading of Recommendations Assessment, Development and Evaluation - Confidence in Evidence from Reviews of Qualitative research

Human Immunodeficiency Virus/ Acquired Immunodeficiency Syndrome

Human Papilloma Virus

Mixed Methods Appraisal Tool

Prevalence And Bias Adjusted Kappa

Pelvic Floor Dysfunction

Peer Review of Electronic Search Strategies

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Sustainable Development Goals

Sample, Phenomenon of Interest, Design, Evaluation, Research type, search structure

Urinary Incontinence

United Kingdom

United Nations

United States

Le Blanc D. Towards integration at last? The Sustainable Development Goals as a network of targets. Sustain Dev. 2015;23(3):176–87.

Article   Google Scholar  

From a whisper to a roar. Tackling taboos in women’s health [press release]. London, United Kingdom: Wellbeing of Women2021.

Royal College of Obstetricians and Gynaecologists. Better for women. Improving the health and wellbeing of girls and women. London: United Kingdom; 2019.

Google Scholar  

Henry C, Ekeroma A, Filoche S. Barriers to seeking consultation for abnormal uterine bleeding: systematic review of qualitative research. BMC Womens Health. 2020;20(1):123.

Article   PubMed   PubMed Central   Google Scholar  

Tinetti A, Weir N, Tangyotkajohn U, Jacques A, Thompson J, Briffa K. Help-seeking behaviour for pelvic floor dysfunction in women over 55: drivers and barriers. Int Urogynecol J. 2018;29(11):1645–53.

Article   PubMed   Google Scholar  

Hagen S, Stark D, Glazener C, Dickson S, Barry S, Elders A, et al. Individualised pelvic floor muscle training in women with pelvic organ prolapse (POPPY): a multicentre randomised controlled trial. Lancet. 2014;383:796–06.

Bø K. Pelvic floor muscle training in treatment of female stress urinary incontinence, pelvic organ prolapse and sexual dysfunction. World J Urol. 2012;30(4):437–43.

NICE. Pelvic floor dysfunction: prevention and non-surgical management (NG210). Guideline. nice.org.uk: NICE. 2021. Contract No.: NG210.

Ghetti C, Skoczylas LC, Oliphant SS, Nikolajski C, Lowder JL. The emotional burden of pelvic organ prolapse in women seeking treatment: a qualitative study. Female Pelvic Med Reconstr Surg. 2015;21(6):332–8.

Toye F, Pearl J, Vincent K, Barker K. A qualitative evidence synthesis using meta-ethnography to understand the experience of living with pelvic organ prolapse. Int Urogynecol J. 2020:1–14.

Dunivan GC, Anger JT, Alas A, Wieslander C, Sevilla C, Chu S, et al. Pelvic organ prolapse: a disease of silence and shame. Female Pelvic Med Reconstr Surg. 2014;20(6):322–7.

Buurman MBR, Lagro-Janssen A. Women’s perception of postpartum pelvic floor dysfunction and their help-seeking behaviour: a qualitative interview study. Scand J Caring Sci. 2013;27(2):406–13.

Sang K, Remnant J, Calvard T, Myhill K, Blood, Work. Managing menstruation, menopause and gynaecological health conditions in the workplace. Int J Environ Res Public Health. 2021;18(4):1951.

Cooper J, Annappa M, Quigley A, Dracocardos D, Bondili A, Mallen C. Prevalence of female urinary incontinence and its impact on quality of life in a cluster population in the United Kingdom (UK): a community survey. Prim Health care Res Dev. 2015;16(4):377–82.

Lane GI, Hagan K, Erekson E, Minassian VA, Grodstein F, Bynum J. Patient-provider discussions about urinary incontinence among older women. Journals Gerontol Ser A: Biol Sci Med Sci. 2021;76(3):463–9.

Raasthøj I, Elnegaard S, Rosendal M, Jarbøl DE. Urinary incontinence among women—which personal and professional relations are involved? A population-based study. Int Urogynecol J. 2019;30(9):1565–74.

Brown HW, Wexner SD, Lukacz ES. Factors associated with care seeking among women with accidental bowel leakage. Female Pelvic Med Reconstr Surg. 2013;19(2):66–71.

Rada MP, Jones S, Falconi G, Milhem Haddad J, Betschart C, Pergialiotis V, et al. A systematic review and meta-synthesis of qualitative studies on pelvic organ prolapse for the development of core outcome sets. Neurourol Urodyn. 2020;39(3):880–9.

Cameron L, Leventhal EA, Leventhal H. Symptom representations and affect as determinants of care seeking in a community-dwelling, adult sample population. Health Psychol. 1993;12:171–9.

Article   CAS   PubMed   Google Scholar  

Leventhal H, Brissette I, Leventhal EA. The common sense model of self-regulation of health and illness. In: Cairncron LD, Leventhal H, editors. The self-regulation of health and illness behaviour. London: Routledge, Taylor and Francis Group; 2003. pp. 42–60.

Leventhal H, Phillips LA, Burns E. The common-sense model of self-regulation (CSM): a dynamic framework for understanding illness self-management. J Behav Med. 2016;39(6):935–46.

Hagger MS, Orbell S. The common sense model of illness self-regulation: a conceptual review and proposed extended model. Health Psychol Rev. 2021:1–31.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. PLoS Med. 2021;18(3):e1003583.

Cooke A, Smith D, Booth A, Beyond PICO. The SPIDER tool for qualitative evidence synthesis. Qual Health Res. 2012;22(10):1435–43.

Munn Z, Stern C, Aromataris E, Lockwood C, Jordan Z. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. BMC Med Res Methodol. 2018;18.

Methley A, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14.

Abhyankar P, Uny I, Semple K, Wane S, Hagen S, Wilkinson J, et al. Women’s experiences of receiving care for pelvic organ prolapse: a qualitative study. BMC Womens Health. 2019;19:45–55.

Rasmussen JL, Ringsberg KC. Being involved in an everlasting fight—A life with postnatal faecal incontinence a qualitative study. Scand J Caring Sci. 2010;24(1):108–15.

Fortenberry JD. The effects of stigma on genital herpes care-seeking behaviours. Herpes. 2004;11(1):8–11.

PubMed   Google Scholar  

Hussein J, Ferguson L. Eliminating stigma and discrimination in sexual and reproductive health care: a public health imperative. Sex Reproductive Health Matters. 2019;27(3):1–5.

Donaldson RL, Meana M. Early dyspareunia experience in young women: confusion, consequences, and help-seeking barriers. J Sex Med. 2011;8(3):814–23.

McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C. PRESS peer review of electronic search strategies: 2015 Guideline Statement. J Clin Epidemiol. 2016;75:40–6.

Noyes J, Booth A, Flemming K, Garside R, Harden A, Lewin S, et al. Cochrane Qualitative and Implementation Methods Group guidance series -paper 3: methods for assessing methodological limitations, data extraction and synthesis, and confidence in synthesized qualitative findings. J Clin Epidemiol. 2018;97:49–58.

NICE. Methods for the development of NICE public health guidance. Process and methods [PMG4]. United Kingdom: NICE; 2012. 26 September 2012.

Hong Q, Fàbregues S, Bartlett G, Boardman F, Cargo M, Dagenais P, et al. The mixed methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. Education for Information; 2018.

Braun V, Clarke V. Thematic analysis: a practical guide. First ed. London: Sage; 2021.

Lewin SGC, Munthe-Kaas H, Carlsen B, Colvin CJ, Gülmezoglu M, et al. Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLoS Med. 2015;12:e1001895.

Ames H, Glenton C, Lewin S. Purposive sampling in a qualitative evidence synthesis: a worked example from a synthesis on parental perceptions of vaccination communication. BMC Med Res Methodol. 2019;19.

Carroll L, O’ Sullivan C, Doody CM, Perrotta C, Fullen BM. Pelvic organ prolapse: the lived experience. PLoS ONE. 2022;17.

Cumming G, Currie P, Moncur HD, Lee R. Web-based survey on the effect of digital story telling on empowering women to seek help for uro-genital atrophy. Menopause Int. 2010;16(2):51–5.

Drennan V, Goodman C, Norton C, Wells A. Incontinence in women prisoners: an exploration of the issues. J Adv Nurs (John Wiley Sons Inc). 2010;66(9):1953–67.

Hinchliff S, Tetley J, Lee D, Nazroo J. Older adults’ experiences of sexual difficulties: qualitative findings from the English Longitudinal Study on Ageing (ELSA). J Sex Res. 2018;55(2):152–63.

Mapp F, Wellings K, Mercer CH, Mitchell K, Tanton C, Clifton S, et al. Help-seeking for genitourinary symptoms: a mixed methods study from Britain’s Third National Survey of sexual attitudes and lifestyles (Natsal-3). BMJ Open. 2019;9(10):e030612.

O’Malley D, Smith V, Higgins A. Sexual health issues postpartum-A mixed methods study of women’s help-seeking behavior after the birth of their first baby. Midwifery. 2021;104:103196.

Tudor KI, Eames S, Haslam C, Chataway J, Liechti MD, Panicker JN. Identifying barriers to help-seeking for sexual dysfunction in multiple sclerosis. J Neurol. 2018;265(12):2789–802.

Vethanayagam N, Orrell A, Dahlberg L, McKee KJ, Orme S, Parker SG, et al. Understanding help-seeking in older people with urinary incontinence: an interview study. Health Soc Care Commun. 2017;25(3):1061–9.

Wagg AR, Kendall S, Bunn F. Women’s experiences, beliefs and knowledge of urinary symptoms in the postpartum period and the perceptions of health professionals: a grounded theory study. Prim Health Care Res Dev. 2017;18(5):448–62.

Hinchliff S, Carvalheira AA, Štulhofer A, Janssen E, Hald GM, Træen B. Seeking help for sexual difficulties: findings from a study with older adults in four European countries. Eur J Ageing. 2020;17(2):185–95.

Jarbøl DE, Haastrup PF, Rasmussen S, Søndergaaard J, Balasubramaniam K. Women’s barriers for contacting their general practitioner when bothered by urinary incontinence: a population-based cross-sectional study. BMC Urol. 2021;21(1):99.

Róin Á, Nord C. Urine incontinence in women aged sixty to sixty-five: negotiating meaning and responsibility. Scand J Caring Sci. 2015;29(4):625–32.

Schaller S, Traeen B, Lundin Kvalem I. Barriers and facilitating factors in help-seeking: a qualitative study on how older adults experience talking about sexual issues with healthcare personnel. Int J Sex Health. 2020;32(2):65–80.

Leusink P, Steinmann R, Makker M, Lucassen PL, Teunissen D, Lagro-Janssen AL, et al. Women’s appraisal of the management of vulvodynia by their general practitioner: a qualitative study. Fam Pract. 2019;36(6):791–6.

Moossdorff-Steinhauser HFA, Berghmans BCM, Spaanderman MEA, Bols EMJ. Urinary incontinence during pregnancy: prevalence, experience of bother, beliefs, and help-seeking behavior. Int Urogynecol J. 2021;32(3):695–701.

Moossdorff-Steinhauser HFA, Berghmans BCM, Spaanderman MEA, Bols EMJ. Urinary incontinence 6 weeks to 1 year post-partum: prevalence, experience of bother, beliefs, and help-seeking behavior. Int Urogynecol J. 2021.

Moossdorff-Steinhauser HFA, Houkes I, Berghmans BCM, Spaanderman MEA, Bols EMJ. Experiences of peri-partum urinary incontinence from a women’s and health care perspective: a qualitative study. Matern Child Health J. 2023.

Rutte A, Welschen LM, van Splunter MM, Schalkwijk AA, de Vries L, Snoek FJ, et al. Type 2 diabetes patients’ needs and preferences for care concerning sexual problems: a cross-sectional survey and qualitative interviews. J Sex Marital Ther. 2016;42(4):324–37.

Pintos-Díaz MZ, Alonso-Blanco C, Parás-Bravo P, Fernández-de-Las-Peñas C, Paz-Zulueta M, Fradejas-Sastre V, et al. Living with urinary incontinence: potential risks of women’s health? A qualitative study on the perspectives of female patients seeking care for the first time in a specialized center. Int J Environ Res Public Health. 2019;16:19.

Carsughi A, Santini S, Lamura G. Impact of the lack of integrated care for older people with urinary incontinence and their family caregivers: results from a qualitative pilot study in two large areas of the Marche Region. Ann Ist Super Sanita. 2019;55(1):26–33.

Björk A-B, Sjöström M, Johansson EE, Samuelsson E, Umefjord G. Women’s experiences of internet-based or postal treatment for stress urinary incontinence. Qual Health Res. 2014;24(4):484–93.

Grundström H, Alehagen S, Kjølhede P, Berterö C. The double-edged experience of healthcare encounters among women with endometriosis: a qualitative study. J Clin Nurs. 2018;27(1–2):205–11.

Grundström H, Danell H, Sköld E, Alehagen S. ‘A protracted struggle’ – a qualitative blog study of endometriosis healthcare experiences in Sweden. Australian J Adv Nurs. 2020;37(4).

Mirskaya M, Lindgren E-C, Carlsson I-M. Online reported women’s experiences of symptomatic pelvic organ prolapse after vaginal birth. BMC Womens Health. 2019;19(1):129.

Pakbaz M, Persson M, Löfgren M, Mogren I. A hidden disorder until the pieces fall into place’ - a qualitative study of vaginal prolapse. BMC Womens Health. 2010;10(1):18.

Pakbaz M, Rolfsman E, Mogren I, Löfgren M. Vaginal prolapse–perceptions and healthcare-seeking behavior among women prior to gynecological surgery. Acta Obstet Gynecol Scand. 2011;90(10):1115–20.

Schreiber Pedersen L, Lose G, Hoybye MT, Jurgensen M, Waldmann A, Rudnicki M. Predictors and reasons for help-seeking behavior among women with urinary incontinence. Int Urogynecol J. 2018;29(4):521–30.

Hayder D. The effects of urinary incontinence on sexuality: seeking an intimate partnership. J Wound Ostomy Cont Nurs. 2012;39(5):539–44.

Jurgensen M, Elsner SA, Pedersen LS, Luckert J, Faust EL, Rudnicki M, et al. I really thought nothing could be done: help-seeking behaviour among women with urinary incontinence. Int J Res Med Sci. 2015;3:826–35.

Gore-Gorszewska G. Why not ask the doctor? Barriers in help-seeking for sexual problems among older adults in Poland. Int J Public Health. 2020;65(8):1507–15.

Wójtowicz U, Płaszewska-Zywko L, Stangel-Wójcikiewicz K, Basta A. Barriers in entering treatment among women with urinary incontinence. Ginekologia Polska. 2014;85(5):342–7.

Elbiss HM, Osman N, Hammad FT. Social impact and healthcare-seeking behavior among women with urinary incontinence in the United Arab Emirates. Int J Gynaecol Obstet. 2013;122(2):136–9.

Hammad FT, Elbiss HM, Osman N. The degree of bother and healthcare seeking behaviour in women with symptoms of pelvic organ prolapse from a developing gulf country. BMC Womens Health. 2018;18(1):77–83.

Al-Badr A, Brasha H, Al-Raddadi R, Noorwali F, Ross S. Prevalence of urinary incontinence among Saudi women. Int J Gynaecol Obstet. 2012;117(2):160–3.

Alshammari S, Alyahya MA, Allhidan RS, Assiry GA, AlMuzini HR, AlSalman MA. Effect of urinary incontinence on the quality of life of older adults in Riyadh: medical and sociocultural perspectives. Cureus. 2020;12(11):e11599.

PubMed   PubMed Central   Google Scholar  

Alshenqeti AM, Almutairi RE, Keram AM. Impact of urinary incontinence on quality of life among women of childbearing age in Al Madinah Al Munawara, Saudi Arabia. Cureus. 2022;14(5):e24886.

Krissi H, Eitan R, Peled Y. The role of primary physicians in the diagnostic delay of lower urinary tract and pelvic organ prolapse symptoms. Eur J Obstet Gynecol Reproductive Biology. 2012;161(1):102–4.

Ahmed A, Fincham JE. Physician office vs retail clinic: patient preferences in care seeking for minor illnesses. Ann Fam Med. 2010;8(2):117–23.

Berger MB, Patel DA, Miller JM, Delancey JO, Fenner DE. Racial differences in self-reported healthcare seeking and treatment for urinary incontinence in community-dwelling women from the EPI Study. Neurourol Urodyn. 2011;30(8):1442–7.

Brown HW, Rogers RG, Wise ME. Barriers to seeking care for accidental bowel leakage: a qualitative study. Int Urogynecol J. 2017;28(4):543–51.

Chen CX, Shieh C, Draucker CB, Carpenter JS. Reasons women do not seek health care for dysmenorrhea. J Clin Nurs (John Wiley Sons Inc). 2018;27(1–2):e301–8.

Devendorf AR, Bradley SE, Barks L, Klanchar A, Orozco T, Cowan L. Stigma among veterans with urinary and fecal incontinence. Stigma Health. 2020.

Doshi AM, Van Den Eeden SK, Morrill MY, Schembri M, Thom DH, Brown JS. Women with diabetes: understanding urinary incontinence and help seeking behavior. J Urol. 2010;184(4):1402–7.

Dunivan GC, Komesu YM, Cichowski SB, Lowery C, Anger JT, Rogers RG. Elder American Indian women’s knowledge of pelvic floor disorders and barriers to seeking care. Female Pelvic Med Reconstr Surg. 2015;21(1):34–8.

Gambrah HA, Hagedorn JC, Dmochowski RR, Johnsen NV. Understanding sexual health concerns in women after traumatic pelvic fracture. Neurourol Urodyn. 2022;41(6):1364–72.

Hatchett L, Hebert-Beirne J, Tenfelde S, Lavender MD, Brubaker L. Knowledge and perceptions of pelvic floor disorders among African American and latina women. Female Pelvic Med Reconstr Surg. 2011;17(4):190–4.

Jackson CB, Botelho EM, Welch LC, Joseph J, Tennstedt SL. Talking with others about stigmatized health conditions: implications for managing symptoms. Qual Health Res. 2012;22(11):1468–75.

Jackson E, Hernandez L, Mallett VT, Montoya TI. Knowledge, perceptions, and attitudes toward pelvic organ prolapse and urinary incontinence in Spanish-speaking Latinas. Female Pelvic Med Reconstr Surg. 2017;23(5):324–8.

Low LK, Tumbarello JA. Falling out: authoritative knowledge and women’s experiences with pelvic organ prolapse. J Midwifery Women’s Health. 2012;57(5):489–94.

Mallett VT, Jezari AM, Carrillo T, Sanchez S, Mulla ZD. Barriers to seeking care for urinary incontinence in Mexican American women. Int Urogynecol J. 2018;29(2):235–41.

Mann J, Shuster J, Moawad N. Attributes and barriers to care of pelvic pain in university women. J Minim Invasive Gynecol. 2013;20(6):811–8.

Muller N. Pelvic organ prolapse: a patient-centred perspective on what women encounter seeking diagnosis and treatment. Australian New Z Cont J. 2010;16(3):70–80.

Siddiqui NY, Ammarell N, Wu JM, Sandoval JS, Bosworth HB. Urinary incontinence and health-seeking behavior among White, Black, and Latina women. Female Pelvic Med Reconstr Surg. 2016;22(5):340–5.

Smith FK, Agu I, Murarka S, Siddiqui G, Orejuela FJ, Muir TW, et al. Barriers to care affecting presentation to urogynecologists in a community setting. Female Pelvic Med Reconstr Surg. 2021;27(2):e368–71.

Vardeman J, Spiers A, Yamasaki J. Things are happening that I don’t understand: a narrative exploration of the chaos of living with pelvic floor disorders. Health Commun. 2022:1–9.

Waetjen LE, Xing G, Johnson WO, Melnikow J, Gold EB. Factors associated with reasons incontinent midlife women report for not seeking urinary incontinence treatment over 9 years across the menopausal transition. Menopause (New York NY). 2018;25(1):29–37.

Washington BB, Raker CA, Mishra K, Sung VW. Variables impacting care-seeking for pelvic floor disorders among African American women. Female Pelvic Med Reconstr Surg. 2013;19(2):98–102.

Welch LC, Botelho EM, Tennstedt SL. Race and ethnic differences in health beliefs about lower urinary tract symptoms. Nurs Res. 2011;60(3):165–72.

Welch LC, Taubenberger S, Tennstedt SL. Patients’ experiences of seeking health care for lower urinary tract symptoms. Res Nurs Health. 2011;34(6):496–507.

Wieslander CK, Alas A, Dunivan GC, Cichowski S, Rogers RG, Sevilla CMS, et al. Misconceptions and miscommunication among spanish-speaking and English-speaking women with pelvic organ prolapse. Int Urogynecol J Pelvic Floor Dysfunct. 2015;26(4):597–604.

Willis-Gray MG, Sandoval JS, Maynor J, Bosworth HB, Siddiqui NY. Barriers to urinary incontinence care seeking in White, Black, and Latina women. Female Pelvic Med Reconstr Surg. 2015;21(2):83–6.

Bascur-Castillo C, Araneda-Gatica V, Castro-Arias H, Carrasco-Portiño M, Ruiz-Cantero MT. Determinants in the process of seeking help for urinary incontinence in the Chilean health system. Int J Gynaecol Obstet. 2019;144(1):103–11.

Po-Ming YU, Chun-Hung YU. Help-seeking behaviour among women with urinary incontinence: a cross-sectional study in two gynaecology clinics. Hong Kong J Gynecol Obstet Midwifery. 2021;21(2):80–5.

Siu JY-m. Communicating under medical patriarchy: gendered doctor-patient communication between female patients with overactive bladder and male urologists in Hong Kong. BMC Womens Health. 2015;15(1):44.

Choi H, Park JY, Yeo JK, Oh MM, Moon DG, Lee JG, et al. Population-based survey on disease insight, quality of life, and health-seeking behavior associated with female urinary incontinence. Int Neurourol J. 2015;19(1):39–46.

Gwee KA, Setia S. Demographics and health care seeking behavior of Singaporean women with chronic constipation: implications for therapeutic management. Int J Gen Med. 2012;5:287–302.

Tanaka E, Momoeda M, Osuga Y, Rossi B, Nomoto K, Hayakawa M, et al. Burden of menstrual symptoms in Japanese women – an analysis of medical care-seeking behavior from a survey-based study. Int J Women’s Health. 2014;6:11–23.

Ng SF, Lok MK, Pang SM, Wun YT. Stress urinary incontinence in younger women in primary care: prevalence and opportunistic intervention. J Womens Health (Larchmt). 2014;23(1):65–8.

Chen H-C, Liu C-Y, Liao C-H, Tsao L-I. Self-perception of symptoms, medical help seeking, and self-help strategies of women with interstitial cystitis/painful bladder syndrome. Lower Urinary Tract Symptoms. 2020;12(3):183–9.

Wang Y-H, Chen S-H, Jou H-J, Tsao L-I. Doing the best to control. The experiences of Taiwanese women with lower urinary tract symptoms. Nurs Res. 2011;60(1).

Beaumont T, Tian E, Kumar S. It’s messing with my physical health. It’s messing with my sex life: Women’s perspectives about, and impact of, pelvic health issues whilst awaiting specialist care. Int Urogynecol J. 2022.

Cross W, Cant R, Manning D, McCarthy S. Addressing information needs of vulnerable communities about incontinence: a survey of ten CALD communities. Collegian. 2014;21(3):209–16.

Fileborn B, Lyons A, Heywood W, Hinchliff S, Malta S, Dow B, et al. Talking to healthcare providers about sex in later life: findings from a qualitative study with older Australian men and women. Australas J Ageing. 2017;36(4):E50–6.

Lamerton TJ, Mielke GI, Brown WJ. Urinary incontinence in young women: risk factors, management strategies, help-seeking behavior, and perceptions about bladder control. Neurourol Urodyn. 2020;39(8):2284–92.

Milroy T, Jacobs S, Frayne J. Impact of pelvic floor dysfunction in Aboriginal and Torres Strait Islander women attending an urban Aboriginal medical service. Aust N Z J Obstet Gynaecol. 2022;62(5):748–54.

Newton D, Bayly C, Fairley CK, Chen M, Keogh L, Temple-Smith M, et al. Women’s experiences of pelvic inflammatory disease: implications for health-care professionals. J Health Psychol. 2013;19(5):618–28.

Tucker J, Murphy EMA, Steen M, Clifton VL. Understanding what impacts on disclosing anal incontinence for women when comparing bowel-screening tools: a phenomenological study. BMC Womens Health. 2019;19(1):142.

Young K, Fisher J, Kirkman M. Partners instead of patients: women negotiating power and knowledge within medical encounters for endometriosis. Feminism Psychol. 2019;30(1):22–41.

TuiSamoa A, Heather M, Kruger J. Urinary incontinence in Pasifika women: a pilot focus group study. Australian New Z Cont J. 2022;28(1):4–8.

Gonzalez G, Vaculik K, Khalil C, Zektser Y, Arnold C, Almario CV, et al. Women’s experience with stress urinary incontinence: insights from social media analytics. J Urol. 2020;203:962–8.

Milner M, Gamble M, Barry-Kinsella C. Covid-19, pelvic health, and women’s voices: a descriptive study. Cont (Amsterdam Netherlands). 2022;1:100012.

Goffman E. Stigma: notes on the management of spoiled identity. New York: New York: Simon and Schuster; 1963.

Rai SS, Syurina EV, Peters RMH, Putri AI, Zweekhorst MBM. Non-communicable diseases-related stigma: a mixed-methods systematic review. Int J Environ Res Public Health. 2020;17:18.

Chaudoir SR, Earnshaw VA, Andel S. Discredited Versus Discreditable: understanding how shared and unique stigma mechanisms affect psychological and physical health disparities. Basic Appl Soc Psych. 2013;35(1):75–87.

Dixon-Woods M, Cavers D, Agarwal S, Annandale E, Arthur A, Harvey J, et al. Conducting a critical interpretive synthesis of the literature on access to healthcare by vulnerable groups. BMC Med Res Methodol. 2006;6(1):35.

Methley A, Campbell S, Cheraghi-Sohi S, Chew-Graham C. The value of the theoretical framework of candidacy in exploring access and experiences of healthcare services. Health Psychol Update. 2016;25(1):1–11.

Robinson D, Prodigalidad LT, Chan S, Serati M, Lozo S, Lowder J et al. International Urogynaecology Consultation Chap. 1 committee 4: patients’ perception of disease burden of pelvic organ prolapse. Int Urogynecol J. 2022.

Toye F, Dixon S, Izett-Kay M, Keating S, McNiven A. Exploring the experiences of people with urogynaecology conditions in the UK: a reflexive thematic analysis and conceptual model. BMC Womens Health. 2023;23(1):431.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ackenbom MF, Carter-Brooks CM, Soyemi SA, Everstine CK, Butters MA, Davis EM. Barriers to urogynecologic care for racial and ethnic minority women: a qualitative systematic review. Urogynecology. 2023;29(2).

Yan F, Xiao LD, Zhou K, Li Z, Tang S. Perceptions and help-seeking behaviours among community-dwelling older people with urinary incontinence: a systematic integrative review. J Adv Nurs. 2022;78(6):1574–87.

Vasconcelos CTM, Firmiano MLV, Oriá MOB, Vasconcelos Neto JA, Saboia DM, Bezerra LRPS. Women’s knowledge, attitude and practice related to urinary incontinence: systematic review. Int Urogynecol J. 2019;30(2):171–80.

Secretary of State for Health and Social Care. Women’s Health Strategy for England. CP 710. Health and Social Care. 2022. https://www.gov.uk/government/publications/womens-health-strategy-for-england/womens-health-strategy-for-england#introduction . Accessed 8 Aug 2022.

Mou T, Gonzalez J, Gupta A, O’Shea M, Thibault MD, Gray EL, et al. Barriers and promotors to health service utilization for pelvic floor disorders in the United States: systematic review and meta-analysis of qualitative and quantitative studies. Urogynecology. 2022;28(9):574–81.

Finestripe Productions. Davina McCall: Sex, myths and the menopause. In: Sands L, editor. 2021.

Hebert-Beirne JM, O’Conor R, Ihm JD, Parlier MK, Lavender MD, Brubaker L. A pelvic health curriculum in school settings: the effect on adolescent females’ knowledge. J Pediatr Adolesc Gynecol. 2017;30(2):188–92.

Berzuk K, Shay B. Effect of increasing awareness of pelvic floor muscle function on pelvic floor dysfunction: a randomized controlled trial. Int Urogynecol J. 2015;26(6):837–44.

Hyakutake MT, Han V, Baerg L, Koenig NA, Cundiff GW, Lee T, et al. Pregnancy-associated pelvic floor health knowledge and reduction of symptoms: the PREPARED randomized controlled trial. J Obstet Gynecol Can. 2018;40(4):418–25.

Myers EM, Robinson BL, Geller EJ, Wells E, Matthews CA, Fenderson JL, et al. Randomized trial of a web-based tool for prolapse: impact on patient understanding and provider counseling. Int Urogynecol J. 2014;25(8):1127–32.

Download references

Acknowledgements

Simon Alberici, Library and Knowledge Manager, Government of Jersey Health and Community Services, for applying the PRESS checklist; Melanie Dembinsky, Research Fellow, Faculty of Health Sciences and Sport, University of Stirling, for independently screening all titles and abstracts; Carol-Anne Walker, MRes student for co-reviewing twenty percent of full texts.

Author information

Authors and affiliations.

Faculty of Health Sciences and Sport, University of Stirling, Stirling, Scotland

Clare Jouanny

Department of Psychology, University of Stirling, Stirling, Scotland

Purva Abhyankar

The Nursing, Midwifery and Allied Health Professions Research Unit, University of Stirling, Stirling, Scotland

Margaret Maxwell

You can also search for this author in PubMed   Google Scholar

Contributions

C.J. is guarantor of the review and substantially contributed to the concept, design and draft of the review protocol, search strategy, collection, and interpretation of data. P.A. substantially contributed to the concept, design, and draft of the review. M.M. substantially contributed to the design and draft of the review. All authors have approved this submitted version and are personally accountable for their contributions.

Corresponding author

Correspondence to Clare Jouanny .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jouanny, C., Abhyankar, P. & Maxwell, M. A mixed methods systematic literature review of barriers and facilitators to help-seeking among women with stigmatised pelvic health symptoms. BMC Women's Health 24 , 217 (2024). https://doi.org/10.1186/s12905-024-03063-6

Download citation

Received : 24 October 2023

Accepted : 29 March 2024

Published : 03 April 2024

DOI : https://doi.org/10.1186/s12905-024-03063-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Pelvic symptoms
  • Help-seeking
  • Common-sense model

BMC Women's Health

ISSN: 1472-6874

quantitative research methods peer reviewed articles

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

48k Accesses

846 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

quantitative research methods peer reviewed articles

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

quantitative research methods peer reviewed articles

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

quantitative research methods peer reviewed articles

Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach

Lorenzo Pallante, Aigli Korfiati, … Marco A. Deriu

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

quantitative research methods peer reviewed articles

COMMENTS

  1. Quantitative and Qualitative Approaches to Generalization and Replication-A Representationalist View

    Hence, mixed methods methodology does not provide a conceptual unification of the two approaches. Lacking a common methodological background, qualitative and quantitative research methodologies have developed rather distinct standards with regard to the aims and scope of empirical science (Freeman et al., 2007). These different standards affect ...

  2. Recent quantitative research on determinants of health in high ...

    Background Identifying determinants of health and understanding their role in health production constitutes an important research theme. We aimed to document the state of recent multi-country research on this theme in the literature. Methods We followed the PRISMA-ScR guidelines to systematically identify, triage and review literature (January 2013—July 2019). We searched for studies that ...

  3. Distinguishing Between Quantitative and Qualitative Research: A

    Maxwell J. A., Chmiel M, Rogers S (2015). Designing integration in mixed method and multi-method research. In Hesse-Biber S., Johnson R. B. (Eds.), Oxford handbook of multimethod and mixed methods research inquiry (pp. 223-239). Oxford, UK: Oxford University Press.

  4. Review Article Synthesizing Quantitative Evidence for Evidence-based

    The purpose of this paper is to introduce an overview of the fundamental knowledge, principals and processes in SR. The focus of this paper is on SR especially for the synthesis of quantitative data from primary research studies that examines the effectiveness of healthcare interventions. To activate evidence-based nursing care in various ...

  5. Reviewing the research methods literature: principles and strategies

    The conventional focus of rigorous literature reviews (i.e., review types for which systematic methods have been codified, including the various approaches to quantitative systematic reviews [2-4], and the numerous forms of qualitative and mixed methods literature synthesis [5-10]) is to synthesize empirical research findings from multiple ...

  6. Paradigmatic Compatibility Matters: A Critical Review ...

    Mixed methods research was initially defined as research designs that involved "at least one quantitative method (designed to collect numbers) and one qualitative method (designed to collect words), where neither type of method is inherently linked to any particular inquiry paradigm" (Greene et al., 1989, p. 256).During the 1990s, advocates of mixed methods research argued that this type ...

  7. Quantitative Research

    Quantitative research methods are concerned with the planning, design, and implementation of strategies to collect and analyze data. Descartes, the seventeenth-century philosopher, suggested that how the results are achieved is often more important than the results themselves, as the journey taken along the research path is a journey of discovery. . High-quality quantitative research is ...

  8. The Methodological Underdog: A Review of Quantitative Research in the

    Differences in methodological strengths and weaknesses between quantitative and qualitative research are discussed, followed by a data mining exercise on 1,089 journal articles published in Adult Education Quarterly, Studies in Continuing Education, and International Journal of Lifelong Learning. A categorization of quantitative adult education ...

  9. Quantitative research artifacts as qualitative data collection

    This sequential explanatory mixed methods research study, as defined by Creswell and Plano Clark (2017), had two data strands: Phase 1 - a quantitative data strand and Phase 2 - a qualitative data strand.While the data collected and the order in which it was collected aligns with Creswell and Plano Clark's sequential explanatory classification of mixed methods research, our mixed methods ...

  10. Quantitative measures of health policy implementation determinants and

    There were several inclusion criteria: (1) empirical studies of the implementation of public policies already passed or approved that addressed physical or behavioral health, (2) quantitative self-report or archival measurement methods utilized, (3) published in peer-reviewed journals from January 1995 through April 2019, (4) published in the ...

  11. 35388 PDFs

    A bibliometric review of coach leadership studies. Explore the latest full-text research PDFs, articles, conference papers, preprints and more on QUANTITATIVE RESEARCH METHODS. Find methods ...

  12. Quantitative data collection approaches in subject-reported oral health

    This scoping review reports on studies that collect survey data using quantitative research to measure self-reported oral health status outcome measures. ... peer-reviewed articles, (2) papers published between 2011 and 2021, (3) only studies using quantitative methods, and (4) containing outcome measures of self-assessed oral health status ...

  13. (PDF) Quantitative Research Methods : A Synopsis Approach

    Abstract. The aim of th is study i s to e xplicate the quanti tative methodology. The study established that. quantitative research de als with quantifying and analyzing variables in o rder to get ...

  14. Quantitative measures used in empirical evaluations of mental health

    The inclusion criteria were: (1) empirical study (including study protocols) of the implementation of public mental health policies (i.e., "Big P" policies) already passed or approved; (2) use of a quantitative measurement tool (i.e., measure, questionnaire, survey, scale); (3) inclusion of at least one implementation determinant or outcome ...

  15. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  16. Peer-reviewed Quantitative Research

    Methods: counting, measuring, quantifying (e.g. Likert scale) Objective; Tests a theory; How to Find Peer-reviewed Quantitative Research Articles. In CINAHL and MEDLINE, to find Peer-reviewed Quantitative Research articles, add several of the following subject terms to your search: CINAHL terms:

  17. Quantifying possible bias in clinical and epidemiological studies with

    Examples are given to describe and illustrate methods of quantitative bias analysis. ... Please note that it may take up to 5 days for the peer review documents to appear. For research papers The BMJ has fully open peer review. This means that accepted research papers published from early 2015 onwards usually have their prepublication history ...

  18. Barriers and enablers to the implementation of patient-reported outcome

    Findings will be disseminated through publications in peer-reviewed journals in the fields of implementation, medicine, as well as health services, and policy research. We will also disseminate results through relevant conferences and social media using different strategies (e.g., graphical abstract).

  19. Clarification of research design, research methods, and research

    Although the existence of multiple approaches is a powerful source in the development of a research design, new public administration (PA) researchers and students may see it as a source of confusion because there is a lack of clarity in the literature about the approaches to research design, research methods, and research methodology in the ...

  20. Exploring differences in the utilization of the emergency department

    The study selection process relied on the following inclusion criteria: a) the study included a comparison between migrants and non-migrants regarding the utilization of the ED; b) the study relied on data collected over the period 2012 - March 2023; c) the study was original research, adopting either a quantitative, qualitative or a mixed ...

  21. A mixed methods systematic literature review of barriers and

    The 'research type' included peer reviewed, published, qualitative, quantitative, or mixed methods primary studies, set in high-income countries only. ... (one qualitative, three quantitative and one mixed methods) with five core criteria. Information about which areas of a study were problematic are reported, rather than summative scores ...

  22. Predicting and improving complex beer flavor through machine ...

    GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important ...

  23. Quantitative Data Analysis—In the Graduate Curriculum

    Teaching quantitative data analysis is not teaching number crunching, but teaching a way of critical thinking for how to analyze the data. The goal of data analysis is to reveal the underlying patterns, trends, and relationships of a study's contextual situation. Learning data analysis is not learning how to use statistical tests to crunch ...

  24. Behind the Numbers: Questioning Questionnaires

    In quantitative organizational research, perhaps more so than in qualitative research, the method itself is a widely accepted guarantor of its own quality. Whether criteria such as reliability and internal, external, and construct validity are really met in the practice of conducting research, however, may remain debatable.