University of Jamestown Library Guides banner

Evidence-Based Practice: Research Guide

5 steps of ebp.

  • Find: Databases for EBP
  • Appraise: Building an Evidence Table
  • Additional Resources

Online Research & Instruction Librarian

Profile Photo

Mon, Tues, Wed: 1p - 4p

Thurs & Sun: 6p - 9p

Please use the "Meet With Me" button to book your appointment.

  • Ask : Convert the need for information into an answerable question.
  • Find : Track down the best evidence with which to answer that question.
  • Appraise : Critically appraise that evidence for its validity and applicability.
  • Apply : Integrate the critical appraisal with clinical expertise and with the patient's unique biology, values, and circumstances.
  • Evaluate : Evaluate the effectiveness and efficiency in executing steps 1-4 and seek ways to improve them both for next time.

1. ASK: Using PICO

Formulating a strong clinical question is the first step in the research process.  PICO is a way of building clinical research questions that allow you to focus your research, and to create a query that better matches most medical databases.

  • Patient – Describe your patient or population.  What are the most important characteristics?  Include information on age, race, gender, medical conditions, etc.
  • Intervention – What is the main intervention or therapy you are considering? The can be as general as treat or observe, or as specific as a specific test or therapy.
  • Comparison Intervention – An alternative intervention or therapy you wish to compare to the first.
  • Outcome – What are you trying to do for the patient?  What is the clinical outcome?  What are the relevant outcomes?

Example: In a (describe patient) can (intervention A) affect (outcome) compared with (intervention B)?

  • Patient - 50 yr old man with diabetes
  • Intervention - weight loss and exercise
  • Comparison - medication
  • Outcome - maintaining blood sugar levels

2. FIND: Formulate a Search Strategy

Think about the keywords for each of the PICO parts of the clinical question.

Sample Question: Is prophylactic physical therapy for patients undergoing upper abdominal surgery effective in preventing post-operative pulmonary complications?

The PICO parts with keywords for this question would look like this:

You might also see PICO with an added T. The T often stands for either “time” or “type of study.” Time helps you consider the timeframe of an intervention or outcome, while type of study is a way to define the types or levels of evidence that you will need in order to answer your question. 

Databases for EBP Research

3. APPRAISE: Evidence & Evaluation

Different types of information provide different standards or levels of evidence. These levels depend on things like a study's design, objectives, and review process. You may be familiar with a pyramid diagram showing a hierarchy of types of evidence. Often included in pyramids of evidence are the following types of information: 

chart displaying different types of evidence

  • Clinical practice guidelines—recommendations for applying current medical knowledge (or evidence) to the treatment and care of a patient. 
  • Meta-analyses and systematic reviews—an approach to literature reviews that identifies all studies addressing a given research question based on specific inclusion criteria and analyzes the results of each study to produce a summary result. 
  • Randomized controlled trials (RCTs)—eligible participants are randomly assigned to study groups to test a treatment against a control group. In blinded trials, the participants and researchers do not know which study group participants have been assigned to. 
  • Cohort studies—follow a group of subjects over a period of time to determine the incidence or identify predictors of a certain condition. 
  • Case-control studies—compare two groups of subjects, one with the outcome and one without, to identify predictor variables associated with the outcome. 
  • Case reports/series, expert opinions, and editorials—reports on individual cases with no control groups involved, opinions based on one person’s experience and expertise 
  • Animal and laboratory studies—studies that do not involve humans 

The pyramid hierarchy places some types of evidence above others in terms of validity, objectivity, and transferability. It’s important to remember, however, that the best type of evidence to answer your research question depends on the nature of your question and what purpose you have for searching for evidence in the first place. Conducting a literature review, for example, is a very different situation than searching for an answer to a specific question about a particular case, patient, or situation. 

Evaluation Criteria:

  • Credibility (Internal Validity)
  • Transferability (External Validity)
  • Dependability (Reliability)
  • Confirmability (Objectivity)

Credibility: looks at truth and quality and asks, "Can you believe the results?"

Some questions you might ask are: Were patients randomized? Were patients analyzed in the groups to which they were (originally) randomized? Were patients in the treatment and control groups similar with respect to known prognostic factors?

Transferability: looks at external validity of the data and asks, "Can the results be transferred to other situations?"

Some questions you might ask are: Were patients in the treatment and control groups similar with respect to known prognostic factors? Was there a blind comparison with an independent gold standard? Were objective and unbiased outcome criteria used? Are the results of this study valid?

Dependability: looks at consistency of results and asks, "Would the results be similar if the study was repeated with the same subjects in a similar context?"

Some questions you might ask are: Aside from the experimental intervention, were the groups treated equally? Was follow-up complete? Was the sample of patients representative? Were the patients sufficiently homogeneous with respect to prognostic factors?

Confirmability: looks at neutrality and asks, "Was there an attempt to enhance objectivity by reducing research bias?"

Some questions you might ask are: Were 5 important groups (patients, care givers, collectors of outcome data, adjudicators of outcome, data analysis) aware of group allocations? Was randomization concealed?

4. APPLY: Use Evidence in Clinical Practice

research evidence is best evaluated using which type of process

Other good resources for both appraisal and applying evidence in clinical practice can be found on these two websites:

  • KT Clearinghouse/Centre for Evidence-Based Medicine, Toronto
  • Centre for Evidence Based Medicine, University of Oxford

5. EVALUATE: Look at Your Performance

Ask yourself:

  • Did you ask an answerable clinical question?
  • Did you find the best external evidence?
  • Did you critically appraise the evidence and evaluate it for its validity and potential usefulness?
  • Did you integrate critical appraisal of the best available external evidence from systematic research with individual clinical expertise in personal daily clinical practice?
  • What were the outcomes of your application of the best evidence for your patient(s)?
  • << Previous: Start Here
  • Next: Find: Databases for EBP >>
  • Last Updated: Feb 25, 2024 5:23 PM
  • URL: https://libguides.uj.edu/ebp

East Carolina University Libraries

  • Joyner Library
  • Laupus Health Sciences Library
  • Music Library
  • Digital Collections
  • Special Collections
  • North Carolina Collection
  • Teaching Resources
  • The ScholarShip Institutional Repository
  • Country Doctor Museum

Evidence-Based Practice for Nursing: Evaluating the Evidence

  • What is Evidence-Based Practice?
  • Asking the Clinical Question
  • Finding Evidence
  • Evaluating the Evidence
  • Articles, Books & Web Resources on EBN

Evaluating Evidence: Questions to Ask When Reading a Research Article or Report

For guidance on the process of reading a research book or an article, look at Paul N. Edward's paper,  How to Read a Book  (2014) . When reading an article, report, or other summary of a research study, there are two principle questions to keep in mind:

1. Is this relevant to my patient or the problem?

  • Once you begin reading an article, you may find that the study population isn't representative of the patient or problem you are treating or addressing. Research abstracts alone do not always make this apparent.
  • You may also find that while a study population or problem matches that of your patient, the study did not focus on an aspect of the problem you are interested in. E.g. You may find that a study looks at oral administration of an antibiotic before a surgical procedure, but doesn't address the timing of the administration of the antibiotic.
  • The question of relevance is primary when assessing an article--if the article or report is not relevant, then the validity of the article won't matter (Slawson & Shaughnessy, 1997).

2. Is the evidence in this study valid?

  • Validity is the extent to which the methods and conclusions of a study accurately reflect or represent the truth. Validity in a research article or report has two parts: 1) Internal validity--i.e. do the results of the study mean what they are presented as meaning? e.g. were bias and/or confounding factors present? ; and 2) External validity--i.e. are the study results generalizable? e.g. can the results be applied outside of the study setting and population(s) ?
  • Determining validity can be a complex and nuanced task, but there are a few criteria and questions that can be used to assist in determining research validity. The set of questions, as well as an overview of levels of evidence, are below.

For a checklist that can help you evaluate a research article or report, use our checklist for Critically Evaluating a Research Article

  • How to Critically Evaluate a Research Article

How to Read a Paper--Assessing the Value of Medical Research

Evaluating the evidence from medical studies can be a complex process, involving an understanding of study methodologies, reliability and validity, as well as how these apply to specific study types. While this can seem daunting, in a series of articles by Trisha Greenhalgh from BMJ, the author introduces the methods of evaluating the evidence from medical studies, in language that is understandable even for non-experts. Although these articles date from 1997, the methods the author describes remain relevant. Use the links below to access the articles.

  • How to read a paper: Getting your bearings (deciding what the paper is about) Not all published research is worth considering. This provides an outline of how to decide whether or not you should consider a research paper. more... less... Greenhalgh, T. (1997b). How to read a paper. Getting your bearings (deciding what the paper is about). BMJ (Clinical Research Ed.), 315(7102), 243–246.
  • Assessing the methodological quality of published papers This article discusses how to assess the methodological validity of recent research, using five questions that should be addressed before applying recent research findings to your practice. more... less... Greenhalgh, T. (1997a). Assessing the methodological quality of published papers. BMJ (Clinical Research Ed.), 315(7103), 305–308.
  • How to read a paper. Statistics for the non-statistician. I: Different types of data need different statistical tests This article and the next present the basics for assessing the statistical validity of medical research. The two articles are intended for readers who struggle with statistics more... less... Greenhalgh, T. (1997f). How to read a paper. Statistics for the non-statistician. I: Different types of data need different statistical tests. BMJ (Clinical Research Ed.), 315(7104), 364–366.
  • How to read a paper: Statistics for the non-statistician II: "Significant" relations and their pitfalls The second article on evaluating the statistical validity of a research article. more... less... Greenhalgh, T. (1997). Education and debate. how to read a paper: Statistics for the non-statistician. II: "significant" relations and their pitfalls. BMJ: British Medical Journal (International Edition), 315(7105), 422-425. doi: 10.1136/bmj.315.7105.422
  • How to read a paper. Papers that report drug trials more... less... Greenhalgh, T. (1997d). How to read a paper. Papers that report drug trials. BMJ (Clinical Research Ed.), 315(7106), 480–483.
  • How to read a paper. Papers that report diagnostic or screening tests more... less... Greenhalgh, T. (1997c). How to read a paper. Papers that report diagnostic or screening tests. BMJ (Clinical Research Ed.), 315(7107), 540–543.
  • How to read a paper. Papers that tell you what things cost (economic analyses) more... less... Greenhalgh, T. (1997e). How to read a paper. Papers that tell you what things cost (economic analyses). BMJ (Clinical Research Ed.), 315(7108), 596–599.
  • Papers that summarise other papers (systematic reviews and meta-analyses) more... less... Greenhalgh, T. (1997i). Papers that summarise other papers (systematic reviews and meta-analyses). BMJ (Clinical Research Ed.), 315(7109), 672–675.
  • How to read a paper: Papers that go beyond numbers (qualitative research) A set of questions that could be used to analyze the validity of qualitative research more... less... Greenhalgh, T., & Taylor, R. (1997). Papers that go beyond numbers (qualitative research). BMJ (Clinical Research Ed.), 315(7110), 740–743.

Levels of Evidence

In some journals, you will see a 'level of evidence' assigned to a research article. Levels of evidence are assigned to studies based on the methodological quality of their design, validity, and applicability to patient care. The combination of these attributes gives the level of evidence for a study.  Many systems for assigning levels of evidence exist.  A frequently used system in medicine is from the  Oxford Center for Evidence-Based Medicine .  In nursing, the system for assigning levels of evidence is often from Melnyk & Fineout-Overholt's 2011 book,  Evidence-based Practice in Nursing and Healthcare: A Guide to Best Practice .  The Levels of Evidence below are adapted from Melnyk & Fineout-Overholt's (2011) model.  

Graphic chart depicting Melnyk & Fineout-Overholt's Levels of Evidence model

Uses of Levels of Evidence : Levels of evidence from one or more studies provide the "grade (or strength) of recommendation" for a particular treatment, test, or practice. Levels of evidence are reported for studies published in some medical and nursing journals. Levels of Evidence are most visible in Practice Guidelines, where the level of evidence is used to indicate how strong a recommendation for a particular practice is. This allows health care professionals to quickly ascertain the weight or importance of the recommendation in any given guideline. In some cases, levels of evidence in guidelines are accompanied by a Strength of Recommendation.

About Levels of Evidence and the Hierarchy of Evidence : While Levels of Evidence correlate roughly with the hierarchy of evidence (discussed elsewhere on this page), levels of evidence don't always match the categories from the Hierarchy of Evidence, reflecting the fact that study design alone doesn't guarantee good evidence. For example, the systematic review or meta-analysis of randomized controlled trials (RCTs) are at the top of the evidence pyramid and are typically assigned the highest level of evidence, due to the fact that the study design reduces the probability of bias  ( Melnyk , 2011),  whereas the weakest level of evidence is the  opinion from authorities and/or reports of expert committees.  However, a systematic review may report very weak evidence for a particular practice and therefore the level of evidence behind a recommendation may be lower than the position of the study type on the Pyramid/Hierarchy of Evidence.

About Levels of Evidence and Strength of Recommendation : The fact that a study is located lower on the Hierarchy of Evidence does not necessarily mean that the strength of recommendation made from that and other studies is low--if evidence is consistent across studies on a topic and/or very compelling, strong recommendations can be made from evidence found in studies with lower levels of evidence, and study types located at the bottom of the Hierarchy of Evidence. In other words, strong recommendations can be made from lower levels of evidence.

For example: a case series observed in 1961 in which two physicians who noted a high incidence (approximately 20%) of children born with birth defects to mothers taking thalidomide resulted in very strong recommendations against the prescription and eventually, manufacture and marketing of thalidomide. In other words, as a result of the case series, a strong recommendation was made from a study that was in one of the lowest positions on the hierarchy of evidence.

Hierarchy of Evidence for Quantitative Questions

The pyramid below represents the hierarchy of evidence, which illustrates the strength of study types; the higher the study type on the pyramid, the more likely it is that the research is valid. The pyramid is meant to assist researchers in prioritizing studies they have located to answer a clinical or practice question. 

For clinical questions, you should try to find articles with the highest quality of evidence. Systematic Reviews and Meta-Analyses are considered the highest quality of evidence for clinical decision-making and should be used above other study types, whenever available, provided the Systematic Review or Meta-Analysis is fairly recent. 

As you move up the pyramid, fewer studies are available, because the study designs become increasingly more expensive for researchers to perform. It is important to recognize that high levels of evidence may not exist for your clinical question, due to both costs of the research and the type of question you have.  If the highest levels of study design from the evidence pyramid are unavailable for your question, you'll need to move down the pyramid.

While the pyramid of evidence can be helpful, individual studies--no matter the study type--must be assessed to determine the validity.

Hierarchy of Evidence for Qualitative Studies

Qualitative studies are not included in the Hierarchy of Evidence above. Since qualitative studies provide valuable evidence about patients' experiences and values, qualitative studies are important--even critically necessary--for Evidence-Based Nursing. Just like quantitative studies, qualitative studies are not all created equal. The pyramid below  shows a hierarchy of evidence for qualitative studies.

research evidence is best evaluated using which type of process

Adapted from Daly et al. (2007)

Help with Research Terms & Study Types: Cut through the Jargon!

  • CEBM Glossary
  • Centre for Evidence-Based Medicine|Toronto
  • Cochrane Collaboration Glossary
  • Qualitative Research Terms (NHS Trust)
  • << Previous: Finding Evidence
  • Next: Articles, Books & Web Resources on EBN >>
  • Last Updated: Jan 12, 2024 10:03 AM
  • URL: https://libguides.ecu.edu/ebn

National Academies Press: OpenBook

Bridging the Evidence Gap in Obesity Prevention: A Framework to Inform Decision Making (2010)

Chapter: 6 evaluating evidence, 6 evaluating evidence.

T he previous chapter describes an expanded perspective on the types of evidence that  can be used in decision making for interventions addressing obesity and other complex, systems-level population health problems. It presents a detailed typology of evidence that goes beyond the traditional simple evidence hierarchies that have been used in clinical practice and less complex public health interventions. This chapter focuses on the question of how one judges the quality of different types of evidence in making decisions about what interventions to undertake. The question is an important one not only because many of the interventions required to address obesity are complex, but also because the available evidence for such interventions comes from studies and program

evaluations that often are purposely excluded from systematic reviews and practice guidelines, in which studies are selected on the basis of the conventional hierarchies.

In the L.E.A.D. framework ( Figure 6-1 ), one begins with a practical question to be answered rather than a theory to be tested or a particular study design (Green and Kreuter, 2005; Sackett and Wennberg, 1997). A decision maker, say, a busy health department director or staff member, will have recognized a certain problem or opportunity and asked, “What should I do?” or “What is our status on this issue?” Either of these questions may be of interest only to this decision maker for the particular social, cultural, political, economic, and physical context in which he/she works, and the answer may have limited generalizability. This lack of generalizability may lead some in the academic community to value such evidence less than that from randomized controlled trials (RCTs). However, data that are contextually relevant to one setting are often more, not less, relevant and useful to decision makers in other settings than highly controlled trial data drawn from unrepresentative samples of unrepresentative populations, with highly trained personnel conducting the interventions under tightly supervised protocols (see Chapter 3 for further discussion).

FIGURE 6-1 The Locate Evidence, Evaluate Evidence, Assemble Evidence, Inform Decisions (L.E.A.D.) framework for obesity prevention decision making.

FIGURE 6-1 The L ocate Evidence, E valuate Evidence, A ssemble Evidence, Inform D ecisions (L.E.A.D.) framework for obesity prevention decision making.

NOTE: The element of the framework addressed in this chapter is highlighted.

The types of evidence that are used in local decision making, including the policy process, extend beyond research to encompass politics, economics, stakeholder ideas and interests, and general knowledge and information (see Chapter 3 ), and the decision maker needs to take a practical approach to incorporating this evidence into real-life challenges. Working from this expanded view of what constitutes relevant evidence and where to find it ( Chapter 5 ), this chapter describes an approach for evaluating these different types of evidence that is dependent on the question being asked and the context in which it arises.

Before proceeding, it is worth emphasizing that the L.E.A.D. framework is useful not only for decision makers and their intermediaries but also for those who generate evidence (e.g., scientists, researchers, funders, publishers), a point captured by the phrase “opportunities to generate evidence” surrounding the steps in the framework ( Figure 6-1 ). In fact, a key premise of the L.E.A.D. framework is that research generators need to give higher priority to the needs of decision makers in their research designs and data collection efforts. To this end, the use of the framework and the evaluation of evidence in the appropriate context will identify gaps in knowledge that require further investigation and research.

This chapter begins by reviewing several key aspects of the evaluation of evidence: the importance of the user perspective, the need to identify appropriate outcomes, and the essential role of generalizability and contextual considerations. After summarizing existing approaches to evaluating the quality of evidence, the chapter describes the general approach proposed by the committee. Finally, the chapter addresses the issue of the trade-offs that have to be made when the available evidence has limitations for answering the question(s) at hand—a particular concern for those who must make decisions about complex, multilevel public health interventions such as obesity prevention.

A USER’S PERSPECTIVE

The approach of “horses for courses” (Petticrew and Roberts, 2003) emphasizes that what constitutes best evidence varies with the question being addressed and that there is no value in forcing the same type of evidence to fit all uses. Once the question being asked is clear, users of the L.E.A.D. framework must either search for or generate (see Chapter 8 ) the kinds of evidence that will be helpful in answering that question. The next chapter describes how to assemble the evidence to inform decisions. For situations in which the evidence is inadequate, incomplete, and/or inconsistent, this chapter suggests ways to blend the best available evidence with less formal sources that can bring tacit knowledge and the experience of professionals and other stakeholders to bear.

A large number of individual questions can, of course, be raised by those undertaking efforts to address obesity or other complex public health challenges. Petticrew and Roberts (2003) place such questions into eight broad categories: effectiveness

(Does this work?), process of delivery (How does it work?), salience (Does it matter?), safety (Will it do more good than harm?), acceptability (Will people be willing to use the intervention?), cost-effectiveness (Is it worth buying this service?), appropriateness (Is this the right service/intervention for this group?), and satisfaction (Are stakeholders satisfied with the service?). To this categorization the committee has added such questions as How many and which people are affected? and What is the seriousness of the problem? In Chapter 5 , the committee adopts this approach but places these questions in the broad categories of “Why,” “What,” and “How” and gives a number of examples for each category (Tables 5-1 through 5-3 ).

Certain types of evidence derived from various study designs could be used to answer some of these questions but not others (Flay et al., 2005). For example, to ascertain the prevalence and severity of a condition and thus the population burden, one needs survey or other surveillance data, not an RCT. To ascertain efficacy, effectiveness, or cost-effectiveness, an RCT may be the best design. To understand how an intervention works, qualitative designs may be the most valuable and appropriate (MacKinnon, 2008). To assess the organizational adoption and practitioner implementation and maintenance of a practice, longitudinal studies of organizational policies and their implementation and enforcement (i.e., studies of quality improvement) may be needed.

As discussed in previous chapters, to assess interventions designed to control obesity at the community level or in real-world settings, RCTs may not be feasible or even possible, and other types of evidence are more appropriate (Mercer et al., 2007; Sanson-Fisher et al., 2007; Swinburn et al., 2005). To apply the terminology adopted for this report ( Chapter 5 ) (Rychetnik et al., 2004), for “Why” (e.g., burden of obesity) or in some cases “How” (e.g., translation of an intervention) questions, RCTs are not the appropriate study design. The same may be true even for some “What” questions (e.g., effectiveness of an intervention) that lend themselves more to formal intervention studies.

Also as discussed in previous chapters, decision makers need to recognize the interrelated nature of factors having an impact on the desired outcome of complex public health interventions. They should view an intervention in the context in which it will be implemented, taking a systems perspective (see Chapter 4 ). Such a perspective, which evolved from an appreciation of the importance of effectiveness in real-world conditions or natural settings (Flay, 1986), is clearly needed when decision makers evaluate generalizability, as well as level of certainty, in judging the quality of evidence (Green and Glasgow, 2006; Rychetik et al., 2004; Swinburn et al., 2005).

IDENTIFICATION OF APPROPRIATE OUTCOMES

Appropriate outcomes may be multiple and may be short-term, intermediate, or long-term in nature. Regardless, they should be aligned with user needs and interests. For policy makers, for example, the outcomes of interest may be those for which they

will be held accountable, which may or may not be directly related to reductions in obesity. In a political context, policy makers may want to know how voters will react, how parents will react, what the costs will be, or whether the ranking of the city or state on body mass index (BMI) levels will change. Health plan directors may want evidence of comparative effectiveness (i.e., comparing the benefits and harms of a competitive intervention in real-world settings) to make decisions on coverage. In any situation with multiple outcomes, which is the usual case, trade-offs may have to be made between these outcomes. For example, an outcome may be cost-effective but not politically popular or feasible. Further discussion of trade-offs can be found later in the chapter.

Logic models are helpful in defining appropriate evaluation outcomes and providing a framework for evaluation. For a long-term outcome, a logic model is useful in defining the short-term and intermediate steps that will lead to that outcome. Outcomes can be goals related to the health of the population (e.g., reduced mortality from diabetes), structural change (e.g., establishment of a new recreation center), a new policy (e.g., access to fresh fruits and vegetables in a Special Supplemental Nutrition Program for Women, Infants, and Children [WIC] program), or others. A recent report by the Institute of Medicine (IOM) (2007) introduces a general logic model for evaluating obesity prevention interventions (for children) (see Chapter 2 , Figure 2-2 ) and applies it specifically to distinct end users, such as government and industry (see Figures 6-2 and 6-3 , respectively). This model takes into account the

FIGURE 6-2 Evaluation framework for government efforts to support capacity development for preventing childhood obesity.

FIGURE 6-2 Evaluation framework for government efforts to support capacity development for preventing childhood obesity.

SOURCE: IOM, 2007.

FIGURE 6-3 Evaluation framework for industry efforts to develop low-calorie and nutrient-dense beverages and promote their consumption by children and youth.

FIGURE 6-3 Evaluation framework for industry efforts to develop low-calorie and nutrient-dense beverages and promote their consumption by children and youth.

interconnected factors that influence the potential impact of an intervention. It facilitates the identification of resources (e.g., funding), strategies and actions (e.g., education, programs), outcomes (e.g., environmental, health), and other cross-cutting factors (e.g., age, culture, psychosocial status) that are important to obesity prevention for particular users.

GENERALIZABILITY AND CONTEXTUAL CONSIDERATIONS

Existing standards of evidence formulate the issue of generalizability in terms of efficacy, effectiveness, and readiness for dissemination (Flay et al., 2005). From this perspective, among the questions to be answered in evaluating whether studies are more or less useful as a source of evidence are the following: How representative were the setting, population, and circumstances in which the studies were conducted? Can the evidence from a study or group of studies be generalized to the multiple settings, populations, and contexts in which the evidence would be applied? Are the interventions studied affordable and scalable in the wide variety of settings where they might

be needed, given the resources and personnel available in those settings? For decision makers, the generalizability of evidence is what they might refer to as “relevance”: Is the evidence, they ask, relevant to our population and context? Answering this question requires comparing the generalizability of the studies providing the evidence and the context (setting, population, and circumstances) in which the evidence would be applied.

Glasgow and others have called for criteria with which to judge the generalizability of studies in reporting evidence, similar to the Consolidated Standards of Reporting Trials (CONSORT) reporting criteria for RCTs and the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) quality rating scales for nonrandomized trials (Glasgow et al., 2006a). Box 6-1 details four dimensions of generalizability (using the term “external validity”) in the reporting of evidence in most efficacy trials and many effectiveness trials and the specific indicators or questions that warrant consideration in judging the quality of the research (Green and Glasgow, 2006).

EXISTING APPROACHES TO EVALUATING EVIDENCE

The most widely acknowledged approach for evaluating evidence—one that underlies much of what is considered evidence of causation in the health sciences—is the classic nine criteria or “considerations” of Bradford Hill (Hill, 1965): strength of association, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy. All but one of these criteria emphasize the level of causality, largely because the phenomena under study were organisms whose biology was relatively uniform within species, so the generalizability of causal relationships could be assumed with relative certainty.

The rating scheme of the Canadian Task Force on the Periodic Health Examination (Canadian Task Force on the Periodic Health Examination, 1979) was adopted in the late 1980s by the U.S. Preventive Services Task Force (USPSTF) (which systematically reviews evidence for effectiveness and develops recommendations for clinical preventive services) (USPSTF, 1989, 1996). These criteria establish a hierarchy for the quality of studies that places professional judgment and cross-sectional observation at the bottom and RCTs at the top. As described by Green and Glasgow (2006), these criteria also concern themselves almost exclusively with the level of certainty. “The greater weight given to evidence based on multiple studies than a single study was the main … [concession] to external validity (or generalizability), … [but] even that was justified more on grounds of replicating the results in similar populations and settings than of representing different populations, settings, and circumstances for the interventions and outcomes” (Green and Glasgow, 2006, p. 128). The Cochrane Collaboration has followed this line of evidence evaluation in its systematic reviews, as has the evidence-based medicine movement (Sackett et al., 1996) more generally in its almost exclusive favoring of RCTs (see Chapter 5 ). As the Cochrane

methods have been extended to nonmedical applications, greater acceptability of other types of evidence has been granted, but reluctantly (see below). More recently, the Campbell Collaboration (see Sweet and Moynihan, 2007) attempted to take a related but necessarily distinctive approach to systematic reviews of more complex interventions addressing social problems beyond health, in the arenas of education, crime and justice, and social welfare. The focus was on improving the usefulness of systematic reviews for researchers, policy makers, the media, interest groups, and the broader community of decision makers. The Society for Prevention Research has extended efforts to establish standards for identifying effective prevention programs and policies by issuing standards for efficacy (level of certainty), effectiveness (generalizability), and dissemination (Flay et al., 2005).

The criteria of the USPSTF mentioned above were adapted by the Community Preventive Services Task Force, with greater concern for generalizability in recognition of the more varied public health circumstances of practice beyond clinical settings (Briss et al., 2000, 2004; Green and Kreuter, 2000). The Community Preventive Services Task Force, which is overseeing systematic reviews of interventions designed to promote population health, is giving increasing attention to generalizability in a standardized section on “applicability.” Numerous textbooks on research quality have tended to concern themselves primarily with designs for efficacy rather than effectiveness studies, although the growing field of evaluation has increasingly focused on issues of practice-based, real-time, ordinary settings (Glasgow et al., 2006b; Green and Lewis, 1986, 1987; Green et al., 1980). Finally, in the field of epidemiology, Rothman and Greenland (2005) offer a widely cited model that describes causality in terms of sufficient causes and their component causes. This model illuminates important principles such as multicausality, the dependence of the strength of component causes on the prevalence of other component causes, and the interactions among component causes.

The foregoing rules or frameworks for evaluating evidence have increasingly been taken up by the social service professions, building not just on biomedical traditions but also on agricultural and educational research in which experimentation predated much of the action research in the social and behavioral sciences. The social service and education fields have increasingly utilized RCTs, but have faced growing resistance to their limitations and the “simplistic distinction between strong and weak evidence [that] hinged on the use of randomized controlled trials …” (Chatterji, 2007, p. 239; see also Hawkins et al., 2007; Mercer et al., 2007; Sanson-Fisher et al., 2007), especially when applied to complex community interventions.

Campbell and Stanley’s (1963) widely used set of “threats to internal validity (level of certainty)” for experimental and quasi-experimental designs were accompanied by their seldom referenced “threats to external validity (generalizability).” “The focus on internal validity (level of certainty) was justified on the grounds that without internal validity, external validity or generalizability would be irrelevant or misleading, if not impossible” (Green and Glasgow, 2006, p. 128). These and other issues

concerning the level of certainty and generalizability are discussed in greater detail in Chapter 8 .

A PROPOSED APPROACH TO EVALUATING THE QUALITY OF SCIENTIFIC EVIDENCE

Scientists have always used criteria or guidelines to organize their thinking about the nature of evidence. Much of what we think we know about the causes of obesity and the current obesity epidemic, for example, is based on the evaluation of evidence using existing criteria. In thinking about the development of a contemporary framework to guide decision making in the complex settings of public health, however, the committee decided to advance a broader view of appropriate evaluation criteria. As described in 2005 in a seminal report from the Institute of Medicine (IOM), these decisions need to be made with the “best available evidence” and cannot wait for the “best possible evidence” or all the desirable evidence to be at hand (IOM, 2005, p. 3). The L.E.A.D. framework should serve the needs of decision makers focused on the obesity epidemic, but can also provide guidance for those making decisions about complex, multifactorial public health challenges more generally.

The starting point for explaining the committee’s approach to evaluating the quality of evidence for obesity prevention is the seven categories of study designs and different sources of evidence presented in Chapter 5 . In Table 6-1 , this typology is linked to criteria for judging the quality of evidence, drawing on the concept of “critical appraisal criteria” of Rychetnik and colleagues (Rychetnik et al., 2002, 2004). Generally speaking, different types of evidence from different types of study designs are evaluated by different criteria, all of which can be found in the literature on evaluating the quality of each type of evidence. In all cases, high-quality evidence avoids bias, confounding, measurement error, and other threats to validity whenever possible; however, other aspects of quality come into play within the broader scope of evidence advanced by the L.E.A.D. framework.

Users of the L.E.A.D. framework can refer to any of the various criteria for high-quality evidence depending on the source of evidence they have located, following the guidance provided in Chapter 5 as well as the references cited in Table 6-1 . This process requires some time and effort by an individual or multidisciplinary group with some expertise in evaluating evidence. Despite the availability of the criteria listed in Table 6-1 , making judgments about the quality of evidence can still be challenging. One recommended approach is the eight-step process advanced by Liddle and colleagues (1996):

“Select reviewers(s) and agree on details of the review procedure.

Specify the objective of the review of evidence.

Identify strategies to locate the full range of evidence including unpublished results and work in progress.

Classify the literature according to general purpose and study type.

TABLE 6-1 A Typology of Study Designs and Quality Criteria

Retrieve the full version of evidence available.

Assess the quality of the evidence.

Quantify the strength of the evidence.

Express the evidence in a standard way.” (pp. 6-7).

Step 6 includes checklists for assessing the quality of studies depending on their design and purpose (Liddle et al., 1996).

Most biomedical researchers are familiar with the quality criteria that have been used for experimental and observational epidemiological research, but less so with those used for qualitative studies. Quality is not addressed for qualitative research in the checklists offered by Liddle and colleagues (1996), but can be assessed using

the same broad concepts of validity and relevance used for quantitative research. However, these concepts need to be applied differently to account for the distinctive goals of such research, so defining a single method for evaluation is not suggested (Cohen and Crabtree, 2008; Patton, 1999). Mays and Pope (2000) summarize “relativist” criteria for quality, similar to the criteria of Rychetnik and colleagues (2002) (see Table 6-1 ), that are common to both qualitative and quantitative studies. Others have since reported on criteria that can be used to assess qualitative research (Cohen and Crabtree, 2008; Popay et al., 1998; Reis et al., 2007). In addition, guidance on the description and implementation of qualitative (and mixed-method) research, along with a checklist, has been provided by the National Institutes of Health (Office of Behavioral and Social Sciences Research, 2000).

Criteria also exist for evaluating the quality of systematic reviews themselves, whether they are of quantitative or qualitative studies (Goldsmith et al., 2007). In addition to the criteria of the Public Health Resource Unit (2006) listed in Table 6-1 , a detailed set of criteria has been compiled by the Milbank Memorial Foundation and the Centers for Disease Control and Prevention (CDC) (Sweet and Moyniham, 2007).

As noted earlier, expert knowledge is frequently considered to be at the bottom of traditional hierarchies that focus on level of certainty, such as that used by the USPSTF. However, expert knowledge can be of value in evaluating evidence and can also be viewed with certain quality criteria in mind (Garthwaite et al., 2008; Harris et al., 2001; Petitti et al., 2009; Turoff and Hiltz, 1996; World Cancer Research Fund and American Institute for Cancer Research, 2007). The Delphi Method was developed to utilize expert knowledge in a reliable and creative way that is suitable for decision making and has been found to be effective in social policy and public health decision making (Linstone and Turoff, 1975); it is a “structured process for collecting and distilling knowledge from a group of experts” through questionnaires interspersed with controlled feedback (Adler and Ziglio, 1996, p. 3). If these quality criteria are taken into account and conflicts of interest are identified and minimized, decision making can benefit substantially from the considered opinion of experts in a particular field or of practitioners, stakeholders, and policy makers capable of making informed judgments on implementation issues (e.g., doctors, lawyers, scientists, or academics able to interpret the scientific literature or specialized forms of data).

Finally, in addition to the main sources of evidence included in Table 6-1 , other sources may be of value in decision making. Many are not independent sources, but closer to a surveillance mechanism or a tool for dissemination of evidence. They include simulation models, health impact assessments, program or policy evaluations, policy scans, and legal opinions. For instance, health impact assessments (described in more detail in Chapter 5 , under “What” questions) formally examine the potential health effects of a proposed intervention (Cole and Fielding, 2007). An example is Health Forecasting (University of California–Los Angeles School of Public Health, 2009), which uses a web-based simulation model that allows users to view evidencebased descriptions of populations and subpopulations (disparities) to assess the poten-

tial effects of policies and practices on future health outcomes. Another such source, policy evaluations, allows studies of various aspects of a problem to be driven by a clear conceptual model. An example is the International Tobacco Control Policy Evaluation Project, a multidisciplinary, multisite, international endeavor that aims to evaluate and understand the impact of tobacco control policies as they are implemented in countries around the world (Fong et al., 2006). These sources may provide evidence for which there are quality criteria to consider, but are not addressed in detail here.

WHEN SCIENTIFIC EVIDENCE IS NOT A PERFECT FIT: TRADE-OFFS TO CONSIDER

Trade-offs may be involved in considering the quality of various types of evidence available to answer questions about complex, multilevel public health interventions (Mercer et al., 2007). Randomization at the individual level and experimental controls may remain the gold standard, but as pointed out above, these methods are not always possible in population health settings, and they are sometimes counterproductive with respect to the artificial conditions used to implement randomization and control procedures. Therefore, some of the advantages of RCTs may have to be traded off to obtain the best available evidence for decision making. Because no one study is usually sufficient to support decisions on public health interventions, the use of multiple types of evidence (all of good quality for their design) may be the best approach (Mercer et al., 2007), a point further elaborated upon in Chapter 8 .

Adler, M., and E. Ziglio. 1996. Gazing into the oracle: The Delphi method and its application to social policy and public health. London, UK: Jessica Kingsley.

Briss, P. A., S. Zaza, M. Pappaioanou, J. Fielding, L. Wright-De Aguero, B. I. Truman, D. P. Hopkins, P. D. Mullen, R. S. Thompson, S. H. Woolf, V. G. Carande-Kulis, L. Anderson, A. R. Hinman, D. V. McQueen, S. M. Teutsch, and J. R. Harris. 2000. Developing an evidence-based Guide to Community Preventive Services—methods. American Journal of Preventive Medicine 18(1, Supplement 1):35-43.

Briss, P. A., R. C. Brownson, J. E. Fielding, and S. Zaza. 2004. Developing and using the Guide to Community Preventive Services: Lessons learned about evidence-based public health. Annual Review of Public Health 25:281-302.

Campbell, D. T., and J. C. Stanley. 1963. Experimental and quasi-experimental designs for research. Chicago: Rand McNally.

Canadian Task Force on the Periodic Health Examination. 1979. The periodic health examination. Canadian Medical Association Journal 121(9):1193-1254.

Chatterji, M. 2007. Grades of evidence: Variability in quality of findings in effectiveness studies of complex field interventions. American Journal of Evaluation 28(3):239-255.

Cohen, D. J., and B. F. Crabtree. 2008. Evaluative criteria for qualitative research in health care: Controversies and recommendations. Annals of Family Medicine 6(4):331-339.

Cole, B. L., and J. E. Fielding. 2007. Health impact assessment: A tool to help policy makers understand health beyond health care. Annual Review of Public Health 28:393-412.

Des Jarlais, D. C., C. Lyles, and N. Crepaz. 2004. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: The TREND statement. American Journal of Public Health 94(3):361-366.

Flay, B. R. 1986. Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. Preventive Medicine 15(5):451-474.

Flay, B. R., A. Biglan, R. F. Boruch, F. G. Castro, D. Gottfredson, S. Kellam, E. K. Moscicki, S. Schinke, J. C. Valentine, and P. Ji. 2005. Standards of evidence: Criteria for efficacy, effectiveness and dissemination. Prevention Science 6(3):151-175.

Fong, G. T., K. M. Cummings, R. Borland, G. Hastings, A. Hyland, G. A. Giovino, D. Hammond, and M. E. Thompson. 2006. The conceptual framework of the International Tobacco Control (ITC) Policy Evaluation Project. Tobacco Control 15(Supplement 3): iii1-iii2.

Garthwaite, P. H., J. B. Chilcott, D. J. Jenkinson, and P. Tappenden. 2008. Use of expert knowledge in evaluating costs and benefits of alternative service provisions: A case study. International Journal of Technology Assessment in Health Care 24(3):350-357.

Glasgow, R., L. Green, L. Klesges, D. Abrams, E. Fisher, M. Goldstein, L. Hayman, J. Ockene, and C. Orleans. 2006a. External validity: We need to do more. Annals of Behavioral Medicine 31(2):105-108.

Glasgow, R. E., L. M. Klesges, D. A. Dzewaltowski, P. A. Estabrooks, and T. M. Vogt. 2006b. Evaluating the impact of health promotion programs: Using the RE-AIM framework to form summary measures for decision making involving complex issues. Health Education Research 21(5):688-694.

Goldsmith, M. R., C. R. Bankhead, and J. Austoker. 2007. Synthesising quantitative and qualitative research in evidence-based patient information. Journal of Epidemiology and Community Health 61(3):262-270.

Green, L. W., and R. E. Glasgow. 2006. Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation & the Health Professions 29(1):126-153.

Green, L. W., and M. W. Kreuter. 2000. Commentary on the emerging Guide to Community Preventive Services from a health promotion perspective. American Journal of Preventive Medicine 18(1 Supplement 1):7-9.

Green, L. W., and M. W. Kreuter. 2005. Health program planning: An educational and ecological approach. 4th ed. New York: McGraw-Hill.

Green, L. W., and F. M. Lewis. 1986. Measurement and evaluation in health education and health promotion. Palo Alto, CA: Mayfield Publishing Company.

Green, L., and F. M. Lewis. 1987. Data analysis in evaluation of health education: Towards standardization of procedures and terminology. Health Education Research 2(3):215-221.

Green, L. W., F. M. Lewis, and D. M. Levine. 1980. Balancing statistical data and clinician judgments in the diagnosis of patient educational needs. Journal of Community Health 6(2):79-91.

Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, and D. Atkins. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. American Journal of Preventive Medicine 20(3, Supplement):21-35.

Hawkins, N. G., R. W. Sanson-Fisher, A. Shakeshaft, C. D’Este, and L. W. Green. 2007. The multiple baseline design for evaluating population-based research. American Journal of Preventive Medicine 33(2):162-168.

Higgins, J. P. T., and S. Green (editors). 2009. Cochrane handbook for systematic review of interventions, Version 5.0.2 [updated September 2009]. The Cochrane Collaboration, 2008. http://www.cochrane-handbook.org (accessed December 13, 2009).

Hill, A. B. 1965. The environment and disease: Association or causation. Proceedings of the Royal Society of Medicine 58:295-300.

IOM (Institute of Medicine). 2005. Preventing childhood obesity: Health in the balance. Edited by J. Koplan, C. T. Liverman, and V. I. Kraak. Washington, DC: The National Academies Press.

IOM. 2007. Progress in preventing childhood obesity: How do we measure up? Edited by J. Koplan, C. T. Liverman, V. I. Kraak, and S. L. Wisham. Washington, DC: The National Academies Press.

Liddle, J., M. Williamson, and L. Irwig. 1996. Method for evaluating research and guideline evidence. Sydney: NSW Health Department.

Linstone, H. L., and M. Turoff. 1975. The Delphi method: Techniques and applications. Reading, MA: Addison-Wesley.

MacKinnon, D. P. 2008. An introduction to statistical meditation analysis. New York: Lawrence Erlbaum Associates.

Mays, N., and C. Pope. 2000. Qualitative research in health care: Assessing quality in qualitative research. British Medical Journal 320(7226):50-52.

Mercer, S. L., B. J. DeVinney, L. J. Fine, L. W. Green, and D. Dougherty. 2007. Study designs for effectiveness and translation research: Identifying trade-offs. American Journal of Preventive Medicine 33(2):139-154.

Office of Behavioral and Social Sciences Research. 2000. Qualitative methods in health research: Opportunities and considerations in application and review. Produced by the NIH Culture and Qualitative Research Interest Group, based on discussions and written comments from the expert working group at a workshop sponsored by the Office of Behavioral and Social Sciences Research. Bethesda, MD: Office of Behavioral and Social Sciences Research.

Patton, M. Q. 1999. Enhancing the quality and credibility of qualitative analysis. Health Services Research 34(5, Part 2):1189-1208.

Petitti, D. B., S. M. Teutsch, M. B. Barton, G. F. Sawaya, J. K. Ockene, and T. Dewitt. 2009. Update on the methods of the U.S. Preventive Services Task Force: Insufficient evidence. Annals of Internal Medicine 150(3):199-205.

Petticrew, M., and H. Roberts. 2003. Evidence, hierarchies, and typologies: Horses for courses. Journal of Epidemiology and Community Health 57(7):527-529.

Pluye, P., M. P. Gagnon, F. Griffiths, and J. Johnson-Lafleur. 2009. A scoring system for appraising mixed methods research, and concomitantly appraising qualitative, quantitative and mixed methods primary studies in mixed studies reviews. International Journal of Nursing Studies 46(4):529-546.

Popay, J., A. Rogers, and G. Williams. 1998. Rationale and standards for the systematic review of qualitative literature in health services research. Qualitative Health Research 8(3):341-351.

Public Health Resource Unit. 2006. Critical appraisal skills programme (CASP). Making sense of evidence. http://www.phru.nhs.uk/Doc_Links/S.Reviews%20Appraisal%20Tool.pdf (accessed December 17, 2009).

Reis, S., D. Hermoni, R. Van-Raalte, R. Dahan, and J. M. Borkan. 2007. Aggregation of qualitative studies—from theory to practice: Patient priorities and family medicine/general practice evaluations. Patient Education and Counseling 65(2):214-222.

Rothman, K. J., and S. Greenland. 2005. Causation and causal inference in epidemiology. American Journal of Public Health 95(Supplement 1):S144-S150.

Rychetnik, L., M. Frommer, P. Hawe, and A. Shiell. 2002. Criteria for evaluating evidence on public health interventions. Journal of Epidemiology and Community Health 56(2):119-127.

Rychetnik, L., P. Hawe, E. Waters, A. Barratt, and M. Frommer. 2004. A glossary for evidence based public health. Journal of Epidemiology and Community Health 58(7):538-545.

Sackett, D. L., and J. E. Wennberg. 1997. Choosing the best research design for each question. British Medical Journal 315(7123):1633-1640.

Sackett, D. L., W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson. 1996. Evidence based medicine: What it is and what it isn’t. British Medical Journal 312(7023):71-72.

Sanson-Fisher, R. W., B. Bonevski, L. W. Green, and C. D’Este. 2007. Limitations of the randomized controlled trial in evaluating population-based health interventions. American Journal of Preventive Medicine 33(2):155-161.

Sweet, M., and R. Moynihan. 2007. Improving population health: The uses of systematic reviews. New York: Milbank Memorial Fund and Centers for Disease Control and Prevention.

Swinburn, B., T. Gill, and S. Kumanyika. 2005. Obesity prevention: A proposed framework for translating evidence into action. Obesity Reviews 6(1):23-33.

Turoff, M., and S. R. Hiltz. 1996. Computer-based Delphi process. In Gazing into the oracle: The Delphi method and its application to social policy and public health , edited by M. Adler and E. Ziglio. London, UK: Jessica Kingsley. Pp. 56-85.

University of California–Los Angeles School of Public Health. 2009. Health forecasting. http://www.health.forcasting.org (accessed November 9, 2009).

USPSTF (U.S. Preventive Services Task Force). 1989. Guide to clinical preventive services. Baltimore, MD: Lippincott Williams & Wilkins.

USPSTF. 1996. Guide to clinical preventive services. 2nd ed. Baltimore, MD: Williams & Wilkins.

World Cancer Research Fund and American Institute for Cancer Research. 2007. Food, nutrition, physical activity, and the prevention of cancer: A global perspective. Washington, DC: American Institute for Cancer Research.

This page intentionally left blank.

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Methodology I: The Best Available Evidence

  • First Online: 19 December 2013

Cite this chapter

Book cover

  • Francesco Chiappelli 2  

1114 Accesses

This chapter establishes some of the foundational concepts principal of the field, namely, the pursuit of the best available evidence for translational effectiveness as a science that follows the scientific strategy. The process commences and is driven by a research question ( i.e ., PICO[TS]) that emerges from the patient–clinician encounter. The question determines the sample ( i.e ., bibliome) and the assessment tools necessary to establish the level ( i.e ., SORT) and quality of the evidence ( i.e., R-Wong). The data produced are examined in terms of what evidence should be excluded, lest it be harmful to patients (i.e., acceptable sampling analysis), and what is the overarching statistical significance of the collected evidence (i.e., meta-analysis). Taken together, these components of the research synthesis design lead to a consensus inference of the best available evidence, which can be refined by content analysis. The dissemination of the outcomes of research synthesis takes the form of traditional research reports and lists an introductory objective statement that presents the PICO[TS] question and related background. The methods sections outline the specifics of the research synthesis process followed in the study, including how the bibliome was obtained (i.e., search process, inclusion–exclusion criteria) and evaluated (i.e., tools of measurement of the level and quality of the evidence). The methods section also presents the analyses that were used and related statistical assumptions and caveats pertaining specifically to the data collected. The reports include a results section and a discussion section where the findings are interpreted in light of the overall consensus of the best available evidence they reveal. Because of the systematic nature of the research synthesis design reported, and because of the nature of the entities studied, viz., the bibliome resulting from a review of the available literature, these reports have come to be known as “systematic reviews.” To be clear, a systematic review is not a traditional narrative review paper; rather, it is a research report in its own right. A research report of a research synthesis of several systematic reviews is called a “ complex” systematic review and more specifically, to emphasize the need and to preserve the need for clinical relevance, a clinically relevant complex systematic review. Notwithstanding, the fields strive to evaluate critically all of the available evidence; it is not uncommon that primary reports and systematic reviews suffer from gaps in evidence and knowledge. The awareness of this caveat has led to the establishment of a subfield, within the research synthesis design, which pertains specifically to the identification and elucidation of gaps in research. This chapter points to current considerations and to future lines of investigation in that domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Defined, as noted above, as the bibliome.

It is certainly the case, as noted and developed in Chaps. 2 and particularly 10 , that whereas the PICOTS question is derived from a personal patient–clinician interaction, the bibliome usually consists of research papers that utilize aggregate data, and the issue has been brought forward that since the patient most likely did not belong to any of the groups reported in the bibliome, the resulting synthesized evidence is tangential, at best, to the needs and wants of the specific individual patient. Be that as it may, the utilization of research evidence in clinical decision-making is always subject to that limitation and is not specific to the utilization of the consensus of the best available evidence in clinical decisions. Moreover, and aware of this limitation, the field makes concerted efforts to develop and characterize statistically stringent methodologies for individual patient data collection ( i.e., patient-centered outcomes research) and analysis ( i.e., individual patient data analysis and meta-analysis), which we discuss in greater details in Chaps. 5 , 6 , and 10 .

Several people play many important and distinct roles in the context of the patient’s well-being: for every aspects of EBHC, there are numerous levels of stakeholders with interest in the best available evidence, from allied clinicians and pharmacists, to family members and caregivers, insurance providers, and others, who converge their important contribution to providing optimal evidence-based health care to individual patients within such communities of practices as patient-centered medical/dental ( i.e ., health care) homes/neighborhoods ( cf. Chaps. 1 and 4 ).

Systematic review is an electronic open-access journal for fast publication of systematic review protocols (to publication of complete review) and updates (to complete reviews and/or updates of previous updates). Protocols are registered in PROSPERO ( vide infra ), verified by PRISMA ( vide infra ), and included in DARE (Database of Abstracts of Reviews of Effects).

Fundamentally, and as defined by the National Institutes of Health (NIH), translational research is used to translate the findings in basic research efficiently into clinical practice.

The clinically relevant complex mixed systematic reviews (CRCMSRs) combine ( i.e ., “mix”) the traditional systematic review approach outlined above with the systematic reviews performed on a set of systematic reviews ( i.e., CRCSRs). That mixing of two heterogeneous bodies of research in a single research synthesis process is methodologically problematic, as it engenders significant analytical challenges and interpretative difficulties. Expectations are that concerted work in the next decade will systematically address these caveats and refine the procedural, analytical, and inferential protocols.

In Chaps. 4 and 5 , we discuss the level and the quality of the evidence. The level of the evidence describes the type of study that was performed to obtain the evidence in question. Assessment of the level of the evidence respond to “what study” was done to obtain the evidence. With respect to its perceived immediate utility to inform clinical decision-making, in vitro and animal studies are given a low level of evidence. By contrast, clinical trials are considered to yield evidence with high utility to immediate application for clinical intervention. The contradiction of facts in this rather uselessly superficial approach is blatant when one considers that Phase 0 and Phase 1 clinical trials, as defined by the National Institutes of Health, obligatorily, are studies performed in vitro or with animals as research subjects. The consensus statement of systematic reviews, it is argued, proffers uniformly the highest utility in informing clinical decisions, and this evidence is assigned therefore the highest level. That is to say, the level of evidence of systematic reviews is viewed as optimal.

The quality of the evidence refers to whether or not the study conducted to obtain the said evidence was conducted in close adherence to the widely recognized standards of research methodology, design, and data analysis that define and characterize the scientific process. Assessment of quality of the evidence responds to “how well as the study executed” that yielded the evidence. The quality of the evidence can be scored and quantified with psychometrically validated instruments designed for that purpose. In the context of systematic review, one such instrument is the “assessment of multiple systematic reviews” (AMSTAR) (Shea et al. 2007 , 2009 ) and it revision (Kung et al. 2010 ).

cf. Lau et al. 1995 ; Janket et al. 2005 ; and Moles et al. 2005 .

cf . Chap. 6 .

Prospero (Booth et al. 2011 , 2013) http://www.crd.york.ac.uk/prospero .

Robert McCloskey, Viking Press, 1941.

26 April 1889–29 April 1951—indeed, it is widely acknowledged that one prime influence in Wittgenstein’s thought was Agostinian philosophy. In fact, it seems that he might as well have found inspiration from the Franciscan School in his discussion on language. St. Francis is known to have instructed his followers evangelize always, use words only if you must . This concept is remarkably similar to Wittgenstein’s own.

But when that process is undertaken, it results in a systematic evaluation of the evidence, which de facto. approaches the pursuit of the best available evidence, and is, by its own nature, not selective any longer.

Wrong translations continue the pervasive misconception that in fact the two terms convey the same concept and mean the same thing—so it doesn’t matter (precision in language in fact does matter, as Wittgenstein and others emphatically stated). These wrong translations continue to misinform and to miseducate our colleagues abroad.

For lack of a better term, and because of its increasingly widespread use, we continue its usage in the chapters in this volume and in our writings beyond this work. However, it may behoove the field to consider the following: Peer-reviewed scientific publications that report fundamental primary research in molecular biology are often called “molecular biology papers”; similar publications that report, say, novel primary research findings in immunology, are often termed “immunology papers”; publications that do the same in the field of, say, psychology, are recognized as “psychology papers,” etc . They all report new research findings systematically derived through the scientific process, appropriately analyzed statistically, and carefully crafted to integrate the novel knowledge into a review of the pertinent body of existing science. Therefore, it may soon be time to abandon the wanton, misleading, and inappropriate use of the term “systematic review” to refer to the product of a research synthesis research investigation and replace it instead with the more correct and precise term of “research synthesis paper,” for instance.

Critics of EBHC and EBD in particular have argued that it is not a science because it is not hypothesis driven. This criticism reveals a lack of understanding of what EBHC is all about and is as fallacious as stating that physics, biology, or psychology is not a science. EBHC is a science because it follows the scientific process.

The interested reader is advised to get on the mailing list of the Cochrane journal club (cochranejournalclub.com).

Some degree of selection bias is unavoidable because of the very nature of our peer-reviewed system. For example, a certain degree of publication bias cannot be avoided simply because, as a general rule, papers that are statistically significant, whether they demonstrate clinical relevance or not, tend to be preferentially published in the scientific literature, compared to reports that demonstrate clinical relevance but fail to reach statistical significance. The problem of publication bias is inherent to our present system of scientific literature and is an unavoidable issue of the research synthesis process.

The effect of the preferential acceptance of articles reporting significant results on research is critical : bias in favor of studies showing significant results alter the reliability of systematic reviews by reducing the included number of papers with opposing results. Because the validity of this type of publications depends on the representativeness and soundness of the source material, underrepresented evidence will have a disproportionally decreased influence on the outcome. That outcome will be particularly grave when research synthesis is utilized to obtain the best available evidence for treatment of pathologies in order to perform either evidence-based clinical decisions or comparative effectiveness analysis.

It is important to note that inter-rater reliability obtains a correlation coefficient between two raters, and a high correlation implies that the two raters “agreed” on which item to score high or low ( i.e. , strong positive Pearson correlation coefficient). By contrast, Cohen’s kappa coefficient is a statistical measure of agreement, which assesses whether or not the probability of the raters agreeing is larger than chance alone. The Pearson intra-rater reliability coefficient is distinct from Cohen’s kappa coefficient, although both values establish the degree of agreement between two raters: they are two distinct sides of the same coin. cf . Chap. 7 .

The discussion of meta-analysis in Chaps. 5 , 6 , and 10 will demonstrate that one of the principal advantages of meta-analysis lies in the fact that it results in increased sample size, compared to any individual study in the analysis, and thus proffers greater statistical power ( i.e ., detecting a statistical effects, if there is one to be found).

Omission of the preliminary acceptable sampling analysis will result in the potential inclusion in the meta-analysis of good as well as of subpar research reports, which will undoubtedly dilute the statistical power of the meta-analytical step by incorporating extraneous systematic error ( i.e ., variability, variance)—the GIGO fallacy allude to in the previous chapter: “garbage in, garbage out” (cf. Spångberg 2010 ). Similarly, if the homogeneity analysis is omitted, a meta-analysis will result that compares apples to oranges, yielding, again for the same reason, reduced power.

The Institute of Medicine Committee on Comparative Effectiveness Research Prioritization, as we recall from Chaps. 1 and 2 , defined (2009) comparative effectiveness research and analysis as “the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.”

For example, the interested reader is referred to 2010 AHRQ report: Creating a framework for “best evidence” approaches in systematic reviews. Review Protocol. Sept 2010. Rockville: Agency for Healthcare Research and Quality. http://www.ahrq.gov/clinic/tp/bestevtp.htm .

We would all prefer that clinical decisions always have a firm scientific foundation, but as a matter of practice, that is still prohibitive with the clinical research protocols and methodologies at our disposal today. A variety of approaches and criteria can be obtained from related sciences, such as and including text analysis and text mining, for the purpose of quantifying certain common elements of clinical observations and descriptive comments. Once systematically quantified, these variables can be subjected to standard statistical analysis, such as and including factor analysis and cluster analysis, as well as multiple and logistic regression, in order to obtain statistically stringent inferences about identified benefits and risks. That is to say, while there may be a range of values that reflect clinically relevant findings depending on the clinical scenario and that different treatment methods may provide various benefits text analysis and mining can allow the systematic analysis of these variables and ranges within any given clinical scenario, thus lending themselves to statistically sound inferences.

References specific to this chapter are listed here—for general references, public domains, and reports, please refer to the general reference list at the end of this book.

Google Scholar  

Barrett B, Brown D, Mundt M, Brown R. Sufficiently important difference: expanding the framework of clinical significance. Med Decis Making. 2005;25:250–61.

Article   PubMed   Google Scholar  

Booth A, Clarke M, Ghersi D, Moher D, Petticrew M, Stewart L. An international registry of systematic review protocols. Lancet. 2011;377:108–9.

Chalmers I, Hedges LV, Cooper H. A brief history of research synthesis. Eval Health Prof. 2002;25:12–37.

Cochrane AL. Effectiveness and efficiency: random reflections on health services. 2nd ed. London: Nuffield Provincial Hospitals Trust; 1972 (published 1989).

Dousti M, Ramchandani MH, Chiappelli F. Evidence-based clinical significance in health care: toward an inferential analysis of clinical relevance. Dent Hypotheses. 2011;2:165–77.

Article   Google Scholar  

Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM. Validity and inter-rater reliability testing of quality assessment instruments. Rockville: AHRQ; 2012.

Ip S, Kitsios GD, Chung M, Lau J. A process for robust and transparent rating of study quality: phase 1. Methods research report. (AHRQ Publication No. 12-EHC004-EF). Rockville: AHRQ; 2011.

Janket SJ, Moles DR, Lau J, Needleman I, Niederman R. Caveat for a cumulative meta-analysis. J Dent Res. 2005;84:487.

Kung J, Chiappelli F, Cajulis OS, Avezova R, Kossan G, Chew L, Maida CA. From systematic reviews to clinical recommendations for evidence-based health care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance. Open Dent J. 2010;4:84–91.

PubMed Central   PubMed   Google Scholar  

Lau J, Schmid CH, Chalmers TC. Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care. J Clin Epidemiol. 1995;48:45–57.

Article   CAS   PubMed   Google Scholar  

Littell JH, Corcoran J, Pillai V. Research synthesis reports and meta-analysis. New York: Oxford University Press; 2008.

Moles DR, Needleman IG, Niederman R, Lau J. Introduction to cumulative meta-analysis in dentistry: lessons learned from undertaking a cumulative meta-analysis in periodontology. J Dent Res. 2005;84:345–9.

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, Porter AC, Tugwell P, Moher D, Bouter LM. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.

Article   PubMed Central   PubMed   Google Scholar  

Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, Henry DA, Boers M. AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009;62:1013–20.

Spångberg LSW. Systematic reviews in endodontics—examples of GIGO? Oral Surg Oral Med Oral Pathol Oral Radiol Endod. 2007;103:724–5.

Whitlock EP, Lopez SA, Chang S, Helfand M, Eder M, Floyd N. Identifying, selecting, and refining topics. In: Methods guide for comparative effectiveness reviews. Rockville: Agency for Healthcare Research and Quality; 2009.

Download references

Author information

Authors and affiliations.

CHS 63-090, UCLA School of Dentistry, Los Angeles, California, USA

Francesco Chiappelli

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Chiappelli, F. (2014). Methodology I: The Best Available Evidence. In: Fundamentals of Evidence-Based Health Care and Translational Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41857-0_3

Download citation

DOI : https://doi.org/10.1007/978-3-642-41857-0_3

Published : 19 December 2013

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-41856-3

Online ISBN : 978-3-642-41857-0

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Logo for University of Minnesota Libraries

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Appraising the Evidence

After identifying an article or resource that seems appropriate to your question, you must critically appraise the information you found. Filtered resources such as DynaMed have often pre-appraised the literature they cite, but primary literature–e.g. individual studies found in a database such as PubMed–are not pre-appraised. Some types of filtered information, such as systematic reviews, critically appraise the studies that they include in their summary of the literature, but you as the reader will also want to critically appraise the methods of the systematic review.

When evaluating the quality of any study, ask yourself the following:

  • Does this study address a clearly focused question?
  • Does the study use valid methods to address this question?
  • Are the valid results of this study important and applicable to my patient, population, or problem?

Just like some fields of study might not be addressed by a systematic review or RCT, some types of questions are better suited to some study types than others.

Below is a table of types of clinical questions and the suggested research design to answer that question, in the order of highest level of evidence to lowest. For example, if your PICO question is about a therapy or intervention to treat a condition, a systematic review or meta-analysis would be the best level of evidence, but if that doesn’t exist then you would want to look for an RCT. If an RCT doesn’t exist, then you would look for a cohort study, and so on.

Evidence-Based Practice Copyright © by Various Authors - See Each Chapter Attribution is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

  • Library databases
  • Library website

Evidence-Based Research: Levels of Evidence Pyramid

Introduction.

One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels.

  • systematic reviews
  • critically-appraised topics
  • critically-appraised individual articles
  • randomized controlled trials
  • cohort studies
  • case-controlled studies, case series, and case reports
  • Background information, expert opinion

Levels of evidence pyramid

The levels of evidence pyramid provides a way to visualize both the quality of evidence and the amount of evidence available. For example, systematic reviews are at the top of the pyramid, meaning they are both the highest level of evidence and the least common. As you go down the pyramid, the amount of evidence will increase as the quality of the evidence decreases.

Levels of Evidence Pyramid

Text alternative for Levels of Evidence Pyramid diagram

EBM Pyramid and EBM Page Generator, copyright 2006 Trustees of Dartmouth College and Yale University. All Rights Reserved. Produced by Jan Glover, David Izzo, Karen Odato and Lei Wang.

Filtered Resources

Filtered resources appraise the quality of studies and often make recommendations for practice. The main types of filtered resources in evidence-based practice are:

Scroll down the page to the Systematic reviews , Critically-appraised topics , and Critically-appraised individual articles sections for links to resources where you can find each of these types of filtered information.

Systematic reviews

Authors of a systematic review ask a specific clinical question, perform a comprehensive literature review, eliminate the poorly done studies, and attempt to make practice recommendations based on the well-done studies. Systematic reviews include only experimental, or quantitative, studies, and often include only randomized controlled trials.

You can find systematic reviews in these filtered databases :

  • Cochrane Database of Systematic Reviews Cochrane systematic reviews are considered the gold standard for systematic reviews. This database contains both systematic reviews and review protocols. To find only systematic reviews, select Cochrane Reviews in the Document Type box.
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) This database includes systematic reviews, evidence summaries, and best practice information sheets. To find only systematic reviews, click on Limits and then select Systematic Reviews in the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .

Open Access databases provide unrestricted access to and use of peer-reviewed and non peer-reviewed journal articles, books, dissertations, and more.

You can also find systematic reviews in this unfiltered database :

Some journals are peer reviewed

To learn more about finding systematic reviews, please see our guide:

  • Filtered Resources: Systematic Reviews

Critically-appraised topics

Authors of critically-appraised topics evaluate and synthesize multiple research studies. Critically-appraised topics are like short systematic reviews focused on a particular topic.

You can find critically-appraised topics in these resources:

  • Annual Reviews This collection offers comprehensive, timely collections of critical reviews written by leading scientists. To find reviews on your topic, use the search box in the upper-right corner.
  • Guideline Central This free database offers quick-reference guideline summaries organized by a new non-profit initiative which will aim to fill the gap left by the sudden closure of AHRQ’s National Guideline Clearinghouse (NGC).
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) To find critically-appraised topics in JBI, click on Limits and then select Evidence Summaries from the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .
  • National Institute for Health and Care Excellence (NICE) Evidence-based recommendations for health and care in England.
  • Filtered Resources: Critically-Appraised Topics

Critically-appraised individual articles

Authors of critically-appraised individual articles evaluate and synopsize individual research studies.

You can find critically-appraised individual articles in these resources:

  • EvidenceAlerts Quality articles from over 120 clinical journals are selected by research staff and then rated for clinical relevance and interest by an international group of physicians. Note: You must create a free account to search EvidenceAlerts.
  • ACP Journal Club This journal publishes reviews of research on the care of adults and adolescents. You can either browse this journal or use the Search within this publication feature.
  • Evidence-Based Nursing This journal reviews research studies that are relevant to best nursing practice. You can either browse individual issues or use the search box in the upper-right corner.

To learn more about finding critically-appraised individual articles, please see our guide:

  • Filtered Resources: Critically-Appraised Individual Articles

Unfiltered resources

You may not always be able to find information on your topic in the filtered literature. When this happens, you'll need to search the primary or unfiltered literature. Keep in mind that with unfiltered resources, you take on the role of reviewing what you find to make sure it is valid and reliable.

Note: You can also find systematic reviews and other filtered resources in these unfiltered databases.

The Levels of Evidence Pyramid includes unfiltered study types in this order of evidence from higher to lower:

You can search for each of these types of evidence in the following databases:

TRIP database

Background information & expert opinion.

Background information and expert opinions are not necessarily backed by research studies. They include point-of-care resources, textbooks, conference proceedings, etc.

  • Family Physicians Inquiries Network: Clinical Inquiries Provide the ideal answers to clinical questions using a structured search, critical appraisal, authoritative recommendations, clinical perspective, and rigorous peer review. Clinical Inquiries deliver best evidence for point-of-care use.
  • Harrison, T. R., & Fauci, A. S. (2009). Harrison's Manual of Medicine . New York: McGraw-Hill Professional. Contains the clinical portions of Harrison's Principles of Internal Medicine .
  • Lippincott manual of nursing practice (8th ed.). (2006). Philadelphia, PA: Lippincott Williams & Wilkins. Provides background information on clinical nursing practice.
  • Medscape: Drugs & Diseases An open-access, point-of-care medical reference that includes clinical information from top physicians and pharmacists in the United States and worldwide.
  • Virginia Henderson Global Nursing e-Repository An open-access repository that contains works by nurses and is sponsored by Sigma Theta Tau International, the Honor Society of Nursing. Note: This resource contains both expert opinion and evidence-based practice articles.
  • Previous Page: Phrasing Research Questions
  • Next Page: Evidence Types
  • Office of Student Disability Services

Walden Resources

Departments.

  • Academic Residencies
  • Academic Skills
  • Career Planning and Development
  • Customer Care Team
  • Field Experience
  • Military Services
  • Student Success Advising
  • Writing Skills

Centers and Offices

  • Center for Social Change
  • Office of Academic Support and Instructional Services
  • Office of Degree Acceleration
  • Office of Research and Doctoral Services
  • Office of Student Affairs

Student Resources

  • Doctoral Writing Assessment
  • Form & Style Review
  • Quick Answers
  • ScholarWorks
  • SKIL Courses and Workshops
  • Walden Bookstore
  • Walden Catalog & Student Handbook
  • Student Safety/Title IX
  • Legal & Consumer Information
  • Website Terms and Conditions
  • Cookie Policy
  • Accessibility
  • Accreditation
  • State Authorization
  • Net Price Calculator
  • Contact Walden

Walden University is a member of Adtalem Global Education, Inc. www.adtalem.com Walden University is certified to operate by SCHEV © 2024 Walden University LLC. All rights reserved.

Libraries | Research Guides

Evidence synthesis.

  • Evidence Synthesis Overview
  • Evidence Synthesis Resources by Discipline

1. Develop a Research Question and Apply a Framework

2. select a reporting guideline, 3. select databases, 4. select grey literature sources, 5. write a search strategy, 6. register a protocol, 7. translate search strategies, 8. manage your citations, 9. article screening, 10. assess the risk of bias, 11. extract the data, 12. synthesize, map, or describe the results.

  • Quantitative Studies (PICO)
  • Qualitative Studies (PICo, CHIP)
  • Mixed Methods (SPICE, SPIDER)
  • Scoping Reviews (PCC)

Formulating a research question is key to a systematic review. It will be the foundation upon which the rest of the research is built. At this stage in the process, you will have identified a knowledge gap in your field, and you are aiming to answer a specific question. For example:

If X is prescribed, what happens to Y patients?

or assess and intervention:

How does X affect Y?

or synthesize existing evidence:

What is the nature of X?

Developing a research question takes time. You will likely go through different versions before settling on a final question. Once you've developed your research question, you will use it to create a search strategy.

Frameworks help to break your question into parts so you can clearly see the elements of your topic. Depending on your field of study, the frameworks listed in this guide may not fit the types of questions you're asking. There are dozens of frameworks you can use to formulate your specific and answerable research question. To see other frameworks you might use, visit the  University of Maryland's Systematic Review guide.

The most common framework for systematic reviews is PICO, which is often used within the health sciences for clinical research, or in education. It is commonly used for quantitative studies.

P: Population

I: Intervention/Exposure

C: Comparison

Example:  In 11-12 year old children (Population), what is the effect of a school-based multi-media learning program  (Intervention) on an increase in real-world problem solving skills compared with analog-only curriculum  (Comparison) within a one-year period (Time)?

Source:  Richardson, W. S., Wilson, M. C., Nishikawa, J., & Hayward, R. S. (1995).  The well-built clinical question: A key to evidence-based decisions .  ACP journal club, 123 (3), A12-A12.

P: Population/problem

I: Phenomenon of Interest

Co: Context

Example:  What are the  experiences  (phenomenon of interest) of  caregivers providing home based care to patients with Alzheimer's disease  (population) in  Australia  (context)?

Source:  Methley, A.M., Campbell, S., Chew-Graham, C.  et al.  PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews.  BMC Health Serv Res   14,  579 (2014). https://doi.org/10.1186/s12913-014-0579-0

________________________________________________________________________

Example: 

Source:  Shaw, R. (2010).  Conducting literature reviews . In M. A. Forester (Ed.),  Doing Qualitative Research in Psychology: A Practical  Guide  (pp. 39-52). London, Sage.

P: Perspective

I: Intervention/Exposure/Interest

C : Comparison

E: Evaluation

Example:  What are the  benefits  (evaluation) of a  doula  (intervention) for  low income mothers  (perspective) in the  developed world  (setting) compared to  no support  (comparison)?

Source:  Booth, A. (2006). Clear and present questions: Formulating questions for evidence based practice.  Library Hi Tech, 24 (3), 355-368.   https://doi.org/10.1108/07378830610692127

________________________________________________________

PI: Phenomenon of Interest

R: Research Type

Example:  What are the  experiences  (evaluation) of  women  (sample) undergoing  IVF treatment  (phenomenon of interest) as assessed?

Design:   questionnaire or survey or interview

Study Type:  qualitative or mixed method

Source:  Cooke, A., Smith, D., & Booth, A. (2012). Beyond PICO: The SPIDER tool for qualitative evidence synthesis.  Qualitative Health Research, 22 (10), 1435-1443.  https://doi.org/10.1177/1049732312452938

Scoping reviews generally have a broader scope that systematic reviews, but it is still helpful to put scoping and mapping reviews within a framework. The Joanna Briggs Institute offers guidance on forming scoping review questions in Chapter 11 of their manual for evidence synthesis . They recommend using the PCC framework:

Example:  What are the trends (concept) in MOOCs (context) that support the interactions of learners with disabilities (population)?

Source:  Peters MDJ, Godfrey C, McInerney P, Munn Z, Tricco AC, Khalil, H. Chapter 11: Scoping Reviews (2020 version). In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis, JBI, 2020. Available from  https://synthesismanual.jbi.global .   https://doi.org/10.46658/JBIMES-20-12

  • MARS Meta-Analysis reporting standards From the American Psychological Association (APA).
  • MECCIR (Methodological Expectations of Campbell Collaboration Intervention Reviews) Links to site to download reporting standards for reviews in the social sciences and education.
  • PRISMA A 27-item checklist. PRISMA guidelines are used primarily by those within the health sciences.
  • PRISMA ScR The PRISMA Scoping Review checklist. Created for the health sciences, but can be used across disciplines.

Librarians can assist you with selecting databases for your systematic review. Each database is different and will require a different search syntax. Some databases have controlled vocabulary and thesauri that you will want to incorporate into your searches. We recommend creating one master search strategy and then translating it for each database. 

To begin browsing databases, visit the A-Z Database List:

  • A-Z Databases A-Z list of databases available through the Northwestern University Libraries.
  • Northwestern Research Guides Created by Northwestern Librarians, research guides are curated lists of databases and resources for each discipline.
  • What is Grey Literature?
  • Why Search Grey Literature?
  • How do I search Grey Literature?
  • Sources for Grey Literature

Grey (or gray) Literature is "A variety of written materials produced by organizations outside of traditional commercial and academic publishing channels, such as annual reports, [theses and dissertations], white papers, or conference proceedings from government agencies, non-governmental organizations, or private companies. Grey literature may be difficult to access because it may not be widely distributed or included in bibliographic databases." 

Your research question and field of study will guide what type of grey literature to include in your systematic review. 

Source: Byrne, D. (2017). Reviewing the literature.  Project Planner . 10.4135/9781526408518.

The purpose of a systematic review is to identify and synthesize all available evidence. There is significant bias in scientific publishing toward publishing studies that show some sort of significant effect. In fact, according to Campbell Collaboration Guidelines on Information Retrieval , more than 50% of studies reported in conference abstracts never reach full publication. While conference abstracts and other grey literature is not peer-reviewed, it is important to include all available research on the topic you're studying.

Finding grey literature on your topic may require some creativity, and may involve going directly to the source. Here are a few tips:

  • Find a systematic review on a topic similar to yours and see what grey literature sources they used. You can find existing systematic reviews in subject databases, The Campbell Library, and the Cochrane Library. In databases such as PsycINFO, you can use the Methodology search tool to narrow by Systematic Review or Meta-Analysis; otherwise check the thesaurus for controlled vocabulary or use the keyword search to add ("systematic review" OR meta-analysis OR "scoping review") to your search string.
  • Ask colleagues and other experts in the field for sources of grey literature in your discipline.
  • Contact known researchers in the field to learn if there are any unpublished or ongoing studies to be aware of.
  • On the web, search professional associations, research funders, and government websites.
  • ProQuest Dissertations & Theses Global This link opens in a new window With more than 2 million entries, PQD&T offers comprehensive listings for U.S. doctoral dissertations back to 1861, with extensive coverage of dissertations from many non-U.S. institutions. A number of masters theses are also listed. Thousands of dissertations are available full text, and abstracts are included for dissertations from the mid-1980s forward.
  • Networked Digital Library of Theses and Dissertations (NDLTD) An international organization dedicated to promoting the adoption, creation, use, dissemination, and preservation of electronic theses and dissertations (ETDs).
  • WHO Institutional Repository for Resource Sharing Institutional WHO database of intergovernmental policy documents and technical reports. Can search by IRIS by region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific)
  • OCLC PapersFIrst OCLC index of papers presented at conferences worldwide
  • OSF Preprints Center for Open Science Framework's search tool for scholarly preprints in the fields of architecture, arts, business, social and behavioral science, and more.
  • Directory of Open Access Repositories Global Directory of Open Access Repositories. You can search and browse through thousands of registered repositories based on a range of features, such as location, software or type of material held.
  • Social Science Research Network This link opens in a new window A service providing scholarly research papers, working papers, and journals in numerous social science disciplines. Includes the following: Accounting Research Network, Cognitive Science Network, Economics Research Network, Entrepreneurship Research & Policy Network, Financial Economics Network, Legal Scholarship Network, Management Research Network.

Use the keywords from your research question and begin to create a core keyword search that can then be translated to fit each database search. Since the goal is to be as comprehensive as possible, you will want to identify all terms that may be used for each of the keywords, and use a combination of natural language and controlled vocabulary when available. Librarians are available to assist with search strategy development and keyword review.

Your core keyword search will likely include some or all of the following syntax:

  • Boolean operators (AND, OR, and NOT) 
  • Proximity operators (NEAR or WITHIN)
  • Synonyms, related terms, and alternate spellings
  • Controlled vocabulary (found within the database thesaurus)
  • Truncation (ex: preg* would find pregnant and pregnancy)

Search filters that are built into databases may also be used, but use them with caution. Database articles within the social sciences tend not to be as consistently or thoroughly indexed as those within the health sciences, so using filters could cause you to miss some relevant results.

Source:  Kugley S, Wade A, Thomas J, Mahood Q, Jørgensen AMK, Hammerstrøm K, Sathe N. Searching for studies: A guide to information retrieval for Campbell Systematic Reviews . Campbell Methods Guides 2016:1 DOI: 10.4073/cmg.2016.1

  • Recording Synonyms Worksheet Template you can use when creating lists of search terms.
  • SnowGlobe A program that assists with literature searching, SnowGlobe takes all known relevant papers and searches through their references (which papers they cite) and citations (which papers cite them).

A protocol is a detailed explanation of your research project that should be written before you begin searching. It will likely include your research question, objectives, and search methodology, but information included within a protocol can vary across disciplines. The protocol will act as a map for you and your team, and will be helpful in the future if you or any other researchers want to replicate your search. Protocol development resources and registries:

  • PRISMA-P A checklist of recommended items for inclusion within a systematic review protocol.
  • Evidence Synthesis Protocol Template Developed by Cornell University Library, the protocol template is a useful tool that can be used to begin writing your protocol.
  • Campbell Collaboration: Submit a Proposal The Campbell Collaboration follows MECCIR reporting standards. If you register with Campbell, you are agreeing to publish the completed review with Campbell first. According to the title registration page, "Co-publication with other journals is possible only after discussing with the Campbell Coordinating Group and Editor in Chief." more... less... Disciplines: Business and Management, Crime and Justice, Disability, Education, International Development, Knowledge Translation and Implementation, Methods, Nutrition, and Social Welfare
  • PROSPERO registry "PROSPERO accepts registrations for systematic reviews, rapid reviews and umbrella reviews. PROSPERO does not accept scoping reviews or literature scans." more... less... Disciplines: health sciences and social care
  • Open Science Framework (OSF) registry If your review doesn't fit into one of the major registries, consider using Open Science Framework. OSF can be used to pre-register a systematic review protocol and to share documents such as a Zotero library, search strategies, and data extraction forms. more... less... Disciplines: multidisciplinary

Each database is different and will require a customized search string. We recommend creating one master keyword list and then translating it for each database by using that database's subject terms and search syntax. Below are some tools to assist with translating search strings from one database to the next.

  • Translating Search Strategies Template Created at Cornell University Library
  • Database Syntax Guide (Cochrane) Includes syntax for Cochrane Library, EBSCO, ProQuest, Ovid, and POPLINE.
  • Systematic Review Search Translator The IEBH SR-Accelerator is a suite of tools to speed up steps in the Systematic Review (SR) process.

When conducting a systematic review, you will likely be exporting hundreds or even thousands of citations from databases. Citation management tools are useful for storing, organizing, and managing your citations. They can also perform de-duplication to remove doubles of any citations you may have. The Libraries provide training and support on EndNote, Zotero, and Mendeley. Visit the links below to get started. You may also reach out directly to  [email protected]  with questions or consultation requests.

  • EndNote Support Guide
  • Mendeley Support Guide
  • Zotero Support Guide

During the screening process, you will take all of the articles you exported from your searches and begin to remove studies that are not relevant to your topic. Use the inclusion/exclusion criteria you developed during the protocol-writing stage to screen the title and abstract of the articles you found. Any studies that don't fit the criteria of your review can be deleted. The full text of the remaining studies will need to be screened to confirm that they fit the criteria of your review.

It is highly recommended that two independent reviewers screen all studies, resolving areas of disagreement by consensus or by a third party who is an expert in the field. Listed below are tools that can be used for article screening.

  • Rayyan A tool designed to expedite the screening process for systematic reviews. Create a free account, upload citations, and collaborate with others to screen your articles.
  • Covidence A subscription based systematic review management tool that provides article screening and quality assessment features. Northwestern does not currently have a subscription, so individual/group pricing applies.

Bias refers to factors that can systematically affect the observations and conclusions of the study, causing them to be inaccurate. When compiling studies for systematic reviews, it is best practice to assess the risk of bias for each of the studies included, and then include the assessment in your final manuscript. The Cochrane Handbook recommends presenting the assessment as a table or graph.

In general, scoping reviews don't require a risk of bias assessment, but according to the PRISMA Scoping Review checklist , scoping reviews should include a "critical appraisal of individual sources of evidence." In a final manuscript, a critical appraisal could be an explanation of the limitations of the studies included.

Source: Andrea C. Tricco, Erin Lillie, Wasifa Zarin, et al.  PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation . Ann Intern Med.2018;169:467-473. [Epub ahead of print 4 September 2018]. doi: 10.7326/M18-0850

  • Cochrane Training Presentation: Risk of Bias Simple overview of risk of bias assessment, including examples of how to assess and present your conclusions.
  • Critical Appraisal Skills Programme (CASP) CASP has appraisal checklists designed for use with Systematic Reviews, Randomised Controlled Trials, Cohort Studies, Case Control Studies, Economic Evaluations, Diagnostic Studies, Qualitative studies and Clinical Prediction Rule.
  • JBI Critical Appraisal Tools From the Joanna Briggs Institute: "JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers."

Once you and your team have screened all of the studies to be included in your review, you will need to extract the data from the studies in order to synthesize the results. You can use Excel or Google Forms to code the results. Additional resources below.

  • Covidence: Data Extraction Covidence is a software that manages all aspects of systematic review processes, including data extraction. Northwestern does not currently subscribe to Covidence, so individual subscription rates apply.
  • Data Extraction Form Template (Excel)
  • RevMan Short for "review manager," RevMan is a free software used to manage Cochrane systematic reviews. It can assist with data extraction and analysis, including meta-analysis.
  • SR Toolbox "a web-based catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process."
  • Systematic Review Data Repository "The Systematic Review Data Repository (SRDR) is a powerful and easy-to-use tool for the extraction and management of data for systematic review or meta-analysis."
  • A Practical Guide: Data Extraction for Intervention Systematic Reviews' "This guide provides you with insights from the global systematic review community, including definitions, practical advice, links to the Cochrane Handbook, downloadable templates, and real-world examples." -Covidence Free ebook download (must enter information to download the title for free)

In the data synthesis section, you will present the main findings of your evidence synthesis. There are multiple ways you could go about synthesizing the data, and that decision will depend largely on the type of studies you're synthesizing. In any case, it is standard to use the PRISMA flow diagram to map out the number of studies identified, screened, and included in your evidence synthesis project.

Librarians can help write the methods section of your review for publication, to ensure clarity and transparency of the search process. However, we encourage evidence synthesis teams to engage statisticians to carry out their data syntheses.

  • PRISMA Flow Diagram
  • PRISMA Flow Diagram Creator

Meta-Analysis

A quantitative statistical analysis that combines the results of multiple studies. The studies included must all be attempting to answer the same research question and have a similar research design. According to the  Cochrane Handbook,  "meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention."

  • Meta-Analysis Effect Size Calculator "...a web-based effect-size calculator. It is designed to facilitate the computation of effect-sizes for meta-analysis. Four effect-size types can be computed from various input data: the standardized mean difference, the correlation coefficient, the odds-ratio, and the risk-ratio."
  • Meta-Essentials A free tool for meta-analysis that "facilitates the integration and synthesis of effect sizes from different studies. The tool consists of a set of workbooks designed for Microsoft Excel that, based on your input, automatically produces all the required statistics, tables, figures, and more."
  • The metafor Package "a free and open-source add-on for conducting meta-analyses with the statistical software environment R."

Narrative or Descriptive

If you've included studies that are not similar in research design, then a meta-analysis is not possible. You will then use a narrative or descriptive synthesis to describe the results.

  • << Previous: Evidence Synthesis Resources by Discipline
  • Last Updated: Mar 12, 2024 9:37 AM
  • URL: https://libguides.northwestern.edu/evidencesynthesis
  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Evaluating Research – Process, Examples and Methods

Evaluating Research – Process, Examples and Methods

Table of Contents

Evaluating Research

Evaluating Research

Definition:

Evaluating Research refers to the process of assessing the quality, credibility, and relevance of a research study or project. This involves examining the methods, data, and results of the research in order to determine its validity, reliability, and usefulness. Evaluating research can be done by both experts and non-experts in the field, and involves critical thinking, analysis, and interpretation of the research findings.

Research Evaluating Process

The process of evaluating research typically involves the following steps:

Identify the Research Question

The first step in evaluating research is to identify the research question or problem that the study is addressing. This will help you to determine whether the study is relevant to your needs.

Assess the Study Design

The study design refers to the methodology used to conduct the research. You should assess whether the study design is appropriate for the research question and whether it is likely to produce reliable and valid results.

Evaluate the Sample

The sample refers to the group of participants or subjects who are included in the study. You should evaluate whether the sample size is adequate and whether the participants are representative of the population under study.

Review the Data Collection Methods

You should review the data collection methods used in the study to ensure that they are valid and reliable. This includes assessing the measures used to collect data and the procedures used to collect data.

Examine the Statistical Analysis

Statistical analysis refers to the methods used to analyze the data. You should examine whether the statistical analysis is appropriate for the research question and whether it is likely to produce valid and reliable results.

Assess the Conclusions

You should evaluate whether the data support the conclusions drawn from the study and whether they are relevant to the research question.

Consider the Limitations

Finally, you should consider the limitations of the study, including any potential biases or confounding factors that may have influenced the results.

Evaluating Research Methods

Evaluating Research Methods are as follows:

  • Peer review: Peer review is a process where experts in the field review a study before it is published. This helps ensure that the study is accurate, valid, and relevant to the field.
  • Critical appraisal : Critical appraisal involves systematically evaluating a study based on specific criteria. This helps assess the quality of the study and the reliability of the findings.
  • Replication : Replication involves repeating a study to test the validity and reliability of the findings. This can help identify any errors or biases in the original study.
  • Meta-analysis : Meta-analysis is a statistical method that combines the results of multiple studies to provide a more comprehensive understanding of a particular topic. This can help identify patterns or inconsistencies across studies.
  • Consultation with experts : Consulting with experts in the field can provide valuable insights into the quality and relevance of a study. Experts can also help identify potential limitations or biases in the study.
  • Review of funding sources: Examining the funding sources of a study can help identify any potential conflicts of interest or biases that may have influenced the study design or interpretation of results.

Example of Evaluating Research

Example of Evaluating Research sample for students:

Title of the Study: The Effects of Social Media Use on Mental Health among College Students

Sample Size: 500 college students

Sampling Technique : Convenience sampling

  • Sample Size: The sample size of 500 college students is a moderate sample size, which could be considered representative of the college student population. However, it would be more representative if the sample size was larger, or if a random sampling technique was used.
  • Sampling Technique : Convenience sampling is a non-probability sampling technique, which means that the sample may not be representative of the population. This technique may introduce bias into the study since the participants are self-selected and may not be representative of the entire college student population. Therefore, the results of this study may not be generalizable to other populations.
  • Participant Characteristics: The study does not provide any information about the demographic characteristics of the participants, such as age, gender, race, or socioeconomic status. This information is important because social media use and mental health may vary among different demographic groups.
  • Data Collection Method: The study used a self-administered survey to collect data. Self-administered surveys may be subject to response bias and may not accurately reflect participants’ actual behaviors and experiences.
  • Data Analysis: The study used descriptive statistics and regression analysis to analyze the data. Descriptive statistics provide a summary of the data, while regression analysis is used to examine the relationship between two or more variables. However, the study did not provide information about the statistical significance of the results or the effect sizes.

Overall, while the study provides some insights into the relationship between social media use and mental health among college students, the use of a convenience sampling technique and the lack of information about participant characteristics limit the generalizability of the findings. In addition, the use of self-administered surveys may introduce bias into the study, and the lack of information about the statistical significance of the results limits the interpretation of the findings.

Note*: Above mentioned example is just a sample for students. Do not copy and paste directly into your assignment. Kindly do your own research for academic purposes.

Applications of Evaluating Research

Here are some of the applications of evaluating research:

  • Identifying reliable sources : By evaluating research, researchers, students, and other professionals can identify the most reliable sources of information to use in their work. They can determine the quality of research studies, including the methodology, sample size, data analysis, and conclusions.
  • Validating findings: Evaluating research can help to validate findings from previous studies. By examining the methodology and results of a study, researchers can determine if the findings are reliable and if they can be used to inform future research.
  • Identifying knowledge gaps: Evaluating research can also help to identify gaps in current knowledge. By examining the existing literature on a topic, researchers can determine areas where more research is needed, and they can design studies to address these gaps.
  • Improving research quality : Evaluating research can help to improve the quality of future research. By examining the strengths and weaknesses of previous studies, researchers can design better studies and avoid common pitfalls.
  • Informing policy and decision-making : Evaluating research is crucial in informing policy and decision-making in many fields. By examining the evidence base for a particular issue, policymakers can make informed decisions that are supported by the best available evidence.
  • Enhancing education : Evaluating research is essential in enhancing education. Educators can use research findings to improve teaching methods, curriculum development, and student outcomes.

Purpose of Evaluating Research

Here are some of the key purposes of evaluating research:

  • Determine the reliability and validity of research findings : By evaluating research, researchers can determine the quality of the study design, data collection, and analysis. They can determine whether the findings are reliable, valid, and generalizable to other populations.
  • Identify the strengths and weaknesses of research studies: Evaluating research helps to identify the strengths and weaknesses of research studies, including potential biases, confounding factors, and limitations. This information can help researchers to design better studies in the future.
  • Inform evidence-based decision-making: Evaluating research is crucial in informing evidence-based decision-making in many fields, including healthcare, education, and public policy. Policymakers, educators, and clinicians rely on research evidence to make informed decisions.
  • Identify research gaps : By evaluating research, researchers can identify gaps in the existing literature and design studies to address these gaps. This process can help to advance knowledge and improve the quality of research in a particular field.
  • Ensure research ethics and integrity : Evaluating research helps to ensure that research studies are conducted ethically and with integrity. Researchers must adhere to ethical guidelines to protect the welfare and rights of study participants and to maintain the trust of the public.

Characteristics Evaluating Research

Characteristics Evaluating Research are as follows:

  • Research question/hypothesis: A good research question or hypothesis should be clear, concise, and well-defined. It should address a significant problem or issue in the field and be grounded in relevant theory or prior research.
  • Study design: The research design should be appropriate for answering the research question and be clearly described in the study. The study design should also minimize bias and confounding variables.
  • Sampling : The sample should be representative of the population of interest and the sampling method should be appropriate for the research question and study design.
  • Data collection : The data collection methods should be reliable and valid, and the data should be accurately recorded and analyzed.
  • Results : The results should be presented clearly and accurately, and the statistical analysis should be appropriate for the research question and study design.
  • Interpretation of results : The interpretation of the results should be based on the data and not influenced by personal biases or preconceptions.
  • Generalizability: The study findings should be generalizable to the population of interest and relevant to other settings or contexts.
  • Contribution to the field : The study should make a significant contribution to the field and advance our understanding of the research question or issue.

Advantages of Evaluating Research

Evaluating research has several advantages, including:

  • Ensuring accuracy and validity : By evaluating research, we can ensure that the research is accurate, valid, and reliable. This ensures that the findings are trustworthy and can be used to inform decision-making.
  • Identifying gaps in knowledge : Evaluating research can help identify gaps in knowledge and areas where further research is needed. This can guide future research and help build a stronger evidence base.
  • Promoting critical thinking: Evaluating research requires critical thinking skills, which can be applied in other areas of life. By evaluating research, individuals can develop their critical thinking skills and become more discerning consumers of information.
  • Improving the quality of research : Evaluating research can help improve the quality of research by identifying areas where improvements can be made. This can lead to more rigorous research methods and better-quality research.
  • Informing decision-making: By evaluating research, we can make informed decisions based on the evidence. This is particularly important in fields such as medicine and public health, where decisions can have significant consequences.
  • Advancing the field : Evaluating research can help advance the field by identifying new research questions and areas of inquiry. This can lead to the development of new theories and the refinement of existing ones.

Limitations of Evaluating Research

Limitations of Evaluating Research are as follows:

  • Time-consuming: Evaluating research can be time-consuming, particularly if the study is complex or requires specialized knowledge. This can be a barrier for individuals who are not experts in the field or who have limited time.
  • Subjectivity : Evaluating research can be subjective, as different individuals may have different interpretations of the same study. This can lead to inconsistencies in the evaluation process and make it difficult to compare studies.
  • Limited generalizability: The findings of a study may not be generalizable to other populations or contexts. This limits the usefulness of the study and may make it difficult to apply the findings to other settings.
  • Publication bias: Research that does not find significant results may be less likely to be published, which can create a bias in the published literature. This can limit the amount of information available for evaluation.
  • Lack of transparency: Some studies may not provide enough detail about their methods or results, making it difficult to evaluate their quality or validity.
  • Funding bias : Research funded by particular organizations or industries may be biased towards the interests of the funder. This can influence the study design, methods, and interpretation of results.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Research Questions

Research Questions – Types, Examples and Writing...

Step 3, Critically Appraising Evidence: Quantitative Evidence-Systematic Reviews or Meta-Analyses

  • PMID: 35105795
  • DOI: 10.1891/NN-2021-0001

Critical appraisal of the evidence is the third step in the evidence-based practice process. This column, the second in a multipart series to describe the critical appraisal process, focuses on critical appraisal of systematic reviews or meta-analyses of randomized controlled trials.

Keywords: EBP; evidence-based practice; levels of evidence; meta-analysis; quantitative evidence; systematic reviews.

© Copyright 2022 Springer Publishing Company, LLC.

  • Systematic Reviews as Topic*

Banner

School of Nursing

  • What is EBP?
  • Asking Your Question (PICO)

Appraising Evidence with Level and Quality

Evidence levels, the evidence pyramid & other conceptualizations, evidence types, study design, ebp glossaries, how to read a paper, tutorial for eb nursing.

  • Clinical Tools and Databases
  • Other Resources

Finding and evaluating evidence is the second phase in the Johns Hopkins Evidence-Based Practice Model (JHEBP). Evidence hierarchies guide identifying the best evidence for decision-making based on the rigor of the methods used (level) and the execution of the study or reporting (quality). Appraisal begins with identifying the level of evidence and then the quality.  The combination of level and quality determines the overall determination of the strength of the evidence. 

When appraising research, keep the following three criteria in mind:

Quality  Trials that are randomised and double blind, to avoid selection and observer bias, and where we know what happened to most of the subjects in the trial.

Validity Trials that mimic clinical practice, or could be used in clinical practice, and with outcomes that make sense. For instance, in chronic disorders we want long-term, not short-term trials. We are [also] ... interested in outcomes that are large, useful, and statistically very significant (p < 0.01, a 1 in 100 chance of being wrong).

Size Trials (or collections of trials) that have large numbers of patients, to avoid being wrong because of the random play of chance. For instance, to be sure that a number needed to treat (NNT) of 2.5 is really between 2 and 3, we need results from about 500 patients. If that NNT is above 5, we need data from thousands of patients.

These are the criteria on which we should judge evidence. For it to be strong evidence, it has to fulfill the requirements of all three criteria.

Source used: Critical Appraisal. Bandolier.

Cover Art

The JHEBM uses a five-level evidence hierarchy, which includes research and nonresearch evidence. Below you will find a guide to the evidence levels and types. Still, we recommend consulting the Johns Hopkins Evidence-based Practice for Nurses and Healthcare Professionals book above for detailed information about levels and types. If you're unfamiliar with the kinds of evidence noted here, browse the  Types of Evidence box  on this page.

Table: Dang, D., Dearholt, S., Bissett, K., Ascenzi, J., & Whalen, M. (2022). Johns Hopkins evidence-based practice for nurses and healthcare professionals: Model and guidelines. 4th ed. Sigma Theta Tau International

An evidence pyramid  is a visual representation of an evidence hierarchy. As you move up the pyramid, the amount of available evidence on a given topic decreases, but the strength of evidence increases. However, you may not always be able to find the highest level of evidence to answer your question. This illustration is helpful because it also notes the information sources associated with each level.

The systematic review or meta-analysis of randomized controlled trials (RCTs) and evidence-based practice guidelines are considered to be the strongest level of evidence on which to guide practice decisions. (Melnyk, 2004) The weakest level of evidence is the opinion from authorities and/or reports of expert committees.

Systematic reviews, meta-analyses, and critically-appraised topics/articles have all gone through an evaluation process: they have been "filtered". Information that has not been critically appraised is considered "unfiltered". As you move up the pyramid, however, fewer studies are available; it's important to recognize that high levels of evidence may not exist for your clinical question.  If this is the case, you'll need to move down the pyramid if your quest for resources at the top of the pyramid is unsuccessful.

Evidence Pyramid Illustration

Image: https://guides.himmelfarb.gwu.edu/ebm/studytypes

  • Oxford Centre for EBM: Levels of Evidence
  • Essential Evidence Plus: Levels of Evidence

To understand and assess levels of evidence, it's helpful to have an understanding of the basic characteristics of the major evidence types, several of which are defined below. For additional evidence type definitions browse the  Centre for Evidence-Based Medicine Glossary  below.

Systematic Review

The application of strategies that limit bias in the assembly, critical appraisal, and synthesis of all relevant studies on a specific topic. Systematic reviews focus on peer-reviewed publications about a specific health problem and use rigorous, standardized methods for selecting and assessing articles. A systematic review may or may not include a meta-analysis, which is a quantitative summary of the results.

Randomized Controlled Trial

An experiment in which subjects in a population are randomly allocated into groups, usually called study and control groups, to receive or not receive an experimental preventive or therapeutic procedure, maneuver, or intervention. The results are assessed by rigorous comparison of rates of disease, death, recovery, or other appropriate outcomes in the study and control groups.

Cohort Studies

Cohort studies identify a group of patients who are already taking a particular treatment or have an exposure, follow them forward over time, and then compare their outcomes with a similar group that has not been affected by the treatment or exposure being studied. Cohort studies are observational and not as reliable as randomized controlled studies since the two groups may differ in ways other than in the variable under study.

Case-Control Studies

Case-control studies are studies in which patients who already have a specific condition are compared with people who do not have the condition. The researcher looks back to identify factors or exposures that might be associated with the illness. They often rely on medical records and patient recall for data collection. These types of studies are often less reliable than randomized controlled trials and cohort studies because showing a statistical relationship does not mean that one factor necessarily caused the other.

Cross-Sectional Studies

Describe the relationship between diseases and other factors at one point in time in a defined population. Cross-sectional studies lack any information on the timing of exposure and outcome relationships and include only prevalent cases. They are often used for comparing diagnostic tests. Studies that show the efficacy of a diagnostic test are also called prospective, blind comparisons to a gold standard study. This is a controlled trial that looks at patients with varying degrees of an illness and administers both diagnostic tests — the test under investigation and the “gold standard” test — to all of the patients in the study group. The sensitivity and specificity of the new test are compared to that of the gold standard to determine potential usefulness.

Case Series and Case Reports

Case series and Case reports consist of collections of reports on the treatment of individual patients or a report on a single patient. Because they are reports of cases and use no control groups to compare outcomes, they have little statistical validity.

Definitions adapted from: https://www.cebm.ox.ac.uk/resources/ebm-tools/glossary

Different types of clinical questions are best answered by different types of research studies.  You might not always find the highest level of evidence to answer your question. When this happens, work your way down to the next highest level of evidence.

Suggested study designs best suited to answer each type of clinical question:

Consult these resources to understand the language of evidence-based practice and terms used in clinical research.

  • National Institute for Health and Care Research (NIHR) UK Glossary
  • Cochrane Glossary of Terms
  • Agency for Healthcare Research and Quality (AHRQ) - Consumer and Patient Page

Dr. Trisha Greenhalgh's clearly written papers (full-text links to each below) discuss how to critically appraise the medical literature. These articles appeared originally in the British Journal of Medicine and were later produced as a book:   How to Read a Paper.  How to Read a Paper: The Basics of Evidence-based Medicine   is also available as an eBook from OHSU Library below.

How to Read a Paper: Assessing the Methodological Quality

Deciding What the Paper is About

Diagnostic or Screening Tests

Drug Trials

Qualitative Research

Statistics for the Non-Statistician I

Statistics for the Non-Statistician II

Systematic Reviews and Meta-Analyses

What Things Cost

Cover Art

  • An Evidence-Based Medicine Tutorial from the Information Services Department of the Library of the Health Sciences-Chicago, University of Illinois at Chicago.
  • << Previous: CINAHL
  • Next: Citation >>
  • Last Updated: Apr 15, 2024 9:34 AM
  • URL: https://libguides.ohsu.edu/nursing

Systematic Review and Evidence Synthesis

Acknowledgements.

This guide is directly informed by and selectively reuses, with permission, content from: 

  • Systematic Reviews, Scoping Reviews, and other Knowledge Syntheses by Genevieve Gore and Jill Boruff, McGill University (CC-BY-NC-SA)
  • A Guide to Evidence Synthesis , Cornell University Library Evidence Synthesis Service

Primary University of Minnesota Libraries authors are: Meghan Lafferty, Scott Marsalis, & Erin Reardon

Last updated: September 2022

Types of evidence synthesis

There are many types of evidence synthesis, and it is important to choose the right type of synthesis for your research questions. 

Types of evidence synthesis include (but are not limited to):

Systematic Review

Addresses a specific, answerable question of medical, scientific, policy, or management importance.

May be limited to relevant study designs depending on the type of question (e.g., intervention, prognosis, diagnosis).

Compares, critically evaluates, and synthesizes evidence.

Follows an established protocol and methodology.

May or may not include a meta-analysis of findings.

The most commonly referred-to type of evidence synthesis.

Time-intensive; can take months or longer than a year to complete.

At the top of the evidence pyramid

Meta-Analysis

Statistical technique for combining the findings from multiple quantitative studies.

Uses statistical methods to objectively evaluate, synthesize, and summarize results.

Scoping Review (or Evidence Map)

Addresses the scope of the existing literature on broad, complex, or exploratory research questions.

Many different study designs may be applicable.

Seeks to identify research gaps and opportunities for evidence synthesis.

May critically evaluate existing evidence, but does not attempt to synthesize results like a systematic review would.

Time-intensive, can take months or longer than a year to complete.

Rapid Review

Applies the methodology of a systematic review within a time-constrained setting.

Employs methodological “shortcuts” (e.g., limiting search terms) at the risk of introducing bias.

Useful for addressing issues that need a quick decision, such as developing policy recommendations or treatment recommendations for emergent conditions.

Umbrella Review

Reviews other systematic reviews on a topic.

Often attempts to answer a broader question than a systematic review typically would.

Useful when there are competing interventions to consider.

Literature (or “Narrative”) Review

A broad term that reviews with a wide scope and non-standardized methodology.

Search strategies, comprehensiveness, and time range covered may vary and do not follow an established protocol.

What Review Type?

Dr. Andrea Tricco, a leading evidence synthesis methodologist, and her team developed web-based tool to assist in selecting the right review type based on your answers to a brief list of questions. Although the tool assumes a health science topic, other disciplines may find it useful as well.

  • Right Review

Main review types characterized by methods

This table summarizes the main characteristics of the 14 main review types as laid out in the seminal article on the topic. Please note that methodologies may have evolved since this article was written, so it is recommended that you review the more specific information on the following pages. Librarians can also work with you to determine the best review type for your needs.

Reproduced from: Grant, M. J. and Booth, A. (2009), A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information & Libraries Journal , 26: 91-108.  doi:10.1111/j.1471-1842.2009.00848.x  Table 1.

  • << Previous: What is Evidence Synthesis?
  • Next: Evidence Synthesis Resources Across Disciplines >>
  • How M&E can help–its uses
  • Guide to site resources
  • From intuition to evidence: Expanding on existing program knowledge
  • Program-defined outcomes and user-centered value: Measuring value for “push” and “pull” public engagement programs
  • Information categories
  • Information characteristics
  • Planning an M&E system

What type of evaluation evidence do you need?

  • Easy ways and hard ways to collect data
  • Choose the easiest type of data source
  • Minimizing reporter burden
  • University public engagement resources
  • Program evaluation/ Measurement & evaluation
  • Data visualization
  • Crossing the divides, delivering the goods: A checklist for community-campus projects
  • 20 Tips and tricks for community activists
  • 20 Tips and tricks for academics working with community activists
  • Contributors

The intended use of the evaluation information will determine the type of evidence you need.   While research studies that seek to develop generalizable knowledge on effective program models require a high degree of certainty and sophistication (e.g. statistical significance, robust attribution, and causal links based on methods that incorporate control groups), evaluation in support of uses such as program improvement or generating impact information for marketing purposes can draw conclusions from more feasible methods, as long as any areas of uncertainty (including potential sources of bias) are clearly documented.

  • The overall purpose of evaluation is to get useful information .
  • Information is useful when it results in a sufficiently better answer to a question.
  • An answer is sufficiently better when the reduction in uncertainty around the information is enough to make a decision or take action .

The credibility and value of all evaluation information (evidence) can be enhanced by clear communication covering two areas:

  • Explain why the evaluation strategies used were appropriate given available resources (time and money) and program context , including considerations of participant burden and program effectiveness (e.g. giving participants tests of knowledge may discourage participation, which would interfere with the program delivery–and possibly introduce bias)
  • Know what biases might exist as a result of the evaluation process, and therefore what uncertainty might still exist around the program questions.

Below are listed possible evaluation uses , their primary audience, and some thoughts on the level of certainty (i.e. the strength of the evidence) the evaluation needs to provide for each purpose or use.

Program improvement

Audience:  internal

Level of certainty/evidence needed

  • various levels of certainty can provide useful information to shape the program
  • useful information even without statistical significance
  • useful information for internal comparisons (e.g. areas where knowledge gain higher/lower) from reasonably valid questions with similar possible biases (e.g. self report of knowledge level)
  • useful information even with possible nonresponse bias
  • useful information without control group
  • useful information even when it is difficult to establish counterfactuals with confidence

Research/knowledge development

Audience:  external

  • accepted methodologies vary by discipline
  • methods should support high degree of validity
  • often need to establish attribution, causation
  • perhaps more flexible level of certainty in areas where there are big knowledge gaps (i.e. high existing uncertainty)
  • Audience likely will not require high certainty (generally uncritical audience)
  • methods and assumptions should be documented and justified
  • Presentation should not be misleading

Performance management/assessment

Audience:  Internal

  • organizational standards may vary; evidence standards may not be explicit
  • will probably align with information attainable with available resources
  • important to document and justify choices of methods and assumptions including references to feasibility if applicable

External reporting

Audience: External

  • Specified standards may vary; standards may not be explicit
  • May need to have formal evaluation; in some cases may need to be external. If so, the process will generally be led by the external funder.

© 2024 Taking the Measure of Engagement. All Rights Reserved.

Woo Themes

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of sysrev

Systematic review of the use of process evaluations in knowledge translation research

Shannon d. scott.

1 Faculty of Nursing, University of Alberta, Edmonton, Alberta Canada

Thomas Rotter

2 School of Nursing, Queen’s University, Kingston, Ontario Canada

Rachel Flynn

Hannah m. brooks, tabatha plesuk, katherine h. bannar-martin, thane chambers.

3 University of Alberta Libraries, Edmonton, Alberta Canada

Lisa Hartling

4 Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada

Associated Data

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Experimental designs for evaluating knowledge translation (KT) interventions can provide strong estimates of effectiveness but offer limited insight into how the intervention worked. Consequently, process evaluations have been used to explore the causal mechanisms at work; however, there are limited standards to guide this work. This study synthesizes current evidence of KT process evaluations to provide future methodological recommendations.

Peer-reviewed search strategies were developed by a health research librarian. Studies had to be in English, published since 1996, and were not excluded based on design. Studies had to (1) be a process evaluation of a KT intervention study in primary health, (2) be a primary research study, and (3) include a licensed healthcare professional delivering or receiving the intervention. A two-step, two-person hybrid screening approach was used for study inclusion with inter-rater reliability ranging from 94 to 95%. Data on study design, data collection, theoretical influences, and approaches used to evaluate the KT intervention, analysis, and outcomes were extracted by two reviewers. Methodological quality was assessed with the Mixed Methods Appraisal Tool (MMAT).

Of the 20,968 articles screened, 226 studies fit our inclusion criteria. The majority of process evaluations used qualitative forms of data collection (43.4%) and individual interviews as the predominant data collection method. 72.1% of studies evaluated barriers and/or facilitators to implementation. 59.7% of process evaluations were stand-alone evaluations. The timing of data collection varied widely with post-intervention data collection being the most frequent (46.0%). Only 38.1% of the studies were informed by theory. Furthermore, 38.9% of studies had MMAT scores of 50 or less indicating poor methodological quality.

Conclusions

There is widespread acceptance that the generalizability of quantitative trials of KT interventions would be significantly enhanced through complementary process evaluations. However, this systematic review found that process evaluations are of mixed quality and lack theoretical guidance. Most process evaluation data collection occurred post-intervention undermining the ability to evaluate the process of implementation. Strong science and methodological guidance is needed to underpin and guide the design and execution of process evaluations in KT science.

Registration

This study is not registered with PROSPERO.

The implementation of research into healthcare practice is complex [ 1 ], with multiple levels to consider such as the patient, healthcare provider, multidisciplinary team, healthcare institution, and local and national healthcare systems. The implementation of evidence-based treatments to achieve healthcare system improvement that is robust, efficient, and sustainable is crucially important. However, it is well established that improving the availability of research is not enough for successful implementation [ 2 ]; rather, active knowledge translation (KT) interventions are essential to facilitate the implementation of research to practice. Determining the success of KT interventions and the implementation process itself relies on evaluation studies.

In the KT field, experimental designs such as randomized trials, cluster randomized trials, and stepped wedge designs are widely used for evaluating the effectiveness of KT interventions. Rigorous experimental designs can provide strong estimates of KT intervention effectiveness, but offer limited insight into how the intervention worked or not [ 1 ] as well as how KT interventions are mediated by different facilitators and barriers and how they lead to implementation or not [ 3 – 5 ]. KT interventions contain several interacting components, such as the degree of flexibility or tailoring of the intervention, the number of interacting components within the interventions, and the number and difficulty of behaviors required by those delivering or receiving the intervention [ 3 ]. This complexity makes it particularly challenging to evaluate KT intervention effectiveness [ 3 – 5 ]. The effectiveness of KT interventions is a result of the interactions between many factors such as context and mechanisms of change. A lack of intervention effect may be due to implementation failure rather than the ineffectiveness of the intervention itself. KT interventions pose methodological challenges and require augmentations to the standard experimental designs [ 6 ] to understand how they do or do not work.

As a result of these limitations, researchers have started to conduct process evaluations alongside experimental designs for evaluating KT interventions. The broad purpose of a process evaluation is to explore aspects of the implementation process [ 7 ]. Process evaluations can be used to assess the fidelity, dose, adaptation, reach, and quality of implementation [ 8 , 9 ] and to identify the causal mechanisms [ 10 , 11 ], mechanisms of impact [ 12 ], and contextual factors associated with variation in outcomes across sites [ 6 , 13 ]. Furthermore, process evaluations can assist in interpreting the outcome results [ 7 ], the barriers and facilitators to implementation [ 14 , 15 ] and sustainability [ 16 ], as well as examining the participants’ views [ 17 ] and understandings of components of the intervention [ 18 , 19 ]. Process evaluations are vital in identifying the success or failure of implementation, which is critical in understanding intervention effectiveness.

Notwithstanding the work of Moore and colleagues [ 12 ], there have been scant methodological recommendations to guide KT process evaluations. This deficit has made designing process evaluations in KT research challenging and has hindered the potential for meaningful comparisons across process evaluation studies. In 2000, the Medical Research Council released an evaluation framework for designing and evaluating complex interventions; this report was later revised in 2008 [ 4 , 20 ]. Of note, earlier guidance for evaluating complex interventions focused exclusively on randomized designs with no mention of process evaluations. The revisions mentioned process evaluations and the role that they can have with complex interventions, yet did not provide specific recommendations for evaluation designs, data collection types, time points, and standardized evaluation approaches for complex interventions. This level of specificity is imperative for research comparisons across KT intervention process evaluations and to understand how change is mediated by specific factors.

Recently, the Medical Research Council has commissioned an update of this guidance to be published in 2019 [ 21 , 22 ]. The update re-emphasizes some of the previous messages related to complex intervention development and evaluation; however, it provides a more flexible and less linear model of the process with added emphasis to development, implementation, and evaluation phases as well as providing a variety of successful case examples that employ a range of methods (from natural experiments to clinical trials). Early reports of the update to the MRC framework highlight the importance of process and economic evaluations as good investments and a move away from experimental methods as the only or best option for evaluation.

In 2013, a framework for process evaluations for cluster-randomized trials of complex interventions was proposed by Grant and colleagues [ 20 ]; however, these recommendations were not based upon a comprehensive, systematic review of all approaches used by others. One study found that only 30% of the randomized controlled trails had associated qualitative investigations [ 23 ]. Moreover, a large proportion of those qualitative evaluations were completed before the trial, with smaller numbers of qualitative evaluations completed during the trial or following it. Given the limitations of the process evaluation work to date, it is critical to systematically review all existing process evaluations of KT outcome assessment. Doing so will aid in the development of rigorous methodological guidance for process evaluation research of KT interventions moving forward.

The aim of our systematic review is to synthesize the existing evidence on process evaluation studies assessing KT interventions. The purpose of our review is to make explicit the current state of methodological guidance for process evaluation research with the aim of providing recommendations for multiple end-user groups. This knowledge is critically important for healthcare providers, health quality consultants, decision and policy makers, non-governmental organizations, governmental departments, and health services researchers to evaluate the effectiveness of their KT efforts in order to ensure scarce healthcare resources are effectively utilized and enhanced knowledge is properly generalized to benefit others.

Objectives and key questions

As per our study protocol [ 24 ] available openly via 10.1186/2046-4053-3-149, the objectives for this systematic review were to (1) systematically locate, assess, and report on published studies in healthcare that are a stand-alone process evaluation of a KT intervention or have a process evaluation component, and (2) offer guidance for researchers in terms of the development and design of process evaluations of KT interventions. The key research question guiding this systematic review was: what is the “state-of-the-science” of separate (stand-alone) or integrated process evaluations conducted alongside KT intervention studies?

Search strategy

This systematic review followed a comprehensive methodology using rigorous guidelines to synthesize diverse forms of research evidence [ 25 ], as outlined in our published protocol [ 24 ]. A peer-reviewed literature search was conducted by a health research librarian of English language articles published between 1996 and 2018 in six databases (Ovid MEDLINE/Ovid MEDLINE (R) In-Process & Other Non-Indexed Citations, Ovid EMBASE, Ovid PsycINFO, EBSCOhost CINAHL, ISI Web of Science, and ProQuest Dissertations and Theses). Full search details can be found in Additional file  1 . See Additional file  2 for the completed PRISMA checklist.

Inclusion/exclusion criteria

Studies were not excluded based upon research design and had to comply with three inclusion criteria (Table  1 ). A two-person hybrid approach was used for screening article titles and abstracts with inter-rater reliability ranging from 94 to 95%. Full-text articles were independently screened by two reviewers, and a two-person hybrid approach was used for data extraction.

Process evaluation systematic review inclusion criteria

1 Health is defined according to the WHO (1946) conceptualization of a state of complete physical and mental well-being and not merely the absence of disease or infirmity, including prevention components and mental health but not “social health”

Quality assessment

The methodological quality of all included studies was assessed using the Mixed Methods Appraisal Tool (MMAT) [ 26 , 27 ] for quantitative, qualitative, and mixed methods research designs. The tool results in a methodological rating of 0, 25, 50, 75, and 100 (with 100 being the highest quality) for each study based on the evaluation of study selection bias, study design, data collection methods, sample size, intervention integrity, and analysis. We adapted the MMAT for multi-method studies (studies where more than one research approach was utilized, but the data were not integrated) by assessing the methods in the study individually and then choosing the lowest quality rating assigned. For studies where the process evaluation was integrated into the study design, the quality of the entire study was assessed.

Data extraction, analysis, and synthesis

Study data were extracted using standardized Excel forms. Only data reported in included studies were extracted. Variables extracted included the following: (1) study design, (2) process evaluation type (integrated vs. separate), (3) process evaluation terms used, (4) timing of data collection (e.g., pre- and post-implementation of intervention), (5) KT intervention type, (6) KT intervention recipient, (7) target behavior, and (8) theory. Studies were grouped and synthesized according to each of the above variables. Evidence tables were created to summarize and describe the studies included in this review.

Theoretical guidance

We extracted and analyzed data on any theoretical guidance that was identified and discussed for the process evaluation stage of the included studies. For the purpose of our systematic review, included studies were stated to be theoretically informed if the process evaluation used theory to (a) assist in the identification of appropriate outcomes, measures, and variables; (b) guide the evaluation of the KT process; and (c) identify potential predictors or mediators, or (d) as a framework for data analysis.

Study design

Of the 20,968 articles screened, 226 full-text articles were included in our review (Fig.  1 ). See Additional file  3 for a full citation list of included studies.

An external file that holds a picture, illustration, etc.
Object name is 13643_2019_1161_Fig1_HTML.jpg

PRISMA flow diagram (Adapted from Moher et al. 2009)

Among these included articles, the following research designs were used: qualitative ( n  = 85, 37.6%), multi-methods ( n  = 55, 24.3%), quantitative descriptive ( n  = 44, 19.5%), mixed methods ( n  = 25, 11.1%), quantitative RCT ( n  = 14, 6.2%), and quantitative non-randomized ( n  = 3, 1.3%). See Table  2 .

Types of research design and associated quality of included studies ( n  = 226)

RCT randomized controlled trial

Process evaluation type and terms

A total of 136 (60.2%) of the included studies were separate (stand-alone) process evaluations, while the process evaluations of the remaining studies ( n  = 90, 39.8%) were integrated into the KT intervention evaluation. Process evaluation research designs included the following: qualitative ( n  = 98, 43.4%), multi-methods ( n  = 56, 24.8%), quantitative descriptive ( n  = 51, 22.6%), and mixed methods ( n  = 21, 9.3%). See Table  3 .

Process evaluation research design of included studies ( n  = 226)

The way in which each of the included studies described the purpose and focus of their process evaluation was synthesized and categorized thematically. Barriers and/or facilitators to implementation was the most widely reported term to describe the purpose and focus of the process evaluation (Table  4 ).

Thematic analysis of process evaluation terms used in included studies ( n  = 226)

*Some studies used multiple terms to describe the process evaluation and its focus

Methods and timing of data collection

Process evaluations had widespread variations in the methods of data collection, with individual interviews ( n  = 123) and surveys or questionnaires ( n  = 100) being the predominant methods (Table  5 ).

Methods of data collection of included studies ( n  = 226)

*Some studies had more than one method of data collection

The majority of process evaluations collected data post-intervention ( n  = 104, 46.0%). The remaining studies collected data pre- and post-intervention ( n  = 40, 17.7%); during and post-intervention ( n  = 29, 12.8%); during intervention ( n  = 25, 11.1%); pre-, during, and post-intervention ( n  = 18, 7.9%); pre- and during intervention ( n  = 5, 2.2%); or pre-intervention ( n  = 3, 1.3%). In 2 studies (0.9%), the timing of data collection was unclear. See Table  6 .

Timing of data collection of included studies ( n  = 226)

Intervention details (type, recipient, and target behavior)

Most of the studies ( n  = 154, 68.1%) identified healthcare professionals (HCPs) as the exclusive KT intervention recipient, while the remaining studies had combined intervention recipients including HCP and others ( n  = 59, 26.1%), and HCP and patients ( n  = 13, 5.8%). Utilizing the Cochrane Effective Practice and Organisation of Care (EPOC) intervention classification schema [ 28 ], 218 (96.5%) studies had professional type interventions, 5 (2.2%) studies had professional type and organizational type interventions, and 3 (1.3%) studies had professional type and financial type interventions. The most common KT intervention target behaviors were “General management of a problem” ( n  = 132), “Clinical prevention services” ( n  = 45), “Patient outcome” ( n  = 35), “Procedures” ( n  = 33), and “Patient education/advice” ( n  = 32). See Table  7 .

Intervention details of included studies ( n  = 226)

*Some studies had multiple targeted behaviors

Of the 226 studies, 38.1% ( n  = 86) were informed by theory (Table  8 ). The most frequently reported theories were as follows: (a) Roger’s Diffusion of Innovation Theory ( n  = 13), (b) Normalization Process Theory ( n  = 10), (c) Promoting Action on Research Implementation in Health Services Framework ( n  = 9), (d) Theory of Planned Behavior ( n  = 9), (e) Plan-Do-Study-Act Framework ( n  = 7), and (f) the Consolidated Framework for Implementation Research ( n  = 6).

Theories used by theory-guided studies ( n  = 86)

*Some studies had multiple theories guiding the process evaluation

The distribution of MMAT scores varied with study design (Table  2 ). The lowest scoring study design was multi-method, with 74.5% ( n  = 41) of multi-method studies scoring 50 or lower. Overall, many of the studies ( n  = 88, 38.9%) had an MMAT score of 50 or lower, with 29 (12.8%) studies scoring 25 and 7 (3.1%) studies scoring 0. Eighty-one studies (35.8%) scored 75, and 57 studies (25.2%) scored 100 (high quality). See Table  9 .

Distribution of MMAT scores (0 = lowest and 100 = highest score)

Our findings provided many insights into the current practices of KT researchers conducting integrated or separate process evaluations, the focus of these process evaluations, the data collection considerations, and the poor methodological quality and a lack of theoretical guidance informing these process evaluations.

The majority of included studies (60.2%) conducted a separate (stand-alone) rather than integrated process evaluation. As Moore and colleagues suggest, there are advantages and disadvantages of either (separated or integrated) approach [ 12 ]. Arguments for separate process evaluations focus on analyzing process data without knowledge of outcome analysis to prevent biasing interpretations of results. Arguments for integration include ensuring implementation data is integrated into outcome analysis and using the process evaluation to identify intermediate outcome data and causal processes while informing the integration of new measures into outcome data collection. Our findings highlight that there is no clear preference for separate or integrated process evaluations. The decision for separation or integration of the process evaluation should be carefully considered by study teams to ensure it is the best option for their study objectives.

Our findings draw attention to a wide variety of terms and foci used within process evaluations. We identified a lack of clear and consistent concepts for process evaluations and their multifaceted components, as well as an absence of standard recommendations on how process evaluations should be developed and conducted. This finding is supported by a literature overview on process evaluations in public health published by Linnan and Steckler in 2002 [ 29 ]. We would encourage researchers to employ terms that are utilized by other researchers to facilitate making meaningful comparisons across studies in the future and to be mindful of comprehensively including the key components of a process evaluation, context, implementation, and mechanisms of impact [ 12 ].

Our findings highlight two important aspects about process evaluation data collection in relation to timing and type of data collected. In terms of data collection timing, almost half of the investigators collected their process evaluation data post-intervention (46%) without any pre-intervention or during intervention data collection. Surprisingly, only 17.7% of the included studies collected data pre- and post-intervention, and only 18 studies collected data pre-, during, and post-intervention. Process evaluations can provide useful information about intervention delivery and if the interventions were delivered as planned (fidelity), the intervention dose, as well as useful information about intervention reach and how the context shaped the implementation process. Our findings suggest a current propensity to collect data after intervention delivery (as compared to before and/or during). It is unclear if our findings are the result of a lack of forethought to employ data collection pre- and during implementation, a lack of resources, or a reliance on data collection approaches post-intervention. This aside, based upon our findings, we recommend that KT researchers planning process evaluations consider data collection earlier in the implementation process to prevent challenges with retrospective data collection and to maximize the potential power of process evaluations. Consideration of key components of process evaluations (context, implementation, and mechanisms of impact) is critically important to prevent inference-observation confusion from an exclusive reliance on outcome evaluations [ 12 ]. An intervention can have positive outcomes even when an intervention was not delivered as intended, as other events or influences can be shaping a context [ 30 ]. Conversely, an intervention may have limited or no effects for a number of reasons that extend beyond the ineffectiveness of the intervention including a weak research design or improper implementation of the intervention [ 31 ]. Implicitly, the process evaluation framework by Moore and colleagues suggests that process evaluation data collection ideally needs to be collected before and throughout the implementation process in order to capture all aspects of implementation [ 12 ].

In terms of data collection type, just over half (54.4%) of the studies utilized qualitative interviews as one form of data collection. Reflecting on the key components of process evaluations (context, implementation, and mechanisms of impact), the frequency of qualitative data collection approaches is lower than anticipated. Qualitative approaches such as interviewing are ideal for uncovering rich and detailed aspects of the implementation context, nuanced participant perspectives on the implementation processes, and the potential mediators to implementation impact. When considering the key components of a process evaluation (context, implementation, and mechanisms of impact), by default, it is suggestive of multi-method work. Consequently, we urge researchers to consider integrating qualitative and quantitative data into their process evaluation study designs to richly capture various perspectives. In addition to individual interviews, surveys, participant observation, focus groups, and document analysis could be used.

A major finding from this systematic review is the lack of methodological rigor in many of the process evaluations. Almost 40% of the studies included in this review had a MMAT score of 50 or less, but the scores varied significantly in terms of study designs used by the investigators. Moreover, the frequency of low MMAT scores for multi-method and mixed method studies suggests a tendency for lower methodological quality which could point to the challenging nature of these research designs [ 32 ] or a lack of reporting guidelines.

Our findings identified a lack of theoretical guidance employed and reported in the included process evaluation studies. It is important to note the role of theory within evaluation is considered contentious by some [ 33 , 34 ], yet conversely, there are increasing calls for the use of theory in the literature. While there is this tension between using or not using theory in evaluations, there are many reported advantages to theory-driven evaluations [ 29 , 33 , 34 ], yet more than 60% of the included studies were not informed by theory. Current research evidence suggests that using theory can help to design studies that increase KT and enable better interpretation and replication of findings of implementation studies [ 35 ]. In alignment with Moore and colleagues, we encourage researchers to consider utilizing theory when designing process evaluations. There is no shortage of KT theories available. Recently, Strifler and colleagues identified 159 KT theories, models, and frameworks in the literature [ 36 ]. In the words of Moore and colleagues who were citing the revised MRC guidance (2008), “an understanding of the causal assumptions underpinning the intervention and use of evaluation to understand how interventions work in practice are vital in building an evidence base that informs policy and practice” [ 9 ].

Limitations

As with all reviews, there is the possibility of incomplete retrieval of identified research; however, this review entailed a comprehensive search of published literature and rigorous review methods. Limitations include the eligibility restrictions (only published studies in the English language were included, for example), and data collection did not extend beyond data reported in included studies.

The current state of the quality of evidence base of process evaluations in KT is weak. Policy makers and funding organizations should call for theory-based multi or mixed method designs with a complimentary process evaluation component. Mixed method designs, with an integrated process evaluation component, would help to inform decision makers about effective process evaluation approaches, and research funding organizations could further promote theory-based designs to guide the development and conduct of implementation studies with a rigorous process evaluation component. Achieving this goal may require well-assembled implementation teams including clinical experts, as well as strong researchers with methodological expertise.

We recommend that future investigators employ rigorous theory-guided multi or mixed method approaches to evaluate the processes of implementation of KT interventions. Our findings highlighted that to date, qualitative study designs in the form of separate (stand-alone) process evaluations are the most frequently reported approaches. The predominant data collection method of using qualitative interviews helps to better understand process evaluations and to answer questions about why the implementation processes work or not, but does not provide an answer about the effectiveness of the implementation processes used. In light of the work of Moore and colleagues [ 12 ], we advocate that future process evaluation investigators should use both qualitative and quantitative methods (mixed methods) with an integrated process evaluation component to evaluate implementation processes in KT research.

We identified the timing of data collection as another methodological weakness in this systematic review. It remains unclear why almost half of the included process evaluation studies collected data only post-implementation. To provide high-certainty evidence for process evaluations, we advocate for the collection of pre-, during, and post-implementation measures and the use of statistical uncertainty measures (e.g., standard deviation, standard error, p values, and confidence intervals). This would allow a rigorous assessment of the implementation processes and sound recommendations supported by statistical measures. The timing of pre-evaluations also helps to address issues before implementation occurs. There is widespread acceptance that the generalizability of quantitative trials of KT interventions would be significantly enhanced through complementary process evaluations. Most data collection occurred post-intervention undermining the ability to evaluate the process of implementation.

Strong science and methodological guidance is needed to underpin and guide the design and execution of process evaluations in KT science. A theory-based approach to inform process evaluations of KT interventions would allow investigators to reach conclusions, not only about the processes by which interventions were implemented and the outcomes they have generated, but also about the reliability of the causal assumptions that link intervention processes and outcomes. Future research is needed that could provide state-of-the-art recommendations on how to design, conduct, and report rigorous process evaluations as part of a theory-based mixed methods evaluation of KT projects. Intervention theory should be used to inform the design of implementation studies to investigate the success or failure of the strategies used. This could lead to more generalizable findings to inform researchers and knowledge users about effective implementation strategies.

Supplementary information

Acknowledgements.

We would like to thank CIHR for providing the funding for the systematic review. We would like to thank our Knowledge User Advisory Panel members for providing guidance and feedback, including Dr. Thomas Rotter, Brenda VandenBeld, Lisa Halma, Christine Jensen-Ross, Gayle Knapik, and Klaas VandenBeld. We would lastly like to acknowledge the contributions of Xuan Wu in data analysis.

Abbreviations

Authors’ contributions.

SDS conceptualized and designed the study and secured the study funding from CIHR. She led all aspects of the study process. TC conducted the search. TR, KHBM, RF, TP, and HMB contributed to the data collection. KHBM, RF, TP, and HMB contributed to the data analysis. TR and LH contributed to the data interpretation. All authors contributed to the manuscript drafts and reviewed the final manuscript. All authors read and approved the final manuscript.

Authors’ information

SDS holds a Canada Research Chair for Knowledge Translation in Child Health. LH holds a Canada Research Chair in Knowledge Synthesis and Translation.

Canadian Institutes of Health Research (CIHR) Knowledge Synthesis Grant #305365.

Availability of data and materials

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Shannon D. Scott, Email: [email protected] .

Thomas Rotter, Email: [email protected] .

Rachel Flynn, Email: ac.atreblau@nnylfmr .

Hannah M. Brooks, Email: [email protected] .

Tabatha Plesuk, Email: ac.atreblau@kuselp .

Katherine H. Bannar-Martin, Email: moc.liamg@mrannabk .

Thane Chambers, Email: ac.atreblau@enaht .

Lisa Hartling, Email: [email protected] .

Supplementary information accompanies this paper at 10.1186/s13643-019-1161-y.

IMAGES

  1. What is evidence?

    research evidence is best evaluated using which type of process

  2. Hierarchy of Scientific Evidence

    research evidence is best evaluated using which type of process

  3. The Essential (Oxford Review) Guide to Evidence-Based Practice

    research evidence is best evaluated using which type of process

  4. Types of Study Designs in Health Research: The Evidence Hierarchy

    research evidence is best evaluated using which type of process

  5. Levels of evidence in research

    research evidence is best evaluated using which type of process

  6. 1 Shows the quality of evidence from various types of research papers

    research evidence is best evaluated using which type of process

VIDEO

  1. Metho 4: Good Research Qualities / Research Process / Research Methods Vs Research Methodology

  2. What Have We Learned? Improving Development Policy Through Impact Evaluation

  3. NIH Peer Review Process

  4. Systematic Reviews In Research Universe

  5. Understanding Research Processes and Practices

  6. OPEN TYPE PROCESS CONTROL EQUIPMENT FBM242 FBM207B FBM207 P0917XX P0916AL P0916JL |Super-Tech Marine

COMMENTS

  1. 5 Steps of EBP

    5 Steps of EBP. Ask: Convert the need for information into an answerable question. Find: Track down the best evidence with which to answer that question. Appraise: Critically appraise that evidence for its validity and applicability. Apply: Integrate the critical appraisal with clinical expertise and with the patient's unique biology, values ...

  2. The Evidence for Evidence-Based Practice Implementation

    Models of Evidence-Based Practice. Multiple models of EBP are available and have been used in a variety of clinical settings. 16-36 Although review of these models is beyond the scope of this chapter, common elements of these models are selecting a practice topic (e.g., discharge instructions for individuals with heart failure), critique and syntheses of evidence, implementation, evaluation ...

  3. Step 3 of EBP: Part 1—Evaluating Research Designs

    Step 3 of the EBP process involves evaluating the quality and client relevance of research results you have located to inform treatment planning. While some useful clinical resources include careful appraisals of research quality, clinicians must critically evaluate the content both included in these summaries and what is excluded or omitted ...

  4. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  5. Evidence-Based Practice for Nursing: Evaluating the Evidence

    The pyramid below represents the hierarchy of evidence, which illustrates the strength of study types; the higher the study type on the pyramid, the more likely it is that the research is valid. The pyramid is meant to assist researchers in prioritizing studies they have located to answer a clinical or practice question.

  6. An overview of methodological approaches in systematic reviews

    Included SRs evaluated 24 unique methodological approaches used for defining the review scope and eligibility, literature search, screening, data extraction, and quality appraisal in the SR process. Limited evidence supports the following (a) searching multiple resources (electronic databases, handsearching, and reference lists) to identify ...

  7. EBP Process

    Step 1: Frame Your Clinical Question. Step 2: Gather Evidence. Step 3: Assess the Evidence. Step 4: Make Your Clinical Decision. Now that you've identified evidence to address your client's problem or situation, the next step in the EBP process is to assess the internal and external evidence. When assessing the evidence, keep in mind that ...

  8. 6 Evaluating Evidence

    The types of evidence that are used in local decision making, including the policy process, extend beyond research to encompass politics, economics, stakeholder ideas and interests, and general knowledge and information (see Chapter 3), and the decision maker needs to take a practical approach to incorporating this evidence into real-life ...

  9. Evidence-Based Research Series-Paper 1: What Evidence-Based Research is

    Evidence-based research is the use of prior research in a systematic and transparent way to inform a new study so that it is answering questions that matter in a valid, efficient, and accessible manner. Results: We describe evidence-based research and provide an overview of the approach of systematically and transparently using previous ...

  10. Methodology I: The Best Available Evidence

    This chapter establishes some of the foundational concepts principal of the field, namely, the pursuit of the best available evidence for translational effectiveness as a science that follows the scientific strategy. The process commences and is driven by a research question ( i.e ., PICO [TS]) that emerges from the patient-clinician encounter.

  11. Appraising the Evidence

    Below is a table of types of clinical questions and the suggested research design to answer that question, in the order of highest level of evidence to lowest. For example, if your PICO question is about a therapy or intervention to treat a condition, a systematic review or meta-analysis would be the best level of evidence, but if that doesn ...

  12. Evidence-Based Research: Levels of Evidence Pyramid

    One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels. Filtered resources: pre-evaluated in some way. systematic reviews. critically-appraised topics. critically-appraised individual articles.

  13. Research Guides: Evidence Synthesis: Steps in a Review

    Evidence Synthesis. 1. Develop a Research Question and Apply a Framework. Overview. Quantitative Studies (PICO) Qualitative Studies (PICo, CHIP) Mixed Methods (SPICE, SPIDER) Scoping Reviews (PCC) Formulating a research question is key to a systematic review.

  14. Evaluating Research

    By examining the evidence base for a particular issue, policymakers can make informed decisions that are supported by the best available evidence. Enhancing education: Evaluating research is essential in enhancing education. Educators can use research findings to improve teaching methods, curriculum development, and student outcomes.

  15. Step 3, Critically Appraising Evidence: Quantitative Evidence ...

    Abstract. Critical appraisal of the evidence is the third step in the evidence-based practice process. This column, the second in a multipart series to describe the critical appraisal process, focuses on critical appraisal of systematic reviews or meta-analyses of randomized controlled trials. Keywords: EBP; evidence-based practice; levels of ...

  16. Evidence Appraisal

    Evidence hierarchies guide identifying the best evidence for decision-making based on the rigor of the methods used (level) and the execution of the study or reporting (quality). Appraisal begins with identifying the level of evidence and then the quality. The combination of level and quality determines the overall determination of the strength ...

  17. PDF Understanding Evidence

    Understanding Evidence. While the Best Available Research Evidence is important and the focus of this document, it is not the only standard of evidence that is essential in violence prevention work. These three facets of evidence, while distinct, also overlap and are important and necessary aspects of making evidence-based decisions.

  18. Research Guides: Systematic Review and Evidence Synthesis: Types of

    The most commonly referred-to type of evidence synthesis. Time-intensive; can take months or longer than a year to complete. At the top of the evidence pyramid . Meta-Analysis. Statistical technique for combining the findings from multiple quantitative studies. Uses statistical methods to objectively evaluate, synthesize, and summarize results.

  19. What type of evaluation evidence do you need?

    The intended use of the evaluation information will determine the type of evidence you need. While research studies that seek to develop generalizable knowledge on effective program models require a high degree of certainty and sophistication (e.g. statistical significance, robust attribution, and causal links based on methods that incorporate control groups), evaluation in support of uses ...

  20. A scoping review on quality assessment tools used in systematic reviews

    Introduction. Systematic Reviews (SRs), evidence-based medicine, and clinical guidelines bring together trustworthy information by systematically acquiring, analysing, and transferring research findings into clinical, management, and policy arenas [].As such, findings of different work in medical literature on related topics are evaluated using SRs and meta-analyses (MAs), through the ...

  21. EBP 5.01, 5.06, 5.08, 5.10, 6.03 Pre and Post Test Practice

    A) The Iowa Model focuses on institutional EBP efforts. B) The Stetler Model is based on the nursing process. C) The Iowa Model uses a four-step system to apply evidence. D) The Stetler Model focuses on the use of research findings in ambulatory care. The Iowa Model focuses on institutional EBP efforts.

  22. EBP

    11 reasons nursing needs to develop an evidence-based practice. 1. Improves quality of care. 2. Improves patient outcomes such as decreased signs and symptoms, improved functional status, improved physical and psychological health, prevention of illnesses, and increased promotion of health through implementation of healthy lifestyles. 3.

  23. Systematic review of the use of process evaluations in knowledge

    Results. Of the 20,968 articles screened, 226 studies fit our inclusion criteria. The majority of process evaluations used qualitative forms of data collection (43.4%) and individual interviews as the predominant data collection method. 72.1% of studies evaluated barriers and/or facilitators to implementation. 59.7% of process evaluations were stand-alone evaluations.