• Privacy Policy

Research Method

Home » Research Recommendations – Examples and Writing Guide

Research Recommendations – Examples and Writing Guide

Table of Contents

Research Recommendations

Research Recommendations

Definition:

Research recommendations refer to suggestions or advice given to someone who is looking to conduct research on a specific topic or area. These recommendations may include suggestions for research methods, data collection techniques, sources of information, and other factors that can help to ensure that the research is conducted in a rigorous and effective manner. Research recommendations may be provided by experts in the field, such as professors, researchers, or consultants, and are intended to help guide the researcher towards the most appropriate and effective approach to their research project.

Parts of Research Recommendations

Research recommendations can vary depending on the specific project or area of research, but typically they will include some or all of the following parts:

  • Research question or objective : This is the overarching goal or purpose of the research project.
  • Research methods : This includes the specific techniques and strategies that will be used to collect and analyze data. The methods will depend on the research question and the type of data being collected.
  • Data collection: This refers to the process of gathering information or data that will be used to answer the research question. This can involve a range of different methods, including surveys, interviews, observations, or experiments.
  • Data analysis : This involves the process of examining and interpreting the data that has been collected. This can involve statistical analysis, qualitative analysis, or a combination of both.
  • Results and conclusions: This section summarizes the findings of the research and presents any conclusions or recommendations based on those findings.
  • Limitations and future research: This section discusses any limitations of the study and suggests areas for future research that could build on the findings of the current project.

How to Write Research Recommendations

Writing research recommendations involves providing specific suggestions or advice to a researcher on how to conduct their study. Here are some steps to consider when writing research recommendations:

  • Understand the research question: Before writing research recommendations, it is important to have a clear understanding of the research question and the objectives of the study. This will help to ensure that the recommendations are relevant and appropriate.
  • Consider the research methods: Consider the most appropriate research methods that could be used to collect and analyze data that will address the research question. Identify the strengths and weaknesses of the different methods and how they might apply to the specific research question.
  • Provide specific recommendations: Provide specific and actionable recommendations that the researcher can implement in their study. This can include recommendations related to sample size, data collection techniques, research instruments, data analysis methods, or other relevant factors.
  • Justify recommendations : Justify why each recommendation is being made and how it will help to address the research question or objective. It is important to provide a clear rationale for each recommendation to help the researcher understand why it is important.
  • Consider limitations and ethical considerations : Consider any limitations or potential ethical considerations that may arise in conducting the research. Provide recommendations for addressing these issues or mitigating their impact.
  • Summarize recommendations: Provide a summary of the recommendations at the end of the report or document, highlighting the most important points and emphasizing how the recommendations will contribute to the overall success of the research project.

Example of Research Recommendations

Example of Research Recommendations sample for students:

  • Further investigate the effects of X on Y by conducting a larger-scale randomized controlled trial with a diverse population.
  • Explore the relationship between A and B by conducting qualitative interviews with individuals who have experience with both.
  • Investigate the long-term effects of intervention C by conducting a follow-up study with participants one year after completion.
  • Examine the effectiveness of intervention D in a real-world setting by conducting a field study in a naturalistic environment.
  • Compare and contrast the results of this study with those of previous research on the same topic to identify any discrepancies or inconsistencies in the findings.
  • Expand upon the limitations of this study by addressing potential confounding variables and conducting further analyses to control for them.
  • Investigate the relationship between E and F by conducting a meta-analysis of existing literature on the topic.
  • Explore the potential moderating effects of variable G on the relationship between H and I by conducting subgroup analyses.
  • Identify potential areas for future research based on the gaps in current literature and the findings of this study.
  • Conduct a replication study to validate the results of this study and further establish the generalizability of the findings.

Applications of Research Recommendations

Research recommendations are important as they provide guidance on how to improve or solve a problem. The applications of research recommendations are numerous and can be used in various fields. Some of the applications of research recommendations include:

  • Policy-making: Research recommendations can be used to develop policies that address specific issues. For example, recommendations from research on climate change can be used to develop policies that reduce carbon emissions and promote sustainability.
  • Program development: Research recommendations can guide the development of programs that address specific issues. For example, recommendations from research on education can be used to develop programs that improve student achievement.
  • Product development : Research recommendations can guide the development of products that meet specific needs. For example, recommendations from research on consumer behavior can be used to develop products that appeal to consumers.
  • Marketing strategies: Research recommendations can be used to develop effective marketing strategies. For example, recommendations from research on target audiences can be used to develop marketing strategies that effectively reach specific demographic groups.
  • Medical practice : Research recommendations can guide medical practitioners in providing the best possible care to patients. For example, recommendations from research on treatments for specific conditions can be used to improve patient outcomes.
  • Scientific research: Research recommendations can guide future research in a specific field. For example, recommendations from research on a specific disease can be used to guide future research on treatments and cures for that disease.

Purpose of Research Recommendations

The purpose of research recommendations is to provide guidance on how to improve or solve a problem based on the findings of research. Research recommendations are typically made at the end of a research study and are based on the conclusions drawn from the research data. The purpose of research recommendations is to provide actionable advice to individuals or organizations that can help them make informed decisions, develop effective strategies, or implement changes that address the issues identified in the research.

The main purpose of research recommendations is to facilitate the transfer of knowledge from researchers to practitioners, policymakers, or other stakeholders who can benefit from the research findings. Recommendations can help bridge the gap between research and practice by providing specific actions that can be taken based on the research results. By providing clear and actionable recommendations, researchers can help ensure that their findings are put into practice, leading to improvements in various fields, such as healthcare, education, business, and public policy.

Characteristics of Research Recommendations

Research recommendations are a key component of research studies and are intended to provide practical guidance on how to apply research findings to real-world problems. The following are some of the key characteristics of research recommendations:

  • Actionable : Research recommendations should be specific and actionable, providing clear guidance on what actions should be taken to address the problem identified in the research.
  • Evidence-based: Research recommendations should be based on the findings of the research study, supported by the data collected and analyzed.
  • Contextual: Research recommendations should be tailored to the specific context in which they will be implemented, taking into account the unique circumstances and constraints of the situation.
  • Feasible : Research recommendations should be realistic and feasible, taking into account the available resources, time constraints, and other factors that may impact their implementation.
  • Prioritized: Research recommendations should be prioritized based on their potential impact and feasibility, with the most important recommendations given the highest priority.
  • Communicated effectively: Research recommendations should be communicated clearly and effectively, using language that is understandable to the target audience.
  • Evaluated : Research recommendations should be evaluated to determine their effectiveness in addressing the problem identified in the research, and to identify opportunities for improvement.

Advantages of Research Recommendations

Research recommendations have several advantages, including:

  • Providing practical guidance: Research recommendations provide practical guidance on how to apply research findings to real-world problems, helping to bridge the gap between research and practice.
  • Improving decision-making: Research recommendations help decision-makers make informed decisions based on the findings of research, leading to better outcomes and improved performance.
  • Enhancing accountability : Research recommendations can help enhance accountability by providing clear guidance on what actions should be taken, and by providing a basis for evaluating progress and outcomes.
  • Informing policy development : Research recommendations can inform the development of policies that are evidence-based and tailored to the specific needs of a given situation.
  • Enhancing knowledge transfer: Research recommendations help facilitate the transfer of knowledge from researchers to practitioners, policymakers, or other stakeholders who can benefit from the research findings.
  • Encouraging further research : Research recommendations can help identify gaps in knowledge and areas for further research, encouraging continued exploration and discovery.
  • Promoting innovation: Research recommendations can help identify innovative solutions to complex problems, leading to new ideas and approaches.

Limitations of Research Recommendations

While research recommendations have several advantages, there are also some limitations to consider. These limitations include:

  • Context-specific: Research recommendations may be context-specific and may not be applicable in all situations. Recommendations developed in one context may not be suitable for another context, requiring adaptation or modification.
  • I mplementation challenges: Implementation of research recommendations may face challenges, such as lack of resources, resistance to change, or lack of buy-in from stakeholders.
  • Limited scope: Research recommendations may be limited in scope, focusing only on a specific issue or aspect of a problem, while other important factors may be overlooked.
  • Uncertainty : Research recommendations may be uncertain, particularly when the research findings are inconclusive or when the recommendations are based on limited data.
  • Bias : Research recommendations may be influenced by researcher bias or conflicts of interest, leading to recommendations that are not in the best interests of stakeholders.
  • Timing : Research recommendations may be time-sensitive, requiring timely action to be effective. Delayed action may result in missed opportunities or reduced effectiveness.
  • Lack of evaluation: Research recommendations may not be evaluated to determine their effectiveness or impact, making it difficult to assess whether they are successful or not.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Evaluating Research

Evaluating Research – Process, Examples and...

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Dissertation
  • How to Write Recommendations in Research | Examples & Tips

How to Write Recommendations in Research | Examples & Tips

Published on 15 September 2022 by Tegan George .

Recommendations in research are a crucial component of your discussion section and the conclusion of your thesis , dissertation , or research paper .

As you conduct your research and analyse the data you collected , perhaps there are ideas or results that don’t quite fit the scope of your research topic . Or, maybe your results suggest that there are further implications of your results or the causal relationships between previously-studied variables than covered in extant research.

Instantly correct all language mistakes in your text

Be assured that you'll submit flawless writing. Upload your document to correct all your mistakes.

upload-your-document-ai-proofreader

Table of contents

What should recommendations look like, building your research recommendation, how should your recommendations be written, recommendation in research example, frequently asked questions about recommendations.

Recommendations for future research should be:

  • Concrete and specific
  • Supported with a clear rationale
  • Directly connected to your research

Overall, strive to highlight ways other researchers can reproduce or replicate your results to draw further conclusions, and suggest different directions that future research can take, if applicable.

Relatedly, when making these recommendations, avoid:

  • Undermining your own work, but rather offer suggestions on how future studies can build upon it
  • Suggesting recommendations actually needed to complete your argument, but rather ensure that your research stands alone on its own merits
  • Using recommendations as a place for self-criticism, but rather as a natural extension point for your work

Prevent plagiarism, run a free check.

There are many different ways to frame recommendations, but the easiest is perhaps to follow the formula of research question   conclusion  recommendation. Here’s an example.

Conclusion An important condition for controlling many social skills is mastering language. If children have a better command of language, they can express themselves better and are better able to understand their peers. Opportunities to practice social skills are thus dependent on the development of language skills.

As a rule of thumb, try to limit yourself to only the most relevant future recommendations: ones that stem directly from your work. While you can have multiple recommendations for each research conclusion, it is also acceptable to have one recommendation that is connected to more than one conclusion.

These recommendations should be targeted at your audience, specifically toward peers or colleagues in your field that work on similar topics to yours. They can flow directly from any limitations you found while conducting your work, offering concrete and actionable possibilities for how future research can build on anything that your own work was unable to address at the time of your writing.

See below for a full research recommendation example that you can use as a template to write your own.

The current study can be interpreted as a first step in the research on COPD speech characteristics. However, the results of this study should be treated with caution due to the small sample size and the lack of details regarding the participants’ characteristics.

Future research could further examine the differences in speech characteristics between exacerbated COPD patients, stable COPD patients, and healthy controls. It could also contribute to a deeper understanding of the acoustic measurements suitable for e-health measurements.

The only proofreading tool specialized in correcting academic writing

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

recommendation in experimental research

Correct my document today

While it may be tempting to present new arguments or evidence in your thesis or disseration conclusion , especially if you have a particularly striking argument you’d like to finish your analysis with, you shouldn’t. Theses and dissertations follow a more formal structure than this.

All your findings and arguments should be presented in the body of the text (more specifically in the discussion section and results section .) The conclusion is meant to summarize and reflect on the evidence and arguments you have already presented, not introduce new ones.

The conclusion of your thesis or dissertation should include the following:

  • A restatement of your research question
  • A summary of your key arguments and/or results
  • A short discussion of the implications of your research

For a stronger dissertation conclusion , avoid including:

  • Generic concluding phrases (e.g. “In conclusion…”)
  • Weak statements that undermine your argument (e.g. “There are good points on both sides of this issue.”)

Your conclusion should leave the reader with a strong, decisive impression of your work.

In a thesis or dissertation, the discussion is an in-depth exploration of the results, going into detail about the meaning of your findings and citing relevant sources to put them in context.

The conclusion is more shorter and more general: it concisely answers your main research question and makes recommendations based on your overall findings.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

George, T. (2022, September 15). How to Write Recommendations in Research | Examples & Tips. Scribbr. Retrieved 21 May 2024, from https://www.scribbr.co.uk/thesis-dissertation/research-recommendations/

Is this article helpful?

Tegan George

Tegan George

Other students also liked, how to write a discussion section | tips & examples, how to write a thesis or dissertation conclusion, how to write a results section | tips & examples.

  • Search Menu
  • Advance Articles
  • Editor's Choice
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Journal of Public Administration Research and Theory
  • About the Public Management Research Association
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Why replication, best practice recommendations for designing and implementing a replication, a worked example: van ryzin (2013), future research on replication, conclusions, acknowledgements.

  • < Previous

Best Practice Recommendations for Replicating Experiments in Public Administration

  • Article contents
  • Figures & tables
  • Supplementary Data

Richard M Walker, Gene A Brewer, M Jin Lee, Nicolai Petrovsky, Arjen van Witteloostuijn, Best Practice Recommendations for Replicating Experiments in Public Administration, Journal of Public Administration Research and Theory , Volume 29, Issue 4, October 2019, Pages 609–626, https://doi.org/10.1093/jopart/muy047

  • Permissions Icon Permissions

Replication is an important mechanism through which broad lessons for theory and practice can be drawn in the applied interdisciplinary social science field of public administration. We suggest a common replication framework for public administration that is illustrated by experimental work in the field. Drawing on knowledge from other disciplines, together with our experience in replicating several experiments on topics such as decision making, organizational rules, and government–citizen relationships, we provide an overview of the replication process. We then distill this knowledge into seven decision points that offer a clear set of best practices on how to design and implement replications in public administration. We conclude by arguing that replication should be part of the normal scientific process in public administration to help to build valid middle-range theories and provide valuable lessons to practice.

Replication is an important part of the scientific process because it can help to establish the external validity of knowledge and thus the ability to generalize a study’s findings more broadly. Replication thus sits at the heart of scientific progress (e.g., Francis 2012 ; Freese 2007 ; Kuhn 1996 ; Nosek and Lakens 2014 ). The goal of replication is to confirm or reject theories under similar and dissimilar conditions to build confidence in (or falsify) them. In this way, theory testing is advanced, and knowledge is accumulated. Many disciplines rely heavily on replications to advance knowledge ( Hubbard, Vetter, and Eldon 1998 ; Open Science Collaboration 2015 ), but they are rare in public administration and management ( Walker, Brewer, and James 2017 ). 1

Public administration is an interdisciplinary social science field that emphasizes applied (“theory to practice”) applications. In this article, we argue that these two factors suggest the need to craft a replication framework for public administration to advance knowledge. Over half-a-century ago, Simon (1996) argued that public administration is a “design science” that devises courses of action designed to change existing outcomes into preferred ones. Recommendations from public administration scholarship can have tangible consequences for citizens; thus, prescriptions need to be well-founded. If the discipline of public administration is to successfully advance policy and practice, then research findings must be generalizable. In the applied social sciences, the context within which studies are conducted is considered to be a critical variable for the development of theory ( Freese 2007 ; Jilke, Meuleman, and Van de Walle 2014 ).

Most social and behavioral sciences are single disciplines with interdisciplinary interests, whereas public administration is an interdisciplinary discipline with many single interests. For example, much of psychology is concerned with questions about the human psyche and human behavior at the individual level. The interdisciplinary nature of public administration means that we are concerned with individual values, attitudes, and behaviors in social and political settings. Further, public administration research can include many different units of analysis (individuals, organizations, networks, countries, etc.). Although the scope of public administration is not necessarily broader or more important than other disciplines, the nature of public administration is interdisciplinary and spans the social sciences.

To help develop a clear set of arguments and illustrations, this article focuses primarily on the replication of experimental studies. However, much of what we say is relevant to other types of research, which also require rigorous replication. Hence, the steps we outline are generally applicable to all types of research, including qualitative studies ( Mele and Belardinelli Forthcoming ; Nowell and Albrecht Forthcoming ). Public administration scholars have increasingly adopted experimental research designs, which are characterized by random assignment of treatments or subjects into separate treatment and control groups, and conducted in real-world or artificial laboratory-like settings. These designs have been adopted because of their strong internal validity and ability to identify causal relationships ( Blom-Hansen, Morton, and Serritzlew 2015 ; James, Jilke, and Van Ryzin 2017 ). Yet even when the internal validity of an experimental study is strong, its external validity is often weak.

Researchers face many challenges in publishing replications. Notably, editors often prefer new findings on groundbreaking topics ( van Witteloostuijn 2016 ) rather than the steady hum of scientific progress inching forward. These expectations are reinforced by the reward and promotion practices of universities, in which faculty are expected to publish in leading or high-impact journals that demand strong findings on innovative topics. Such practices have led some to argue that research articles contain many Type I errors in which false positives abound ( Ioannidis 2005 ; van Witteloostuijn 2016 ). Replication publications are, nonetheless, on the rise across the social sciences (e.g., Francis 2012 ; Nosek and Lakens 2014 ) and in public administration ( Walker, Brewer, and James 2017 ), but much more needs to be done. It is therefore important to develop a replication framework for conducting experimental replications, particularly considering the costliness of experiments, the number of published studies with small- N s, and the contextual nature of the findings reported therein.

This article contributes to the debate on replication in public administration and beyond by offering best practice recommendations for experimental replications with a goal of improving theory and practice. Outlining clear steps for conducting experimental replications will inform researchers’ decisions on whether a study can be replicated and how the replication should be carried out. This article describes seven decision points. Central to these decision points are questions on criteria for determining the type of replication to be implemented. We rely on Tsang and Kwan’s (1999) classification of different types of replications to guide these decisions. Their framework offers conceptual clarity on the type of replication needed to extend the original study to different populations, designs, and analyses. The decision points are (1) deciding whether replicating the study is feasible, (2) assessing the internal validity of the original study and the replication, (3) making choices about statistical power, (4) choosing a critical test case, (5) establishing boundary conditions, (6) establishing content validity, and (7) deciding how to compare the findings.

The first section of this article reviews several rationales for replicating studies in the social sciences in general and catalogs replication experiences in different disciplines. In the second section, we introduce the seven decision points for designing and implementing replications which provides the underpinnings for our best practice recommendations. Following this, we illustrate our best practice recommendations through a replication conducted in Hong Kong in 2017 of Van Ryzin (2013) . The foundations for our best practice recommendations are drawn from the replication experiences on the topics reviewed and our experience in replicating field, laboratory, and survey experiments at the Laboratory for Public Management and Policy at City University of Hong Kong (e.g., Walker, Brewer, and James 2017 ). We conclude by proposing a “common replication framework” that would integrate replication into the normal scientific practices in public administration to improve the quality of scholarship and help develop mid-range theories and robust lessons for practice.

Scientific knowledge is bolstered when research designs have high validity. Researchers may not get it right the first time, but through repeated efforts, their findings amass and converge on the truth. Campbell and colleagues (e.g., Campbell and Stanley 1963 ; Cook and Campbell 1979 ) shaped researchers’ thinking when they distinguished between four essential forms of validity in social science research: internal validity, external validity, construct validity, and statistical conclusion validity. The authors further described various threats to validity and cautioned researchers to minimize them. For many decades, this has been textbook material, and it informs our decision points.

Researchers should strive for high levels of validity, according to Campbell and Stanley (1963) . However, there are tradeoffs that work against achieving a perfect balance. On the one hand, research designs tailored to optimize the testing of causal relationships with high internal validity often create artificial or contrived conditions that make the result very difficult to reproduce in other settings (low external validity). On the other hand, research designs aimed at broad generalizability (high external validity) often sacrifice control and rigor in the research design, making the results less certain (low internal validity). Public administration scholars have adopted the use of experimental research designs in the search for stronger internal validity. Nevertheless, in an applied social science discipline like public administration, external validity is also important. Hence, replication research designs often incur a trade-off between internal and external validity. External validity can then be produced through a string of internally valid studies that test boundary conditions.

A Review of the Major Replication Efforts to Date

Reproducibility.

Reproducibility is “the extent to which consistent results are observed when scientific studies are repeated” ( Open Science Collaboration 2015 , 657). Some disciplines have experienced a “reproducibility crisis” because they could not reproduce high visibility research findings. In economics and finance, Dewald, Thursby, and Anderson (1986) attempted to verify findings reported in articles published in the Journal of Money, Credit, and Banking before and after the journal required authors to provide data and code to others upon request. For articles published before the mandate, 14 out of 62 authors contacted responded that they had lost or discarded their data; for articles published subsequently, the most striking observation was the high prevalence of errors, some of which led to substantially different conclusions. More recently, in psychology, the Open Science Collaboration’s (2015) multisite replication of 100 experiments challenged the hitherto prevalent views of much of the accumulated knowledge in the field when only 39% of the original effects were substantively replicated.

The reproducibility debate has led to calls for openness and transparency in all aspects of the research process ( Camerer et al. 2016 ; Ioannidis, 2005 ; Munafo et al. 2017 ; Nosek et al. 2015 ). In our view, the public administration should institutionalize these norms of openness and transparency before a replication crisis emerges in the field. This is essential if replication is to become a normal part of public administration research. Given the relative youthfulness of experimental research designs in public administration, ensuring the full transparency of all published experimental studies is an essential prerequisite for conducting replications (decision point 1). 2 A leading voice in the transparency and openness debate has been the Center for Open Science, which has published Transparency and Openness Guidelines ( Nosek et al. 2015 ). Level 1 of these standards has been adopted by some public administration journals. 3 Whether journals adopt these standards or not, public administration scholars are urged to be open and transparent in their research designs, data and code to facilitate replication.

Replication

Many existing articles provide guidance on the conduct of replications. We identified a sample of these articles by searching for titles containing the term “replicat*” in the Social Science Citation Index in the disciplines of economics, management, psychology, sociology, and public administration for the period 1970–2018. We turned up 971 articles, of which the vast majority were replication studies. Table 1 identifies three broad themes that are discernable in the literature: types of replication, academic practices, and recommendations and guidance for the conduct of replication. The table lists some useful studies and provides a brief summary of their recommendations for improving replication.

Examples of Sources Addressing Replication Practices

Note: T = discusses types of replication; A = makes recommendations to the academic community to increase the prevalence of replications; R = makes recommendations for the conduct of replications.

The first theme includes scholars working from an exploratory social science perspective and seeking to understand and classify types of replication—we return to this topic in the next section. A second theme involves studies focusing on recommendations to promote the conduct and publication of replications. Recommendations typically include teaching replications in PhD programs to instill replication as a norm and changing the attitudes and behaviors of journals, editors, and reviewers ( Hubbard, Vetter, and Little 1998 ; Schmidt 2009 ). The third theme promotes the practice of replication. Recommendations include detailed guidance on implementing a direct replication ( Brandt et al. 2014 ), the development of a “good-enough” replication standard ( Singh, Ang, and Leong 2003 ), and a protocol based on the researchers involved in the replication, the population/subjects studied, and the methods and analysis ( Walker, Brewer, and James 2017 ). Bettis, Helfat, and Shaver (2016) move beyond questions of research design to consider how a replication advances knowledge and how to interpret its results, as do Fabrigar and Wegener (2016) , while Jilke et al. (2017) provide detailed guidance on measurement equivalence. Others have provided replication models such as RNICE (relevance, number, internal validity, contextual realism, and external validity), which was published in a public administration journal ( Pedersen and Stritch 2018 ). This model lists topics that need to be examined to evaluate a replication effort and is an important step forward in the emerging debate in public administration.

If replication agendas are to pay dividends, studies must be carefully chosen ( Bettis, Helfat, and Shaver 2016 ). Pedersen and Stitch (2018 , 3) suggested that “as a rule, a ‘valuable’ replication aims to provide supporting (or contradicting) evidence about the existence of a phenomenon that audiences care about and deem important.” Detailed impact criteria have been offered by the Netherlands Organisation for Scientific Research, providing a useful guide for public administration scholars: The original study’s findings must be scientifically impactful (e.g., a breakthrough finding with much subsequent work built on it, or a study that has been highly cited) and/or take an important societal perspective (e.g., a critical policy). 4

This section introduces our main contribution: best practice recommendations for conducting replications in public administration research broadly and in experimental studies specifically. The best practice recommendations consist of key “decision points” to help researchers navigate the thicket of issues they face when planning a replication, such as assessing feasibility and determining the type of replication needed. Replications connect the existing body of knowledge with new knowledge. Therefore, they should be applied systematically ( Schmidt 2009 ). Smart decisions on these issues can increase a researcher’s impact, produce new knowledge for the field, and make scientific progress more efficient.

The best practice recommendations span several steps in the scientific method, such as testing and re-testing hypotheses to accumulate results that eventually converge and form scientific knowledge. Figure 1 depicts the main stages of replication: planning, implementing, and reporting results. The goals of replication and decisions on the type of replication to use (see below) influence the best practice recommendations. Thus, trade-offs between these choices are also considered.

Replication: Summary of Main Issues and Decision Points

Replication: Summary of Main Issues and Decision Points

Different Types of Replication

In selecting studies for replication, many decisions must be made, and a classification scheme of different types of replications is very helpful. Tsang and Kwan (1999) proposed a framework that identified four different forms of replication ( table 2 ). The framework contrasts several study elements to create a typology: studies using the “same measurement and analysis” or “different measurement and/or analysis,” and studies using the “same population” or a “different population.” Tsang and Kwan’s (1999) terminology suggests that different populations consist of different participants (e.g., diverse groups of citizens in different service areas or jurisdictions). Their ideas also apply to the additional dimension of context (e.g., the same citizens experiencing different policy domains or institutional settings).

Source: Adapted from Tsang and Kwan (1999) .

Tsang and Kwan (1999) called a replication with the same research procedures, measurements, analysis, population (although perhaps a different sample), and context an “exact replication.” Following Schmidt (2009) , we have relabeled this a “direct” replication because an “exact” or “literal” replication is actually not possible. At a minimum, time passes, and the subjects change. However, we have moved beyond Schmidt’s (2009 , 91) argument that a direct replication is just a “[r]epetition of an experimental procedure.” A study is repeated at a different point in time to test whether the findings hold and whether the concepts are still applicable. A direct replication could also involve a different sample of public managers, such as frontline workers, from a larger population of public managers. An example of a direct replication was implemented by Grimmelikhuijsen and Porumbescu (2017) on Van Ryzin (2013) . The replication yielded similar results and reinforced the findings reported in the original study.

The second type of replication also uses a sample from the same population, but it is labeled “conceptual.” A conceptual replication is an extension of the original study because it involves different measurements and analyses. A conceptual extension introduces new procedures and explores additional ramifications of the original finding(s). For example, using the same population, Van Ryzin, Riccucci, and Li (2017) extended their earlier study of representative bureaucracy ( Riccucci, Van Ryzin, and Li 2016 ) into a new policy domain: emergency preparedness. They were unable to verify their prior findings and concluded that “the symbolic effects of gender representation may be policy-specific” ( Van Ryzin, Riccucci, and Li 2017 , 1365). Scholars conducting replication studies should be aware that conceptual replication may require many repetitions using similar and dissimilar measurements and analyses to provide clear-cut answers.

The third and fourth types of replication involve varying degrees of extension because they are performed on populations that differ from those in the original study. These replications may also differ by using either the same or new measurements and/or analyses. For example, “empirical generalization” uses the same research design, measures, and analyses, but assesses whether the original findings hold up in different populations. Lee, Moon, and Kim (2017) extended Knott, Millar, and Verkuilen’s (2003) study of American university students’ decision making to Hong Kong and Korea, broadly confirming the original finding that individuals with limited information make incremental decisions. “Generalization and extension” replications use a different population while seeking to extend the original findings by adopting additional measurements and analyses. Under this type of replication, if the results are different from those reported in the original study, the discrepancies may be attributed to an altered research design and/or changes in the population.

The approaches to generalization and extension have varied in the public administration literature. For example, Olsen (2017) examined the hypotheses and findings from psychology to study action bias. George et al. (2017) replicated a Danish experiment on performance information and politicians’ preferences for spending and reform, adding an extension onto the role of strategic goals. As discussed in decision point 7, interpreting the replication results requires finesse. Nevertheless, generalizations and extensions represent an important step in testing boundary conditions (decision point 5) to refine theory and strengthen external validity.

Moving from direct replication to generalization and extension increases the confirmatory power of the replication ( Schmidt 2009 ). A direct replication tests whether the hypotheses and causal relationships in the original study hold up with minimal change. Adding to population, measurement, and analysis through “empirical generalization,” “conceptual,” and “generalization and extension” further tests the concepts and causal relationships in each replication. In an ideal world, with ample time and resources available, every replication would travel through the four stages shown in figure 2 to determine whether, or the extent to which, the findings can be confirmed at each stage—reflecting Singh, Ang, and Leong’s (2003) call for “good enough” replications. As Schmidt (2009 , 97) noted, systematically diverse populations, contexts, measures, and analyses are likely to result in variations from the original study that are suited to the multitrait–multimethod research design or a “systematic replication matrix.” To overcome the obstacles of resources and time, a “many-labs” research design has been employed, a topic we return to below.

Building Generalizability

Building Generalizability

Planning a Replication

Decision point 1: deciding if it is feasible to replicate the study.

The goal of the first decision point is to determine whether it is feasible to replicate a study. Although the original study is likely to have been published in a scholarly journal, as we noted above, most public administration journals do not require all aspects of a research article’s design, procedures, and analysis to be fully disclosed. Given this, attempting a replication can be challenging.

If all information on the research design, analysis, and results are accessible, the researcher can move directly to decision points 2 and 3 to check for internal and conclusion (statistical) validity and determine which type of replication should be undertaken. However, it may not be possible to replicate a study if the original materials are not available and sufficient. This includes information on participant recruitment, instructions, measures, procedures, and analysis of the final data set ( Brandt et al. 2014 ). In some cases, materials may be available, but the original study was conducted in a different language. For example, the research materials for Weibel, Rost, and Osterloh’s (2010) study were in German. To conduct a replication, these materials had to be requested from the authors and translated meticulously. Translation is costly and translating somewhat technical research materials can result in misinterpretation, raising questions of construct validity (decision point 6).

If the full details are unavailable or the costs of replication are prohibitively high, there are three options. First, the replication can be abandoned. Second, if adequate study data and research materials are available, the researcher can undertake a “checking of analysis” ( Tsang and Kwan 1999 ). If the original findings can be reproduced based on the replicator’s ability to recreate the original study materials, the researcher can move to decision point 2 and onward to consider which type of replication to undertake. Third, if partial details are available, a final option is to use a different type of replication (e.g., a conceptual replication). This would also lead the researcher to decision point 2.

Information on the original materials may be available from the author(s). If the original authors are being consulted on how to access the materials, it may be necessary to consider a role for one or more of them in the replication. Benefits (including access to detailed knowledge on how the original study was conducted and how it can be replicated) can accrue by including the original team members. However, benefits also arise from maintaining independence from the original research team. Autonomy ensures that researchers have less emotional attachment to the findings and can be impartial: analysis of replications in education and psychology indicate that involving the original authors increases the likelihood of producing similar findings to the original article ( Makel and Plucker 2014 ; Makel, Plucker, and Hegarty 2012 ). If the replicating researcher desires independence from the original authors, but the full materials are unavailable, replication may not be possible, or a different type of replication may need to be undertaken.

Decision Point 2: Assessing the Internal Validity of the Original Study and the Replication

Some public administration experiments are well-designed. They are guided by a clear hypothesis (or a set of hypotheses) derived from a well-specified theoretical perspective. The measures and treatments fit with the theoretical concepts. Alternative explanations are carefully considered and to the greatest extent possible eliminated by the design, which results in a high degree of internal validity. In principle, such cases are suitable for replication. However, replications may still encounter serious threats to internal validity if such threats were present in the original study.

Internal validity is defined by the following questions: did the experimental stimulus, in fact, make some significant difference in the result, and did this covariation result from a causal relationship ( Cook and Campbell 1979 ; Shadish, Cook, and Campbell 2002 , 37)? Validity is a property of inferences related to the design that relate to the sureness or truth of the results. Many well-known threats can weaken internal validity (e.g., ambiguous temporal precedence, selection bias, history, maturation, regression, attrition of respondents, testing, and instrumentation). Space does not permit a full review of all threats to internal validity, but when replicating a study, full consideration should be given.

Decision making on internal validity is a two-stage process. First, the research design of the original study must be carefully examined to ensure that internal validity is high and that fatal problems, such as ambiguous temporal precedence, are removed. Many threats to internal validity, such as attrition and history, occur during the implementation of a study. Original studies should clearly document these threats, but this does not always happen. Thus, it is important to systematically access the original study’s design, procedures, and analysis. Second, when replicating, improvements to internal validity should be implemented to enable a “convincing replication” ( Brandt et al. 2014 ; Pedersen and Stritch 2018 ). Any remaining limitations should be documented when the results are reported.

Decision Point 3: Making Choices about Statistical Power

In public administration, one endemic threat to internal and conclusion validity is low statistical power. The definition of statistical power is the ability of a statistical test (at a level of statistical significance, α, specified by the researcher) to detect an effect size (δ, also specified by the researcher) that exists within the population. It follows that statistical power is always a contingent quantity—it is not defined except for specified values of statistical significance, α, and importantly, the effect size δ. Statistical power is a major consideration when deciding what to infer from an experimental finding. Low power can be a parsimonious explanation of nonsignificant findings ( Shadish, Cook, and Campbell 2002 ). Although the concept of statistical power is widely used, practice could be improved, as Lindsay (2015) notes for the field of experimental psychology.

In a power calculation, the level of statistical significance (α) is typically specified by convention, often at 0.05 [but see Meyer, van Witteloostuijn, and Beugelsdijk (2017) and Cumming (2014) on how to avoid the pitfalls of such a criterion]. For the effect size δ, researchers tend to follow prior research. 5 In a replication, δ can be based on the estimate(s) of the prior study. If researchers want to allow for the possibility of publication bias (which would mean the estimates of δ are greater in magnitude than the population value), they can choose a smaller value of the effect size, δ, for the power calculation ( Vazire 2016 , 4). Once the level of statistical significance and effect size have been specified, statistical power increases in the number of subjects and decreases in the variance of the outcome. 6 We discuss these two levels in turn.

The clearest solution to low statistical power is to increase the sample size. Often, for practical reasons, the number of subjects available for the replication is relatively limited. Having a limited number of subjects means that a replication can improve on the first experiment by taking steps to enhance statistical power. However, there may be a range of constraints that restrict the researcher from doing this. These include the costs of treatments plus compensation and difficult access to subjects. 7 Those who are interested in replication with a focus on maximizing power can take a number of steps to do so. Shadish, Cook, and Campbell (2002 , 46–7) provided a useful catalog of ways to increase power. For one, if the treatments are costly relative to the control conditions, or if each treatment is primarily compared with the control condition, the number of subjects in the control condition can be increased. Indeed, Orr (1999) shows that for T treatments and a control condition, power is maximized if 1/( T × 2) subjects are allocated to each treatment condition, and one-half of the subjects are allocated to the control condition.

Beyond increasing the number of subjects and/or the optimal allocation of subjects to experimental conditions, other strategies to increase statistical power focus on reducing the variance of the outcome. For replications, the variance in outcome can be estimated from the summary statistics of the study being replicated, if they are broken down and reported by experimental conditions. The following three strategies to decrease the outcome variance and thus increase statistical power are all compatible with both direct replication and empirical generalization. These strategies enhance power by reducing heterogeneity first so that likes are compared with likes. They increase power if—and only if—a variable has been measured before the experiment that is expected to correlate with the outcome (cf. Shadish, Cook, and Campbell 2002 ). How does this work?

The three strategies are matching, blocking, and stratifying. We define and illustrate them using the running example of a “nudge” experiment that seeks to identify what type of electric bill information leads households to reduce their consumption. In this example, there are N households as subjects and T different types of information on the electric bill, and the outcome is electricity consumption. One variable that could be used for matching, blocking, or stratifying is the physical size of each dwelling.

Matching means that there are N / T matches with the same scores on the matching variable. There would be N / T households matched on their square footage, and then the T different treatments would be randomly assigned within each match. Blocking is conceptually similar to matching but does not necessarily occur for the exact same scores, only for similar scores. In contrast, stratifying means there are fewer than N / T strata. For instance, strata may be the type of dwelling: apartment/condominium, town home, or single-family home. The N units would first be grouped into these three strata. Then, within each stratum, the T different treatments would be randomly assigned.

In addition to using household size, we could use the pretest electricity consumption as the matching, blocking, or stratifying variable. In either case, because the matching, blocking, or stratifying variables are correlated with the outcome, the power of the test to detect the effects of the T -1 treatments (each compared against the control condition) is increased relative to simply randomly assigning the T conditions across the N households. For each of the three strategies, the analysis must consider how the treatment was assigned, especially when the percentage assigned to a condition varies ( Gerber and Green 2012 ).

Apart from the three strategies just discussed, the introduction of covariates is standard practice in public administration. The logic behind controlling for covariates is fundamentally the same as that behind matching, blocking, and stratifying: reducing error variance by comparing treatment and control within more homogeneous subgroups of the subject pool. The difference is that this is done mathematically in the analysis after the experiment has been conducted. Instead of conducting a t -test or ANOVA of differences between groups, the covariates (e.g., students’ gender, major and location of subjects’ residence) are included as additional variables. For this purpose, it can be helpful to express the analysis as a regression with dummy variables for the treatments. This is mathematically identical to an ANOVA, but some will find it easier to interpret the findings ( Angrist and Pischke 2009 ). 8

Given the limited resources and growing expertize in replication, together with threats to internal validity, many of the original studies that have been replicated were underpowered. Therefore, the comparison between original and replicated results has been inconclusive. Paying close attention to statistical power when designing a replication study is an essential ingredient for maximizing the insights to be gained from the replication.

Implementing the Replication

Decision point 4: choosing the critical test case.

The next decision point is where to replicate, or to choose between replication sites. The choice of replication site requires a detailed consideration of context. Recently, several scholars have sought to develop a theory of context in public administration (e.g., Meier, Rutherford, and Avellaneda 2017 ; O’Toole and Meier 2015 ; Pollitt 2013 ). Their motivation stems from the equivocal findings of observational studies in different contexts, which suggests that context matters and is an important aspect of theory in public administration. The multidimensional nature of context across time and space has led to calls for a more intensive study of its key dimensions, such as the political, environmental, and internal factors affecting public organizations ( O’Toole and Meier 2015 ). Scholars further assert that context affects the relationships between variables in different settings by interacting with public administration practice, altering the relationship between the independent and dependent variables and thus reshaping theory. 9 These factors influence site selection decisions.

The first step in choosing the critical test case is to distinguish between two ideal types of studies (with much gray middle ground): confined versus universal. Confined studies produce findings that are purported to be specific to the local context, whereas universal studies claim global generalizability. In public administration, confined studies are the norm: studies in specific institutional contexts are often framed as tests of general theory. Generally, the argument is that public administration phenomena are context-specific and boundary conditions apply (see below).

One example is the role of national culture. What holds in China may not work in Germany, and vice versa. New public management (NPM) practices may be effective in similar contexts (i.e., Anglo-Saxon countries), but much less so or not at all elsewhere. In contrast, universal claims are said to hold everywhere and every time, irrespective of context. For example, public service motivation and red tape have been presented as universal constructs with local variations, whereas agencification and autonomization are viewed as Scandinavian concepts with universal implications. This is not the place to discuss the type of claims researchers should make, but a study’s location on the confined-universal spectrum implies an understanding of context and a choice of replication sites.

In public administration, confined studies are the rule rather than the exception, so we take this as our stepping stone. Our advice for the second step is to apply both Mill’s (1843) method of difference or dissimilarity and agreement or similarity. 10 The initial stage is to select sites where the study is likely to be replicated (the method of agreement) or not (the method of difference). Applying this logic to public administration, we should include theoretically selected sites in the replication design. Take the above example again. Suppose that theory predicts that NPM practices (e.g., competitive tendering) are only effective in Anglo-Saxon countries. A powerful multisite replication would redo the original study in countries such as Australia, Ireland, the United Kingdom, and the United States (where, following the logic of the method of agreement, the original finding is expected to replicate well) and in nation-states like Brazil, China, France, and Germany (where, through the lens of the method of difference argument, the original result is not expected to replicate). This example illustrates the difference in replicating a universal study: site selection is irrelevant because the original result is expected to replicate everywhere.

Many replications fall into the gray zone with neither fully confined nor fully universal findings. In the context of the above NPM example, an argument could be made that the effectiveness of performance pay is positively associated with a country’s degree of individualism. In this case, our advice is to sample replication sites across the individualism spectrum, taking care to select diverse sites. The findings from the replication can then be used in a moderated meta-analysis with individualism as the critical moderator. The prediction is that the effect sizes will be larger with higher individualism (and perhaps small or nonsignificant at the collectivist end of the scale).

Decision Point 5: Establishing Boundary Conditions

A primary motive for replication in public administration is to establish whether research findings work in different contexts and to understand what factors might affect the degree of generalization. Public administration replication often aims to make more explicit the moderators that explain why replication results vary, but boundary conditions may apply to many other aspects of a study, such as substituting a similar stimulus to achieve a desired effect, testing the cause–effect relationship on a different population, moving the experiment to a different setting, or assessing the consistency of results over time. An understanding of boundary conditions is important because, as discussed in decision point 4, the original study may have been confined—designed for a specific and limited purpose rather than for generalization. In any case, the goal is to make all relevant boundary conditions explicit because this permits an understanding of “the accuracy of theoretical predicators for any context” ( Busse, Kach, and Wagner 2017 , 578). Put simply, it translates theory to practice and helps implement the design science agenda of public administration. The fifth decision point, thus, requires researchers to clearly state boundary conditions and hypothesize their expected findings when replicating an original study.

Choice of replication site will influence boundary conditions. If the strategy is to replicate a study in a very similar setting, the purpose is probably to increase the likelihood of confirming the results or further refine the theory in the original study. The selection of a dissimilar setting suggests that the replication includes extensions that will push the boundaries of the original study to develop or adjust theory. Busse, Kach, and Wagner (2017) propose “inside-out” and “outside-in” approaches to exploring boundary conditions ( figure 3 ). In an inside-out exploration (point A in figure 3 ), uncertainty about boundary conditions is low, and the accuracy of theoretical predications is high. Replications testing inside-out are likely to implement replications that incrementally change populations or measures and analyses [see Grimmelikhuijsen and Porumbescu’s (2017) replication of Van Ryzin (2013) ].

Setting Boundary Conditions Note: Adapted from Busse, Kach, and Wagner (2017, 584).

Setting Boundary Conditions Note: Adapted from Busse, Kach, and Wagner (2017 , 584).

With the outside-in approach (point B in figure 3 ), boundary conditions are highly uncertain, and the accuracy of theoretical predications is low. A replication based on dissimilarity and uncertain boundary conditions seeks to test and develop a theory for a new context. In this case, the researcher would hypothesize that the results of the original study would not be present in the replication. Confirmation of this hypothesis does not equate to “replication failure” because the focus is on breaking new ground. This means that “the problem statement is less clear and the research project does not follow a fully straightforward plan but more likely involves some feedback loops and iterations” ( Busse, Kach, and Wagner 2017 , 584).

An inside-out approach to the question of boundary conditions has been applied to the replications conducted in Hong Kong on organizational design. The Hong Kong government was designed by the British colonial powers and thus reflects the combined structures, cultures, and practices of East and West. Externally, public bureaucracy in Hong Kong looks much like the West. However, Hong Kong’s government and public services are much more centralized and hierarchical than those in most Western countries because of its duty-based and leadership-centric culture, which is typical of East countries. Replications of Kaufmann and Feeney’s (2014) study of red tape ( Walker, Lee, and James 2017 , 457) suggested the following proposition: “The results of replications on organizational design will be in the same direction as the original studies, but weakened by Hong Kong’s diverse culture.” Notable in this view is the likely greater tolerance of rules and decisions made by more senior actors in the hierarchy. This assertion was upheld in the replication, which resulted in nonsignificant findings.

Clearer understanding of boundary conditions in replications can inform public administration on how we use knowledge to solve problems and devise solutions in different contexts. Indeed, exploration of boundary condition fosters theory development, strengthens research validity, and mitigates the research–practice gap ( Busse, Kach, and Wagner 2017 ).

Decision Point 6: Establishing Construct Validity

The sixth decision point concerns questions of construct validity, a topic that has recently been discussed in studies of comparative public administration ( Jilke, Meuleman, and Van de Walle 2014 ; Jilke et al. 2017 ). In cross-cultural studies, bias has resulted from systemic differences in the measurement instruments ( Poortinga 1989 ). More specifically, the measures used “do not correspond to cross-cultural differences in the construct purportedly measured by the instrument” ( Matsumoto and Van de Vijver 2011 , 18). At a basic level, translation from one language into the other already comes with subtle differences in meaning ( Harzing, Reiche, and Pudelko 2013 ), implying that replication work that crosses countries and languages requires careful assessment of cross-cultural and cross-language differences, in combination with equally careful translation and interpretation procedures.

Construct validity is important in any experimental replication, and particularly so when replications extend the original study to different populations. Researchers must ensure that the conceptual construct being measured is consistent in meaning across different settings and that it can be “mapped onto a measurement scale in the same way” ( Jilke et al. 2017 , 1295). This means that a concept and its interpretation by respondents on a scale are equivalent when the respondents hold the same views and variation is only related to context. In generalization and extension replications, researchers can make decisions about the suitability of the measures and explicitly build into their research design measures that are appropriate for the context of the new study. If decisions are not made about construct validity in the replicated study, it may suffer from construct, method, or item bias ( Jilke, Meuleman, and Van de Walle 2014 ). If the measures are not equivalent, the findings can suffer from construct bias (dissimilarity of latent concepts across countries), method bias (all types of biases that come from the methodological and procedural aspects of a survey), and item bias (different people understand or interpret the same survey item differently).

Construct validity can be challenging. The replication of Kaufmann and Feeney’s (2014) study (in Walker, Lee, and James 2017 ) illustrates some aspects of these challenges. Kaufmann and Feeney (2014) examined red tape and outcome favorability. The term “red tape” does not readily convey the same meaning in English as it does in Chinese. The Chinese translation of “red tape” is “繁文縟禮 (Fan-Wen-Ru-Jie).” The meaning is similar in both languages, but the origins differ substantially. The roots of Fan-Wen-Ru-Jie come from the Chinese history of Confucianism, and different schools of Confucian thought favor different interpretations. Furthermore, the direct back translation of Fan-Wen-Ru-Jie is “complicated wordings (writings or documents) and burdensome etiquette,” even though its definition is accepted as meaning “red tape.” To partially offset this potential bias, the replication of Kaufmann and Feeney (2014) was conducted in the English language. In short, when replicating an original study, the replication cannot simply copy or directly translate the original questionnaire because there may be cultural effects such as item inappropriateness or differential response styles ( Matsumoto and Van de Vijver 2011 ).

Taking questions of construct validity one step further, Jilke et al. (2017) suggested that measurement equivalence should be checked during replication, particularly when populations and measurement or analysis change (i.e., conceptual generalization, empirical generalization, and generalization and extension). Jilke et al. (2017) offered detailed guidance to test various levels of measurement equivalence using multiple group confirmatory factor analysis. Those attempting replications are encouraged to consult this source. Finally, questions of construct validity are bound up with decision point 1 regarding whether the original study can be replicated, and decision point 7 concerning the interpretation of the replication’s results, which we now turn to.

Reporting Results

Decision point 7: deciding how to compare the findings.

Determining how the findings should be compared depends on the number of replications, their power, and the nature and extent of their heterogeneity (see the above discussions of boundary conditions and confined studies). Working with sharp rule-of-thumb threshold differences (e.g., if the p value of the replication is above .05 or the new effect size is half the original, the replication has failed) is unwise ( Cumming 2014 ). Actually, we advise against referring to replications as “successes” or “failures” because these judgments can be highly subjective. As a collateral benefit, removing the success–failure classification from our replication terminology makes replications less threatening for the original authors and hence increases the likelihood that replication work will be undertaken.

Describing the results of replications in more objective terms is also more consistent with the ideals of social science. In any replication, the degree of replicability should be carefully discussed using a series of benchmarks (e.g., power-corrected p values, full confidence intervals, and effect sizes). For an example of such careful reflection, we refer to Bouwmeester et al. (2017) , while recalling that care and attention are needed to interpret the statistical evidence ( Bordacconi and Larsen 2014 ). This is an appropriate methodology when only a limited number of replications are being analyzed, perhaps one or two.

When the number of replications that have been conducted increase scholars can compare findings using meta-analytic techniques. Fabrigar and Wegener (2016) illustrate how meta-analytic calculations can be used on a small set of experiments to show how additional replications of the original study change meta-analytic indices. In the example, Fabrigar and Wegener (2016 , 75) walk through they show that when meta-analytic calculations are applied to “fragile” replication results—i.e., ones that fail by threshold differences—effect sizes can be detected across the population of studies. As the scale of the results being compared increases, for example, in a many-labs study that involves scholars from different research labs, more traditional meta-analytic techniques can implement focuses on the overall effect size and variations therein.

In this section, we walk through the decision points to replicate Van Ryzin’s (2013) experimental test of the expectation disconfirmation theory (EDT) of citizen satisfaction. In Van Ryzin’s (2013) online experimental vignette methodology, subjects received either low or high expectations delivered by the mayor of an imaginary US local government. The subjects were asked to view either low- or high-performance street cleanliness photographs. The study confirmed the central hypothesis in EDT that “citizens judge public services not only on experienced service quality but also on an implicit comparison of service quality with prior expectations” (597). EDT was selected for replication because it could better explain citizen satisfaction in a variety of locations and thus had the potential to become a valuable practical framework for governments. Four replications were conducted. The first was an empirical generalization that sought to use, to the extent possible, the same measures and analysis as the original study among a population of Hong Kong citizens. Further replications extended the original study to explore the boundary conditions.

Planning the Replication

The design of the original experiment and information on the treatments and processes were clearly reported in the article ( Van Ryzin 2013 , 602–6). In the replication, the decision was made to include the original study’s author, who became a co-investigator for the grant funding. This provided access to all of the research materials needed for design and analysis and tacit knowledge on the implementation of the original study. For example, the pretest questions discussed in the article were not published ( Van Ryzin 2013 , 603). Decision point 1 was passed.

Decision point 2 requires an examination of the original study’s internal validity. EDT is typically studied using observational research methods such as cross-sectional surveys that raise questions of causality, and subjective measures of expectations and performance, and endogeneity. Van Ryzin (2013) applied an experimental research design to address these validity concerns of ambiguous temporal precedence. Careful reading of the article and discussion with the original author suggested that the threats to internal validity were low: subjects were drawn from a nationally representative sample and randomly assigned to relatively simple treatment conditions. Issues of maturation and attrition did not arise. Decision point 2 was thus satisfied.

Decision point 3 focuses on power and sample size. The sample size in the original study was 964 respondents, randomized across four arms of the study (arm 1 = 251, arm 2 = 257, arm 3 = 226, and arm 4 = 230). The original study did not discuss power analysis in relation to sample size. Analysis of the sample sizes reported ( Van Ryzin 2013 , 602) showed power at 0.95 [for the effect sizes δ set equal to Van Ryzin’s (2013) findings and the critical value α = 0.05]. The original study used path analyses, so the minimum sample size required for power 0.95 was 76, based on MANOVA for repeated measures between factors. Power analysis indicated that the target response rate for the replication should be 600 subjects. However, we recruited 1,000 subjects because the replication included four experiments run over a 6-month period and attrition was anticipated. The replication was a convenience sample, and no claims were made that it was representative of the Hong Kong population.

To date, all experimental tests of EDT and citizen satisfaction have been based in Western liberal democracies (e.g., Grimmelikhuijsen and Porumbescu 2017 ). Among these tests, a number have been replications [see Filtenborg, Gaardboe, and Sigsgaard-Rasmussen (2017) on Danish citizens; Grimmelikhuijsen and Porumbescu (2017) on US MTurk subjects]. Hong Kong was selected as a critical test case (decision point 4) because it is an Asian municipality that relies on observational research methods to study citizen satisfaction. The replications of EDT and citizen satisfaction supported the theorized causal relationships ( Filtenborg, Gaardboe, and Sigsgaard-Rasmussen 2017 ; Grimmelikhuijsen and Porumbescu 2017 ). These confirmatory replications of EDT on citizen satisfaction, together with widespread evidence supporting the theory in studies of goods and products ( Oliver 2010 ), suggested a method of agreement (that the original study would be replicated). We were further motivated to conduct the replication because, if claims of universality could be upheld, EDT offered the prospect of a new and more robust approach to measure citizen satisfaction in the city.

The contexts in Hong Kong and the United States are very different, with large political, environmental, and internal variations. Hong Kong is governed by the Chinese principle of “one-country two systems” and reports directly to Beijing. It is, therefore, a more unitary system than the United States environmentally, social capital is lower in Hong Kong, and the government is very centralized and hierarchical [see Scott (2010) for a thorough review of the Hong Kong context]. However, EDT has been successfully applied to information technology products in Hong Kong ( Thong, Hong, and Tam 2006 ). Given this, we adopted an inside-out approach to the question of boundary conditions raised in decision point 5. We expected that the theory would hold in Hong Kong, despite its very different economic system. To test this expectation and confirm decisions on method of agreement/inside-out replication, the first stage was to conduct an empirical generalization. For this, we sought to retain the original measurement and analysis as much as possible. However, the original vignettes were developed in the American context and were based around a mayor’s announcement to a medium-sized city. In 2017, Hong Kong was a city of 7.36 million people. Therefore, to make the replication as realistic as possible for Hong Kong respondents (see discussion of construct validity below), we changed the treatment from a citywide level to a Hong Kong district. For the generalization and extension replications, the expectation and performance treatments were further changed to capture the Hong Kong context. These replications examined street cleanliness, air quality, and secondary education because these policy issues were relevant in the local context.

EDT is a well-established framework in the management and marketing literature and, as noted above, has experienced increased popularity in public administration studies. The measures and manipulations exhibit good face validity. The topic in Van Ryzin’s EDT study was a technical and unambiguous public service: street cleanliness. The main concept’s construct validity (decision point 6) was therefore not overly problematic. For measurement purposes, we used the same wording for the expectation and performance manipulations and variable measurements as the original study. The performance manipulation in the original study included two pictures of a New York street showing different degrees of cleanliness. In the replication, a picture was taken of a Hong Kong street. One photograph (high performance) showed the original clean street; the other was manipulated using Adobe Photoshop to include litter (low performance). This was done to limit variation in the photographs to only litter (i.e., sharpen the treatment effect), and to enhance validity for the Hong Kong subjects. To avoid construct validity problems associated with translating English into traditional Chinese, all materials were prepared in English, which is widely spoken. Following these decisions and after finalizing the research design, the replication was implemented from February to August 2017.

Reporting the Results of the Replication

As noted above, comparing an original study with a replication can be challenging because of differences in context. However, for Van Ryzin’s (2013) replication, a universal claim was advanced, implying that the results from the original study and the replication would be comparable. We analyzed the replications by implementing the same path analysis as the original study. The results showed the same pattern and direction in the replication as original study, and comparable levels of statistical significance: expectations ⇒ satisfaction (positive), expectation ⇒ disconfirmation (negative), performance ⇒ disconfirmation (positive), performance ⇒ satisfaction (positive), and disconfirmation ⇒ satisfaction (positive). Given that several replications were conducted, it was possible to make further comparisons between the original study, our study, and other replications ( Filtenborg, Gaardboe, and Sigsgaard-Rasmussen 2017 ; Grimmelikhuijsen and Porumbescu 2017 ). Table 3 shows the expected relationship between the expectation and performance manipulations, and the satisfaction and mean scores in each study. The mean scores consistently show that performance has a high impact on satisfaction, resulting in high satisfaction when expectations are set low or high, and a low effect on satisfaction when performance is set low, and expectations are set low or high. This brief preview of the results from the several replications of Van Ryzin (2013) suggests that the direction of the findings was comparable and that EDT was validated on three occasions, providing emerging evidence of the theory’s universality.

Comparison of Expectations and Performance Manipulations on Levels of Satisfaction in the Original Study and Replications

Note: Mean scores, 1 = very dissatisfied to 7 = very satisfied, unequal subscripts indicate statistically significant difference within pairs (low expectations, high expectations). Data are available in online supplementary materials .

a Van Ryzin (2013) .

b Filtenborg, Gaardboe, and Sigsgaard-Rasmussen (2017) . One decimal place reported.

Replication needs to become a normal scientific practice in public administration to improve the quality of scholarship, accumulate knowledge, develop mid-range theories, distill lessons for practice, and identify robust size effects and boundary conditions. Replication failures are not necessarily failures of good scientific practice; rather, they should be seen as the scientific process in action. 11 The time for action is ripe. If public administration is serious about science, validity, and generalization, scholars should view replications more positively and take up this common framework. Practically, this suggests designing and undertaking replications for knowledge building rather than for instrumental purposes, such as falsifying a study result that, itself, has little impact on theory and practice.

The scholarly community needs to expand the meaning of what we refer to as replication, which could have a more immediate and even greater cumulative effect on our field. Replications are not limited to cloning previous studies or re-running controlled laboratory or field experiments. The findings derived from other research designs, like case studies and survey research, should also be subject to rigorous replication and verification. The same best practices we have suggested, or similar ones, can be used to more efficiently corroborate knowledge derived from nonexperimental and quasi-experimental research designs ( Fabrigar and Wegener 2016 ). Researchers should scrutinize these findings and test their boundaries in other samples and settings with different model specifications and research techniques, such as comprehensive literature reviews, meta-analysis, and longitudinal panel studies. Such thorough, persistent efforts to vet important research findings can significantly increase confidence in public administration research and grow the knowledge base in our field. Moreover, rigorous research on boundary conditions will pay big dividends for public administration practice as researchers and practitioners advance the design science ambitions of our field.

Advancing a common replication framework can be achieved through collective action. Replications are an essential part of the scientific method that serves the common interest of the scientific community and beyond. Existing reward structures need to be modified to incentivize replication ( Everett and Earp 2015 ; van Witteloostuijn 2016 ). Currently, the field does not provide enough incentives for individual researchers to carry out and publish replication studies on a large enough scale. Nevertheless, public administration researchers do know something about solving this “tragedy of the commons.” Collective action problems can be overcome by group interventions (e.g., by altering the incentive structure and spreading the costs) or through individual initiatives (as when some individuals subordinate their self-interest and act for the common good). Both courses of action will help generate more replication studies in public administration.

From the experimental research design perspective taken in this article, many changes have taken place in the field since 1992 when Bozeman and Scott argued that experimental methods are not necessarily suited to studying public organizations, and have ethical and logistical barriers ( Bozeman and Scott 1992 ). The research agenda in public administration now embraces questions well suited for experimental methods, particularly questions of behavior ( James, Jilke, and Van Ryzin 2017 ), institutional design ( Meier and Smith 1994 ), and public service performance ( Walker, Boyne, and Brewer 2010 ). Concerns over ethics and deception have been reduced, and logistical barriers have been lowered as research expertize in the field has grown, and more resources have been dedicated to experimental studies ( James, Jilke, and Van Ryzin 2017 ).

As resources for experiments grow, more experimental laboratories and research centers will become operational, creating opportunities for more extensive replication agendas. When these laboratories are linked and working on common agendas, progress will speed up and become more efficient. Multisite or “many-lab” public administration studies are now feasible (for instance, see Klein et al. 2014 ; Open Science Collaboration 2015 ) and we could also amass more replications by making a replication study a mandatory component of PhD programs. Multiple replications can subsequently be used to accumulate estimates that can be used in meta-analyses. Technological advances facilitate this; researchers now have access to larger, more diverse groups of subjects (e.g., through MTurk or eLancing). Survey technologies permit simultaneous implementation of experiments across time and space (e.g., Qualtics). Datasets and protocols can be easily shared and stored. Actually, many-lab studies present an ideal way to explore boundary conditions, and they provide a real opportunity to advance the theory and practice of public administration.

Neither institutions nor individuals can be expected to subordinate their personal interest for the greater good, even though some do. Granting organizations (i.e., Netherlands Organisation for Scientific Research) and journals (i.e., Journal of Behavioral Public Administration ) 12 can incentivize replications. Researchers can choose to take up the replication agenda and might earmark a percentage of their publications for replications in their research area. [ LeBel (2015) recommends the ratio of 4:1.] Yet over the long term, these contributions may be insufficient to mainstream replication in public administration. This means that public administration replicators must become more strategic and resourceful, as we have alluded to in this article. The field should strive for better replications in the same way that we aim for a better theory, research methods, and utilization. The field should trumpet the need for replications, try to refine and improve the process, and incentivize high-impact work in the same way we pursue other normatively important principles. The implementation of a common replication framework would place public administration at the forefront of the applied social sciences, according to it a leading role in the development of solutions to complex, human-related, real-world problems that involve value judgments. Eventually, replication can become a virtuous cycle of theory testing, refinement, and application.

Replication offers public administration scholars the opportunity to address some substantive questions and advance the theory and practice of public administration, in keeping with Simon’s (1996) notion that public administration is a design science that devises courses of action to change existing situations into preferred ones. This major contribution is centered around testing the boundary conditions of current theory and practice using conceptual, generalization, and extension replications. In this article, we outline a set of best practices for public administration researchers who are planning to conduct replication studies. These practices have been gleaned from our experience in one of the first replication laboratories in public administration peppered with advice from other disciplines. This protocol should help researchers avoid common problems, make wise decisions, and carry out better replications. The important elements include targeting the most impactful research findings for replication, shoring up internal validity before moving on to external validity, and identifying the most important boundary conditions to explore. Many useful practices have been suggested. We hope that this article will trigger debate on replication and that our colleagues will take up this agenda and help advance the scientific ambitions of the field, develop better mid-range theories, and glean useful lessons for practice.

This work was supported by: University Grants Committee, Research Grants Council (project no. CityU 11611516); Public Policy Research Funding Scheme from the Central Policy Unit of the Government of the Hong Kong Special Administrative Region (project no. 2015.A1.031.16A); City University of Hong Kong Department of Public Policy Conference grant and College of Liberal Arts and Social Sciences Capacity Building grant; National Research Foundation of Korea grant funded by the Korean Government (NRF-2017S1A3A2067636).

We would like to thank Mary Feeney, Bert George, Oliver James, Seb Jilke, Tobin Im, M. Jae Moon, Greta Nasi, Asmus Leth Olsen, Lars Tummers, Gregg Van Ryzin, and Xun Wu for contributing their ideas and thoughts to this project. Drafts of the article were presented at seminars at the Laboratory for Public Management and Policy City University of Hong Kong, Erasmus University Rotterdam, KU Leuven and SPEA, University of Indiana, and three conferences: Charting New Directions in Public Management Research, University of Bocconi, January 2018; The 2018 Yonsei International Public Management Conference, Seoul, January 2018; and the 22nd International Research Symposium on Public Management, Edinburgh, April 2018. All errors remain with the authors.

In this article, the term “public administration” includes public management.

We encourage openness and transparency in all forms of empirical research.

Public Administration Review , Journal of Behavioral Public Administration and the International Journal of Public Sector Management have signed up to Level 1 (voluntary disclosure): See https://www.publicadministrationreview.com/guidelines/ , https://journal-bpa.org/index.php/jbpa/transparency , and http://www.emeraldgrouppublishing.com/products/journals/author_guidelines.htm?id=ijpsm

https://www.nwo.nl/en/news-and-events/news/2017/social-sciences/second-call-for-proposals-replication-studies-open-for-applications.html

Some researchers report a calculation of power using the estimate of the effect size they obtained in the same study. Cumming (2014) shows that such post hoc power can take almost any value and is thus not informative. Statistical power is most useful when calculated in reference to an effect size δ specified ex ante .

While power calculations are best done in software, we provide a worked, simplified example. For this, we use a between-subjects design, which is representative of the majority of experiments in public administration. Suppose there is a treatment and a control condition (more conditions mean this calculation has to be repeated). The outcome (dependent variable) is a seven-point scale ranging from 1 to 7, with a mean of 3 and standard deviation of 1 in each condition. Suppose we set the level of statistical significance at α = 0.05 in a one-tailed test, and the effect size δ as 0.2 standard deviations, which is 0.2 × 1 = 0.2 in our example. To calculate power, we use the critical values of the relevant statistical distribution, typically the t -distribution. It can be approximated by the standard normal distribution for larger sample sizes, which we use in our example. If the mean of the treated group is 3.2, the power for correctly rejecting the null hypothesis of no positive effect is approximately: = 1 − prob.[Z > 1.64 − (0.2 / ( ⁠ ( 1 / N ) ⁠ )]. For a typical N of 75 subjects for a comparison between two conditions, this yields a power of: 1 − prob.[Z > 1.64 − (0.2 / ( ⁠ ( 1 / 75 ) ⁠ )] = 1 − prob.[Z > 1.64 – 1.73] = 1 − prob.[Z > − 0.09] = 1 − 0.4641 = 0.5359. That is, at the specified level of significance and the specified effect size, the null of no effect will be rejected in 54% of trials. As noted before, for the specified target effect size, power increases with the sample size and decreases with the level of statistical significance the researcher chooses. Instead of setting the number of subjects in each condition ( N ) and solving for power (which is useful when assessing existing studies to be replicated), it can also be useful to set power at a given value and solve for the number of subjects required. This is useful for designing a new study or replication.

Sample size is also subject to diminishing returns, and excessively large samples can raise an ethical problem because they are wasteful and needlessly expose subjects to risk ( Bacchetti et al. 2005 ).

The analysis should be presented with and without covariates to aid interpretation.

It should be noted that cloned dependent variables may differ across settings because of unique attributes of the settings.

This is closely related to the issue of boundary conditions, which is elaborated in decision point 5.

An anonymous reviewer suggested this line of argument.

The Journal of Behavioral Public Administration is explicitly open to publishing replications of experimental work, including null findings; see www.journal-bpa.org .

Angrist , Joshua D. , and Jörn-Steffen Pischke . 2009 . Mostly harmless econometrics: An empiricist’s companion . Statistical Papers 52 : 503 – 4 .

Google Scholar

Brandt , Mark J. , Hans Ijzerman , Ap Dijksterhuis , Frank J. Farach , Jason Geller , Roger Giner-Sorolla , James A. Grange , Marco Perugini , Jeffrey R. Spies , and Anna Van’t Veer . 2014 . The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology 50 : 217 – 24 .

Bacchetti , Peter , Leslie E. Wolf , Mark R. Segal , and Charles E. McCulloch . 2005 . Ethics and sample size . American Journal of Epidemiology 161 : 105 – 10 .

Berthon , Pierre , Leyland Pitt , Michael Ewing , and Christopher L. Carr . 2002 . Potential research space in MIS: A framework for envisioning and evaluating research replication, extension, and generation . Information Systems Research 13 : 416 – 27 .

Bettis , Richard A. , Constance E. Helfat , and Myles J. Shaver . 2016 . The necessity, logic and forms of replication . Strategic Management Journal 37 : 2193 – 203 .

Blom-Hansen , Jens , Rebecca Morton , and Søren Serritzlew . 2015 . Experiments in public management research . International Public Management Journal 18 : 151 – 70 .

Bordacconi , Mats Joe , and Martin Vinæs Larsen . 2014 . Regression to causality: Regression-style presentation influences causal attribution . Research and Politics 1 : 1 – 6 .

Bouwmeester , Samantha , Peter P. J. L. Verkoeijen , Balazs Aczel , Fernando Barbosa , Laurent Bègue , Pablo Brañas-Garza , Thorsten G. H. Chmura , G. Cornelissen , F. S. Døssing , A. M. Espín , et al.  2017 . Registered replication report: Rand, Greene, and Nowak (2012) . Perspectives on Psychological Science 12 : 527 – 42 .

Bozeman , Barry , and Patrick Scott . 1992 . Laboratory experiments in public policy and management . Journal of Public Administration Research and Theory 2 : 293 – 313 .

Brandt , Mark J. , Hans Ijzerman , Ap Dijksterhuis , Frank J. Farach , Jason Geller , Roger Giner-Sorolla , James A. Grange , Marco Perugini , Jeffrey R. Spies , and Anna Van’t Veer . 2014 . The replication recipe: What makes for a convincing replication ? Journal of Experimental Social Psychology 50 : 217 – 24 .

Busse , Christian , Andrew P. Kach , and Stephan M. Wagner . 2017 . Boundary conditions: Why we need them, and when to consider them . Organizational Research Methods 20 : 574 – 609 .

Camerer , Colin F. , Anna Dreber , Eskil Forsell , Teck-Hua Ho , Jürgen Huber , Magnus Johannesson , Michael Kirchler , Johan Almenberg , Adam Altmejd , Taizan Chan , et al.  2016 . Evaluating replicability of laboratory experiments in economics . Science 351 : 1433 – 6 .

Campbell , D. T. , and J. C. Stanley , 1963 . Experimental and quasi-experimental designs for research . Boston : Houghton Mifflin .

Google Preview

Cook , Thomas, D. , and Donald T. Campbell . 1979 . Quasi-experimentation: Design and analysis issues for field settings . Boston, MA : Houghton Mifflin .

Cumming , Geoff . 2014 . The new statistics: Why and how . Psychological Science 25 : 7 – 29 .

Dewald , William G. , Jerry G. Thursby , and R. G. Anderson . 1986 . Replication in empirical economics: The journal of money, credit, and banking project . The American Economic Review 76 : 587 – 603 .

Everett , Jim A. C. , and Brian D. Earp . 2015 . A tragedy of the (academic) commons: Interpreting the replication crisis in psychology as a social dilemma for early-career researchers . Frontiers in Psychology 6 : 1152 .

Fabrigar , Leandre R. , and Duane T. Wegener . 2016 . Conceptualizing and evaluating the replication of research results . Journal of Experimental Social Psychology 66 : 68 – 80 .

Filtenborg , Andres Foged , Frederik Gaardboe , and Jesper Sigsgaard-Rasmussen . 2017 . An experimental test of the expectancy disconfirmation theory of citizen satisfaction . Public Management Review 19 : 1235 – 50 .

Francis , Gregory . 2012 . The psychology of replication and the replication of psychology . Perspectives on Psychological Sciences 7 : 585 – 94 .

Freese , Jeremy . 2007 . Replication standards for quantitative social sciences. Why not sociology ? Sociological Methods & Research 36 : 153 – 72 .

Freese , Jeremy , and David Peterson . 2017 . Replication in social science . Annual Review of Sociology 43 : 147 – 65 .

George , Bert , Sebastian Desmidt , Poul A. Nielsen , and Martin Baekgaard . 2017 . Rational planning and politicians’ preferences for spending and reform: Replication and extension of a survey experiment . Public Management Review 19 : 1251 – 71 .

Gerber , Alan S. , and Donald P. Green . 2012 . Field experiments: Design, analysis, and interpretation . New York/London : Norton .

Grimmelikhuijsen Sephan , and Gregory A. Porumbescu . 2017 . Reconsidering the expectancy-disconfirmation model: Three experimental replications . Public Management Review 19 : 1272 – 92 .

Harzing , Anne-Wil , Sebastian Reiche , and Markus Pudelko . 2013 . Challenges in international survey research: A review with illustrations and suggested solutions for best practice . European Journal of International Management 7 : 112 – 34 .

Hubbard , Raymond , Daniel E. Vetter , and Eldon L. Little . 1998 . Replication in strategic management: Scientific testing for validity, generalizability and usefulness . Strategic Management Journal 19 : 243 – 54 .

Ioannidis , John P. A . 2005 . Why most published research findings are false . PLoS Medicine 2 : 696 – 701 .

James , Oliver , Sebastian R. Jilke , and Gregg G. Van Ryzin . 2017 . Experiments in public management research. Challenges and contributions . Cambridge : Cambridge Univ. Press .

Jilke , Sebastian , Bart Meuleman , and Steven Van de Walle . 2014 . We need to compare, but how? Measurement equivalence in comparative public administration . Public Administration Review 75 : 36 – 48 .

Jilke , Sebastian , Nicolai Petrovsky , Bart Meuleman , and Oliver James . 2017 . Measurement equivalence in replications of experiments: When and why it matters and guidance on how to determine equivalence . Public Management Review 19 : 1293 – 310 .

Kaufmann , Wesley , and Mary Feeney . 2014 . Beyond the rules: The effect of outcome favourability on red tape perceptions . Public Administration 92 : 178 – 91 .

Kerr , Gayle , Don E. Shultz , and Ian Lings . 2016 . “Someone should do something”: Replication and an agenda for collective action . Journal of Advertising 45 : 4 – 12 .

Klein , Richard A. , Kate A. Ratliff , Michelangelo Vianello , Reginald B. Adams , Jr. , Stephan Bahník , Michael L. Bernstein , Brian A. Nosek , et al.  2014 . Investigating variation in replicability: A “many labs” replication project . Social Psychology 45 : 142 – 52 .

Knott , Jack H. , Gary J. Miller , and Jay Verkuilen . 2003 . Adaptive incrementalism and complexity: Experiments with two-person cooperative signaling games . Journal of Public Administration Research and Theory 13 : 341 – 66 .

Kuhn , Thomas . 1996 . The structure of scientific revolutions . Chicago : Univ. of Chicago Press .

LeBel , Etienne P . 2015 . A new replication norm for psychology . Collabra 1 : 1 – 13 .

Lee , M. Jin , M. Jae Moon , and Jungsook Kim . 2017 . Insights from experiments with duopoly games: Rational incremental decision-making . Public Management Review 19 : 1328 – 51 .

Lindsay , D. Stephen . 2015 . Replication in psychological science . Psychological Science 26 : 1827 – 32 .

Makel , Matthew , and Jonathan A. Plucker . 2014 . Facts are more important than novelty: Replication in the education sciences . Educational Researcher 43 : 304 – 16 .

Makel , Matthew C. , Jonathan A. Plucker , and Boyd Hegarty . 2012 . Replications in psychology research: How often do they really occur ? Perspectives on Psychological Science 7 : 537 – 42 .

Matsumoto , David , and Fons. J. Van de Vijver . 2011 . Cross-cultural research methods in psychology . New York : Cambridge Univ. Press .

Meier , Kenneth J. , Amanda Rutherford , and Claudia Avellaneda , eds. 2017 . Comparative public management. Why national, environmental, and organizational context matters . Washington, DC . Georgetown Univ. Press .

Meier , Kenneth J. , and Kevin B. Smith . 1994 . Say it ain’t so Moe: Institutional design, policy effectiveness and drug policy . Journal of Public Administration Research and Theory 4 : 429 – 42 .

Mele , Valentina , and Paolo Belardinelli . Forthcoming . Mixed methods in public administration research: Selecting, sequencing and connecting . Journal of Public Administration Research and Theory .

Meyer , Klaus , Arjen van Witteloostuijn , and Sjoerd Beugelsdijk . 2017 . What’s in a p? Reassessing best practices for reporting hypothesis-testing research . Journal of International Business Studies 48 : 535 – 51 .

Mill , John S . 1843 . A system of logic . London : Longman .

Munafo , Marcus R. , Brian A. Nosek , Dorothy V. M. Bishop , Katherine S. Button , Christopher D. Chambers , Nathalie Percied du Sert , Uri Simonsohn , Eric-Jan Wagenmakers , Jennifer J. Ware , and John P. A. Ioannidis 2017 . A manifesto for reproducible science . Nature Human Behaviour 1 : 0021 .

Nosek , B. A. , G. Alter , G. C. Banks , D. Borsboom , S. D. Bowman , S. J. Breckler , S. Buck , C. D. Chambers , G. Chin , G. Christensen , et al.  2015 . Promoting an open research culture . Science 348 : 1422 – 5 .

Nosek , Brian A. , and Daniel Lakens . 2014 . Registered reports. A method to increase the credibility of published results . Social Psychology 45 : 137 – 41 .

Nowell , Branda , and Kate Albrecht , Forthcoming . A reviewer’s guide to qualitative rigor . Journal of Public Administration Research and Theory .

Oliver , Richard L . 2010 . Satisfaction: A behavioral perspective on the consumer . 2nd ed. Armonk, NY : M. E. Sharpe .

Olsen , Asmus Leth . 2017 . Responding to problems: Actions are rewarded independently of the outcome . Public Management Review 19 : 1352 – 64 .

Open Science Collaboration . 2015 . Estimating the reproducibility of psychological science . Science 349 : aac4716 .

Orr , Larry L . 1999 . Social experiments: Evaluating public programs with experimental methods . Thousand Oaks, CA : Sage .

O’Toole , Laurence J. , Jr. , and Kenneth J. Meier . 2015 . Public management, context and performance: In quest of a more general theory . Journal of Public Administration Research and Theory 25 : 237 – 56 .

Pedersen , Mogens Jin , and Justin M. Stritch . 2018 . RNICE Model: Evaluating the contribution of replication studies in public administration and management research . Public Administration Review . 78:606–612.

Pollitt , Christopher , ed. 2013 . Context in public policy and management: The missing link? Cheltenham, UK : Edward Elgar .

Poortinga , Ype H . 1989 . Equivalence of cross‐cultural data: An overview of basic issues . International Journal of Psychology 24 : 737 – 56 .

Riccucci , Normal , Gregg G. Van Ryzin , and Haufang Li . 2016 . Representative bureaucracy and the willingness to co-produce: An experimental study . Public Administration Review 76 : 121 – 30 .

Schmidt , Stefan . 2009 . Shall we really do it again? The powerful concept of replication is neglected in the social sciences . Review of General Psychology 13 : 90 – 100 .

Scott , Ian . 2010 . The public sector in Hong Kong: Government, policy, people . Hong Kong : Univ. of Hong Kong Press .

Shadish , William R. , Thomas D. Cook , and Donald T. Campbell . 2002 . Experimental and quasi-experimental designs for generalized causal inference . Boston : Houghton Mifflin .

Simon , Herbert A. 1996 . Sciences of the artificial , 3rd ed.. Cambridge, MA : MIT Press .

Singh , Kulwant , Siah Hwee Ang , and Siew Meng Leong . 2003 . Increasing replication for knowledge accumulation in strategy research . Journal of Management 29 : 533 – 49 .

Thong , James Y. L. , Se-Joon Hong , and Kar Yan Tam . 2006 . The effects of post-adoption beliefs on the expectation-confirmation model for information technology continuance . International Journal of Human-Computer Studies 64 : 799 – 810 .

Tsang , Eric W. K. , and Kai-Man Kwan . 1999 . Replication and theory development in organizational science: A critical realist perspective . Academy of Management Review 24 : 759 – 80 .

Uncles , Mark D. , and Simon Kwok . 2013 . Designing research with in-built differentiated replication . Journal of Business Research 66 : 1398 – 405 .

Van Ryzin , Gregg G . 2013 . An experimental test of expectancy-disconfirmation . Journal of Public Administration Research and Theory 32 : 597 – 614 .

Van Ryzin , Gregg G. , Norma Riccucci , and Huafang Li . 2017 . Representative bureaucracy and its symbolic effect on citizens: A conceptual replication . Public Management Review 19 : 1365 – 80 .

Vazire , Simine . 2016 . Editorial . Social Psychological and Personality Science 7 : 3 – 7 .

Walker , Richard M. , George A. Boyne , and Gene A. Brewer . 2010 . Public management and performance: Research directions . Cambridge : Cambridge Univ. Press .

Walker , Richard M. , Gene A. Brewer , and Oliver James , Guest Editors. 2017 . Special issue: Replication, experiments and knowledge in public management research . Public Management Review 19 : 1221 – 379 .

Walker , Richard M. , Myong Jin Lee , and Oliver James . 2017 . Replications of experimental research: Implications for the study of public management . In Experiments in public management research: Challenges and contributions , eds O. James , S. Jilke , and G. G. Van Ryzin . Cambridge : Cambridge Univ. Press .

Weibel , Antoinette , Katja Rost , and Margit Osterloh . 2010 . Pay for performance in the public sector—Benefits and (hidden) costs . Journal of Public Administration Research and Theory 20 : 387 – 412 .

van Witteloostuijn , Arjen . 2016 . What happened to Popperian falsification? Publishing neutral and negative findings: Moving away from biased publication practices . Cross Cultural & Strategic Management 23 : 481 – 508 .

Supplementary data

Email alerts, citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1477-9803
  • Print ISSN 1053-1858
  • Copyright © 2024 Public Management Research Association
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Experimental Research: Definition, Types, Design, Examples

Appinio Research · 14.05.2024 · 31min read

Experimental Research Definition Types Design Examples

Experimental research is a cornerstone of scientific inquiry, providing a systematic approach to understanding cause-and-effect relationships and advancing knowledge in various fields. At its core, experimental research involves manipulating variables, observing outcomes, and drawing conclusions based on empirical evidence. By controlling factors that could influence the outcome, researchers can isolate the effects of specific variables and make reliable inferences about their impact. This guide offers a step-by-step exploration of experimental research, covering key elements such as research design, data collection, analysis, and ethical considerations. Whether you're a novice researcher seeking to understand the basics or an experienced scientist looking to refine your experimental techniques, this guide will equip you with the knowledge and tools needed to conduct rigorous and insightful research.

What is Experimental Research?

Experimental research is a systematic approach to scientific inquiry that aims to investigate cause-and-effect relationships by manipulating independent variables and observing their effects on dependent variables. Experimental research primarily aims to test hypotheses, make predictions, and draw conclusions based on empirical evidence.

By controlling extraneous variables and randomizing participant assignment, researchers can isolate the effects of specific variables and establish causal relationships. Experimental research is characterized by its rigorous methodology, emphasis on objectivity, and reliance on empirical data to support conclusions.

Importance of Experimental Research

  • Establishing Cause-and-Effect Relationships : Experimental research allows researchers to establish causal relationships between variables by systematically manipulating independent variables and observing their effects on dependent variables. This provides valuable insights into the underlying mechanisms driving phenomena and informs theory development.
  • Testing Hypotheses and Making Predictions : Experimental research provides a structured framework for testing hypotheses and predicting the relationship between variables . By systematically manipulating variables and controlling for confounding factors, researchers can empirically test the validity of their hypotheses and refine theoretical models.
  • Informing Evidence-Based Practice : Experimental research generates empirical evidence that informs evidence-based practice in various fields, including healthcare, education, and business. Experimental research contributes to improving outcomes and informing decision-making in real-world settings by identifying effective interventions, treatments, and strategies.
  • Driving Innovation and Advancement : Experimental research drives innovation and advancement by uncovering new insights, challenging existing assumptions, and pushing the boundaries of knowledge. Through rigorous experimentation and empirical validation, researchers can develop novel solutions to complex problems and contribute to the advancement of science and technology.
  • Enhancing Research Rigor and Validity : Experimental research upholds high research rigor and validity standards by employing systematic methods, controlling for confounding variables, and ensuring replicability of findings. By adhering to rigorous methodology and ethical principles, experimental research produces reliable and credible evidence that withstands scrutiny and contributes to the cumulative body of knowledge.

Experimental research plays a pivotal role in advancing scientific understanding, informing evidence-based practice, and driving innovation across various disciplines. By systematically testing hypotheses, establishing causal relationships, and generating empirical evidence, experimental research contributes to the collective pursuit of knowledge and the improvement of society.

Understanding Experimental Design

Experimental design serves as the blueprint for your study, outlining how you'll manipulate variables and control factors to draw valid conclusions.

Experimental Design Components

Experimental design comprises several essential elements:

  • Independent Variable (IV) : This is the variable manipulated by the researcher. It's what you change to observe its effect on the dependent variable. For example, in a study testing the impact of different study techniques on exam scores, the independent variable might be the study method (e.g., flashcards, reading, or practice quizzes).
  • Dependent Variable (DV) : The dependent variable is what you measure to assess the effect of the independent variable. It's the outcome variable affected by the manipulation of the independent variable. In our study example, the dependent variable would be the exam scores.
  • Control Variables : These factors could influence the outcome but are kept constant or controlled to isolate the effect of the independent variable. Controlling variables helps ensure that any observed changes in the dependent variable can be attributed to manipulating the independent variable rather than other factors.
  • Experimental Group : This group receives the treatment or intervention being tested. It's exposed to the manipulated independent variable. In contrast, the control group does not receive the treatment and serves as a baseline for comparison.

Types of Experimental Designs

Experimental designs can vary based on the research question, the nature of the variables, and the desired level of control. Here are some common types:

  • Between-Subjects Design : In this design, different groups of participants are exposed to varying levels of the independent variable. Each group represents a different experimental condition, and participants are only exposed to one condition. For instance, in a study comparing the effectiveness of two teaching methods, one group of students would use Method A, while another would use Method B.
  • Within-Subjects Design : Also known as repeated measures design , this approach involves exposing the same group of participants to all levels of the independent variable. Participants serve as their own controls, and the order of conditions is typically counterbalanced to control for order effects. For example, participants might be tested on their reaction times under different lighting conditions, with the order of conditions randomized to eliminate any research bias .
  • Mixed Designs : Mixed designs combine elements of both between-subjects and within-subjects designs. This allows researchers to examine both between-group differences and within-group changes over time. Mixed designs help study complex phenomena that involve multiple variables and temporal dynamics.

Factors Influencing Experimental Design Choices

Several factors influence the selection of an appropriate experimental design:

  • Research Question : The nature of your research question will guide your choice of experimental design. Some questions may be better suited to between-subjects designs, while others may require a within-subjects approach.
  • Variables : Consider the number and type of variables involved in your study. A factorial design might be appropriate if you're interested in exploring multiple factors simultaneously. Conversely, if you're focused on investigating the effects of a single variable, a simpler design may suffice.
  • Practical Considerations : Practical constraints such as time, resources, and access to participants can impact your choice of experimental design. Depending on your study's specific requirements, some designs may be more feasible or cost-effective   than others .
  • Ethical Considerations : Ethical concerns, such as the potential risks to participants or the need to minimize harm, should also inform your experimental design choices. Ensure that your design adheres to ethical guidelines and safeguards the rights and well-being of participants.

By carefully considering these factors and selecting an appropriate experimental design, you can ensure that your study is well-designed and capable of yielding meaningful insights.

Experimental Research Elements

When conducting experimental research, understanding the key elements is crucial for designing and executing a robust study. Let's explore each of these elements in detail to ensure your experiment is well-planned and executed effectively.

Independent and Dependent Variables

In experimental research, the independent variable (IV) is the factor that the researcher manipulates or controls, while the dependent variable (DV) is the measured outcome or response. The independent variable is what you change in the experiment to observe its effect on the dependent variable.

For example, in a study investigating the effect of different fertilizers on plant growth, the type of fertilizer used would be the independent variable, while the plant growth (height, number of leaves, etc.) would be the dependent variable.

Control Groups and Experimental Groups

Control groups and experimental groups are essential components of experimental design. The control group serves as a baseline for comparison and does not receive the treatment or intervention being studied. Its purpose is to provide a reference point to assess the effects of the independent variable.

In contrast, the experimental group receives the treatment or intervention and is used to measure the impact of the independent variable. For example, in a drug trial, the control group would receive a placebo, while the experimental group would receive the actual medication.

Randomization and Random Sampling

Randomization is the process of randomly assigning participants to different experimental conditions to minimize biases and ensure that each participant has an equal chance of being assigned to any condition. Randomization helps control for extraneous variables and increases the study's internal validity .

Random sampling, on the other hand, involves selecting a representative sample from the population of interest to generalize the findings to the broader population. Random sampling ensures that each member of the population has an equal chance of being included in the sample, reducing the risk of sampling bias .

Replication and Reliability

Replication involves repeating the experiment to confirm the results and assess the reliability of the findings . It is essential for ensuring the validity of scientific findings and building confidence in the robustness of the results. A study that can be replicated consistently across different settings and by various researchers is considered more reliable. Researchers should strive to design experiments that are easily replicable and transparently report their methods to facilitate replication by others.

Validity: Internal, External, Construct, and Statistical Conclusion Validity

Validity refers to the degree to which an experiment measures what it intends to measure and the extent to which the results can be generalized to other populations or contexts. There are several types of validity that researchers should consider:

  • Internal Validity : Internal validity refers to the extent to which the study accurately assesses the causal relationship between variables. Internal validity is threatened by factors such as confounding variables, selection bias, and experimenter effects. Researchers can enhance internal validity through careful experimental design and control procedures.
  • External Validity : External validity refers to the extent to which the study's findings can be generalized to other populations or settings. External validity is influenced by factors such as the representativeness of the sample and the ecological validity of the experimental conditions. Researchers should consider the relevance and applicability of their findings to real-world situations.
  • Construct Validity : Construct validity refers to the degree to which the study accurately measures the theoretical constructs of interest. Construct validity is concerned with whether the operational definitions of the variables align with the underlying theoretical concepts. Researchers can establish construct validity through careful measurement selection and validation procedures.
  • Statistical Conclusion Validity : Statistical conclusion validity refers to the accuracy of the statistical analyses and conclusions drawn from the data. It ensures that the statistical tests used are appropriate for the data and that the conclusions drawn are warranted. Researchers should use robust statistical methods and report effect sizes and confidence intervals to enhance statistical conclusion validity.

By addressing these elements of experimental research and ensuring the validity and reliability of your study, you can conduct research that contributes meaningfully to the advancement of knowledge in your field.

How to Conduct Experimental Research?

Embarking on an experimental research journey involves a series of well-defined phases, each crucial for the success of your study. Let's explore the pre-experimental, experimental, and post-experimental phases to ensure you're equipped to conduct rigorous and insightful research.

Pre-Experimental Phase

The pre-experimental phase lays the foundation for your study, setting the stage for what's to come. Here's what you need to do:

  • Formulating Research Questions and Hypotheses : Start by clearly defining your research questions and formulating testable hypotheses. Your research questions should be specific, relevant, and aligned with your research objectives. Hypotheses provide a framework for testing the relationships between variables and making predictions about the outcomes of your study.
  • Reviewing Literature and Establishing Theoretical Framework : Dive into existing literature relevant to your research topic and establish a solid theoretical framework. Literature review helps you understand the current state of knowledge, identify research gaps, and build upon existing theories. A well-defined theoretical framework provides a conceptual basis for your study and guides your research design and analysis.

Experimental Phase

The experimental phase is where the magic happens – it's time to put your hypotheses to the test and gather data. Here's what you need to consider:

  • Participant Recruitment and Sampling Techniques : Carefully recruit participants for your study using appropriate sampling techniques . The sample should be representative of the population you're studying to ensure the generalizability of your findings. Consider factors such as sample size , demographics , and inclusion criteria when recruiting participants.
  • Implementing Experimental Procedures : Once you've recruited participants, it's time to implement your experimental procedures. Clearly outline the experimental protocol, including instructions for participants, procedures for administering treatments or interventions, and measures for controlling extraneous variables. Standardize your procedures to ensure consistency across participants and minimize sources of bias.
  • Data Collection and Measurement : Collect data using reliable and valid measurement instruments. Depending on your research questions and variables of interest, data collection methods may include surveys , observations, physiological measurements, or experimental tasks. Ensure that your data collection procedures are ethical, respectful of participants' rights, and designed to minimize errors and biases.

Post-Experimental Phase

In the post-experimental phase, you make sense of your data, draw conclusions, and communicate your findings  to the world . Here's what you need to do:

  • Data Analysis Techniques : Analyze your data using appropriate statistical techniques . Choose methods that are aligned with your research design and hypotheses. Standard statistical analyses include descriptive statistics, inferential statistics (e.g., t-tests, ANOVA), regression analysis , and correlation analysis. Interpret your findings in the context of your research questions and theoretical framework.
  • Interpreting Results and Drawing Conclusions : Once you've analyzed your data, interpret the results and draw conclusions. Discuss the implications of your findings, including any theoretical, practical, or real-world implications. Consider alternative explanations and limitations of your study and propose avenues for future research. Be transparent about the strengths and weaknesses of your study to enhance the credibility of your conclusions.
  • Reporting Findings : Finally, communicate your findings through research reports, academic papers, or presentations. Follow standard formatting guidelines and adhere to ethical standards for research reporting. Clearly articulate your research objectives, methods, results, and conclusions. Consider your target audience and choose appropriate channels for disseminating your findings to maximize impact and reach.

By meticulously planning and executing each experimental research phase, you can generate valuable insights, advance knowledge in your field, and contribute to scientific progress.

A s you navigate the intricate phases of experimental research, leveraging Appinio can streamline your journey toward actionable insights. With our intuitive platform, you can swiftly gather real-time consumer data, empowering you to make informed decisions with confidence. Say goodbye to the complexities of traditional market research and hello to a seamless, efficient process that puts you in the driver's seat of your research endeavors.

Ready to revolutionize your approach to data-driven decision-making? Book a demo today and discover the power of Appinio in transforming your research experience!

Book a Demo

Experimental Research Examples

Understanding how experimental research is applied in various contexts can provide valuable insights into its practical significance and effectiveness. Here are some examples illustrating the application of experimental research in different domains:

Market Research

Experimental studies are crucial in market research in testing hypotheses, evaluating marketing strategies, and understanding consumer behavior . For example, a company may conduct an experiment to determine the most effective advertising message for a new product. Participants could be exposed to different versions of an advertisement, each emphasizing different product features or appeals.

By measuring variables such as brand recall, purchase intent, and brand perception, researchers can assess the impact of each advertising message and identify the most persuasive approach.

Software as a Service (SaaS)

In the SaaS industry, experimental research is often used to optimize user interfaces, features, and pricing models to enhance user experience and drive engagement. For instance, a SaaS company may conduct A/B tests to compare two versions of its software interface, each with a different layout or navigation structure.

Researchers can identify design elements that lead to higher user satisfaction and retention by tracking user interactions, conversion rates, and customer feedback . Experimental research also enables SaaS companies to test new product features or pricing strategies before full-scale implementation, minimizing risks and maximizing return on investment.

Business Management

Experimental research is increasingly utilized in business management to inform decision-making, improve organizational processes, and drive innovation. For example, a business may conduct an experiment to evaluate the effectiveness of a new training program on employee productivity. Participants could be randomly assigned to either receive the training or serve as a control group.

By measuring performance metrics such as sales revenue, customer satisfaction, and employee turnover, researchers can assess the training program's impact and determine its return on investment. Experimental research in business management provides empirical evidence to support strategic initiatives and optimize resource allocation.

In healthcare , experimental research is instrumental in testing new treatments, interventions, and healthcare delivery models to improve patient outcomes and quality of care. For instance, a clinical trial may be conducted to evaluate the efficacy of a new drug in treating a specific medical condition. Participants are randomly assigned to either receive the experimental drug or a placebo, and their health outcomes are monitored over time.

By comparing the effectiveness of the treatment and placebo groups, researchers can determine the drug's efficacy, safety profile, and potential side effects. Experimental research in healthcare informs evidence-based practice and drives advancements in medical science and patient care.

These examples illustrate the versatility and applicability of experimental research across diverse domains, demonstrating its value in generating actionable insights, informing decision-making, and driving innovation. Whether in market research or healthcare, experimental research provides a rigorous and systematic approach to testing hypotheses, evaluating interventions, and advancing knowledge.

Experimental Research Challenges

Even with careful planning and execution, experimental research can present various challenges. Understanding these challenges and implementing effective solutions is crucial for ensuring the validity and reliability of your study. Here are some common challenges and strategies for addressing them.

Sample Size and Statistical Power

Challenge : Inadequate sample size can limit your study's generalizability and statistical power, making it difficult to detect meaningful effects. Small sample sizes increase the risk of Type II errors (false negatives) and reduce the reliability of your findings.

Solution : Increase your sample size to improve statistical power and enhance the robustness of your results. Conduct a power analysis before starting your study to determine the minimum sample size required to detect the effects of interest with sufficient power. Consider factors such as effect size, alpha level, and desired power when calculating sample size requirements. Additionally, consider using techniques such as bootstrapping or resampling to augment small sample sizes and improve the stability of your estimates.

To enhance the reliability of your experimental research findings, you can leverage our Sample Size Calculator . By determining the optimal sample size based on your desired margin of error, confidence level, and standard deviation, you can ensure the representativeness of your survey results. Don't let inadequate sample sizes hinder the validity of your study and unlock the power of precise research planning!

Confounding Variables and Bias

Challenge : Confounding variables are extraneous factors that co-vary with the independent variable and can distort the relationship between the independent and dependent variables. Confounding variables threaten the internal validity of your study and can lead to erroneous conclusions.

Solution : Implement control measures to minimize the influence of confounding variables on your results. Random assignment of participants to experimental conditions helps distribute confounding variables evenly across groups, reducing their impact on the dependent variable. Additionally, consider using matching or blocking techniques to ensure that groups are comparable on relevant variables. Conduct sensitivity analyses to assess the robustness of your findings to potential confounders and explore alternative explanations for your results.

Researcher Effects and Experimenter Bias

Challenge : Researcher effects and experimenter bias occur when the experimenter's expectations or actions inadvertently influence the study's outcomes. This bias can manifest through subtle cues, unintentional behaviors, or unconscious biases , leading to invalid conclusions.

Solution : Implement double-blind procedures whenever possible to mitigate researcher effects and experimenter bias. Double-blind designs conceal information about the experimental conditions from both the participants and the experimenters, minimizing the potential for bias. Standardize experimental procedures and instructions to ensure consistency across conditions and minimize experimenter variability. Additionally, consider using objective outcome measures or automated data collection procedures to reduce the influence of experimenter bias on subjective assessments.

External Validity and Generalizability

Challenge : External validity refers to the extent to which your study's findings can be generalized to other populations, settings, or conditions. Limited external validity restricts the applicability of your results and may hinder their relevance to real-world contexts.

Solution : Enhance external validity by designing studies closely resembling real-world conditions and populations of interest. Consider using diverse samples  that represent  the target population's demographic, cultural, and ecological variability. Conduct replication studies in different contexts or with different populations to assess the robustness and generalizability of your findings. Additionally, consider conducting meta-analyses or systematic reviews to synthesize evidence from multiple studies and enhance the external validity of your conclusions.

By proactively addressing these challenges and implementing effective solutions, you can strengthen the validity, reliability, and impact of your experimental research. Remember to remain vigilant for potential pitfalls throughout the research process and adapt your strategies as needed to ensure the integrity of your findings.

Advanced Topics in Experimental Research

As you delve deeper into experimental research, you'll encounter advanced topics and methodologies that offer greater complexity and nuance.

Quasi-Experimental Designs

Quasi-experimental designs resemble true experiments but lack random assignment to experimental conditions. They are often used when random assignment is impractical, unethical, or impossible. Quasi-experimental designs allow researchers to investigate cause-and-effect relationships in real-world settings where strict experimental control is challenging. Common examples include:

  • Non-Equivalent Groups Design : This design compares two or more groups that were not created through random assignment. While similar to between-subjects designs, non-equivalent group designs lack the random assignment of participants, increasing the risk of confounding variables.
  • Interrupted Time Series Design : In this design, multiple measurements are taken over time before and after an intervention is introduced. Changes in the dependent variable are assessed over time, allowing researchers to infer the impact of the intervention.
  • Regression Discontinuity Design : This design involves assigning participants to different groups based on a cutoff score on a continuous variable. Participants just above and below the cutoff are treated as if they were randomly assigned to different conditions, allowing researchers to estimate causal effects.

Quasi-experimental designs offer valuable insights into real-world phenomena but require careful consideration of potential confounding variables and limitations inherent to non-random assignment.

Factorial Designs

Factorial designs involve manipulating two or more independent variables simultaneously to examine their main effects and interactions. By systematically varying multiple factors, factorial designs allow researchers to explore complex relationships between variables and identify how they interact to influence outcomes. Common types of factorial designs include:

  • 2x2 Factorial Design : This design manipulates two independent variables, each with two levels. It allows researchers to examine the main effects of each variable as well as any interaction between them.
  • Mixed Factorial Design : In this design, one independent variable is manipulated between subjects, while another is manipulated within subjects. Mixed factorial designs enable researchers to investigate both between-subjects and within-subjects effects simultaneously.

Factorial designs provide a comprehensive understanding of how multiple factors contribute to outcomes and offer greater statistical efficiency compared to studying variables in isolation.

Longitudinal and Cross-Sectional Studies

Longitudinal studies involve collecting data from the same participants over an extended period, allowing researchers to observe changes and trajectories over time. Cross-sectional studies , on the other hand, involve collecting data from different participants at a single point in time, providing a snapshot of the population at that moment. Both longitudinal and cross-sectional studies offer unique advantages and challenges:

  • Longitudinal Studies : Longitudinal designs allow researchers to examine developmental processes, track changes over time, and identify causal relationships. However, longitudinal studies require long-term commitment, are susceptible to attrition and dropout, and may be subject to practice effects and cohort effects.
  • Cross-Sectional Studies : Cross-sectional designs are relatively quick and cost-effective, provide a snapshot of population characteristics, and allow for comparisons across different groups. However, cross-sectional studies cannot assess changes over time or establish causal relationships between variables.

Researchers should carefully consider the research question, objectives, and constraints when choosing between longitudinal and cross-sectional designs.

Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are quantitative methods used to synthesize findings from multiple studies and draw robust conclusions. These methods offer several advantages:

  • Meta-Analysis : Meta-analysis combines the results of multiple studies using statistical techniques to estimate overall effect sizes and assess the consistency of findings across studies. Meta-analysis increases statistical power, enhances generalizability, and provides more precise estimates of effect sizes.
  • Systematic Reviews : Systematic reviews involve systematically searching, appraising, and synthesizing existing literature on a specific topic. Systematic reviews provide a comprehensive summary of the evidence, identify gaps and inconsistencies in the literature, and inform future research directions.

Meta-analysis and systematic reviews are valuable tools for evidence-based practice, guiding policy decisions, and advancing scientific knowledge by aggregating and synthesizing empirical evidence from diverse sources.

By exploring these advanced topics in experimental research, you can expand your methodological toolkit, tackle more complex research questions, and contribute to deeper insights and understanding in your field.

Experimental Research Ethical Considerations

When conducting experimental research, it's imperative to uphold ethical standards and prioritize the well-being and rights of participants. Here are some key ethical considerations to keep in mind throughout the research process:

  • Informed Consent : Obtain informed consent from participants before they participate in your study. Ensure that participants understand the purpose of the study, the procedures involved, any potential risks or benefits, and their right to withdraw from the study at any time without penalty.
  • Protection of Participants' Rights : Respect participants' autonomy, privacy, and confidentiality throughout the research process. Safeguard sensitive information and ensure that participants' identities are protected. Be transparent about how their data will be used and stored.
  • Minimizing Harm and Risks : Take steps to mitigate any potential physical or psychological harm to participants. Conduct a risk assessment before starting your study and implement appropriate measures to reduce risks. Provide support services and resources for participants who may experience distress or adverse effects as a result of their participation.
  • Confidentiality and Data Security : Protect participants' privacy and ensure the security of their data. Use encryption and secure storage methods to prevent unauthorized access to sensitive information. Anonymize data whenever possible to minimize the risk of data breaches or privacy violations.
  • Avoiding Deception : Minimize the use of deception in your research and ensure that any deception is justified by the scientific objectives of the study. If deception is necessary, debrief participants fully at the end of the study and provide them with an opportunity to withdraw their data if they wish.
  • Respecting Diversity and Cultural Sensitivity : Be mindful of participants' diverse backgrounds, cultural norms, and values. Avoid imposing your own cultural biases on participants and ensure that your research is conducted in a culturally sensitive manner. Seek input from diverse stakeholders to ensure your research is inclusive and respectful.
  • Compliance with Ethical Guidelines : Familiarize yourself with relevant ethical guidelines and regulations governing research with human participants, such as those outlined by institutional review boards (IRBs) or ethics committees. Ensure that your research adheres to these guidelines and that any potential ethical concerns are addressed appropriately.
  • Transparency and Openness : Be transparent about your research methods, procedures, and findings. Clearly communicate the purpose of your study, any potential risks or limitations, and how participants' data will be used. Share your research findings openly and responsibly, contributing to the collective body of knowledge in your field.

By prioritizing ethical considerations in your experimental research, you demonstrate integrity, respect, and responsibility as a researcher, fostering trust and credibility in the scientific community.

Conclusion for Experimental Research

Experimental research is a powerful tool for uncovering causal relationships and expanding our understanding of the world around us. By carefully designing experiments, collecting data, and analyzing results, researchers can make meaningful contributions to their fields and address pressing questions. However, conducting experimental research comes with responsibilities. Ethical considerations are paramount to ensure the well-being and rights of participants, as well as the integrity of the research process. Researchers can build trust and credibility in their work by upholding ethical standards and prioritizing participant safety and autonomy. Furthermore, as you continue to explore and innovate in experimental research, you must remain open to new ideas and methodologies. Embracing diversity in perspectives and approaches fosters creativity and innovation, leading to breakthrough discoveries and scientific advancements. By promoting collaboration and sharing findings openly, we can collectively push the boundaries of knowledge and tackle some of society's most pressing challenges.

How to Conduct Research in Minutes?

Discover the power of Appinio , the real-time market research platform revolutionizing experimental research. With Appinio, you can access real-time consumer insights to make better data-driven decisions in minutes. Join the thousands of companies worldwide who trust Appinio to deliver fast, reliable consumer insights.

Here's why you should consider using Appinio for your research needs:

  • From questions to insights in minutes:  With Appinio, you can conduct your own market research and get actionable insights in record time, allowing you to make fast, informed decisions for your business.
  • Intuitive platform for anyone:  You don't need a PhD in research to use Appinio. Our platform is designed to be user-friendly and intuitive so  that anyone  can easily create and launch surveys.
  • Extensive reach and targeting options:  Define your target audience from over 1200 characteristics and survey them in over 90 countries. Our platform ensures you reach the right people for your research needs, no matter where they are.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Time Series Analysis Definition Types Techniques Examples

16.05.2024 | 30min read

Time Series Analysis: Definition, Types, Techniques, Examples

14.05.2024 | 31min read

Interval Scale Definition Characteristics Examples

07.05.2024 | 29min read

Interval Scale: Definition, Characteristics, Examples

Enago Academy

Experimental Research Design — 6 mistakes you should never make!

' src=

Since school days’ students perform scientific experiments that provide results that define and prove the laws and theorems in science. These experiments are laid on a strong foundation of experimental research designs.

An experimental research design helps researchers execute their research objectives with more clarity and transparency.

In this article, we will not only discuss the key aspects of experimental research designs but also the issues to avoid and problems to resolve while designing your research study.

Table of Contents

What Is Experimental Research Design?

Experimental research design is a framework of protocols and procedures created to conduct experimental research with a scientific approach using two sets of variables. Herein, the first set of variables acts as a constant, used to measure the differences of the second set. The best example of experimental research methods is quantitative research .

Experimental research helps a researcher gather the necessary data for making better research decisions and determining the facts of a research study.

When Can a Researcher Conduct Experimental Research?

A researcher can conduct experimental research in the following situations —

  • When time is an important factor in establishing a relationship between the cause and effect.
  • When there is an invariable or never-changing behavior between the cause and effect.
  • Finally, when the researcher wishes to understand the importance of the cause and effect.

Importance of Experimental Research Design

To publish significant results, choosing a quality research design forms the foundation to build the research study. Moreover, effective research design helps establish quality decision-making procedures, structures the research to lead to easier data analysis, and addresses the main research question. Therefore, it is essential to cater undivided attention and time to create an experimental research design before beginning the practical experiment.

By creating a research design, a researcher is also giving oneself time to organize the research, set up relevant boundaries for the study, and increase the reliability of the results. Through all these efforts, one could also avoid inconclusive results. If any part of the research design is flawed, it will reflect on the quality of the results derived.

Types of Experimental Research Designs

Based on the methods used to collect data in experimental studies, the experimental research designs are of three primary types:

1. Pre-experimental Research Design

A research study could conduct pre-experimental research design when a group or many groups are under observation after implementing factors of cause and effect of the research. The pre-experimental design will help researchers understand whether further investigation is necessary for the groups under observation.

Pre-experimental research is of three types —

  • One-shot Case Study Research Design
  • One-group Pretest-posttest Research Design
  • Static-group Comparison

2. True Experimental Research Design

A true experimental research design relies on statistical analysis to prove or disprove a researcher’s hypothesis. It is one of the most accurate forms of research because it provides specific scientific evidence. Furthermore, out of all the types of experimental designs, only a true experimental design can establish a cause-effect relationship within a group. However, in a true experiment, a researcher must satisfy these three factors —

  • There is a control group that is not subjected to changes and an experimental group that will experience the changed variables
  • A variable that can be manipulated by the researcher
  • Random distribution of the variables

This type of experimental research is commonly observed in the physical sciences.

3. Quasi-experimental Research Design

The word “Quasi” means similarity. A quasi-experimental design is similar to a true experimental design. However, the difference between the two is the assignment of the control group. In this research design, an independent variable is manipulated, but the participants of a group are not randomly assigned. This type of research design is used in field settings where random assignment is either irrelevant or not required.

The classification of the research subjects, conditions, or groups determines the type of research design to be used.

experimental research design

Advantages of Experimental Research

Experimental research allows you to test your idea in a controlled environment before taking the research to clinical trials. Moreover, it provides the best method to test your theory because of the following advantages:

  • Researchers have firm control over variables to obtain results.
  • The subject does not impact the effectiveness of experimental research. Anyone can implement it for research purposes.
  • The results are specific.
  • Post results analysis, research findings from the same dataset can be repurposed for similar research ideas.
  • Researchers can identify the cause and effect of the hypothesis and further analyze this relationship to determine in-depth ideas.
  • Experimental research makes an ideal starting point. The collected data could be used as a foundation to build new research ideas for further studies.

6 Mistakes to Avoid While Designing Your Research

There is no order to this list, and any one of these issues can seriously compromise the quality of your research. You could refer to the list as a checklist of what to avoid while designing your research.

1. Invalid Theoretical Framework

Usually, researchers miss out on checking if their hypothesis is logical to be tested. If your research design does not have basic assumptions or postulates, then it is fundamentally flawed and you need to rework on your research framework.

2. Inadequate Literature Study

Without a comprehensive research literature review , it is difficult to identify and fill the knowledge and information gaps. Furthermore, you need to clearly state how your research will contribute to the research field, either by adding value to the pertinent literature or challenging previous findings and assumptions.

3. Insufficient or Incorrect Statistical Analysis

Statistical results are one of the most trusted scientific evidence. The ultimate goal of a research experiment is to gain valid and sustainable evidence. Therefore, incorrect statistical analysis could affect the quality of any quantitative research.

4. Undefined Research Problem

This is one of the most basic aspects of research design. The research problem statement must be clear and to do that, you must set the framework for the development of research questions that address the core problems.

5. Research Limitations

Every study has some type of limitations . You should anticipate and incorporate those limitations into your conclusion, as well as the basic research design. Include a statement in your manuscript about any perceived limitations, and how you considered them while designing your experiment and drawing the conclusion.

6. Ethical Implications

The most important yet less talked about topic is the ethical issue. Your research design must include ways to minimize any risk for your participants and also address the research problem or question at hand. If you cannot manage the ethical norms along with your research study, your research objectives and validity could be questioned.

Experimental Research Design Example

In an experimental design, a researcher gathers plant samples and then randomly assigns half the samples to photosynthesize in sunlight and the other half to be kept in a dark box without sunlight, while controlling all the other variables (nutrients, water, soil, etc.)

By comparing their outcomes in biochemical tests, the researcher can confirm that the changes in the plants were due to the sunlight and not the other variables.

Experimental research is often the final form of a study conducted in the research process which is considered to provide conclusive and specific results. But it is not meant for every research. It involves a lot of resources, time, and money and is not easy to conduct, unless a foundation of research is built. Yet it is widely used in research institutes and commercial industries, for its most conclusive results in the scientific approach.

Have you worked on research designs? How was your experience creating an experimental design? What difficulties did you face? Do write to us or comment below and share your insights on experimental research designs!

Frequently Asked Questions

Randomization is important in an experimental research because it ensures unbiased results of the experiment. It also measures the cause-effect relationship on a particular group of interest.

Experimental research design lay the foundation of a research and structures the research to establish quality decision making process.

There are 3 types of experimental research designs. These are pre-experimental research design, true experimental research design, and quasi experimental research design.

The difference between an experimental and a quasi-experimental design are: 1. The assignment of the control group in quasi experimental research is non-random, unlike true experimental design, which is randomly assigned. 2. Experimental research group always has a control group; on the other hand, it may not be always present in quasi experimental research.

Experimental research establishes a cause-effect relationship by testing a theory or hypothesis using experimental groups or control variables. In contrast, descriptive research describes a study or a topic by defining the variables under it and answering the questions related to the same.

' src=

good and valuable

Very very good

Good presentation.

Rate this article Cancel Reply

Your email address will not be published.

recommendation in experimental research

Enago Academy's Most Popular Articles

What is Academic Integrity and How to Uphold it [FREE CHECKLIST]

Ensuring Academic Integrity and Transparency in Academic Research: A comprehensive checklist for researchers

Academic integrity is the foundation upon which the credibility and value of scientific findings are…

7 Step Guide for Optimizing Impactful Research Process

  • Publishing Research
  • Reporting Research

How to Optimize Your Research Process: A step-by-step guide

For researchers across disciplines, the path to uncovering novel findings and insights is often filled…

Launch of "Sony Women in Technology Award with Nature"

  • Industry News
  • Trending Now

Breaking Barriers: Sony and Nature unveil “Women in Technology Award”

Sony Group Corporation and the prestigious scientific journal Nature have collaborated to launch the inaugural…

Guide to Adhere Good Research Practice (FREE CHECKLIST)

Achieving Research Excellence: Checklist for good research practices

Academia is built on the foundation of trustworthy and high-quality research, supported by the pillars…

ResearchSummary

  • Promoting Research

Plain Language Summary — Communicating your research to bridge the academic-lay gap

Science can be complex, but does that mean it should not be accessible to the…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

Research Recommendations – Guiding policy-makers for evidence-based decision making

recommendation in experimental research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

recommendation in experimental research

As a researcher, what do you consider most when choosing an image manipulation detector?

National Academies Press: OpenBook

Undergraduate Research Experiences for STEM Students: Successes, Challenges, and Opportunities (2017)

Chapter: 9 conclusions and recommendations, 9 conclusions and recommendations.

Practitioners designing or improving undergraduate research experiences (UREs) can build on the experiences of colleagues and learn from the increasingly robust literature about UREs and the considerable body of evidence about how students learn. The questions practitioners ask themselves during the design process should include questions about the goals of the campus, program, faculty, and students. Other factors to consider when designing a URE include the issues raised in the conceptual framework for learning and instruction, the available resources, how the program or experience will be evaluated or studied, and how to design the program from the outset to incorporate these considerations, as well as how to build in opportunities to improve the experience over time in light of new evidence. (Some of these topics are addressed in Chapter 8 .)

Colleges and universities that offer or wish to offer UREs to their students should undertake baseline evaluations of their current offerings and create plans to develop a culture of improvement in which faculty are supported in their efforts to continuously refine UREs based on the evidence currently available and evidence that they and others generate in the future. While much of the evidence to date is descriptive, it forms a body of knowledge that can be used to identify research questions about UREs, both those designed around the apprenticeship model and those designed using the more recent course-based undergraduate research experience (CURE) model. Internships and other avenues by which undergraduates do research provide many of the same sorts of experiences but are not well studied. In any case, it is clear that students value these experiences; that many faculty do as well; and that they contribute to broadening participation in science,

technology, engineering, and mathematics (STEM) education and careers. The findings from the research literature reported in Chapter 4 provide guidance to those designing both opportunities to improve practical and academic skills and opportunities for students to “try out” a professional role of interest.

Little research has been done that provides answers to mechanistic questions about how UREs work. Additional studies are needed to know which features of UREs are most important for positive outcomes with which students and to gain information about other questions of this type. This additional research is needed to better understand and compare different strategies for UREs designed for a diversity of students, mentors, and institutions. Therefore, the committee recommends steps that could increase the quantity and quality of evidence available in the future and makes recommendations for how faculty, departments, and institutions might approach decisions about UREs using currently available information. Multiple detailed recommendations about the kinds of research that might be useful are provided in the research agenda in Chapter 7 .

In addition to the specific research recommended in Chapter 7 , in this chapter the committee provides a series of interrelated conclusions and recommendations related to UREs for the STEM disciplines and intended to highlight the issues of primary importance to administrators, URE program designers, mentors to URE students, funders of UREs, those leading the departments and institutions offering UREs, and those conducting research about UREs. These conclusions and recommendations are based on the expert views of the committee and informed by their review of the available research, the papers commissioned for this report, and input from presenters during committee meetings. Table 9-1 defines categories of these URE “actors,” gives examples of specific roles included in each category, specifies key URE actions for which that category is responsible, and lists the conclusions and recommendations the committee views as most relevant to that actor category.

RESEARCH ON URES

Conclusion 1: The current and emerging landscape of what constitutes UREs is diverse and complex. Students can engage in STEM-based undergraduate research in many different ways, across a variety of settings, and along a continuum that extends and expands upon learning opportunities in other educational settings. The following characteristics define UREs. Due to the variation in the types of UREs, not all experiences include all of the following characteristics in the same way; experiences vary in how much a particular characteristic is emphasized.

TABLE 9-1 Audiences for Committee’s Conclusions and Recommendations

  • They engage students in research practices including the ability to argue from evidence.
  • They aim to generate novel information with an emphasis on discovery and innovation or to determine whether recent preliminary results can be replicated.
  • They focus on significant, relevant problems of interest to STEM researchers and, in some cases, a broader community (e.g., civic engagement).
  • They emphasize and expect collaboration and teamwork.
  • They involve iterative refinement of experimental design, experimental questions, or data obtained.
  • They allow students to master specific research techniques.
  • They help students engage in reflection about the problems being investigated and the work being undertaken to address those problems.
  • They require communication of results, either through publication or presentations in various STEM venues.
  • They are structured and guided by a mentor, with students assuming increasing ownership of some aspects of the project over time.

UREs are generally designed to add value to STEM offerings by promoting an understanding of the ways that knowledge is generated in STEM fields and to extend student learning beyond what happens in the small group work of an inquiry-based course. UREs add value by enabling students to understand and contribute to the research questions that are driving the field for one or more STEM topics or to grapple with design challenges of interest to professionals. They help students understand what it means to be a STEM researcher in a way that would be difficult to convey in a lecture course or even in an inquiry-based learning setting. As participants in a URE, students can learn by engaging in planning, experimentation, evaluation, interpretation, and communication of data and other results in light of what is already known about the question of interest. They can pose relevant questions that can be solved only through investigative or design efforts—individually or in teams—and attempt to answer these questions despite the challenges, setbacks, and ambiguity of the process and the results obtained.

The diversity of UREs reflects the reality that different STEM disciplines operate from varying traditions, expectations, and constraints (e.g., lab safety issues) in providing opportunities for undergraduates to engage in research. In addition, individual institutions and departments have cultures that promote research participation to various degrees and at different stages in students’ academic careers. Some programs emphasize design and problem solving in addition to discovery. UREs in different disciplines can

take many forms (e.g., apprentice-style, course-based, internships, project-based), but the definitional characteristics described above are similar across different STEM fields.

Furthermore, students in today’s university landscape may have opportunities to engage with many different types of UREs throughout their education, including involvement in a formal program (which could include mentoring, tutoring, research, and seminars about research), an apprentice-style URE under the guidance of an individual or team of faculty members, an internship, or enrolling in one or more CUREs or in a consortium- or project-based program.

Conclusion 2: Research on the efficacy of UREs is still in the early stages of development compared with other interventions to improve undergraduate STEM education.

  • The types of UREs are diverse, and their goals are even more diverse. Questions and methodologies used to investigate the roles and effectiveness of UREs in achieving those goals are similarly diverse.
  • Most of the studies of UREs to date are descriptive case studies or use correlational designs. Many of these studies report positive outcomes from engagement in a URE.
  • Only a small number of studies have employed research designs that can support inferences about causation. Most of these studies find evidence for a causal relationship between URE participation and subsequent persistence in STEM. More studies are needed to provide evidence that participation in UREs is a causal factor in a range of desired student outcomes.

Taking the entire body of evidence into account, the committee concludes that the published peer-reviewed literature to date suggests that participation in a URE is beneficial for students .

As discussed in the report’s Introduction (see Chapter 1 ) and in the research agenda (see Chapter 7 ), the committee considered descriptive, causal, and mechanistic questions in our reading of the literature on UREs. Scientific approaches to answering descriptive, causal, and mechanistic questions require deciding what to look for, determining how to examine it, and knowing appropriate ways to score or quantify the effect.

Descriptive questions ask what is happening without making claims as to why it is happening—that is, without making claims as to whether the research experience caused these changes. A descriptive statement about UREs only claims that certain changes occurred during or after the time the students were engaged in undergraduate research. Descriptive studies

cannot determine whether any benefits observed were caused by participation in the URE.

Causal questions seek to discover whether a specific intervention leads to a specific outcome, other things being equal. To address such questions, causal evidence can be generated from a comparison of carefully selected groups that do and do not experience UREs. The groups can be made roughly equivalent by random assignment (ensuring that URE and non-URE groups are the same on average as the sample size increases) or by controlling for an exhaustive set of characteristics and experiences that might render the groups different prior to the URE. Other quasi-experimental strategies can also be used. Simply comparing students who enroll in a URE with students who do not is not adequate for determining causality because there may be selection bias. For example, students already interested in STEM are more likely to seek out such opportunities and more likely to be selected for such programs. Instead the investigator would have to compare future enrollment patterns (or other measures) between closely matched students, some of whom enrolled in a URE and some of whom did not. Controlling for selection bias to enable an inference about causation can pose significant challenges.

Questions of mechanism or of process also can be explored to understand why a causal intervention leads to the observed effect. Perhaps the URE enhances a student’s confidence in her ability to succeed in her chosen field or deepens her commitment to the field by exposing her to the joy of discovery. Through these pathways that act on the participant’s purposive behavior, the URE enhances the likelihood that she persists in STEM. The question for the researcher then becomes what research design would provide support for this hypothesis of mechanism over other candidate explanations for why the URE is a causal factor in STEM persistence.

The committee has examined the literature and finds a rich descriptive foundation for testable hypotheses about the effects of UREs on student outcomes. These studies are encouraging; a few of them have generated evidence that a URE can be a positive causal factor in the progression and persistence of STEM students. The weight of the evidence has been descriptive; it relies primarily on self-reports of short-term gains by students who chose to participate in UREs and does not include direct measures of changes in the students’ knowledge, skills, or other measures of success across comparable groups of students who did and did not participate in UREs.

While acknowledging the scarcity of strong causal evidence on the benefits of UREs, the committee takes seriously the weight of the descriptive evidence. Many of the published studies of UREs show that students who participate report a range of benefits, such as increased understanding of the research process, encouragement to persist in STEM, and support that helps them sustain their identity as researchers and continue with their

plans to enroll in a graduate program in STEM (see Chapter 4 ). These are effective starting points for causal studies.

Conclusion 3: Studies focused on students from historically underrepresented groups indicate that participation in UREs improves their persistence in STEM and helps to validate their disciplinary identity.

Various UREs have been specifically designed to increase the number of historically underrepresented students who go on to become STEM majors and ultimately STEM professionals. While many UREs offer one or more supplemental opportunities to support students’ academic or social success, such as mentoring, tutoring, summer bridge programs, career or graduate school workshops, and research-oriented seminars, those designed for underrepresented students appear to emphasize such features as integral and integrated components of the program. In particular, studies of undergraduate research programs targeting underrepresented minority students have begun to document positive outcomes such as degree completion and persistence in interest in STEM careers ( Byars-Winston et al., 2015 ; Chemers et al., 2011 ; Jones et al., 2010 ; Nagda et al., 1998 ; Schultz et al., 2011 ). Most of these studies collected data on apprentice-style UREs, in which the undergraduate becomes a functioning member of a research group along with the graduate students, postdoctoral fellows, and mentor.

Recommendation 1: Researchers with expertise in education research should conduct well-designed studies in collaboration with URE program directors to improve the evidence base about the processes and effects of UREs. This research should address how the various components of UREs may benefit students. It should also include additional causal evidence for the individual and additive effects of outcomes from student participation in different types of UREs. Not all UREs need be designed to undertake this type of research, but it would be very useful to have some UREs that are designed to facilitate these efforts to improve the evidence base .

As the focus on UREs has grown, so have questions about their implementation. Many articles have been published describing specific UREs (see Chapter 2 ). Large amounts of research have also been undertaken to explore more generally how students learn, and the resulting body of evidence has led to the development and adoption of “active learning” strategies and experiences. If a student in a URE has an opportunity to, for example, analyze new data or to reformulate a hypothesis in light of the student’s analysis, this activity fits into the category that is described as active learning. Surveys of student participants and unpublished evaluations pro-

vide additional information about UREs but do not establish causation or determine the mechanism(s). Consequently, little is currently known about the mechanisms of precisely how UREs work and which aspects of UREs are most powerful. Important components that have been reported include student ownership of the URE project, time to tackle a question iteratively, and opportunities to report and defend one’s conclusions ( Hanauer and Dolan, 2014 ; Thiry et al., 2011 ).

There are many unanswered questions and opportunities for further research into the role and mechanism of UREs. Attention to research design as UREs are planned is important; more carefully designed studies are needed to understand the ways that UREs influence a student’s education and to evaluate the outcomes that have been reported for URE participants. Appropriate studies, which include matched samples or similar controls, would facilitate research on the ways that UREs benefit students, enabling both education researchers and implementers of UREs to determine optimal features for program design and giving the community a more robust understanding of how UREs work.

See the research agenda ( Chapter 7 ) for specific recommendations about research topics and approaches.

Recommendation 2: Funders should provide appropriate resources to support the design, implementation, and analysis of some URE programs that are specifically designed to enable detailed research establishing the effects on participant outcomes and on other variables of interest such as the consequences for mentors or institutions.

Not all UREs need to be the subject of extensive study. In many cases, a straightforward evaluation is adequate to determine whether the URE is meeting its goals. However, to achieve more widespread improvement in both the types and quality of the UREs offered in the future, additional evidence about the possible causal effects and mechanisms of action of UREs needs to be systematically collected and disseminated. This includes a better understanding of the implementation differences for a variety of institutions (e.g., community colleges, primarily undergraduate institutions, research universities) to ensure that the desired outcomes can translate across settings. Increasing the evidence about precisely how UREs work and which aspects of UREs are most powerful will require careful attention to study design during planning for the UREs.

Not all UREs need to be designed to achieve this goal; many can provide opportunities to students by relying on pre-existing knowledge and iterative improvement as that knowledge base grows. However, for the knowledge base to grow, funders must provide resources for some URE designers and social science researchers to undertake thoughtful and well-planned studies

on causal and mechanistic issues. This will maximize the chances for the creation and dissemination of information that can lead to the development of sustainable and effective UREs. These studies can result from a partnership formed as the URE is designed and funded, or evaluators and social scientists could identify promising and/or effective existing programs and then raise funds on their own to support the study of those programs to answer the questions of interest. In deciding upon the UREs that are chosen for these extensive studies, it will be important to consider whether, collectively, they are representative of UREs in general. For example, large and small UREs at large and small schools targeted at both introductory and advanced students and topics should be studied.

CONSTRUCTION OF URES

Conclusion 4: The committee was unable to find evidence that URE designers are taking full advantage of the information available in the education literature on strategies for designing, implementing, and evaluating learning experiences. STEM faculty members do not generally receive training in interpreting or conducting education research. Partnerships between those with expertise in education research and those with expertise in implementing UREs are one way to strengthen the application of evidence on what works in planning and implementing UREs.

As discussed in Chapters 3 and 4 , there is an extensive body of literature on pedagogy and how people learn; helping STEM faculty to access the existing literature and incorporate those concepts as they design UREs could improve student experiences. New studies that specifically focus on UREs may provide more targeted information that could be used to design, implement, sustain, or scale up UREs and facilitate iterative improvements. Information about the features of UREs that elicit particular outcomes or best serve certain populations of students should be considered when implementing a new instantiation of an existing model of a URE or improving upon an existing URE model.

Conclusion 5: Evaluations of UREs are often conducted to inform program providers and funders; however, they may not be accessible to others. While these evaluations are not designed to be research studies and often have small sample sizes, they may contain information that could be useful to those initiating new URE programs and those refining UREs. Increasing access to these evaluations and to the accumulated experience of the program providers may enable URE designers and implementers to build upon knowledge gained from earlier UREs.

As discussed in Chapter 1 , the committee searched for evaluations of URE programs in several different ways but was not able to locate many published evaluations to study. Although some evaluations were found in the literature, the committee could not determine a way to systematically examine the program evaluations that have been prepared. The National Science Foundation and other funders generally require grant recipients to submit evaluation data, but that information is not currently aggregated and shared publicly, even for programs that are using a common evaluation tool. 1

Therefore, while program evaluation likely serves a useful role in providing descriptive data about a program for the institutions and funders supporting the program, much of the summative evaluation work that has been done to date adds relatively little to the broader knowledge base and overall conversations around undergraduate research. Some of the challenges of evaluation include budget and sample size constraints.

Similarly, it is difficult for designers of UREs to benefit systematically from the work of others who have designed and run UREs in the past because of the lack of an easy and consistent mechanism for collecting, analyzing, and sharing data. If these evaluations were more accessible they might be beneficial to others designing and evaluating UREs by helping them to gather ideas and inspiration from the experiences of others. A few such stories are provided in this report, and others can be found among the many resources offered by the Council on Undergraduate Research 2 and on other websites such as CUREnet. 3

Recommendation 3: Designers of UREs should base their design decisions on sound evidence. Consultations with education and social science researchers may be helpful as designers analyze the literature and make decisions on the creation or improvement of UREs. Professional development materials should be created and made available to faculty. Educational and disciplinary societies should consider how they can provide resources and connections to those working on UREs.

Faculty and other organizers of UREs can use the expanding body of scholarship as they design or improve the programs and experiences offered to their students. URE designers will need to make decisions about how to adapt approaches reported in the literature to make the programs they develop more suitable to their own expertise, student population(s), and available resources. Disciplinary societies and other national groups, such as those focused on improving pedagogy, can play important roles in

___________________

1 Personal knowledge of Janet Branchaw, member of the Committee on Strengthening Research Experiences for Undergraduate STEM Students.

2 See www.cur.org [November 2016].

3 See ( curenet.cns.utexas.edu ) [November 2016].

bringing these issues to the forefront through events at their national and regional meetings and through publications in their journals and newsletters. They can develop repositories for various kinds of resources appropriate for their members who are designing and implementing UREs. The ability to travel to conferences and to access and discuss resources created by other individuals and groups is a crucial aspect of support (see Recommendations 7 and 8 for further discussion).

See Chapter 8 for specific questions to consider when one is designing or implementing UREs.

CURRENT OFFERINGS

Conclusion 6: Data at the institutional, state, or national levels on the number and type of UREs offered, or who participates in UREs overall or at specific types of institutions, have not been collected systematically. Although the committee found that some individual institutions track at least some of this type of information, we were unable to determine how common it is to do so or what specific information is most often gathered.

There is no one central database or repository that catalogs UREs at institutions of higher education, the nature of the research experiences they provide, or the relevant demographics (student, departmental, and institutional). The lack of comprehensive data makes it difficult to know how many students participate in UREs; where UREs are offered; and if there are gaps in access to UREs across different institutional types, disciplines, or groups of students. One of the challenges of describing the undergraduate research landscape is that students do not have to be enrolled in a formal program to have a research experience. Informal experiences, for example a work-study job, are typically not well documented. Another challenge is that some students participate in CUREs or other research experiences (such as internships) that are not necessarily labeled as such. Institutional administrators may be unaware of CUREs that are already part of their curriculum. (For example, establishment of CUREs may be under the purview of a faculty curriculum committee and may not be recognized as a distinct program.) Student participation in UREs may occur at their home institution or elsewhere during the summer. Therefore, it is very difficult for a science department, and likely any other STEM department, to know what percentage of their graduating majors have had a research experience, let alone to gather such information on students who left the major. 4

4 This point was made by Marco Molinaro, University of California, Davis, in a presentation to the Committee on Strengthening Research Experience for Undergraduate STEM Students, September 16, 2015.

Conclusion 7: While data are lacking on the precise number of students engaged in UREs, there is some evidence of a recent growth in course-based undergraduate research experiences (CUREs), which engage a cohort of students in a research project as part of a formal academic experience.

There has been an increase in the number of grants and the dollar amount spent on CUREs over the past decade (see Chapter 3 ). CUREs can be particularly useful in scaling UREs to reach a much larger population of students ( Bangera and Brownell, 2014 ). By using a familiar mechanism—enrollment in a course—a CURE can provide a more comfortable route for students unfamiliar with research to gain their first experience. CUREs also can provide such experiences to students with diverse backgrounds, especially if an institution or department mandates participation sometime during a student’s matriculation. Establishing CUREs may be more cost-effective at schools with little on-site research activity. However, designing a CURE is a new and time-consuming challenge for many faculty members. Connecting to nationally organized research networks can provide faculty with helpful resources for the development of a CURE based around their own research or a local community need, or these networks can link interested faculty to an ongoing collaborative project. Collaborative projects can provide shared curriculum, faculty professional development and community, and other advantages when starting or expanding a URE program. See the discussion in the report from a convocation on Integrating Discovery-based Research into the Undergraduate Curriculum ( National Academies of Sciences, Engineering, and Medicine, 2015 ).

Recommendation 4: Institutions should collect data on student participation in UREs to inform their planning and to look for opportunities to improve quality and access.

Better tracking of student participation could lead to better assessment of outcomes and improved quality of experience. Such metrics could be useful for both prospective students and campus planners. An integrated institutional system for research opportunities could facilitate the creation of tiered research experiences that allow students to progress in skills and responsibility and create support structures for students, providing, for example, seminars in communications, safety, and ethics for undergraduate researchers. Institutions could also use these data to measure the impact of UREs on student outcomes, such as student success rates in introductory courses, retention in STEM degree programs, and completion of STEM degrees.

While individual institutions may choose to collect additional information depending on their goals and resources, relevant student demographics

and the following design elements would provide baseline data. At a minimum, such data should include

  • Type of URE;
  • Each student’s discipline;
  • Duration of the experience;
  • Hours spent per week;
  • When the student began the URE (e.g., first year, capstone);
  • Compensation status (e.g., paid, unpaid, credit); and
  • Location and format (e.g., on home campus, on another campus, internship, co-op).

National aggregation of some of the student participation variables collected by various campuses might be considered by funders. The existing Integrated Postsecondary Education Data System database, organized by the National Center for Education Statistics at the U.S. Department of Education, may be a suitable repository for certain aspects of this information.

Recommendation 5: Administrators and faculty at all types of colleges and universities should continually and holistically evaluate the range of UREs that they offer. As part of this process, institutions should:

  • Consider how best to leverage available resources (including off-campus experiences available to students and current or potential networks or partnerships that the institution may form) when offering UREs so that they align with their institution’s mission and priorities;
  • Consider whether current UREs are both accessible and welcoming to students from various subpopulations across campus (e.g., historically underrepresented students, first generation college students, those with disabilities, non-STEM majors, prospective kindergarten-through-12th-grade teachers); and
  • Gather and analyze data on the types of UREs offered and the students who participate, making this information widely available to the campus community and using it to make evidence-based decisions about improving opportunities for URE participation. This may entail devising or implementing systems for tracking relevant data (see Conclusion 4 ).

Resources available for starting, maintaining, and expanding UREs vary from campus to campus. At some campuses, UREs are a central focus and many resources are devoted to them. At other institutions—for example, many community colleges—UREs are seen as extra, and new resources may be required to ensure availability of courses and facilities. Resource-

constrained institutions may need to focus more on ensuring that students are aware of potential UREs that already exist on campus and elsewhere in near proximity to campus. All institutional discussions about UREs must consider both the financial resources and physical resources (e.g., laboratories, field stations, engineering design studios) required, while remembering that faculty time is a crucial resource. The incentives and disincentives for faculty to spend time on UREs are significant. Those institutions with an explicit mission to promote undergraduate research may provide more recognition and rewards to departments and faculty than those with another focus. The culture of the institution with respect to innovation in pedagogy and support for faculty development also can have a major influence on the extent to which UREs are introduced or improved.

Access to UREs may vary across campus and by department, and participation in UREs may vary across student groups. It is important for campuses to consider the factors that may facilitate or discourage students from participation in UREs. Inconsistent procedures or a faculty preference for students with high grades or previous research experience may limit options for some student populations.

UREs often grow based on the initiative of individual faculty members and other personnel, and an institution may not have complete or even rudimentary knowledge of all of the opportunities available or whether there are gaps or inconsistencies in its offerings. A uniform method for tracking the UREs available on a given campus would be useful to students and would provide a starting point for analyzing the options. Tracking might consist of notations in course listings and, where feasible, on student transcripts. Analysis might consider the types of UREs offered, the resources available to each type of URE, and variations within or between various disciplines and programs. Attention to whether all students or groups of students have appropriate access to UREs would foster consideration of how to best allocate resources and programming on individual campuses, in order to focus resources and opportunities where they are most needed.

Conclusion 8: The quality of mentoring can make a substantial difference in a student’s experiences with research. However, professional development in how to be a good mentor is not available to many faculty or other prospective mentors (e.g., graduate students, postdoctoral fellows).

Engagement in quality mentored research experiences has been linked to self-reported gains in research skills and productivity as well as retention in STEM (see Chapter 5 ). Quality mentoring in UREs has been shown

to increase persistence in STEM for historically underrepresented students ( Hernandez et al., 2016 ). In addition, poor mentoring during UREs has been shown to decrease retention of students ( Hernandez et al., 2016 ).

More general research on good mentoring in the STEM environment has been positively associated with self-reported gains in identity as a STEM researcher, a sense of belonging, and confidence to function as a STEM researcher ( Byars-Winston et al., 2015 ; Chemers et al., 2011 ; Pfund et al., 2016 ; Thiry et al., 2011 ). The frequency and quality of mentee-mentor interactions has been associated with students’ reports of persistence in STEM, with mentoring directly or indirectly improving both grades and persistence in college. For students from historically underrepresented ethnic/racial groups, quality mentoring has been associated with self-reported enhanced recruitment into graduate school and research-related career pathways ( Byars-Winston et al., 2015 ). Therefore, it is important to ensure that faculty and mentors receive the proper development of mentoring skills.

Recommendation 6: Administrators and faculty at colleges and universities should ensure that all who mentor undergraduates in research experiences (this includes faculty, instructors, postdoctoral fellows, graduate students, and undergraduates serving as peer mentors) have access to appropriate professional development opportunities to help them grow and succeed in this role.

Although many organizations recognize effective mentors (e.g., the National Science Foundation’s Presidential Awards for Excellence in Science, Mathematics, and Engineering Mentoring), there currently are no standard criteria for selecting, evaluating, or recognizing mentors specifically for UREs. In addition, there are no requirements that mentors meet some minimum level of competency before engaging in mentoring or participate in professional development to obtain a baseline of knowledge and skills in mentoring, including cultural competence in mentoring diverse groups of students. Traditionally, the only experience required for being a mentor is having been mentored, regardless of whether the experience was negative or positive ( Handelsman et al., 2005 ; Pfund et al., 2015 ). Explicit consideration of how the relationships are formed, supported, and evaluated can improve mentor-mentee relationships. To ensure that the mentors associated with a URE are prepared appropriately, thereby increasing the chances of a positive experience for both mentors and mentees, all prospective mentors should prepare for their role. Available resources include the Entering Mentoring course (see Pfund et al., 2015 ) and the book Successful STEM Mentoring Initiative for Underrepresented Students ( Packard, 2016 ).

A person who is an ineffective mentor for one student might be inspiring for another, and the setting in which the mentoring takes place (e.g., a CURE or apprentice-style URE, a laboratory or field-research environment) may also influence mentor effectiveness. Thus, there should be some mechanism for monitoring such relationships during the URE, or there should be opportunity for a student who is unhappy with the relationship to seek other mentors. Indeed, cultivating a team of mentors with different experiences and expertise may be the best strategy for any student. A parallel volume to the Entering Mentoring curriculum mentioned above, Entering Research Facilitator’s Manual ( Branchaw et al., 2010 ), is designed to help students with their research mentor-mentee relationships and to coach them on building teams of mentors to guide them. As mentioned in Chapter 5 , the Entering Research curriculum also contains information designed to support a group of students as they go through their first apprentice-style research experience, each working in separate research groups and also meeting together as a cohort focused on learning about research.

PRIORITIES FOR THE FUTURE

Conclusion 9: The unique assets, resources, priorities, and constraints of the department and institution, in addition to those of individual mentors, impact the goals and structures of UREs. Schools across the country are showing considerable creativity in using unique resources, repurposing current assets, and leveraging student enthusiasm to increase research opportunities for their students.

Given current calls for UREs and the growing conversation about their benefits, an increasing number of two- and four-year colleges and universities are increasing their efforts to support undergraduate research. Departments, institutions, and individual faculty members influence the precise nature of UREs in multiple ways and at multiple levels. The physical resources available, including laboratories, field stations, and engineering design studios and testing facilities, make a difference, as does the ability to access resources in the surrounding community (including other parts of the campus). Institutions with an explicit mission to promote undergraduate research may provide more time, resources (e.g., financial, support personnel, space, equipment), and recognition and rewards to departments and faculty in support of UREs than do institutions without that mission. The culture of the institution with respect to innovation in pedagogy and support for faculty development also affects the extent to which UREs are introduced or improved.

Development of UREs requires significant time and effort. Whether or not faculty attempt to implement UREs can depend on whether departmental

or institutional reward and recognition systems compensate for or even recognize the time required to initiate and implement them. The availability of national consortia can help to alleviate many of the time and logistical problems but not those obstacles associated with recognition and resources.

It will be harder for faculty to find the time to develop UREs at institutions where they are required to teach many courses per semester, although in some circumstances faculty can teach CUREs that also advance their own research ( Shortlidge et al., 2016 ). Faculty at community colleges generally have the heaviest teaching expectations, little or no expectations or incentives to maintain a research program, limited access to lab or design space or to scientific and engineering journals, and few resources to undertake any kind of a research program. These constraints may limit the extent to which UREs can be offered to the approximately 40 percent of U.S. undergraduates who are enrolled in the nation’s community colleges (which collectively also serve the highest percentage of the nation’s underrepresented students). 5

Recommendation 7: Administrators and faculty at all types of colleges and universities should work together within and, where feasible, across institutions to create a culture that supports the development of evidence-based, iterative, and continuous refinement of UREs, in an effort to improve student learning outcomes and overall academic success. This should include the development, evaluation, and revision of policies and practices designed to create a culture supportive of the participation of faculty and other mentors in effective UREs. Policies should consider pedagogy, professional development, cross-cultural awareness, hiring practices, compensation, promotion (incentives, rewards), and the tenure process.

Colleges and universities that would like to expand or improve the UREs offered to their students should consider the campus culture and climate and the incentives that affect faculty choices. Those campuses that cultivate an environment supportive of the iterative and continuous refinement of UREs and that offer incentives for evaluation and evidence-based improvement of UREs seem more likely to sustain successful programs. Faculty and others who develop and implement UREs need support to be able to evaluate their courses or programs and to analyze evidence to make decisions about URE design. This kind of support may be fostered by expanding the mission of on-campus centers for learning and teaching to focus more on UREs or by providing incentives for URE developers from the natural sciences and engineering to collaborate with colleagues in the social sciences or colleges of education with expertise in designing studies

5 See http://nces.ed.gov/programs/coe/indicator_cha.asp [November 2016].

involving human subjects. Supporting closer communication between URE developers and the members of the campus Institutional Review Board may help projects to move forward more seamlessly. Interdepartmental and intercampus connections (especially those between two- and four-year institutions) can be valuable for linking faculty with the appropriate resources, colleagues, and diverse student populations. Faculty who have been active in professional development on how students learn in the classroom may have valuable experiences and expertise to share.

The refinement or expansion of UREs should build on evidence from data on student participation, pedagogy, and outcomes, which are integral components of the original design. As UREs are validated and refined, institutions should make efforts to facilitate connections among different departments and disciplines, including the creation of multidisciplinary UREs. Student engagement in learning in general, and with UREs more specifically, depends largely on the culture of the department and the institution and on whether students see their surroundings as inclusive and energetic places to learn and thrive. A study that examined the relationship between campus missions and the five benchmarks for effective educational practice (measured by the National Survey of Student Engagement) showed that different programs, policies, and approaches may work better, depending on the institution’s mission ( Kezar and Kinzie, 2006 ).

The Council on Undergraduate Research (2012) document Characteristics of Excellence in Undergraduate Research outlines several best practices for UREs based on the apprenticeship model (see Chapter 8 ). That document is not the result of a detailed analysis of the evidence but is based on the extensive experiences and expertise of the council’s members. It suggests that undergraduate research should be a normal part of the undergraduate experience regardless of the type of institution. It also identifies changes necessary to include UREs as part of the curriculum and culture changes necessary to support curricular reform, co-curricular activities, and modifications to the incentives and rewards for faculty to engage with undergraduate research. In addition, professional development opportunities specifically designed to help improve the pedagogical and mentoring skills of instructional staff in using evidence-based practices can be important for a supportive learning culture.

Recommendation 8: Administrators and faculty at all types of colleges and universities should work to develop strong and sustainable partnerships within and between institutions and with educational and professional societies for the purpose of sharing resources to facilitate the creation of sustainable URE programs.

Networks of faculty, institutions, regionally and nationally coordinated URE initiatives, professional societies, and funders should be strengthened

to facilitate the exchange of evidence and experience related to UREs. These networks could build on the existing work of professional societies that assist faculty with pedagogy. They can help provide a venue for considering the policy context and larger implications of increasing the number, size, and scope of UREs. Such networks also can provide a more robust infrastructure, to improve the sustainability and expansion of URE opportunities. The sharing of human, financial, scientific, and technical resources can strengthen the broad implementation of effective, high-quality, and more cost-efficient UREs. It may be especially important for community colleges and minority-serving institutions to engage in partnerships in order to expand the opportunities for undergraduates (both transfer and technical students) to participate in diverse UREs (see discussion in National Academies of Sciences, Engineering, and Medicine, 2015 , and Elgin et al., 2016 ). Consortia can facilitate the sharing of resources across disciplines and departments within the same institution or at different institutions, organizations, and agencies. Consortia that employ research methodologies in common can share curriculum, research data collected, and common assessment tools, lessening the time burden for individual faculty and providing a large pool of students from which to assess the efficacy of individual programs.

Changes in the funding climate can have substantial impacts on the types of programs that exist, iterative refinement of programs, and whether and how programs might be expanded to broaden participation by more undergraduates. For those institutions that have not yet established URE programs or are at the beginning phases of establishing one, mechanisms for achieving success and sustainability may include increased institutional ownership of programs of undergraduate research, development of a broad range of programs of different types and funding structures, formation of undergraduate research offices or repurposing some of the responsibilities and activities of those which already exist, and engagement in community promotion and dissemination of student accomplishments (e.g., student symposia, support for undergraduate student travel to give presentations at professional meetings).

Over time, institutions must develop robust plans for ensuring the long-term sustained funding of high-quality UREs. Those plans should include assuming that more fiscal responsibility for sustaining such efforts will be borne by the home institution as external support for such efforts decreases and ultimately ends. Building UREs into the curriculum and structure of a department’s courses and other programs, and thus its funding model, can help with sustainability. Partnerships with nonprofit organizations and industry, as well as seeking funding from diverse agencies, can also facilitate programmatic sustainability, especially if the UREs they fund can also support the mission and programs of the funders (e.g., through research internships or through CUREs that focus on community-

based research questions and challenges). Partnerships among institutions also may have greater potential to study and evaluate student outcomes from URE participation across broader demographic groups and to reduce overall costs through the sharing of administrative or other resources (such as libraries, microscopes, etc.).

Bangera, G., and Brownell, S.E. (2014). Course-based undergraduate research experiences can make scientific research more inclusive. CBE–Life Sciences Education , 13 (4), 602-606.

Branchaw, J.L., Pfund, C., and Rediske, R. (2010) Entering Research Facilitator’s Manual: Workshops for Students Beginning Research in Science . New York: Freeman & Company.

Byars-Winston, A.M., Branchaw, J., Pfund, C., Leverett, P., and Newton, J. (2015). Culturally diverse undergraduate researchers’ academic outcomes and perceptions of their research mentoring relationships. International Journal of Science Education , 37 (15), 2,533-2,554.

Chemers, M.M., Zurbriggen, E.L., Syed, M., Goza, B.K., and Bearman, S. (2011). The role of efficacy and identity in science career commitment among underrepresented minority students. Journal of Social Issues , 67 (3), 469-491.

Council on Undergraduate Research. (2012). Characteristics of Excellence in Undergraduate Research . Washington, DC: Council on Undergraduate Research.

Elgin, S.C.R., Bangera, G., Decatur, S.M., Dolan, E.L., Guertin, L., Newstetter, W.C., San Juan, E.F., Smith, M.A., Weaver, G.C., Wessler, S.R., Brenner, K.A., and Labov, J.B. 2016. Insights from a convocation: Integrating discovery-based research into the undergraduate curriculum. CBE–Life Sciences Education, 15 , 1-7.

Hanauer, D., and Dolan, E. (2014) The Project Ownership Survey: Measuring differences in scientific inquiry experiences, CBE–Life Sciences Education , 13 , 149-158.

Handelsman, J., Pfund, C., Lauffer, S.M., and Pribbenow, C.M. (2005). Entering Mentoring . Madison, WI: The Wisconsin Program for Scientific Teaching.

Hernandez, P.R., Estrada, M., Woodcock, A., and Schultz, P.W. (2016). Protégé perceptions of high mentorship quality depend on shared values more than on demographic match. Journal of Experimental Education. Available: http://www.tandfonline.com/doi/full/10.1080/00220973.2016.1246405 [November 2016].

Jones, P., Selby, D., and Sterling, S.R. (2010). Sustainability Education: Perspectives and Practice Across Higher Education . New York: Earthscan.

Kezar, A.J., and Kinzie, J. (2006). Examining the ways institutions create student engagement: The role of mission. Journal of College Student Development , 47 (2), 149-172.

National Academies of Sciences, Engineering, and Medicine. (2015). Integrating Discovery-Based Research into the Undergraduate Curriculum: Report of a Convocation . Washington, DC: National Academies Press.

Nagda, B.A., Gregerman, S.R., Jonides, J., von Hippel, W., and Lerner, J.S. (1998). Undergraduate student-faculty research partnerships affect student retention. Review of Higher Education, 22 , 55-72. Available: http://scholar.harvard.edu/files/jenniferlerner/files/nagda_1998_paper.pdf [February 2017].

Packard, P. (2016). Successful STEM Mentoring Initiatives for Underrepresented Students: A Research-Based Guide for Faculty and Administrators . Sterling, VA: Stylus.

Pfund, C., Branchaw, J.L., and Handelsman, J. (2015). Entering Mentoring: A Seminar to Train a New Generation of Scientists (2nd ed). New York: Macmillan Learning.

Pfund, C., Byars-Winston, A., Branchaw, J.L., Hurtado, S., and Eagan, M.K. (2016). Defining attributes and metrics of effective research mentoring relationships. AIDS and Behavior, 20 , 238-248.

Schultz, P.W., Hernandez, P.R., Woodcock, A., Estrada, M., Chance, R.C., Aguilar, M., and Serpe, R.T. (2011). Patching the pipeline reducing educational disparities in the sciences through minority training programs. Educational Evaluation and Policy Analysis , 33 (1), 95-114.

Shortlidge, E.E., Bangera, G., and Brownell, S.E. (2016). Faculty perspectives on developing and teaching course-based undergraduate research experiences. BioScience, 66 (1), 54-62.

Thiry, H., Laursen, S.L., and Hunter, A.B. (2011). What experiences help students become scientists? A comparative study of research and other sources of personal and professional gains for STEM undergraduates. Journal of Higher Education, 82 (4), 358-389.

This page intentionally left blank.

Undergraduate research has a rich history, and many practicing researchers point to undergraduate research experiences (UREs) as crucial to their own career success. There are many ongoing efforts to improve undergraduate science, technology, engineering, and mathematics (STEM) education that focus on increasing the active engagement of students and decreasing traditional lecture-based teaching, and UREs have been proposed as a solution to these efforts and may be a key strategy for broadening participation in STEM. In light of the proposals questions have been asked about what is known about student participation in UREs, best practices in UREs design, and evidence of beneficial outcomes from UREs.

Undergraduate Research Experiences for STEM Students provides a comprehensive overview of and insights about the current and rapidly evolving types of UREs, in an effort to improve understanding of the complexity of UREs in terms of their content, their surrounding context, the diversity of the student participants, and the opportunities for learning provided by a research experience. This study analyzes UREs by considering them as part of a learning system that is shaped by forces related to national policy, institutional leadership, and departmental culture, as well as by the interactions among faculty, other mentors, and students. The report provides a set of questions to be considered by those implementing UREs as well as an agenda for future research that can help answer questions about how UREs work and which aspects of the experiences are most powerful.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Selecting and Improving Quasi-Experimental Designs in Effectiveness and Implementation Research

Margaret a. handley.

1 Department of Epidemiology and Biostatistics, Division of Infectious Disease Epidemiology, University of California, San Francisco, CA

2 General Internal Medicine and UCSF Center for Vulnerable Populations, San Francisco Zuckerberg General Hospital and Trauma Center, University of California, San Francisco, CA, 1001 Potrero Avenue, Box 1364, San Francisco, CA 94110

Courtney Lyles

Charles mcculloch, adithya cattamanchi.

3 Division of Pulmonary and Critical Care Medicine and UCSF Center for Vulnerable Populations, San Francisco Zuckerberg General Hospital and Trauma Center, University of California, San Francisco, CA, 1001 Potrero Avenue, San Francisco, CA 94110

Interventional researchers face many design challenges when assessing intervention implementation in real-world settings. Intervention implementation requires ‘holding fast’ on internal validity needs while incorporating external validity considerations (such as uptake by diverse sub-populations, acceptability, cost, sustainability). Quasi-experimental designs (QEDs) are increasingly employed to achieve a better balance between internal and external validity. Although these designs are often referred to and summarized in terms of logistical benefits versus threats to internal validity, there is still uncertainty about: (1) how to select from among various QEDs, and (2) strategies to strengthen their internal and external validity. We focus on commonly used QEDs (pre-post designs with non-equivalent control groups, interrupted time series, and stepped wedge designs) and discuss several variants that maximize internal and external validity at the design, execution, and analysis stages.

INTRODUCTION

Public health practice involves implementation or adaptation of evidence-based interventions into new settings in order to improve health for individuals and populations. Such interventions typically include on one or more of the “7 Ps” (programs, practices, principles, procedures, products, pills, and policies) ( 9 ). Increasingly, both public health and clinical research have sought to generate practice-based evidence on a wide range of interventions, which in turn has led to a greater focus on intervention research designs that can be applied in real-world settings ( 2 , 8 , 9 , 20 , 25 , 26 , 10 , 2 ).

Randomized controlled trials (RCTs) in which individuals are assigned to intervention or control (standard-of-care or placebo) arms are considered the gold standard for assessing causality and as such are a first choice for most intervention research. Random allocation minimizes selection bias and maximizes the likelihood that measured and unmeasured confounding variables are distributed equally, enabling any difference in outcomes between intervention and control arms to be attributed to the intervention under study. RCTs can also involve random assignment of groups (e.g., clinics, worksites or communities) to intervention and control arms, but a large number of groups are required in order to realize the full benefits of randomization. Traditional RCTs strongly prioritize internal validity over external validity by employing strict eligibility criteria and rigorous data collection methods.

Alternative research methods are needed to test interventions for their effectiveness in many real-world settings—and later when evidence-based interventions are known, for spreading or scaling up these interventions to new settings and populations ( 23 , 40 ). In real-world settings, random allocation of the intervention may not be possible or fully under the control of investigators because of practical, ethical, social, or logistical constraints. For example, when partnering with communities or organizations to deliver a public health intervention, it might not be acceptable that only half of individuals or sites receive an intervention. As well, the timing of intervention roll-out might be determined by an external process outside the control of the investigator, such as a mandated policy. Also, when self-selected groups are expected to participate in a program as part of routine care, there would arise ethical concerns associated with random assignment – for example, the withholding or delaying of a potentially effective treatment or the provision of a less effective treatment for one group of participants ( 49 ). As described by Peters et al “implementation research seeks to understand and work within real world conditions, rather than trying to control for these conditions or to remove their influence as causal effects. “ ( 40 ). For all of these reasons, a blending of the design components of clinical effectiveness trials and implementation research is feasible and desirable, and this review covers both. Such blending of effectiveness and implementation components within a study can provide benefits beyond either research approach alone ( 14 ), for example by leading to faster uptake of interventions by simultaneously testing implementation strategies.

Since assessment of intervention effectiveness and implementation in real-world settings requires increased focus on external validity (including consideration of factors enhancing intervention uptake by diverse sub-populations, acceptability to a wide range of stakeholders, cost, and sustainability) ( 34 ), interventional research designs are needed that are more relevant to the potential, ‘hoped for’ treatment population than a RCT, and that achieve a better balance between internal and external validity. Quasi-experimental designs (QEDs), which first gained prominence in social science research ( 11 ), are increasingly being employed to fill this need. [ BOX 1 HERE: Definitions used in this review].

DEFINITIONS AND TERMS USED IN PAPER

QEDs test causal hypotheses but, in lieu of fully randomized assignment of the intervention, seek to define a comparison group or time period that reflects the counter-factual ( i.e., outcomes if the intervention had not been implemented) ( 43 ). QEDs seek to identify a comparison group or time period that is as similar as possible to the treatment group or time period in terms of baseline (pre-intervention) characteristics. QEDs can include partial randomization such as in stepped wedge designs (SWD) when there is pre-determined (and non-random) stratification of sites, but the order in which sites within each strata receive the intervention is assigned randomly. For example, strata that are determined by size or perceived ease of implementation may be assigned to receive the intervention first. However, within those strata the specific sites themselves are randomly selected to receive the intervention across the time intervals included in the study). In all cases, the key threat to internal validity of QEDs is a lack of similarity between the comparison and intervention groups or time periods due to differences in characteristics of the people, sites, or time periods involved.

Previous reviews in this journal have focused on the importance and use of QEDs and other methods to enhance causal inference when evaluating the impact of an intervention that has already been implemented ( 4 , 8 , 9 , 18 ). Design approaches in this case often include creating a post-hoc comparison group for a natural experiment or identifying pre and post-intervention data to then conduct an interrupted time series study. Analysis phase approaches often utilize techniques such as pre-post, regression adjustment, scores, difference-in-differences, synthetic controls, interrupted time series, regression discontinuity, and instrumental variables ( 4 , 9 , 18 ). Although these articles summarize key components of QEDs (e.g. interrupted time series), as well as analysis-focused strategies (regression adjustment, propensity scores, difference-in-differences, synthetic controls, and instrumental variables) there is still uncertainty about: (1) how to select from among various QEDs in the pre-implementation design phase, and (2) strategies to strengthen internal and external validity before and during the implementation phase.

In this paper we discuss the a priori choice of a QED when evaluating the impact of an intervention or policy for which the investigator has some element of design control related to 1) order of intervention allocation (including random and non-random approaches); 2) selecting sites or individuals; and/or 3) timing and frequency of data collection. In the next section, we discuss the main QEDs used for prospective evaluations of interventions in real-world settings and their advantages and disadvantages with respect to addressing threats to internal validity [ BOX 2 HERE Common Threats to Internal Validty of Quasi-Experimental Designs Evaluating Interventions in ‘Real World’ Settings]. Following this summary, we discuss opportunities to strengthen their internal validity, illustrated with examples from the literature. Then we propose a decision framework for key decision points that lead to different QED options. We conclude with a brief discussion of incorporating additional design elements to capture the full range of relevant implementation outcomes in order to maximize external validity.

Common Threats to Internal Validty of Quasi-Experimental Designs Evaluating Interventions in ‘Real World’ Settings

QUASI-EXPERIMENTAL DESIGNS FOR PROSPECTIVE EVALUTION OF INTERVENTIONS

Table 1 summarizes the main QEDs that have been used for prospective evaluation of health intervention in real-world settings; pre-post designs with a non-equivalent control group, interrupted time series and stepped wedge designs. We do not include pre-post designs without a control group in this review, as in general, QEDs are primarily those designs that identify a comparison group or time period that is as similar as possible to the treatment group or time period in terms of baseline (pre-intervention) characteristics ( 50 ). Below, we describe features of each QED, considering strengths and limitations and providing examples of their use.

Overview of Commonly Used QED in Intervention Research*

1. Pre-Post With Non-Equivalent Control Group

The first type of QED highlighted in this review is perhaps the most straightforward type of intervention design: the pre-post comparison study with a non-equivalent control group. In this design, the intervention is introduced at a single point in time to one or more sites, for which there is also a pre-test and post-test evaluation period, The pre-post differences between these two sites is then compared. In practice, interventions using this design are often delivered at a higher level, such as to entire communities or organizations 1 [ Figure 1 here]. In this design the investigators identify additional site(s) that are similar to the intervention site to serve as a comparison/control group. However, these control sites are different in some way than the intervention site(s) and thus the term “non-equivalent” is important, and clarifies that there are inherent differences in the treatment and control groups ( 15 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1671041-f0001.jpg

Illustration of the Pre-Post Non-Equivalent Control Group Design

The strengths of pre-post designs are mainly based in their simplicity, such as data collection is usually only at a few points (although sometimes more). However, pre-post designs can be affected by several of the threats to internal validity of QEDs presented here. The largest challenges are related to 1) ‘history bias’ in which events unrelated to the intervention occur (also referred to as secular trends) before or during the intervention period and have an effect on the outcome (either positive or negative) that are not related to the intervention ( 39 ); and 2) differences between the intervention and control sites because the non-equivalent control groups are likely to differ from the intervention sites in a number of meaningful ways that impact the outcome of interest and can bias results (selection bias).

At this design stage, the first step at improving internal validity would be focused on selection of a non-equivalent control group(s) for which some balance in the distribution of known risk factors is established. This can be challenging as there may not be adequate information available to determine how ‘equivalent’ the comparison group is regarding relevant covariates.

It can be useful to obtain pre-test data or baseline characteristics to improve the comparability of the two groups. In the most controlled situations within this design, the investigators might include elements of randomization or matching for individuals in the intervention or comparison site, to attempt to balance the covariate distribution. Implicit in this approach is the assumption that the greater the similarity between groups, the smaller the likelihood that confounding will threaten inferences of causality of effect for the intervention ( 33 , 47 ). Thus, it is important to select this group or multiple groups with as much specificity as possible.

In order to enhance the causal inference for pre-post designs with non-equivalent control groups, the best strategies improve the comparability of the control group with regards to potential covariates related to the outcome of interest but are not under investigation. One strategy involves creating a cohort, and then using targeted sampling to inform matching of individuals within the cohort. Matching can be based on demographic and other important factors (e.g. measures of health care access or time-period). This design in essence creates a matched, nested case-control design.

Collection of additional data once sites are selected cannot in itself reduce bias, but can inform the examination of the association of interest, and provide data supporting interpretation consistent with the reduced likelihood of bias. These data collection strategies include: 1) extra data collection points at additional pre- or post- time points (to get closer to an interrupted time series design in effect and examine potential threats of maturation and history bias), and 2) collection of data on other dependent variables with a priori assessment of how they will ‘react’ with time dependent variables. A detailed analysis can then provide information on the potential affects on the outcome of interest (to understand potential underlying threats due to history bias).

Additionally, there are analytic strategies that can improve the interpretation of this design, such as: 1) analysis for multiple non-equivalent control groups, to determine if the intervention effects are robust across different conditions or settings (.e.g. using sensitivity analysis), 2) examination within a smaller critical window of the study in which the intervention would be plausibly expected to make the most impact, and 3) identification of subgroups of individuals within the intervention community who are known to have received high vs. low exposure to the intervention, to be able to investigate a potential “dose-response” effect. Table 2 provides examples of studies using the pre-post non-equivalent control group designs that have employed one or more of these improvement approaches to improve the internal study’s validity.

Improving Quasi-Experimental Designs-Internal and External Validity Considerations

Cousins et al utilized a non-equivalent control selection strategy to leverage a recent cross-sectional survey among six universities in New Zealand regarding drinking among college-age students ( 16 ). In the original survey, there were six sites, and for the control group, five were selected to provide non-equivalent control group data for the one intervention campus. The campus intervention targeted young adult drinking-related problems and other outcomes, such as aggressive behavior, using an environmental intervention with a community liaison and a campus security program (also know as a Campus Watch program). The original cross-sectional survey was administered nationally to students using a web-based format, and was repeated in the years soon after the Campus Watch intervention was implemented in one site. Benefits of the design include: a consistent sampling frame at each control sites, such that sites could be combined as well as evaluated separately and collection of additional data on alcohol sales and consumption over the study period, to support inference. In a study by Wertz et al ( 48 ), a non-equivalent control group was created using matching for those who were eligible for a health coaching program and opted out of the program (to be compared with those who opted in) among insured patients with diabetes and/or hypertension. Matching was based on propensity scores among those patients using demographic and socioeconomic factors and medical center location and a longitudinal cohort was created prior to the intervention (see Basu et al 2017 for more on this approach).

In the pre-post malaria-prevention intervention example from Gambia, the investigators were studying the introduction of bed nets treated with insecticide on malaria rates in Gambia, and collected additional data to evaluate the internal validity assumptions within their design ( 1 ). In this study, the investigators introduced bed nets at the village level, using communities not receiving the bed nets as control sites. To strengthen the internal validity they collected additional data that enabled them to: 1) determine whether the reduction in malaria rates were most pronounced during the rainy season within the intervention communities, as this was a biologically plausible exposure period in which they could expect the largest effect size difference between intervention and control sites, and 2) examine use patterns for the bed nets, based on how much insecticide was present in the bed nets over time (after regular washing occurred), which aided in calculating a “dose-response” effect of exposure to the bed net among a subsample of individuals in the intervention community.

2. Interrupted Time Series

An interrupted time series (ITS) design involves collection of outcome data at multiple time points before and after an intervention is introduced at a given point in time at one or more sites ( 6 , 13 ). The pre-intervention outcome data is used to establish an underlying trend that is assumed to continue unchanged in the absence of the intervention under study ( i.e., the counterfactual scenario). Any change in outcome level or trend from the counter-factual scenario in the post-intervention period is then attributed to the impact of the intervention. The most basic ITS design utilizes a regression model that includes only three time-based covariates to estimate the pre-intervention slope (outcome trend before the intervention), a “step” or change in level (difference between observed and predicted outcome level at the first post-intervention time point), and a change in slope (difference between post- and pre-intervention outcome trend) ( 13 , 32 ) [ Figure 2 here].

An external file that holds a picture, illustration, etc.
Object name is nihms-1671041-f0002.jpg

Interrupted Time Series Design

Whether used for evaluating a natural experiment or, as is the focus here, for prospective evaluation of an intervention, the appropriateness of an ITS design depends on the nature of the intervention and outcome, and the type of data available. An ITS design requires the pre- and post-intervention periods to be clearly differentiated. When used prospectively, the investigator therefore needs to have control over the timing of the intervention. ITS analyses typically involve outcomes that are expected to change soon after an intervention is introduced or after a well-defined lag period. For example, for outcomes such as cancer or incident tuberculosis that develop long after an intervention is introduced and at a variable rate, it is difficult to clearly separate the pre- and post-intervention periods. Last, an ITS analysis requires at least three time points in the pre- and post-intervention periods to assess trends. In general, a larger number of time points is recommended, particularly when the expected effect size is smaller, data are more similar at closer together time points ( i.e., auto-correlation), or confounding effects ( e.g., seasonality) are present. It is also important for investigators to consider any changes to data collection or recording over time, particularly if such changes are associated with introduction of the intervention.

In comparison to simple pre-post designs in which the average outcome level is compared between the pre- and post-intervention periods, the key advantage of ITS designs is that they evaluate for intervention effect while accounting for pre-intervention trends. Such trends are common due to factors such as changes in the quality of care, data collection and recording, and population characteristics over time. In addition, ITS designs can increase power by making full use of longitudinal data instead of collapsing all data to single pre- and post-intervention time points. The use of longitudinal data can also be helpful for assessing whether intervention effects are short-lived or sustained over time.

While the basic ITS design has important strengths, the key threat to internal validity is the possibility that factors other than the intervention are affecting the observed changes in outcome level or trend. Changes over time in factors such as the quality of care, data collection and recording, and population characteristics may not be fully accounted for by the pre-intervention trend. Similarly, the pre-intervention time period, particularly when short, may not capture seasonal changes in an outcome.

Detailed reviews have been published of variations on the basic ITS design that can be used to enhance causal inference. In particular, the addition of a control group can be particularly useful for assessing for the presence of seasonal trends and other potential time-varying confounders ( 52 ). Zombre et al ( 52 ) maintained a large number of control number of sites during the extended study period and were able to look at variations in seasonal trends as well as clinic-level characteristics, such as workforce density and sustainability. In addition to including a control group, several analysis phase strategies can be employed to strengthen causal inference including adjustment for time varying confounders and accounting for auto correlation.

3. Stepped Wedge Designs

Stepped wedge designs (SWDs) involve a sequential roll-out of an intervention to participants (individuals or clusters) over several distinct time periods ( 5 , 7 , 22 , 24 , 29 , 30 , 38 ). SWDs can include cohort designs (with the same individuals in each cluster in the pre and post intervention steps), and repeated cross-sectional designs (with different individuals in each cluster in the pre and post intervention steps) ( 7 ). In the SWD, there is a unidirectional, sequential roll- out of an intervention to clusters (or individuals) that occurs over different time periods. Initially all clusters (or individuals) are unexposed to the intervention, and then at regular intervals, selected clusters cross over (or ‘step’) into a time period where they receive the intervention [ Figure 3 here]. All clusters receive the intervention by the last time interval (although not all individuals within clusters necessarily receive the intervention). Data is collected on all clusters such that they each contribute data during both control and intervention time periods. The order in which clusters receive the intervention can be assigned randomly or using some other approach when randomization is not possible. For example, in settings with geographically remote or difficult-to-access populations, a non-random order can maximize efficiency with respect to logistical considerations.

An external file that holds a picture, illustration, etc.
Object name is nihms-1671041-f0003.jpg

Illustration of the stepped wedge study design-Intervention Roll-Out Over Time*

* Adapted from Turner et al 2017

The practical and social benefits of the stepped wedge design have been summarized in recent reviews ( 5 , 22 , 24 , 27 , 29 , 36 , 38 , 41 , 42 , 45 , 46 , 51 ). In addition to addressing general concerns with RCTs discussed earlier, advantages of SWDs include the logistical convenience of staggered roll-out of the intervention, which enables a.smaller staff to be distributed across different implementation start times and allows for multi-level interventions to be integrated into practice or ‘real world’ settings (referred to as the feasibility benefit). This benefit also applies to studies of de-implementation, prior to a new approach being introduced. For example, with a staggered roll-out it is possible to build in a transition cohort, such that sites can adjust to the integration of the new intervention, and also allow for a switching over in sites to de-implementing a prior practice. For a specified time period there may be ‘mixed’ or incomplete data, which can be excluded from the data analysis. However, associated with a longer duration of roll-out for practical reasons such as this switching, are associated costs in threats to internal validity, discussed below.

There are several limitations to the SWD. These generally involve consequences of the trade-offs related to having design control for the intervention roll-out, often due to logistical reasons on the one hand, but then having ‘down the road’ threats to internal validity. These roll-out related threats include potential lagged intervention effects for non-acute outcomes; possible fatigue and associated higher drop-out rates of waiting for the cross-over among clusters assigned to receive the intervention later; fidelity losses for key intervention components over time; and potential contamination of later clusters ( 22 ). Another drawback of the SWD is that it involves data assessment at each point when a new cluster receives the intervention, substantially increasing the burden of data collection and costs unless data collection can be automated or uses existing data sources. Because the SWD often has more clusters receiving the intervention towards the end of the intervention period than in previous time periods, there is a potential concern that there can be temporal confounding at this stage. The SWD is also not as suited for evaluating intervention effects on delayed health outcomes (such as chronic disease incidence), and is most appropriate when outcomes that occur relatively soon after each cluster starts receiving the intervention. Finally, as logistical necessity often dictates selecting a design with smaller numbers of clusters, there are relatedly challenges in the statistical analysis. To use standard software, the common recommendation is to have at least 20 to 30 clusters ( 35 ).

Stepped wedge designs can embed improvements that can enhance internal validity, mimicking the strength of RCTs. These generally focus on efforts to either reduce bias or achieve balance in covariates across sites and over time; and/or compensate as much as possible for practical decisions made at the implementation stage, which affect the distribution of the intervention over time and by sites. The most widely used approaches are discussed in order of benefit to internal validity: 1) partial randomization; 2) stratification and matching; 3) embedding data collection at critical points in time, such as with a phasing-in of intervention components, and 4) creating a transition cohort or wash-out period. The most important of these SWD elements is random assignment of clusters as to when they will cross over into the intervention period. As well, utilizing data regarding time-varying covariates/confounders, either to stratify clusters and then randomize within strata (partial randomization) or to match clusters on known covariates in the absence of randomization, are techniques often employed to minimize bias and reduce confounding. Finally, maintaining control over the number and timing of data collection points over the study period can be beneficial in several ways. First, it can allow for data analysis strategies that can incorporate cyclical temporal trends (such as seasonality-mediated risk for the outcome, such as with flu or malaria) or other underlying temporal trends. Second, it can enable phased interventions to be studied for the contribution of different components included in the phases (e.g. passive then active intervention components), or can enable ‘pausing’ time, as when there is a structured wash out or transition cohort created for practical reasons (e.g. one intervention or practice is stopped/de-implemented, and a new one is introduced) (see Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1671041-f0004.jpg

Illustration of the stepped wedge study design- Summary of Exposed and Unexposed Cluster Time*

Adapted from Hemming 2015

Table 2 provides examples of studies using SWD that have used one or more of the design approaches described above to improve the internal validity of the study. In the study by Killam et al 2010 ( 31 ), a non-randomized SWD was used to evaluate a complex clinic-based intervention for integrating anti-retro viral (ART) treatment into routine antenatal care in Zambia for post-partum women. The design involved matching clinics by size and an inverse roll-out, to balance out the sizes across the four groups. The inverse roll-out involved four strata of clinics, grouped by size with two clinics in each strata. The roll-out was sequenced across these eight clinics, such that one smaller clinics began earlier, with three clinics of increasing size getting the intervention afterwards. This was then followed by a descending order of clinics by size for the remaining roll-out, ending with the smallest clinic. This inverse roll-out enabled the investigators to start with a smaller clinic, to work out the logistical considerations, but then influence the roll-out such as to avoid clustering of smaller or larger clinics in any one step of the intervention.

A second design feature of this study involved the use of a transition cohort or wash-out period (see Figure 4 ) (also used in the Morrison et al 2015 study)( 19 , 37 ). This approach can be used when an existing practice is being replaced with the new intervention, but there is ambiguity as to which group an individual would be assigned to while integration efforts were underway. In the Killam study, the concern was regarding women who might be identified as ART-eligible in the control period but actually enroll into and initiate ART at an antenatal clinic during the intervention period. To account for the ambiguity of this transition period, patients with an initial antenatal visit more than 60 days prior to the date of implementing the ART in the intervention sites were excluded. For analysis of the primary outcome, patients were categorized into three mutually exclusive categories: a referral to ART cohort, an integrated ART in the antenatal clinics cohort, and a transition cohort. It is important to note that the time period for a transition cohort can add considerable time to an intervention roll-out, especially when there is to be a de-implementation of an existing practice that involves a wide range or staff or activities. As well, the exclusion of the data during this phase can reduce the study’s power if not built into the sample size considerations at the design phase.

Morrison et al 2015 ( 37 ) used a randomized cluster design, with additional stratification and randomization within relevant sub-groups to examine a two-part quality improvement intervention focusing on clinician uptake of patient cooling procedures for post-cardiac care in hospital settings (referred to as Targeted Temperature Management). In this study, 32 hospitals were stratified into two groups based on intensive care unit size (< 10 beds vs ≥ 10 beds), and then randomly assigned into four different time periods to receive the intervention. The phased intervention implementation included both passive (generic didactic training components regarding the intervention) and an active (tailored support to site-specific barriers identified in passive phase) components. This study exemplifies some of the best uses of SWD in the context of QI interventions that have either multiple components of for which there may be a passive and active phase, as is often the case with interventions that are layered onto systems change requirements (e.g. electronic records improvements/customization) or relate to sequenced guidelines implementation (as in this example).

Studies using a wait-list partial randomization design are also included in Table 2 ( 24 , 27 , 42 ). These types of studies are well-suited to settings where there is routine enumeration of a cohort based on a specific eligibility criteria, such as enrolment in a health plan or employment group, or from a disease-based registry, such as for diabetes ( 27 , 42 ). It has also been reported that this design can increase efficiency and statistical power in contrast to cluster-based trials, a crucial consideration when the number of participating individuals or groups is small ( 22 ).

The study by Grant et al et al uses a variant of the SWD for which individuals within a setting are enumerated and then randomized to get the intervention. In this example, employees who had previously screened positive for HIV at the company clinic as part of mandatory testing, were invited in random sequence to attend a workplace HIV clinic at a large mining facility in South Africa to initiate a preventive treatment for TB during the years prior to the time when ARTs were more widely available. Individuals contributed follow-up time to the “pre-clinic” phase from the baseline date established for the cohort until the actual date of their first clinic visit, and also to the “post- clinic” phase thereafter. Clinic visits every 6 months were used to identify incident TB events. Because they were looking at reduction in TB incidence among the workers at the mine and not just those in the study, the effect of the intervention (the provision of clinic services) was estimated for the entire study population (incidence rate ratio), irrespective of whether they actually received isoniazid.

CONSIDERATIONS IN CHOOSING BETWEEN QED

We present a decision ‘map’ approach based on a Figure 5 to assist in considering decisions in selecting among QEDs and for which features you can pay particular attention to in the design [ Figure 5 here].

An external file that holds a picture, illustration, etc.
Object name is nihms-1671041-f0005.jpg

Quasi-Experimental Design Decision-Making Map

First, at the top of the flow diagram ( 1 ), consider if you can have multiple time points you can collect data for in the pre and post intervention periods. Ideally, you will be able to select more than two time points. If you cannot, then multiple sites would allow for a non-equivalent pre-post design. If you can have more than the two time points for the study assessments, you next need to determine if you can include multiple sites ( 2 ). If not, then you can consider a single site point ITS. If you can have multiple sites, you can choose between a SWD and a multiple site ITS based on whether or not you observe the roll-out over multiple time points, (SWD) or if you have only one intervention time point (controlled multiple site ITS)

STRATEGIES TO STRENGTHEN EXTERNAL VALIDITY

In a recent article in this journal ( 26 ), the following observation was made that there is an unavoidable trade-off between these two forms of validity such that with a higher control of a study, there is stronger evidence for internal validity but that control may jeopardize some of the external validity of that stronger evidence. Nonetheless, there are design strategies for non-experimental studies that can be undertaken to improve the internal validity while not eliminating considerations of external validity. These are described below across all three study designs.

1. Examine variation of acceptability and reach among diverse sub-populations

One of the strengths of QEDs is that they are often employed to examine intervention effects in real world settings and often, for more diverse populations and settings. Consequently, if there is adequate examination of characteristics of participants and setting-related factors it can be possible to interpret findings among critical groups for which there may be no existing evidence of an intervention effect for. For example in the Campus Watch intervention ( 16 ), the investigator over-sampled the Maori indigenous population in order to be able to stratify the results and investigate whether the program was effective for this under-studied group. In the study by Zombré et al ( 52 ) on health care access in Burkina Faso, the authors examined clinic density characteristics to determine its impact on sustainability.

2. Characterize fidelity and measures of implementation processes

Some of the most important outcomes for examination in these QED studies include whether the intervention was delivered as intended (i.e., fidelity), maintained over the entire study period (i.e., sustainability), and if the outcomes could be specifically examined by this level of fidelity within or across sites. As well, when a complex intervention is related to a policy or guideline shift and implementation requires logistical adjustments (such as phased roll-outs to embed the intervention or to train staff), QEDs more truly mimic real world constraints. As a result, capturing processes of implementation are critical as they can describe important variation in uptake, informing interpretation of the findings for external validity. As described by Prost et al ( 41 ), for example, it is essential to capture what occurs during such phased intervention roll-outs, as with following established guidelines for the development of complex interventions including efforts to define and protocolize activities before their implementation ( 17 , 18 , 28 ). However, QEDs are often conducted by teams with strong interests in adapting the intervention or ‘learning by doing’, which can limit interpretation of findings if not planned into the design. As done in the study by Bailet et al ( 3 ), the investigators refined intervention, based on year 1 data, and then applied in years 2–3, at this later time collecting additional data on training and measurement fidelity. This phasing aspect of implementation generates a tension between protocolizing interventions and adapting them as they go along. When this is the case, additional designs for the intervention roll-out, such as adaptive or hybrid designs can also be considered.

3. Conduct community or cohort-based sampling to improve inference

External validity can be improved when the intervention is applied to entire communities, as with some of the community-randomized studies described in Table 2 ( 12 , 21 ). In these cases, the results are closer to the conditions that would apply if the interventions were conducted ‘at scale’, with a large proportion of a population receiving the intervention. In some cases QEDs also afford greater access for some intervention research to be conducted in remote or difficult to reach communities, where the cost and logistical requirements of an RCT may become prohibitive or may require alteration of the intervention or staffing support to levels that would never be feasible in real world application.

4. Employ a model or framework that covers both internal and external validity

Frameworks can be helpful to enhances interpretability of many kinds of studies, including QEDs and can help ensure that information on essential implementation strategies are included in the results ( 44 ). Although several of the case studies summarized in this article included measures that can improve external validity (such as sub-group analysis of which participants were most impacted, process and contextual measures that can affect variation in uptake), none formally employ an implementation framework. Green and Glasgow (2006) ( 25 ) have outlined several useful criteria for gaging the extent to which an evaluation study also provides measures that enhance interpretation of external validity, for which those employing QEDs could identify relevant components and frameworks to include in reported findings.

It has been observed that it is more difficult to conduct a good quasi-experiment than to conduct a good randomized trial ( 43 ). Although QEDs are increasingly used, it is important to note that randomized designs are still preferred over quasi-experiments except where randomization is not possible. In this paper we present three important QEDs and variants nested within them that can increase internal validity while also improving external validity considerations, and present case studies employing these techniques.

1 It is important to note that if such randomization would be possible at the site level based on similar sites, a cluster randomized control trial would be an option.

LITERATURE CITED

Advantages and Limitations of Experiments for Researching Participatory Enterprise Modeling and Recommendations for Their Implementation

  • Conference paper
  • First Online: 17 November 2022
  • Cite this conference paper

recommendation in experimental research

  • Anne Gutschmidt   ORCID: orcid.org/0000-0001-8038-4435 8  

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 456))

Included in the following conference series:

  • IFIP Working Conference on The Practice of Enterprise Modeling

465 Accesses

Participatory enterprise modeling (PEM) means that stakeholders become directly involved in the process of creating enterprise models. Based on their different perspectives, they discuss and exchange knowledge and ideas in joint meetings and, with the support of modeling experts, they collaboratively create the models. Although there is a lot of empirical and theoretical work on group work and collaboration that we can build on, there are still many aspects of PEM that we should research. The participatory approach is claimed to lead to higher model quality and commitment, empirical evidence is, however, still scarce. Moreover, there are many factors that might influence productivity and the outcome of participatory modeling projects, such as facilitation methods or the tools used for modeling. In this paper, I will discuss the special value, but also methodical challenges and limitations of experimental studies on PEM compared to surveys and case studies. I will give methodical recommendations on how to design and implement experiments on PEM and discuss how they can eventually add to case studies carried out in companies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

recommendation in experimental research

A Study on the Impact of the Level of Participation in Enterprise Modeling

recommendation in experimental research

Participatory modeling from a stakeholder perspective: On the influence of collaboration and revisions on psychological ownership and perceived model quality

recommendation in experimental research

Researching Participatory Modeling Sessions: An Experimental Study on the Influence of Evaluation Potential and the Opportunity to Draw Oneself

Barjis, J.: CPI modeling: collaborative, participative, interactive modeling. In: Proceedings of the 2011 Winter Simulation Conference (WSC), pp. 3094–3103 (2011). https://doi.org/10.1109/WSC.2011.6148009

Bowling, A., Ebrahim, S.: Quantitative social science: the survey. In: Handbook of Health Research Methods: Investigation, Measurement and Analysis, pp. 190–214 (2005)

Google Scholar  

Deith, M.C., et al.: Lessons learned for collaborative approaches to management when faced with diverse stakeholder groups in a rebuilding fishery. Mar. Policy 130 , 104555 (2021). https://doi.org/10.1016/j.marpol.2021.104555 . https://www.sciencedirect.com/science/article/pii/S0308597X21001664

Döring, N., Bortz, J.: Forschungsmethoden und Evaluation in den Sozial- und Humanwissenschaften, 5th edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-642-41089-5

Book   Google Scholar  

Fellmann, M., Sandkuhl, K., Gutschmidt, A., Poppe, M.: Structuring participatory enterprise modelling sessions. In: Grabis, J., Bork, D. (eds.) PoEM 2020. LNBIP, vol. 400, pp. 58–72. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63479-7_5

Chapter   Google Scholar  

Field, A.: Discovering Statistics Using IBM SPSS Statistics. Sage, Thousand Oaks (2013)

Fischer, M., Rosenthal, K., Strecker, S.: Experimentation in conceptual modeling research: a systematic review. In: AMCIS (2019)

Gjersvik, R., Krogstie, J., Folstad, A.: Participatory development of enterprise process models. In: Information Modeling Methods and Methodologies, pp. 195–215 (2005). https://doi.org/10.4018/978-1-59140-375-3.ch010

Goodrick, D.: Comparative Case Studies. SAGE Publications Limited, Thousand Oaks (2020)

Gutschmidt, A.: Empirical insights into the appraisal of tool support for participative enterprise modeling. In: EMISA Forum, vol. 38, no. 1. De Gruyter (2018)

Gutschmidt, A.: On the influence of tools on collaboration in participative enterprise modeling—an experimental comparison between whiteboard and multi-touch table. In: Andersson, B., Johansson, B., Barry, C., Lang, M., Linger, H., Schneider, C. (eds.) Advances in Information Systems Development. LNISO, vol. 34, pp. 151–168. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22993-1_9

Gutschmidt, A.: An exploratory comparison of tools for remote collaborative and participatory enterprise modeling. In: ECIS 2021 Research-in-Progress Papers (2021)

Gutschmidt, A., Lantow, B., Hellmanzik, B., Ramforth, B., Wiese, M., Martins, E.: Participatory modeling from a stakeholder perspective: on the influence of collaboration and revisions on psychological ownership and perceived model quality. Softw. Syst. Model. 1–17 (2022). https://doi.org/10.1007/s10270-022-01036-7

Gutschmidt, A., Sauer, V., Schönwälder, M., Szilagyi, T.: Researching participatory modeling sessions: an experimental study on the influence of evaluation potential and the opportunity to draw oneself. In: Pańkowska, M., Sandkuhl, K. (eds.) BIR 2019. LNBIP, vol. 365, pp. 44–58. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31143-8_4

Koç, H., Sandkuhl, K., Stirna, J.: Design thinking and enterprise modeling: an investigation of eight enterprise architecture management projects. In: Augusto, A., Gill, A., Nurcan, S., Reinhartz-Berger, I., Schmidt, R., Zdravkovic, J. (eds.) BPMDS/EMMSAD -2021. LNBIP, vol. 421, pp. 228–242. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79186-5_15

Krogstie, J.: Model-Based Development and Evolution of Information Systems: A Quality Approach. Springer, London (2012). https://doi.org/10.1007/978-1-4471-2936-3

Luebbe, A., Weske, M.: Tangible media in process modeling – a controlled experiment. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 283–298. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_22

Nolte, A., Herrmann, T.: Facilitating participation of stakeholders during process analysis and design. In: De Angeli, A., Bannon, L., Marti, P., Bordin, S. (eds.) COOP 2016: Proceedings of the 12th International Conference on the Design of Cooperative Systems, 23-27 May 2016, Trento, Italy, pp. 225–241. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33464-6_14

Pierce, J.L., Jussila, I.: Collective psychological ownership within the work and organizational context: construct introduction and elaboration. J. Organ. Behav. 31 (6), 810–834 (2010)

Article   Google Scholar  

Rittgen, P.: Collaborative modeling of business processes: a comparative case study. In: Proceedings of the 2009 ACM Symposium on Applied Computing, SAC 2009, pp. 225–230. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1529282.1529333

Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empir. Softw. Eng. 14 (2), 131–164 (2009). https://doi.org/10.1007/s10664-008-9102-8

Sandkuhl, K., Seigerroth, U.: Participative or conventional enterprise modelling? Multiple-case analysis on decision criteria. In: Rowe, F., et al. (eds.) 28th European Conference on Information Systems - Liberty, Equality, and Fraternity in a Digitizing World, ECIS 2020, Marrakech, Morocco, 15–17 June 2020 (2020)

Sandkuhl, K., Stirna, J., Persson, A., Wißotzki, M.: Enterprise Modeling: Tackling Business Challenges with the 4EM Method. The Enterprise Engineering Series, Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43725-4

Stirna, J., Persson, A.: Enterprise Modeling - Facilitating the Process and the People. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94857-7

Stirna, J., Persson, A., Sandkuhl, K.: Participative enterprise modeling: experiences and recommendations. In: Krogstie, J., Opdahl, A., Sindre, G. (eds.) CAiSE 2007. LNCS, vol. 4495, pp. 546–560. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72988-4_38

Van Dyne, L., Pierce, J.L.: Psychological ownership and feelings of possession: three field studies predicting employee attitudes and organizational citizenship behavior. J. Organ. Behav. Int. J. Ind. Occup. Organ. Psychol. Behav. 25 (4), 439–459 (2004)

Veisi, H., Jackson-Smith, D., Arrueta, L.: Alignment of stakeholder and scientist understandings and expectations in a participatory modeling project. Environ. Sci. Policy 134 , 57–66 (2022)

Vernadat, F.B.: Enterprise modelling and integration. In: Kosanke, K., Jochem, R., Nell, J.G., Bas, A.O. (eds.) Enterprise Inter- and Intra-Organizational Integration. ITIFIP, vol. 108, pp. 25–33. Springer, Boston, MA (2003). https://doi.org/10.1007/978-0-387-35621-1_4

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29044-2

Book   MATH   Google Scholar  

Yin, R.K.: Case Study Research: Design and Methods, vol. 5. Sage, Thousand Oaks (2009)

Download references

Author information

Authors and affiliations.

University of Rostock, 18059, Rostock, Germany

Anne Gutschmidt

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Anne Gutschmidt .

Editor information

Editors and affiliations.

Middlesex University, London, UK

Balbir S. Barn

Rostock University, Rostock, Germany

Kurt Sandkuhl

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Cite this paper.

Gutschmidt, A. (2022). Advantages and Limitations of Experiments for Researching Participatory Enterprise Modeling and Recommendations for Their Implementation. In: Barn, B.S., Sandkuhl, K. (eds) The Practice of Enterprise Modeling. PoEM 2022. Lecture Notes in Business Information Processing, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-031-21488-2_14

Download citation

DOI : https://doi.org/10.1007/978-3-031-21488-2_14

Published : 17 November 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-21487-5

Online ISBN : 978-3-031-21488-2

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing

  • Find a journal
  • Track your research

COMMENTS

  1. How to Write Recommendations in Research

    Recommendations for future research should be: Concrete and specific. Supported with a clear rationale. Directly connected to your research. Overall, strive to highlight ways other researchers can reproduce or replicate your results to draw further conclusions, and suggest different directions that future research can take, if applicable.

  2. Exploring Experimental Research: Methodologies, Designs, and

    Experimental research serves as a fundamental scientific method aimed at unraveling. cause-and-effect relationships between variables across various disciplines. This. paper delineates the key ...

  3. (Pdf) Chapter 5 Summary, Conclusions, Implications and Recommendations

    The conclusions are as stated below: i. Students' use of language in the oral sessions depicted their beliefs and values. based on their intentions. The oral sessions prompted the students to be ...

  4. 4 Findings and Recommendations

    The Experimental Program to Stimulate Competitive Research. Washington, DC: The National Academies Press. doi: 10.17226/18384. ... The Experimental Program to Stimulate Competitive Research makes recommendations for improvement for each agency to create a more focused program with greater impact. READ FREE ONLINE. Contents. Front Matter i-xiv;

  5. How to Write Recommendations in Research

    Here is a step-wise guide to build your understanding on the development of research recommendations. 1. Understand the Research Question: Understand the research question and objectives before writing recommendations. Also, ensure that your recommendations are relevant and directly address the goals of the study. 2.

  6. Research Recommendations

    For example, recommendations from research on climate change can be used to develop policies that reduce carbon emissions and promote sustainability. Program development: Research recommendations can guide the development of programs that address specific issues. For example, recommendations from research on education can be used to develop ...

  7. Draw conclusions and make recommendations (Chapter 6)

    For this reason you need to support your conclusions with structured, logical reasoning. Having drawn your conclusions you can then make recommendations. These should flow from your conclusions. They are suggestions about action that might be taken by people or organizations in the light of the conclusions that you have drawn from the results ...

  8. How to Write Recommendations in Research

    Recommendation in research example. See below for a full research recommendation example that you can use as a template to write your own. Recommendation section. The current study can be interpreted as a first step in the research on COPD speech characteristics. However, the results of this study should be treated with caution due to the small ...

  9. PDF Standards for Experimental Research: Encouraging a Better Understanding

    4 Standards for Experimental Research 110 proportion of independent variables meet these criteria, as Newsted and colleagues 111 (1997:236)suggest. 112 Even in cases in which manipulations appear obvious, they may not be so. For 113 example, some early research in information presentation used treatments that 114 confounded the information form (e.g., table or graph) with other factors, such as

  10. Best Practice Recommendations for Designing and Implementing

    We describe experimental vignette methodology (EVM) as a way to address the dilemma of conducting experimental research that results in high levels of confidence regarding internal validity but is challenged by threats to external validity versus conducting nonexperimental research that usually maximizes external validity but whose conclusions are ambiguous regarding causal relationships.

  11. Best Practice Recommendations for Replicating Experiments in Public

    Given the relative youthfulness of experimental research designs in public administration, ensuring the full transparency of all published experimental studies is an essential prerequisite for conducting replications (decision point 1). 2 A leading voice in the transparency and openness debate has been the Center for Open Science, which has ...

  12. Experimental Research: Definition, Types, Examples

    Content. Experimental research is a cornerstone of scientific inquiry, providing a systematic approach to understanding cause-and-effect relationships and advancing knowledge in various fields. At its core, experimental research involves manipulating variables, observing outcomes, and drawing conclusions based on empirical evidence.

  13. Experimental Research Design

    Abstract. Experimental research design is centrally concerned with constructing research that is high in causal (internal) validity. Randomized experimental designs provide the highest levels of causal validity. Quasi-experimental designs have a number of potential threats to their causal validity. Yet, new quasi-experimental designs adopted ...

  14. Chapter 6 Conclusions and recommendations for research

    Recommendations for research We end with some recommendations about future research in this area, building on the methods described in this report. It is important to note that this is a growing field which is developing rapidly, particularly in the context of personalised therapies and causal mediation analysis, and these recommendations do ...

  15. Experimental Research Design

    Abstract. This chapter addresses the peculiarities, characteristics, and major fallacies of experimental research designs. Experiments have a long and important history in the social, natural, and medicinal sciences. Unfortunately, in business and management this looks differently. This is astounding, as experiments are suitable for analyzing ...

  16. PDF Stanford University

    Stanford University

  17. Recommendations for the Use of Experimental Designs in Management

    Hamdani et al. recently provided guidance for prospective authors on a wide range of issues encountered in quantitative research papers submitted for review at the Journal of Management Education.Building on their insights, this essay focuses on experimental designs and provides more specific guidance and recommendations for conducting valid and powerful research.

  18. Experimental Research Designs: Types, Examples & Advantages

    Experimental research design is a framework of protocols and procedures created to conduct experimental research with a scientific approach using two sets of variables. Herein, the first set of variables acts as a constant, used to measure the differences of the second set. ... Research Recommendations - Guiding policy-makers for evidence ...

  19. Findings, Conclusions, and Recommendations

    Recommendation 1: Social scientists who are planning to add biological specimens to their survey research should familiarize themselves with existing best practices for the collection, storage, use, and distribution of biospecimens. First and foremost, the design of the protocol for collection must ensure the safety of both participants and survey staff (data and specimen collectors and handlers).

  20. 9 Conclusions and Recommendations

    Recommendation 1: Researchers with expertise in education research should conduct well-designed studies in collaboration with URE program directors to improve the evidence base about the processes and effects of UREs. This research should address how the various components of UREs may benefit students.

  21. Experimental designs in management and leadership research: Strengths

    Experimental research designs are important because they minimize threats to internal validity. Internal validity is the confidence a researcher has that a change (whether naturally occurring or due to manipulation) in the independent variable causes the observed change in the dependent variable. Although there are a number of confounding variables that may threaten internal validity, the most ...

  22. Selecting and Improving Quasi-Experimental Designs in Effectiveness and

    QEDs test causal hypotheses but, in lieu of fully randomized assignment of the intervention, seek to define a comparison group or time period that reflects the counter-factual (i.e., outcomes if the intervention had not been implemented) ().QEDs seek to identify a comparison group or time period that is as similar as possible to the treatment group or time period in terms of baseline (pre ...

  23. (PDF) CHAPTER FIVE Summary, Conclusion and Recommendation

    ISBN: 978-978-59429-9-6. CHAPTER FIVE. Summary, Conclusion and Recommendation. Aisha Ibrahim Zaid. Department of Adult Educ. & Ext. Services. Faculty of Education and Extension Services. Usmanu ...

  24. Advantages and Limitations of Experiments for Researching ...

    This paper contains some practical recommendations on what to consider when planning an experiment in the context of PEM based on experiences from former experimental studies. In this paper, it was also suggested to consider experiments as a supplement to other research designs, particularly case studies.