Loading metrics

Open Access

Essays articulate a specific perspective on a topic of broad interest to scientists.

See all article types »

A guide to open science practices for animal research

Contributed equally to this work with: Kai Diederich, Kathrin Schmitt

Affiliation German Federal Institute for Risk Assessment, German Centre for the Protection of Laboratory Animals (Bf3R), Berlin, Germany

* E-mail: [email protected]

ORCID logo

  • Kai Diederich, 
  • Kathrin Schmitt, 
  • Philipp Schwedhelm, 
  • Bettina Bert, 
  • Céline Heinl

PLOS

Published: September 15, 2022

  • https://doi.org/10.1371/journal.pbio.3001810
  • Reader Comments

Fig 1

Translational biomedical research relies on animal experiments and provides the underlying proof of practice for clinical trials, which places an increased duty of care on translational researchers to derive the maximum possible output from every experiment performed. The implementation of open science practices has the potential to initiate a change in research culture that could improve the transparency and quality of translational research in general, as well as increasing the audience and scientific reach of published research. However, open science has become a buzzword in the scientific community that can often miss mark when it comes to practical implementation. In this Essay, we provide a guide to open science practices that can be applied throughout the research process, from study design, through data collection and analysis, to publication and dissemination, to help scientists improve the transparency and quality of their work. As open science practices continue to evolve, we also provide an online toolbox of resources that we will update continually.

Citation: Diederich K, Schmitt K, Schwedhelm P, Bert B, Heinl C (2022) A guide to open science practices for animal research. PLoS Biol 20(9): e3001810. https://doi.org/10.1371/journal.pbio.3001810

Copyright: © 2022 Diederich et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors received no specific funding for this work.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: All authors are employed at the German Federal Institute for Risk Assessment and part of the German Centre for the Protection of Laboratory Animals (Bf3R) which developed and hosts animalstudyregistry.org , a preregistration platform for animal studies and animaltestinfo.de, a database for non-technical project summaries (NTS) of approved animal study protocols within Germany.

Abbreviations: CC, Creative Commons; CIRS-LAS, critical incident reporting system in laboratory animal science; COVID-19, Coronavirus Disease 2019; DOAJ, Directory of Open Access Journals; DOI, digital object identifier; EDA, Experimental Design Assistant; ELN, electronic laboratory notebook; EU, European Union; IMSR, International Mouse Strain Resource; JISC, Joint Information Systems Committee; LIMS, laboratory information management system; MGI, Mouse Genome Informatics; NC3Rs, National Centre for the Replacement, Refinement and Reduction of Animals in Research; NTS, non-technical summary; RRID, Research Resource Identifier

Introduction

Over the past decade, the quality of published scientific literature has been repeatedly called into question by the failure of large replication studies or meta-analyses to demonstrate sufficient translation from experimental research into clinical successes [ 1 – 5 ]. At the same time, the open science movement has gained more and more advocates across various research areas. By sharing all of the information collected during the research process with colleagues and with the public, scientists can improve collaborations within their field and increase the reproducibility and trustworthiness of their work [ 6 ]. Thus, the International Reproducibility Networks have called for more open research [ 7 ].

However, open science practices have not been adopted to the same degree in all research areas. In psychology, which was strongly affected by the so-called reproducibility crisis, the open science movement initiated real practical changes leading to a broad implementation of practices such as preregistration or sharing of data and material [ 8 – 10 ]. By contrast, biomedical research is still lagging behind. Open science might be of high value for research in general, but in translational biomedical research, it is an ethical obligation. It is the responsibility of the scientist to transparently share all data collected to ensure that clinical research can adequately evaluate the risks and benefits of a potential treatment. When Russell and Burch published “The Principles of Humane Experimental Technique” in 1959, scientists started to implement their 3Rs principle to answer the ethical dilemma of animal welfare in the face of scientific progress [ 11 ]. By replacing animal experiments wherever possible, reducing the number of animals to a strict minimum, and refining the procedures where animals have still to be used, this ethical dilemma was addressed. However, in recent years, whether the 3Rs principle is sufficient to fully address ethical concerns about animal experiments has been questioned [ 12 ].

Most people tolerate the use of animals for scientific purposes only under the basic assumption that the knowledge gained will advance research in crucial areas. This implies that performed experiments are reported in a way that enables peers to benefit from the collected data. However, recent studies suggest that a large proportion of animal experiments are never actually published. For example, scientists working within the European Union (EU) have to write an animal study protocol for approval by the competent authorities of the respective country before performing an animal experiment [ 13 ]. In these protocols, scientists have to describe the planned study and justify every animal required for the project. By searching for publications resulting from approved animal study protocols from 2 German University Medical Centers, Wieschowski and colleagues found that only 53% of approved protocols led to a publication after 6 years [ 14 ]. Using a similar approach, Van der Naald and colleagues determined a publication rate of 60% at the Utrecht Medical Center [ 15 ]. In a follow-up survey, the respective researchers named so-called “negative” or null-hypothesis results as the main cause for not publishing outcomes [ 15 ]. The current scientific system is shaped by publishers, funders, and institutions and motivates scientists to publish novel, surprising, and positive results, revealing one of the many structural problems that the numerous efforts towards open science initiatives are targeting. Non-publication not only strongly contradicts ethical values, but also it compromises the quality of published literature by leading to overestimation of effect sizes [ 16 , 17 ]. Furthermore, publications of animal studies too often show poor reporting that strongly impairs the reproducibility, validity, and usefulness of the results [ 18 ]. Unfortunately, the idea that negative or equivocal findings can also contribute to the gain of scientific knowledge is frequently neglected.

So far, the scientific community using animals has shown limited resonance to the open science movement. Due to the strong controversy surrounding animal experiments, scientists have been reluctant to share information on the topic. Additionally, translational research is highly competitive and researchers tend to be secretive about their ideas until they are ready for publication or patent [ 19 , 20 ]. However, this missing openness could also point to a lack of knowledge and training on the many open science options that are available and suitable for animal research. Researchers have to be convinced of the benefits of open science practices, not only for science in general, but also for the individual researcher and each single animal. Yet, the key players in the research system are already starting to value open science practices. An increasing number of journals request open sharing of data, funders pay for open access publications and institutions consider open science practices in hiring decisions. Open science practices can improve the quality of work by enabling valuable scientific input from peers at the early stages of research projects. Furthermore, the extended communication that open science practices offer can draw attention to research and help to expand networks of collaborators and lead to new project opportunities or follow-up positions. Thus, open science practices can be a driver for careers in academia, particularly those of early career researchers.

Beyond these personal benefits, improving transparency in translational biomedical research can boost scientific progress in general. By bringing to light all the recorded research outputs that until now have remained hidden, the publication bias and the overestimation of effect sizes can be reduced [ 17 ]. Large-scale sharing of data can help to synthesize research outputs in preclinical research that will enable better decision-making for clinical research. Disclosing the whole research process will help to uncover systematic problems and support scientists in thoroughly planning their studies. In the long run, we predict that the implementation of open science practices will lead to the use of fewer animals in unintentionally repeated experiments that previously showed unreported negative results or in the establishment of methods by avoiding experimental dead ends that are often not published. More collaborations and sharing of materials and methods can further reduce the number of animal experiments used for the implementation of new techniques.

Open science can and should be implemented at each step of the research process ( Fig 1 ). A vast number of tools are already provided that were either directly conceptualized for animal research or can be adapted easily. In this Essay, we provide an overview of open science tools that improve transparency, reliability, and animal welfare in translational in vivo biomedical research by supporting scientists to clearly communicate their research and by supporting collaborative working. Table 1 lists the most prominent open science tools we discuss, together with their respective links. We have structured this Essay to guide you through which tools can be used at each stage of the research process, from planning and conducting experiments, through to analyzing data and communicating the results. However, many of these tools can be used at many different steps. Table 1 has been deposited on Zenodo and will be updated continuously [ 21 ].

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Application of open science practices at each step of the research process can maximize the impact of performed animal experiments. The implementation of these practices will lead to less time pressure at the end of a project. Due to the connection of most of these open science practices, spending more time in the planning phase and during the conduction of experiments will save time during the data analysis and publication of the study. Indeed, consulting reporting guidelines early on, preregistering a statistical plan, and writing down crucial experimental details in an electronic lab notebook, will strongly accelerate the writing of a manuscript. If protocols or even electronic lab notebooks were made public, just citing these would simplify the writing of publications. Similarly, if a data management plan is well designed before starting data collection, analyzing, and depositing data in a public repository, as is increasingly required, will be fast. NTS, non-technical summary.

https://doi.org/10.1371/journal.pbio.3001810.g001

thumbnail

https://doi.org/10.1371/journal.pbio.3001810.t001

Planning the study

Transparent practices can be adopted at every stage of the research process. However, to ensure full effectivity, it is highly recommended to engage in detailed planning before the start of the experiment. This can prevent valuable time from being lost at the end of the study due to careless decisions being made at the beginning. Clarifying data management at the start of a project can help avoiding filing chaos that can be very time consuming to untangle. Keeping clear track of a project and study design will also help if new colleagues are included later on in the project or if entire project parts are handed over. In addition, all texts written on the rationale and hypothesis of the study or method descriptions, or design schemes created during the planning phase can be used in the final publications ( Fig 1 ). Similarly, information required for preregistration of animal studies or for reporting according to the ARRIVE guidelines are an extension of the details required for ethical approval [ 22 , 23 ]. Thus, the time burden within the planning phase is often overestimated. Furthermore, the thorough planning of experiments can avoid the unnecessary use of animals by preventing wrong avenues from being pursued.

Implementing open scientific practices at the beginning of a project does not mean that the idea and study plan must be shared immediately, but rather is critical for making the entire workflow transparent at the end of the project. However, optional early sharing of information can enable peers to give feedback on the study plan. Studies potentially benefit more from this a priori input than they would from the classical a posteriori peer-review process.

Most people perceive guidelines as advice that instructs on how to do something. However, it is sometimes useful to consider the term in its original meaning; “the line that guides us”. In this sense, following guidelines is not simply fulfilling a duty, but is a process that can help to design a sound research study and, as such, guidelines should be consulted at the planning stage of a project. The PREPARE guidelines are a list of important points that should be thought-out before starting a study involving animal experiments in order to reduce the waste of animals, promote alternatives, and increase the reproducibility of research and testing [ 24 ]. The PREPARE checklist helps to thoroughly plan a study and focuses on improving the communication and collaboration between all involved participants of the study (i.e., animal caretakers and scientists). Indeed, open science begins with the communication within a research facility. It is currently available in 33 languages and the responsible team from Norecopa, Norway’s 3R-center, takes requests for translations into further languages.

The UK Reproducibility Network has also published several guiding documents (primers) on important topics for open and reproducible science. These address issues such as data sharing [ 25 ], open access [ 26 ], open code and software [ 27 ], and preprints [ 28 ], as well as preregistration and registered reports [ 27 ]. Consultation of these primers is not only helpful in the relevant phases of the experiment but is also encouraged in the planning phase.

Although the ARRIVE guidelines are primarily a reporting guideline specifically designed for preparing a publication containing animal data, they can also support researchers when planning their experiments [ 22 , 23 ]. Going through the ARRIVE website, researchers will find tools and explanations that can support them in planning their experiments [ 29 ]. Consulting the ARRIVE checklist at the beginning of a project can help in deciding what details need to be documented during conduction of the experiments. This is particularly advisable, given that compliance to ARRIVE is still poor [ 18 ].

Experimental design

To maximize the validity of performed experiments and the knowledge gained, designing the study well is crucial. It is important that the chosen animal species reflects the investigated disease well and that basic characteristics of the animal, such as sex or age, are considered carefully [ 30 ]. The Canadian Institutes of Health Research provides a collection of resources on the integration of sex and gender in biomedical research with animals, including tips and tools for researchers and reviewers [ 31 ]. Additionally, it is advisable to avoid unnecessary standardization of biological and environmental factors that can reduce the external validity of results [ 32 ]. Meticulous statistical planning can further optimize the use of animals. Free to use online tools for calculating sample sizes such as G*Power or the inVivo software package for R can further support animal researchers in designing their statistical plan [ 33 , 34 ]. Randomization for the allocation of groups can be supported with specific tools for scientists like Research Randomizer, but also by simple online random number generators [ 35 ]. Furthermore, it might be advisable when designing the study to incorporate pathological analyses into the experimental plan. Optimal planning of tissue collection, performance of pathological procedures according to accepted best practices, and use of optimal pathological analysis and reporting methods can add some extra knowledge that would otherwise be lost. This can improve the reproducibility and quality of translational biomedicine, especially, but not exclusively, in animal studies with morphological endpoints. In all animal studies, unexpected deaths in experimental animals can occur and be the cause of lost data or missed opportunities to identify health problems [ 36 , 37 ].

To support researchers in designing their animal research, the National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs) has also developed the Experimental Design Assistant (EDA) [ 38 , 39 ]. This online tool helps researchers to better structure in vivo research by creating detailed schemes of the study design. It provides feedback on the entered design, drawing researcher’s attention to crucial decisions in the project. The resulting schemes can be used to transparently share the study design by uploading it into a study preregistration, enclosing it in a grant application, or submitting it with a final manuscript. The EDA can be used for different study designs in diverse scenarios and helps to communicate researcher plans to others [ 40 ]. The EDA might be particularly of interest to clarify very complex study designs involving multiple experimental groups. Working with the EDA might appear rather complex in the beginning, but the NC3R provides regular webinars that can help to answer any questions that arise.

Preregistration

Preregistration is an effective tool to improve the quality and transparency of research. To preregister their work, scientists must determine crucial details of the study before starting any experiment. Changes occurring during a study can be outlined at the end. A preregistered study plan should include at least the hypothesis and determine all the parameters that are known in advance. A description of the planned study design and statistical analysis will enable reviewers and peers to better retrace the workflow. It can prevent the intentional use of the flexibility of analysis to reach p -values under a certain significance level (e.g., p-hacking or HARKing (Hypothesizing After Results are Known)). With preregistration, scientists can also claim their idea at an early stage of their research with a citable individual identifier that labels the idea as their own. Some open preregistration platforms also provide a digital object identifier (DOI), which makes the registered study citable. Three public registries actively encourage the preregistration of animal studies conducted around the world: OSF registry, preclinicaltrials.eu, and animalstudyregistry.org [ 41 – 45 ]. Scientists can choose the registry according to their needs. Preregistering a study in a public registry supports scientists in planning their study and later to critically reevaluate their own work and assess its limitations and potentials.

As an alternative to public registries, researchers can also submit their study plan to one of hundreds of journals already publishing registered reports, including many journals open to animal research [ 8 ]. A submitted registered report passes 2 steps of peer review. In the first step, reviewers comment on the idea and the study design. After an “in-principle-acceptance,” researchers can conduct their study as planned. If the authors conduct the experiments as described in the accepted study protocol, the journal will publish the final study regardless of the outcome. This might be an attractive option, especially for early career researchers, as a manuscript is published at the beginning of a project with the guarantee of a future final publication.

The benefits of preregistration can already be observed in clinical research, where registration has been mandatory for most trials for more than 20 years. Preregistration in clinical research has helped to make known what has been tested and not just what worked and was published, and the implementation of trial registration has strongly reduced the number of publications reporting significant treatment effects [ 46 ]. In animal research, with its unrealistically high percentage of positive results, preregistration seems to be particularly worthwhile.

Research data management

To get the most out of performed animal experiments, effective sharing of data at the end of the study is essential. Sharing research data optimally is complex and needs to be prepared in advance. Thus, data management can be seen as one part of planning a study thoroughly. Many funders have recognized the value of the original research data and request a data management plan from applicants in advance [ 25 , 47 ]. Various freely available tools such as DMPTool or DMPonline already exist to design a research data management plan that complies to the requirements of different funders [ 48 , 49 ]. The data management plan defines the types of data collected and describes the handling and names responsible persons throughout the data lifecycle. This includes collecting the data, analyzing, archiving, and sharing it. Finally, a data management plan enables long-term access and the possibility for reuse by peers. Developing such a plan, whether it is required by funders or not, will later simplify the application of the FAIR data principle (see section on the FAIR data principle). The Longwood Medical Area Research Data Management Working Group from the Harvard Medical School developed a checklist to assist researchers in optimally managing their data throughout the data lifecycle [ 50 ]. Similarly, the Joint Information Systems Committee (JISC) provides a great research data management toolkit including a checklist for researchers planning their project [ 51 ]. Consulting this checklist in the planning phase of a project can prevent common errors in research data management.

Non-technical project summary

One instrument specifically conceived to create transparency on animal research for the general public is the so-called non-technical project summary (NTS). All animal protocols approved within the EU must be accompanied by these comprehensible summaries. NTSs are intended to inform the public about ongoing animal experiments. They are anonymous and include information on the objectives and potential benefits of the project, the expected harm, the number of animals, the species, and a statement of compliance with the requirements of the 3Rs principle. However, beyond simply informing the public, NTSs can also be used for meta-research to help identify new research areas with an increased need for new 3R technologies [ 52 , 53 ]. NTSs become an excellent tool to appropriately communicate the scientific value of the approved protocol and for meta-scientists to generate added value by systematically analyzing theses summaries if they fulfill a minimum quality threshold [ 54 , 55 ]. In 2021, the EU launched the ALURES platform ( Table 1 ), where NTSs from all member states are published together, opening the opportunities for EU-wide meta-research. NTSs are, in contrast to other open science practices, mandatory in the EU. However, instead of thinking of them as an annoying duty, it might be worth thoroughly drafting the NTS to support the goals of more transparency towards the public, enabling an open dialogue and reducing extreme opinions.

Conducting the experiments

Once the experiments begin, documentation of all necessary details is essential to ensure the transparency of the workflow. This includes methodological details that are crucial for replicating experiments, but also failed attempts that could help peers to avoid experiments that do not work in the future. All information should be stored in such a way that it can be found easily and shared later. In this area, many new tools have emerged in recent years ( Table 1 ). These tools will not only make research transparent for colleagues, but also help to keep track of one’s own research and improve internal collaboration.

Electronic laboratory notebooks

Electronic laboratory notebooks (ELNs) are an important pillar of research data management and open science. ELNs facilitate the structured and harmonized documentation of the data generation workflow, ensure data integrity, and keep track of all modifications made to the original data based on an audit trail option. Moreover, ELNs simplify the sharing of data and support collaborations within and outside the research group. Methodological details and research data become searchable and traceable. There is an extensive amount of literature providing advice on the selection and the implementation process of an ELN depending on the specific needs and research area and its discussion would be beyond the scope of this Essay [ 56 – 58 ]. Some ELNs are connected to a laboratory information management system (LIMS) that provides an animal module supporting the tracking of animal details [ 59 ]. But as research involving animals is highly heterogeneous, this might not be the only decision point and we cannot recommend a specific ELN that is suitable for all animal research.

ELNs are already established in the pharmaceutical industry and their use is on the rise among academics as well. However, due to concerns around costs for licenses, data security, and loss of flexibility, many research institutions still fear the expenses that the introduction of such a system would incur [ 56 ]. Nevertheless, an increasing number of academic institutions are implementing ELNs and appreciating the associated benefits [ 60 ]. If your institution already has an ELN, it might be easiest to just use the option available in the research environment. If not, the Harvard Medical School provides an extensive and updated overview of various features of different ELNs that can support scientists in choosing the appropriate one for their research [ 61 ]. There are many commercial ELN products, which may be preferred when the administrative workload should be outsourced to a large extent. However, open-source products such as eLabFTW or open BIS provide a greater opportunity for customization to meet specific needs of individual research institutions [ 62 – 64 ]. A huge number of options are available depending on the resources and the features required. Some scientists might prefer generic note taking tools such as Evernote or just a simple Word document that offers infinite flexibility, but specific ELNs can further support good record keeping practice by providing immutability, automated backups, standardized methods, and protocols to follow. Clearly defining the specific requirements expected might help to choose an adequate system that would improve the quality of the record compared to classical paper laboratory notebooks.

Sharing protocols

Adequate sharing of methods in translational biomedical sciences is key to reproducibility. Several repositories exist that simplify the publication and exchange of protocols. Writing down methods at the end of the project bears the risk that crucial details might be missing [ 65 ]. On protocols.io, scientists can note all methodological details of a procedure, complete them with uploaded documents, and keep them for personal use or share them with collaborators [ 66 ]. Authors can also decide at any point in time to make their protocol public. Protocols published on protocols.io receive a DOI and become citable; they can be commented on by peers and adapted according to the needs of the individual researcher. Protocol.io files from established protocols can also be submitted together with some context and sample datasets to PLOS ONE , where it can be peer-reviewed and potentially published [ 67 , 68 ]. Depending on the affiliation of the researchers to academia or industry and on an internal or public sharing of files, protocols.io can be free of charge or come with costs. Other journals also encourage their authors to deposit their protocols in a freely accessible repository, such as protocol exchange from Nature portfolio [ 69 ]. Another option might be to separately submit a protocol that was validated by its use in an already published research article to an online and peer-reviewed journal specific for research protocols, such as Bio-Protocol. A multitude of journals, including eLife and Science already collaborate with Bio-Protocol and recommend authors to publish the method in Bio-Protocol [ 70 ]. Bio-Protocol has no submission fees and is freely available to all readers. Both protocols.io and Bio-Protocol allow the illustration of complex scientific methods by uploading videos to published protocols. In addition, protocols can be deposited in a general research repository such as the Open Science Framework (OSF repository) and referenced in appropriate publications.

Sharing critical incidents

Sharing critical or even adverse events that occur in the context of animal experimentation can help other scientists to avoid committing the same mistakes. The system of sharing critical incidents is already established in clinical practice and helps to improve medical care [ 71 , 72 ]. The online platform critical incident reporting system in laboratory animal science (CIRS-LAS) represents the first preclinical equivalent to these clinical systems [ 73 ]. With this web-based tool, critical incidents in animal research can be reported anonymously without registration. An expert panel helps to analyze the incident to encourage an open dialogue. Critical incident reporting is still very marginal in animal research and performed procedures are very variable. These factors make a systemic analysis and a targeted search of incidence difficult. However, it may be of special interest for methods that are broadly used in animal research such as anesthesia. Indeed, a broad feed of this system with data on errors occurring in standard procedures today could help avoid critical incidences in the future and refine animal experiments.

Sharing animals, organs, and tissue

When we think about open science, sharing results and data are often in focus. However, sharing material is also part of a collaborative and open research culture that could help to greatly reduce the number of experimental animals used. When an animal is killed to obtain specific tissue or organs, the remainder is mostly discarded. This may constitute a wasteful practice, as surplus tissue can be used by other researchers for different analyses. More animals are currently killed as surplus than are used in experiments, demonstrating the potential for sharing these animals [ 74 , 75 ].

Sharing information on generated surplus is therefore not only economical, but also an effective way to reduce the number of animals used for scientific purposes. The open-source software Anishare is a straightforward way for breeders of genetically modified lines to promote their surplus offspring or organs within an institution [ 76 ]. The database AniMatch ( Table 1 ) connects scientists within Europe who are offering tissue or organs with scientists seeking this material. Scientists already sharing animal organs can support this process by describing it in publications and making peers aware of this possibility [ 77 ]. Specialized research communities also allow sharing of animal tissue or animal-derived products worldwide that are typically used in these fields on a collaborative basis via the SEARCH-framework [ 78 , 79 ]. Depositing transgenic mice lines into one of several repositories for mouse strains can help to further minimize efforts in producing new transgenic lines and most importantly reduce the number of surplus animals by supporting the cryoconservation of mouse lines. The International Mouse Strain Resource (IMSR) can be used to help find an adequate repository or to help scientists seeking a specific transgenic line find a match [ 80 ].

Analyzing the data

Animal researchers have to handle increasingly complex data. Imaging, electrophysiological recording, or automated behavioral tracking, for example, produce huge datasets. Data can be shared as raw numerical output but also as images, videos, sounds, or other forms from which raw numerical data can be generated. As the heterogeneity and the complexity of research data increases, infinite possibilities for analysis emerge. Transparently reporting how the data were processed will enable peers to better interpret reported results. To get the most out of performed animal experiments, it is crucial to allow other scientists to replicate the analysis and adapt it to their research questions. It is therefore highly recommended to use formats and tools during the analysis that allow a straightforward exchange of code and data later on.

Transparent coding

The use of non-transparent analysis codes have led to a lack of reproducibility of results [ 81 ]. Sharing code is essential for complex analysis and enables other researchers to reproduce results and perform follow-up studies, and citable code gives credit for the development of new algorithms ( Table 1 ). Jupyter Notebooks are a convenient way to share data science pipelines that may use a variety of coding languages, including like Python, R or Matlab, and also share the results of analyses in the form of tables, diagrams, images, and videos. Notebooks contain source code and can be published or collaboratively shared on platforms like GitHub or GitLab, where version control of source code is implemented. The data-archiving tool Zenodo can be used to archive a repository on GitHub and create a DOI for the archive. Thereby contents become citable. Using free and open-source programming language like R or Python will increase the number of potential researchers that can work with the published code. Best practice for research software is to publish the source code with a license that allows modification and redistribution.

Choice of data visualization

Choosing the right format for the visualization of data can increase its accessibility to a broad scientific audience and enable peers to better judge the validity of the results. Studies based on animal research often work with very small sample sizes. Visualizing these data in histograms may lead to an overestimation of the outcomes. Choosing the right dot plots that makes all recorded points visible and at the same time focusses on the summary instead of the individual points can further improve the intuitive understanding of a result. If the sample size is too low, it might not be meaningful to visualize error bars. A variety of freely available tools already exists that can support scientists in creating the most appropriate graphs for their data [ 82 ]. In particular, when representing microscopy results or heat maps, it should be kept in mind that a large part of the population cannot perceive the classical red and green representation [ 83 ]. Opting for the color-blind safe color maps and checking images with free tools such as color oracle ( Table 1 ) can increase the accessibility of graphs. Multiple journals have already addressed flaws in data visualization and have introduced new policies that will accelerate the uptake of transparent representation of results.

Publication of all study outcomes

Open science practices have received much attention in the past few years when it comes to publication of the results. However, it is important to emphasize that although open science tools have their greatest impact at the end of the project, good study preparation and sharing of the study plan and data early on can greatly increase the transparency at the end.

The FAIR data principle

To maximize the impact and outcome of a study, and to make the best long-term use of data generated through animal experiments, researchers should publish all data collected during their research according to the FAIR data principle. That means the data should be findable, accessible, interoperable, and reusable. The FAIR principle is thus an extension of open access publishing. Data should not only be published without paywalls or other access restrictions, but also in such a manner that they can be reused and further processed by others. For this, legal as well as technical requirements must be met by the data. To achieve this, the GoFAIR initiative has developed a set of principles that should be taken into account as early as at the data collection stage [ 49 , 84 ]. In addition to extensively described and machine-readable metadata, these principles include, for example, the application of globally persistent identifiers, the use of open file formats, and standardized communication protocols to ensure that humans and machines can easily download the data. A well-chosen repository to upload the data is then just the final step to publish FAIR data.

FAIR data can strongly increase the knowledge gained from performed animal experiments. Thus, the same data can be analyzed by different researchers and could be combined to obtain larger sample sizes, as already occurs in the neuroimaging community, which works with comparable datasets [ 85 ]. Furthermore, the sharing of data enables other researchers to analyze published datasets and estimate measurement reliabilities to optimize their own data collection [ 86 , 87 ]. This will help to improve the translation from animal research into clinics and simultaneously reduce the number of animal experiment in future.

Reporting guidelines

In preclinical research, the ARRIVE guidelines are the current state of the art when it comes to reporting data based on animal experiments [ 22 , 23 ]. The ARRIVE guidelines have been endorsed by more than 1,000 journals who ask that scientists comply with them when reporting their outcomes. Since the ARRIVE guidelines have not had the expected impact on the transparency of reporting in animal research publications, a more rigorous update has been developed to facilitate their application in practice (ARRIVE 2.0 [ 23 ]). We believe that the ARRIVE guidelines can be more effective if they are implemented at a very early stage of the project (see section on guidelines). Some more specialized reporting guidelines have also emerged for individual research fields that rely on animal studies, such as endodontology [ 88 ]. The equator network collects all guidelines and makes them easily findable with their search tool on their website ( Table 1 ). MERIDIAN also offers a 1-stop shop for all reporting guidelines involving the use of animals across all research sectors [ 89 ]. It is thus worth checking for new reporting guidelines before preparing a manuscript to maximize the transparency of described experiments.

Identifiers

Persistent identifiers for published work, authors, or resources are key for making public data findable by search engines and are thus a prerequisite for compliance to FAIR data principles. The most common identifier for publications will be a DOI, which makes the work citable. A DOI is a globally unique string assigned by the International DOI Foundation to identify content permanently and provide a persistent link to its location on the Internet. An ORCID ID is used as a personal persistent identifier and is recommendable to unmistakably identify an author ( Table 1 ). This will avoid confusions between authors with the same name or in the case of name changes or changes of affiliation. Research Resource Identifiers (RRID) are unique ID numbers that help to transparently report research resources. RRID also apply to animals to clearly identify the species used. RRID help avoid confusion between different names or changing names of genetic lines and, importantly, make them machine findable. The RRID Portal helps scientists find a specific RRID or create one if necessary ( Table 1 ). In the context of genetically altered animal lines, correct naming is key. The Mouse Genome Informatics (MGI) Database is the authoritative source of official names for mouse genes, alleles, and strains ([ 90 ]).

Preprint publication

Preprints have undergone unprecedented success, particularly during the height of the Coronavirus Disease 2019 (COVID-19) pandemic when the need for rapid dissemination of scientific knowledge was critical. The publication process for scientific manuscripts in peer-reviewed journals usually requires a considerable amount of time, ranging from a few months to several years, mainly due to the lengthy review process and inefficient editorial procedures [ 91 , 92 ]. Preprints typically precede formal publication in scientific journals and, thus, do not go through a peer review process, thus, facilitating the prompt open dissemination of important scientific findings within the scientific community. However, submitted papers are usually screened and checked for plagiarism. Preprints are assigned a DOI so they can be cited. Once a preprint is published in a journal, its status is automatically updated on the preprint server. The preprint is linked to the publication via CrossRef and mentioned accordingly on the website of the respective preprint platform.

After initial skepticism, most publishers now allow papers to be posted on preprint servers prior to submission. An increasing number of journals even allow direct submission of a preprint to their peer review process. The US National Institutes of Health and the Wellcome Trust, among other funders, also encourage prepublication and permit researchers to cite preprints in their grant applications. There are now numerous preprint repositories for different scientific disciplines. BioASAP provides a searchable database for preprint servers that can help in identifying the one that best matches an individual’s needs [ 93 ]. The most popular repository for animal research is bioRxiv, which is hosted by the Cold Spring Harbor Laboratory ( Table 1 ).

The early exchange of scientific results is particularly important for animal research. This acceleration of the publication process can help other scientists to adapt their research or could even prevent animal experiments if other scientists become aware that an experiment has already been done before starting their own. In addition, preprints can help to increase the visibility of research. Journal articles that have a corresponding preprint publication have higher citation and Altmetric counts than articles without preprint [ 94 ]. In addition, the publication of preprints can help to combat publication bias, which represents a major problem in animal research [ 16 ]. Since journals and readers prioritize cutting-edge studies with positive results over inconclusive or negative results, researchers are reluctant to invest time and money in a manuscript that is unlikely to be accepted in a high-impact journal.

In addition to the option of publishing as preprint, other alternative publication formats have recently been introduced to facilitate the publication of research results that are hard to publish in traditional peer-reviewed journals. These include micro publications, data repositories, data journals, publication platforms, and journals that focus on negative or inconclusive results. The tool fiddle can support scientists in choosing the right publication format [ 95 , 96 ].

Open access publication

Publishing open access is one of the most established open science strategies. In contrast to the FAIR data principle, the term open access publication refers usually to the publication of a manuscript on a platform that is accessible free of charge—in translational biomedical research, this is mostly in the form of a scientific journal article. Originally, publications accessible free of charge were the answer to the paywalls established by renowned publishing houses, which led to social inequalities within and outside the research system. In translational biomedical research, the ethical aspect of urgently needed transparency is another argument in favor of open access publication, as these studies will not only be findable, but also internationally readable.

There are different ways of open access publishing; the 2 main routes are gold open access and green open access. Numerous journals offer now gold open access. It refers to the immediate and fully accessible publication of an article. The Directory of Open Access Journals (DOAJ) provides a complete and updated list for high-quality, open access, and peer-reviewed journals [ 97 ]. Charité–Universitätsmedizin Berlin offers a specific tool for biomedical open access journals that supports animal researchers to choose an appropriate journal [ 49 ]. In addition, the Sherpa Romeo platform is a straightforward way to identify publisher open access policies on a journal-by-journal basis, including information on preprints, but also on licensing of articles [ 51 ]. Hybrid open access refers to openly accessible articles in otherwise paywalled journals. By contrast, green open access refers to the publication of a manuscript or article in a repository that is mostly operated by institutions and/or universities. The publication can be exclusively on the repository or in combination with a publisher. In the quality-assured, global Directory of Open Access Repositories (openDOAR), scientists can find thousands of indexed open access repositories [ 49 ]. The publisher often sets an embargo during which the authors cannot make the publication available in the repository, which can restrict the combined model. It is worth mentioning that gold open access is usually more expensive for the authors, as they have to pay an article processing charge. However, the article’s outreach is usually much higher than the outreach of an article in a repository or available exclusively as subscription content [ 98 ]. Diamond open access refers to publications and publication platforms that can be read free of charge by anyone interested and for which no costs are incurred by the authors either. It is the simplest and fairest form of open access for all parties involved, as no one is prevented from participating in scientific discourse by payment barriers. For now, it is not as widespread as the other forms because publishers have to find alternative sources of revenue to cover their costs.

As social media and the researcher’s individual public outreach are becoming increasingly important, it should be remembered that the accessibility of a publication should not be confused with the licensing under which the publication is made available. In order to be able to share and reuse one’s own work in the future, we recommend looking for journals that allow publications under the Creative Commons licenses CC BY or CC BY-NC. This also allows the immediate combination of gold and green open access.

Creative commons licenses

Attributing Creative Commons (CC) licenses to scientific content can make research broadly available and clearly specifies the terms and conditions under which people can reuse and redistribute the intellectual property, namely publications and data, while giving the credit to whom it deserves [ 49 ]. As the laws on copyright vary from country to country and law texts are difficult to understand for outsiders, the CC licenses are designed to be easily understandable and are available in 41 languages. This way, users can easily avoid accidental misuse. The CC initiative developed a tool that enables researchers to find the license that best fits their interests [ 49 ]. Since the licenses are based on a modular concept ranging from relatively unrestricted licenses (CC BY, free to use, credit must be given) to more restricted licenses (CC BY-NC-ND, only free to share for non-commercial purposes, credit must be given), one can find an appropriate license even for the most sensitive content. Publishing under an open CC license will not only make the publication easy to access but can also help to increase its reach. It can stimulate other researchers and the interested public to share this article within their network and to make the best future use of it. Bear in mind that datasets published independently from an article may receive a different CC license. In terms of intellectual property, data are not protected in the same way as articles, which is why the CC initiative in the United Kingdom recommends publishing them under a CC0 (“no rights reserved”) license or the Public Domain Mark. This gives everybody the right to use the data freely. In an animal ethics sense, this is especially important in order to get the most out of data derived from animal experiments.

Data and code repositories

Sharing research data is essential to ensure reproducibility and to facilitate scientific progress. This is particularly true in animal research and the scientific community increasingly recognizes the value of sharing research data. However, even though there is increasing support for the sharing of data, researchers still perceive barriers when it comes to doing so in practice [ 99 – 101 ]. Many universities and research institutions have established research data repositories that provide continuous access to datasets in a trusted environment. Many of these data repositories are tied to specific research areas, geographic regions, or scientific institutions. Due to the growing number and overall heterogeneity of these repositories, it can be difficult for researchers, funding agencies, publishers, and academic institutions to identify appropriate repositories for storing and searching research data.

Recently, several web-based tools have been developed to help in the selection of a suitable repository. One example is Re3data, a global registry of research data repositories that includes repositories from various scientific disciplines. The extensive database can be searched by country, content (e.g., raw data, source code), and scientific discipline [ 49 ]. A similar tool to help find a data archive specific to the field is FAIRsharing, based at Oxford University [ 102 ]. If there is no appropriate subject-specific data repository or one seems unsuitable for the data, there are general data repositories, such as Open Science Framework, figshare, Dryad, or Zenodo. To ensure that data stored in a repository can be found, a DOI is assigned to the data. Choosing the right license for the deposited code and data ensures that authors get credit for their work.

Publication and connection of all outcomes

If scientists have used all available open science tools during the research process, then publishing and linking all outcomes represents the well-deserved harvest ( Fig 2 ). At the end of a research process, researchers will not just have 1 publication in a journal. Instead, they might have a preregistration, a preprint, a publication in a journal, a dataset, and a protocol. Connecting these outcomes in a way that enables other scientists to better assess the results that link these publications will be key. There are many examples of good open science practices in laboratory animal science, but we want to highlight one of them to show how this could be achieved. Blenkuš and colleagues investigated how mild stress-induced hyperthermia can be assessed non-invasively by thermography in mice [ 103 ]. The study was preregistered with animalstudyregistry.org , which is referred to in their publication [ 104 ]. A deviation from the originally preregistered hypothesis was explained in the manuscript and the supplementary material was uploaded to figshare [ 105 ].

thumbnail

Application of open science practices can increase the reproducibility and visibility of a research project at the same time. By publishing different research outputs with more detailed information than can be included in a journal article, researchers enable peers to replicate their work. Reporting according to guidelines and using transparent visualization will further improve this reproducibility. The more research products that are generated, the more credit can be attributed. By communicating on social media or additionally publishing slides from delivered talks or posters, more attention can be raised. Additionally, publishing open access and making the work machine-findable makes it accessible to an even broader number of peers.

https://doi.org/10.1371/journal.pbio.3001810.g002

It might also be helpful to provide all resources from a project in a single repository such as Open Science Framework, which also implements other, different tools that might have been used, like GitHub or protocols.io.

Communicating your research

Once all outcomes of the project are shared, it is time to address the targeted peers. Social media is an important instrument to connect research communities [ 106 ]. In particular, Twitter is an effective way to communicate research findings or related events to peers [ 107 ]. In addition, specialized platforms like ResearchGate can support the exchange of practical experiences ( Table 1 ). When all resources related to a project are kept in one place, sharing this link is a straightforward way to reach out to fellow scientists.

With the increasing number of publications, science communication has become more important in recent years. Transparent science that communicates openly with the public contributes to strengthening society’s trust in research.

Conclusions

Plenty of open science tools are already available and the number of tools is constantly growing. Translational biomedical researchers should seize this opportunity, as it could contribute to a significant improvement in the transparency of research and fulfil their ethical responsibility to maximize the impact of knowledge gained from animal experiments. Over and above this, open science practices also bear important direct benefits for the scientists themselves. Indeed, the implementation of these tools can increase the visibility of research and becomes increasingly important when applying for grants or in recruitment decisions. Already, more and more journals and funders require activities such as data sharing. Several institutions have established open science practices as evaluation criteria alongside publication lists, impact factor, and h-index for panels deciding on hiring or tenure [ 108 ]. For new adopters, it is not necessary to apply all available practices at once. Implementing single tools can be a safe approach to slowly improve the outreach and reproducibility of one’s own research. The more open science products that are generated, the more reproducible the work becomes, but also the more the visibility of a study increases ( Fig 2 ).

As other research fields, such as social sciences, are already a step ahead in the implementation of open science practices, translational biomedicine can profit from their experiences [ 109 ]. We should thus keep in mind that open science comes with some risks that should be minimized early on. Indeed, the more open science practices become incentivized, the more researchers could be tempted to get a transparency quality label that might not be justified. When a study is based on a bad hypothesis or poor statistical planning, this cannot be fixed by preregistration, as prediction alone is not sufficient to validate an interpretation [ 110 ]. Furthermore, a boom of data sharing could disconnect data collectors and analysts, bearing the risk that researchers performing the analysis lack understanding of the data. The publication of datasets could also promote a “parasitic” use of a researcher’s data and lead to scooping of outcomes [ 111 ]. Stakeholders could counteract such a risk by promoting collaboration instead of competition.

During the COVID-19 pandemic, we have seen an explosion of preprint publications. This unseen acceleration of science might be the adequate response to a pandemic; however, the speeding up science in combination with the “publish or perish” culture could come at the expense of the quality of the publication. Nevertheless, a meta-analysis comparing the quality of reporting between preprints and peer-reviewed articles showed that the quality of reporting in preprints in the life sciences is at most slightly lower on average compared to peer-reviewed articles [ 112 ]. Additionally, preprints and social media have shown during this pandemic that a premature and overconfident communication of research results can be overinterpreted by journalists and raise unfounded hopes or fears in patients and relatives [ 113 ]. By being honest and open about the scope and limitations of the study and choosing communication channels carefully, researchers can avoid misinterpretation. It should be noted, however, that by releasing all methodological details and data in research fields such as viral engineering, where a dual use cannot be excluded, open science could increase biosecurity risk. Implementing access-controlled repositories, application programming interfaces, and a biosecurity risk assessment in the planning phase (i.e., by preregistration) could mitigate this threat [ 114 ].

Publishing in open access journals often involves higher publication costs, which makes it more difficult for institutes and universities from low-income countries to publish there [ 115 ]. Equity has been identified as a key aim of open science [ 116 ]. It is vital, therefore, that existing structural inequities in the scientific system are not unintentionally reinforced by open science practices. Early career researchers have been the main drivers of the open science movement in other fields even though they are often in vulnerable positions due to short contracts and hierarchical and strongly networked research environments. Supporting these early career researchers in adopting open science tools could significantly advance this change in research culture [ 117 ]. However, early career researchers can already benefit by publishing registered reports or preprints that can provide a publication much faster than conventional journal publications. Communication in social media can help them establish a network enabling new collaborations or follow-up positions.

Even though open science comes with some risks, the benefits easily overweigh these caveats. If a change towards more transparency is accompanied by the implementation of open science in the teaching curricula of the universities, most of the risks can be minimized [ 118 ]. Interestingly, we have observed that open science tools and infrastructure that are specific to animal research seem to mostly come from Europe. This may be because of strict regulations within Europe for animal experiments or because of a strong research focus in laboratory animal science along with targeted research funding in this region. Whatever the reason might be, it demonstrates the important role of research policy in accelerating the development towards 3Rs and open science.

Overall, it seems inevitable that open science will eventually prevail in translational biomedical research. Scientists should not wait for the slow-moving incentive framework to change their research habits, but should take pioneering roles in adopting open science tools and working towards more collaboration, transparency, and reproducibility.

Acknowledgments

The authors gratefully acknowledge the valuable input and comments from Sebastian Dunst, Daniel Butzke, and Nils Körber that have improved the content of this work.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 6. Cary Funk MH, Brian Kennedy, Courtney Johnson. Americans say open access to data and independent review inspire more trust in research findings. Pew Research Center Website: Pew Research Center; 2019. Available from: https://www.pewresearch.org/science/2019/08/02/americans-say-open-access-to-data-and-independent-review-inspire-more-trust-in-research-findings/ .
  • 7. International Reproducibility Networks. International Networks Statement UK Reproducibility Network Website: UK Reproducibility Network. 2021. Available from: https://cpb-eu-w2.wpmucdn.com/blogs.bristol.ac.uk/dist/b/631/files/2021/09/International-Networks-Statement-v1.0.pdf .
  • 13. Article 36 of Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 amended by Regilation (EU) 2019/1010 of the European Parliament and of the Council of 5 June 2019. OJEU. 2010;L276:36.
  • 19. American Association for Cancer Research. Editorial Policies. 2021. Available from: https://aacrjournals.org/content/authors/editorial-policies .
  • 21. Diederich K, Schmitt K, Schwedhelm P, Bert B, Heinl C. Open Science Toolbox for Animal Research. Zenodo. 2022. Available from: https://zenodo.org/record/6497560 .
  • 29. NC3R. ARRIVE guidelines. NC3R Website. Available from: https://arriveguidelines.org/ .
  • 31. Canadian Institutes of Health Research. How to integrate sex and gender into research. Website of the Canadian Institutes of Health Research: Canadian Institutes of Health Research. 2019 [cited 2019 Aug 21]. Available from: https://cihr-irsc.gc.ca/e/50836.html .
  • 33. Simon T, Bate RAC. InVivoStat. Available from: https://invivostat.co.uk/ .
  • 35. Urbaniak G, Plous S. Research randomizer (version 4.0) [computer software]. 2013.
  • 47. Medical Research Council’s. Data sharing policy. UK Research and Innovation Website 2021. Available from: https://www.ukri.org/publications/mrc-data-sharing-policy/ .
  • 48. University of California Curation Center. DMPTool. 2021. Available from: https://dmptool.org/ .
  • 49. Digital Curation Centre. DMPOnline. Available from: https://dmponline.dcc.ac.uk/ . Digital Curation Centre; 2021.
  • 50. Harvard Longwood Medical Area Research Data Management Working Group. Biomedical Data Lifecycle. Harvard Medical School Website: Harvard Medical School; 2021. Available from: https://datamanagement.hms.harvard.edu/about/what-research-data-management/biomedical-data-lifecycle .
  • 51. Joint Information Systems Committee. Research data management toolkit JISC Website: JISC; 2018. Available from: https://www.jisc.ac.uk/guides/rdm-toolkit .
  • 54. German Centre for the Protection of Laboratory Animals (Bf3R). NTPs—Nicht Technische Projektzusammenfassungen 3R-SMART; 2020. Available from: https://www.3r-smart.de/index.php?id=6895 .
  • 55. Understanding Animal Research. Guide to writing non-technical summaries concordat on openness on animal research in the UK2018. Available from: https://concordatopenness.org.uk/guide-to-writing-non-technical-summaries .
  • 56. Gerlach B, Untucht C, Stefan A. Electronic Lab Notebooks and Experimental Design Assistants. In: Bespalov A, Michel MC, Steckler T, editors. Good Research Practice in Non-Clinical Pharmacology and Biomedicine. Cham: Springer International Publishing; 2020. p. 257–75.
  • 58. Adam BL, Birte L. ELN Guide: electronic laboratory notebooks in the context of research data management and good research practice–a guide for the life sciences. Cologne, Germany: ZB MED–Information Centre for Life Sciences; 2021.
  • 59. AgileBio. LabCollector Website https://labcollector.com/labcollector-lims/features/modules/animals-module/2022 . Available from: https://labcollector.com/labcollector-lims/features/modules/animals-module/ .
  • 61. Harvard Longwood Medical Area Research Data Management Working Group. Electronic Lab Notebook Comparison Matrix. Zenodo. 2021.
  • 70. Bio-protocol. Collaborating Journals bio-protocol website2021. Available from: https://bio-protocol.org/default.aspx?dw=Collaborating .
  • 76. Dinkel H. anishare: GitHub; [updated June 2018]. Available from: https://github.com/hdinkel/anishare .
  • 89. O’Connor AM. MERIDIAN: Menagerie of Reporting guidelines Involving Animals. Iowa State University; 2022. Available from: https://meridian.cvm.iastate.edu/ .
  • 90. The Jackson Laboratory. Mouse Nomenclature Home Page at the Mouse Genome Informatics website World Wide Web: The Jackson Laboratory,Bar Harbor, Maine. Available from: http://www.informatics.jax.org/mgihome/nomen/index.shtml .
  • 97. Directory of Open Access Journals. Find open access journals & articles. Available from: https://doaj.org/ . Directory of Open Access Journals, [DOAJ]; 2021.
  • 98. Gold Open Access research has greater societal impact as used more outside of academia [press release]. Springer Nature Website: Springer. Nature. 2020;30:2020.
  • 104. Franco NH. Can we use infrared thermography for assessing emotional states in mice?—A comparison between handling-induced stress by different techniques. Available from: animalstudyregistry.org . German Federal Institute for Risk Assessment (BfR); 2020. https://doi.org/10.17590/asr.0000224
  • Open access
  • Published: 10 July 2020

Guidelines for planning and conducting high-quality research and testing on animals

  • Adrian J. Smith   ORCID: orcid.org/0000-0002-8375-0805 1  

Laboratory Animal Research volume  36 , Article number:  21 ( 2020 ) Cite this article

15k Accesses

25 Citations

46 Altmetric

Metrics details

There are important scientific, legal and ethical reasons for optimising the quality of animal research and testing. Concerns about the reproducibility and translatability of animal studies are now being voiced not only by those opposed to animal use, but also by scientists themselves.

Many of the attempts to improve reproducibility have, until recently, focused on ways in which the reporting of animal studies can be improved. Many reporting guidelines have been written. Better reporting cannot, however, improve the quality of work that has already been carried out - for this purpose better planning is required.

Planning animal studies should involve close collaboration with the animal facility where the work is to be performed, from as early a stage as possible. In this way, weaknesses in the protocol will be detected and changes can be made before it is too late. Improved planning must focus on more than the “mathematical” elements of experimental design such as randomisation, blinding and statistical methods. This should include focus on practical details such as the standard of the facility, any need for education and training, and all the factors which can improve animal welfare.

The PREPARE ( Planning Research and Experimental Procedures on Animals: Recommendations for Excellence ) checklist was developed to help scientists be more aware of all the issues which may affect their experiments. The checklist is supported by comprehensive webpages containing more information, with links to the latest resources that have been developed for each topic on the list.

Introduction

There is now widespread international acceptance for the 3R-concept ( Replacement, Reduction, Refinement [ 1 ]) when planning research or testing which may involve the use of animals or animal tissue:

Replacement where possible with non-animal methods

Reduction of the number of animals to the minimum which achieves a valid result, and

Refinement of the care and use of those animals which must be used, to maximise animal welfare and data quality.

The three Rs are now part of animal research legislation in many countries [ 2 ]. In Europe, the European Union Directive 2010/63 explicitly states that Replacement is the ultimate aim [ 3 ]. An assessment of the need to use animals at all must therefore be the first stage of the process when planning preclinical research or testing. The large range of alternatives now available is beyond the scope of this paper, but there are many sources of information of this topic (e.g. [ 4 ]).

If animal use is unavoidable, attention must be paid to a long list of known variables which may affect the data collected from them. Unlike test-tube ingredients, animals are complex individuals, differing in their genetic make-up, microbial composition, and behavioural responses to their environment and procedures to which they are subjected. Again, a review of all these factors is beyond the scope of this paper, but information on the effects of these variables is available (e.g. [ 5 ]).

In addition to the legal and scientific incentives, there are good ethical reasons for aiming for the highest possible quality of animal-based research and testing. This is particularly important to remember within basic research in academia, where scientists may be rewarded for the publication of new knowledge rather than for the application of their research results.

In most cases, animal research and testing is performed to learn more about another species, usually humans, rather than to shed more light on the species being used as a model. This work must, therefore, be valid, robust and translatable. As Ritskes-Hoitinga & Wever [ 6 ] remarked: ‘we need a cultural change in which researchers are rewarded for producing valid and reproducible results that are relevant to patients, and for doing justice to the animals being used’. Ensuring translatability is difficult enough in itself [ 7 ], and it is totally dependent upon well-planned studies.

Quality does not come automatically: it necessitates detailed planning from day one, to take into account the effects of the internal and external parameters which affect the animals’ response to a procedure. In addition, the animal facility must have a large number of routine procedures in place, both to maintain the stability of the environment and to tackle any emergencies which may arise. Many scientists who do not work on a regular basis within an animal facility are probably unaware of the number and subtlety of many of these factors. Input from the facility’s veterinary staff will be central to this process.

Guidelines for planning and conducting animal-based studies help both scientists and animal facilities to discuss the issues mentioned above at an early stage, while it is still possible to make improvements in the protocol. Scientists may need to be reminded that the greatest source of variation is likely to come from the animals themselves, rather than from their treatments. Scientists may assume that the facility is dealing with these issues, but this is not always the case. The classic studies by Crabbe and coworkers, who set up standardised behavioural tests on inbred mouse strains in different laboratories simultaneously, showed how unforeseen variables can lead to significant differences in results [ 8 , 9 ].

Fortunately, the need for detailed planning guidelines is becoming clearer, because the quality of animal experiments is now increasingly being criticised, not just by opponents of animal research but also by scientists themselves (e.g. [ 10 , 11 , 12 , 13 , 14 ]). The use of strong words such as ‘research waste’ and ‘false results’ (e.g. [ 15 , 16 ]) is becoming commonplace.

Unfortunately, initiatives to solve the reproducibility crisis tend often to focus on just two of the issues: the more “mathematical” elements of experimental design, and better reporting (e.g. [ 17 ]). These issues are of course important, and include the following items, among others:

Publication bias (reporting only positive results)

Low statistical power

P -value hacking (manipulating data to obtain statistical significance)

HARKing (Hypothesising After the Results are Known)

Lack of randomisation and blinding

Norecopa has made a collection of literature references about these concerns [ 18 ].

However, those familiar with the workings of an animal facility can add many additional and important issues to this list, which may be less conspicuous but which are equally critical to the validity of an experiment. These may be grouped into:

Artefacts caused by internal factors such as genetic diversity and subclinical infections

Artefacts caused by external effects such as transport, cage conditions, re-grouping of animals, food deprivation and the procedure itself

The need for contingency plans to reduce or avoid these and other risks in the facility

Reporting does not improve the standard of experiments

Good reporting is of course important, to allow readers to evaluate the scientific quality of the publication and the strength of the conclusions drawn by the authors. Insistence on better reporting is not new. When Laboratory Animal Science as we know it today was under development in the second half of the last century, focus was placed at an early stage on the low standard of reporting in the scientific literature. In a classic paper, Jane Smith and colleagues [ 19 ] examined the descriptions of laboratory animals, and the procedures for which they were used, in 149 scientific papers published in 8 major journals from 1990 to 1991. The percentages of papers not reporting basic details about the animals were alarmingly high (e.g. sex: 28%; age: 52%; weight: 71%; source: 53%), and 30% of the papers did not mention how many animals were used. The percentages were even higher for environmental factors such as room temperature (72%), photoperiod (72%), relative humidity (89%) and the number of animals per cage (73%).

Many reporting guidelines have been written since the 1980s, to encourage improvements. These include both general guidance (e.g. [ 20 , 21 , 22 , 23 , 24 ]) and guidelines written for specific types of experiment (e.g. [ 25 , 26 , 27 , 28 ]).

It is vitally important to remember that better reporting of an experiment which has already been performed cannot improve the quality of that experiment. A good salesman may manage to sell more burnt cakes if he describes them well (and if he is a good psychologist), but they will still be burnt and they will not taste better. To improve a cake, one must go back to the kitchen and modify the ingredients and/or the baking conditions. In the case of animal studies, just as in the kitchen, the quality of the result is dependent upon planning and conducting, not reporting.

This has been well demonstrated by the way in which the ARRIVE ( Animal Research: Reporting of In Vivo Experiments ) guidelines for reporting animal experiments [ 23 ] have been received and implemented. A new version of ARRIVE was developed in 2019 [ 29 ], because, as the authors point out, despite endorsement by more than a thousand journals, only a small number of these journals actively enforce compliance. Indeed, a Swiss study revealed that 51% of researchers using journals that had endorsed ARRIVE had even never heard of them [ 30 ]. The authors of ARRIVE concluded that most journals are unlikely to be able to provide the resources needed to ensure compliance with all the items on the original checklist. The new version of the ARRIVE guidelines has a shorter checklist of ‘essential’ items, to try and increase compliance. This situation demonstrates clearly how important the planning stage is for the quality of scientific papers.

Scientists should contact the animal facility as soon as they have concrete plans of conducting animal studies. Collaboration between scientists and facility staff will be needed to discuss all stages of the study, up to and including the end of the study which involves depopulation, decontamination and waste disposal. An essential part of this process is attention to the needs of the facility staff. This includes, among other things, their education and training, personal protection, their workload, and means of ensuring adequate staffing levels at all times during the study.

Preparation for preclinical studies: a modern definition of the 3Rs

The concept of the three Rs (Replacement, Reduction and Refinement) developed by Russell and Burch over 60 years ago [ 1 ] was written in an era when the most pressing need was to reduce the inhumanity of animal experiments. Technology at that time did not offer the same potential to replace such experiments as is available today - neither was there so much focus on reducing animal numbers by more sophisticated experimental design.

So today, preparation for robust, valid and humane preclinical studies should go beyond a mere search for more humane methods, using more contemporary definitions of the three Rs [ 31 ]:

Replacement is not just the use of methods which achieve a given purpose without procedures on animals, but also about total avoidance of animal use (Non-Animal Models, NAMs) by innovative approaches to scientific problems, for example by studies directly on human tissue

Reduction is about obtaining comparable information from fewer animals, or for obtaining more information from the same number of animals. Today, reduction also focuses on optimalisation of experimental design so that experiments are robust and reproducible

Refinement methods minimise pain, suffering or distress, but also improve animals’ well-being, since modern research demonstrates that this affects the quality of the data collected from the animals. Modern technology can be harnessed to refine the methods and equipment we use on animals.

Animals that are in harmony with their surroundings will provide more reliable scientific data in an experiment, because the parameters measured will reflect the treatment they are given, rather than being affected by stress. It is indeed true that ‘happy animals give better science’ [ 32 , 33 ].

For these reasons, scientists must be given comprehensive guidelines for planning any experiments which may involve the use of animals, or material taken from animals.

The PREPARE guidelines

Based on the authors’ experiences over the past 30 years in designing and supervising animal experiments, comprehensive guidelines for planning animal studies have been constructed, called PREPARE ( Planning Research and Experimental Procedures on Animals: Recommendations for Excellence ) [ 34 ].

PREPARE contains a checklist, which serves as a reminder of items that should be considered before and during the study, see Fig. 1 . This checklist is available in over 20 languages.

figure 1

The PREPARE checklist (available at https://norecopa.no/PREPARE/prepare-checklist ). From Smith, AJ, Clutton, RE, Lilley, E, Hansen KEAa, Brattelid, T. PREPARE: Guidelines for planning animal research and testing. Laboratory Animals, 2018;52:135–141. DOI: https://doi.org/10.1177/0023677217724823 . Published under Open Access, Creative Commons licence CC BY-NC 4.0

Many of these items will need their own sets of checklists or standard operating procedures, in the same way that pilots, however experienced, use many checklists, even on routine flights, before, during and after the flight. Many of these checklists will be produced by the animal facility itself. Scientists should, however, check that these are in place, and discuss their contents with the facility.

Importantly, and unlike many reporting guidelines, the PREPARE checklist is supported by a website which provides more information on each of the 15 main topics on the checklist ( https://norecopa.no/PREPARE ). The website gives more complete guidance in the form of text and links to quality guidelines and scientific papers. This website is continually updated as new knowledge develops.

The PREPARE guidelines contain, of course, many of the elements found in reporting guidelines. However, PREPARE contains additional material about issues that can have dramatic effects on the scientific validity of the research, as well as on health and safety, and animal welfare.

Contingency plans and resources

Human nature is such that we tend to believe that accidents only happen to others. If this belief is followed in an animal facility, it will not only put the outcome of a scientific study at risk, but it will also endanger the health and lives of both the animals and personnel who are directly or indirectly involved in the study. As indicated above, it is important to ensure the quality of the whole process from obtaining the animals to disposal of waste and decontamination after the study.

A competent animal facility is one that “hopes for the best but is prepared for the worst” . Facilities with comprehensive and realistic contingency plans will be well placed to tackle disasters, including lockdown situations in connection with a pandemic. There are many resources available that describe the general principles involved, but these must be tailored to the local conditions at each facility. Building a contingency plan from scratch is a time-consuming affair, but it is an excellent insurance policy for the day when a threatening situation arises. Those lacking such a plan should begin with a risk assessment of the facility and its activities, and start by writing contingency plans for the most important of these scenarios.

In its simplest terms, risk assessment is the consequence of a threat multiplied by the likelihood of it occurring. The consequences of the threat include the level of tolerance of the event occurring, which may be anything from “totally unacceptable” to “acceptable within certain limits”. Assessments should be performed at several levels, since threats and their consequences may differ, depending upon where and when they occur, for example:

at the facility level (e.g. the consequences of flooding or fire)

at the room level (e.g. the consequences of power outages to vital equipment)

in connection with specific types of research (e.g. risk of human infection)

It is wise to construct a contingency plan based upon the assumption that ‘what can go wrong will go wrong at some time’ [ 35 ], and that this will happen when it is least convenient, for example during public holidays when staffing levels may be low.

Clearly, both the design of animal studies and the production of contingency plans must involve close collaboration between management, scientists and technical staff, including external suppliers of equipment and services.

The Covid-19 pandemic has demonstrated the importance of being adequately prepared. Animal facilities have had to quickly write contingency plans to tackle situations which were barely imaginable before the outbreak. This work has demanded enormous time and energy, at the expense of conducting research, and it has left many facilities with the unpleasant task of having to euthanise large numbers of healthy animals. Clearly, it is easier to tackle these situations, however improbable they may seem, if the majority of the issues which may arise have already been discussed, and plans made to tackle them.

At the time of writing, some specific advice on contingency plans for the Covid-19 pandemic is beginning to emerge, and existing advice on disaster planning is being re-examined (see [ 36 ]).

Collaboration between scientists and animal care staff

There are many good reasons for early and close collaboration between scientists and the staff at the animal facility where they hope to carry out the work. This collaboration should include dialogue with the animal carers and technicians, not just with the managers. Some of the reasons include the following:

the staff have a moral right to know what will happen to animals in their care.

they will be more motivated to look for ways of refining the study. This will improve both animal welfare and the scientific quality, including reliability of the data being collected from the animals.

the animal care staff know the possibilities, and the limitations, of the animal facility best. They are less likely to play limitations down for fear of the study being transferred to another facility.

they often possess a large range of practical skills and are good at lateral thinking from one study to another - they may be able to suggest a refinement which they have already seen in another species.

they know the animals best

the animals know them best

lack of involvement of the animal care staff creates anxiety, depression and opposition to animal research, as well as limiting creativity which might improve the experiments

A mutually respectful dialogue between technical and academic staff will also help resolve issues quickly which may otherwise cause disagreements later, such as the division of labour and responsibilities all stages of the study. It will also help to avoid the loss of important data due to misunderstandings about who was to collect it.

Culture of care and challenge

To facilitate this dialogue, steps should be taken to foster a culture of care among all members of the staff and research teams. This is actively encouraged in European legislation [ 3 ]. Animal research will inevitably, from time to time, involve studies where sentient creatures exhibit pain, suffering and distress. It is therefore vital to consider the mental health of those caring for these animals or observing this, to avoid compassion fatigue. In Europe, an International Culture of Care network has been established, to share experiences in implementing such a culture [ 37 ].

Closely related to a culture of care is the concept of a Culture of Challenge [ 38 ]. This is all about ‘looking for the acceptable, rather than choosing the accepted’. Comments such as “we have always done it that way” or “we do it as often as necessary” should automatically start a discussion about how to change these habits.

It can be hoped that the current focus on poor reproducibility in animal studies can be turned into an initiative to ensure better planning of all stages, rather than focusing on improving reporting. Otherwise, we are in danger of wasting time, discussing the quality of the lock on the door of the stable from which the horse has already escaped [ 39 ].

Availability of data and materials

This review paper does not include original data. Figure  1 is from a previous paper, published under Open Access, Creative Commons licence CC BY-NC 4.0.

Russell WMS, Burch RL. The principles of humane experimental technique. London: Methuen; 1959.

Javier G. Laboratory animals: regulations and recommendations for the care and use of animals in research. 2nd ed. ISBN 9780128498804. London: Academic Press; 2018.

European Commission. Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposes. 2010. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:276:0033:0079:en:PDF . Accessed 15 May 2020.

Google Scholar  

EURL-ECVAM. Finding information on alternative methods. 2020 https://ec.europa.eu/jrc/en/eurl/ecvam/knowledge-sharing-3rs/finding-information-alternative-methods . Accessed 15 May 2020.

Norecopa. Housing and husbandry. 2020a. https://norecopa.no/prepare/12-housing-and-husbandry . Accessed 15 May 2020.

Ritskes-Hoitinga M, Wever K. Improving the conduct, reporting, and appraisal of animal research. BMJ. 2018. https://doi.org/10.1136/bmj.j4935 .

Pound P, Ritskes-Hoitinga M. Is it possible to overcome issues of external validity in preclinical animal research? Why most animal models are bound to fail. J Transl Med. 2018. https://doi.org/10.1186/s12967-018-1678-1 .

Crabbe JC, Wahlsten D, Dudek BC. Genetics of mouse behavior: interactions with laboratory environment. Science. 1999;284:1670–2.

Article   CAS   Google Scholar  

Wahlsten D, Metten P, Phillips TJ, Boehm LL II, Burkhart-Kasch C, Dorow J, et al. Different data from different labs: Lessons from studies of gene-environment interaction. J Neurobiol. 2003. https://doi.org/10.1002/neu.10173 .

Avey MT, Moher D, Sullivan KJ, Fergusson D, Griffin G, Grimshaw JM, et al. The devil is in the details: incomplete reporting in preclinical animal research. PLoS One. 2016;11:e0166733. https://doi.org/10.1371/journal.pone.0166733 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016. https://doi.org/10.1038/533452a .

Bradbury AG, Eddleston M, Clutton RE. Pain management in pigs undergoing experimental surgery; a literature review. Br J Anaesthesiol. 2016. https://doi.org/10.1093/bja/aev301 .

Enserink M. Sloppy reporting on animal studies proves hard to change. Science. 2017. https://doi.org/10.1126/science.357.6358.1337 .

Skibba R. Swiss survey highlights potential flaws in animal studies. Nature. 2016. https://doi.org/10.1038/nature.2016.21093 .

Ioannidis JPA. Why Most published research findings are false. PLoS Med. 2005. https://doi.org/10.1371/journal.pmed.0020124 .

Macleod MR. Biomedical research: increasing value, reducing waste. Lancet. 2014;383. https://doi.org/10.1016/S0140-6736(13)62329-6 .

Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nat Hum Behav. 2017. https://doi.org/10.1038/s41562-016-0021 .

Norecopa. Experimental design and reporting: concerns. 2020b. https://norecopa.no/concerns . Accessed 15 May 2020.

Smith JA, Birke L, Sadler D. Reporting animal use in scientific papers. Lab Anim. 1997. https://doi.org/10.1258/002367797780596176 .

Brattelid T, Smith AJ. Guidelines for reporting the results of experiments on fish. Lab Anim. 2000. https://doi.org/10.1258/002367700780457590 .

Ellery AW. Guidelines for specification of animals and husbandry methods when reporting the results of animal experiments. Working Committee for the Biological Characterization of Laboratory Animals / GV-SOLAS. Lab Anim. 1985. https://doi.org/10.1258/002367785780942714 .

Hooijmans CR, Leenaars M, Ritskes-Hoitinga M. A gold standard publication checklist to improve the quality of animal studies, to fully integrate the three Rs, and to make systematic reviews more feasible. Altern Lab Anim. 2010. https://doi.org/10.1177/026119291003800208 .

Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010. https://doi.org/10.1371/journal.pbio.1000412 .

Öbrink KJ, Rehbinder C. Animal definition: a necessity for the validity of animal experiments? Lab Anim. 2000. https://doi.org/10.1258/002367700780457608 .

Bramhall M, Flórez-Vargas O, Stevens R, Brass A, Cruickshank S. Quality of methods reporting in animal models of colitis. Inflamm Bowel Dis. 2015. https://doi.org/10.1097/MIB.0000000000000369 .

Guidelines for the treatment of animals in behavioural research and teaching. Anim Behav. 2020. https://doi.org/10.1016/j.anbehav.2019.11.002 .

Smith MM, Clarke EC, Little CB. Considerations for the design and execution of protocols for animal research and treatment to improve reproducibility and standardization: DEPART well-prepared and ARRIVE safely. Osteoarthr Cartil. 2017. https://doi.org/10.1016/j.joca.2016.10.016 .

STAIR Consensus Conferences. 2017. http://www.thestair.org/ . Accessed 15 May 2020.

Du Sert NP, Hurst V, Ahluwalia A, Alam S, Avey MT, Baker M, et al. The ARRIVE guidelines 2019: updated guidelines for reporting animal research. bioRxiv. 2019. https://doi.org/10.1101/703181 .

Reichlin TS, Vogt L, Wurbel H. The researchers’ view of scientific rigor-survey on the conduct and reporting of in vivo research. PLoS One. 2016. https://doi.org/10.1371/journal.pone.0165999 .

Norecopa. The three R’s. 2020c. https://norecopa.no/alternatives/the-three-rs . Accessed 15 May 2020.

NC3Rs. Training techniques for less stressed laboratory rodents. 2019. https://www.nc3rs.org.uk/news/highlights-2019-nc3rsiat-animal-technicians-symposium#rodenttraining . Accessed 18 May 2020.

Poole T. Happy animals make good science. Lab Anim. 1997. https://doi.org/10.1258/002367797780600198 .

Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim. 2018a. https://doi.org/10.1177/0023677217724823 .

Murphy’s Law. Wikipedia. 2020. https://en.wikipedia.org/wiki/Murphy%27s_law . Accessed 15 May 2020.

Norecopa. Be PREPARED. 2020d. https://norecopa.no/be-prepared . Accessed 15 May 2020.

Norecopa (2020e): Culture of care. https://norecopa.no/CoC . Accessed 15 May 2020.

Louhimies S. Refinement facilitated by the Culture of Care. In: ALTEX Proceedings of the EUSAAT 2015-Linz 2005 Congress, 20-23 September, Linz, vol. 4; 2015. p. 154. http://eusaat-congress.eu/images/2015/Abstractbook_EUSAAT_2015_Linz_2015.pdf . Accessed 15 May 2020.

Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. Improving animal research: PREPARE before you ARRIVE. BMJ. 2018b. https://doi.org/10.1136/bmj.k760 .

Download references

Acknowledgements

The author acknowledges the contributions of the co-authors of the PREPARE guidelines and other colleagues within Laboratory Animal Science over many years, which led to the development of the PREPARE guidelines and many of the principles mentioned in this paper.

The work of Norecopa has been supported by a large number of sponsors ( https://norecopa.no/sponsors ). No specific funding was obtained for the production of this paper. The author thanks the Universities Federation of Animal Welfare (UFAW), U.K., for funding the open access publication of the paper from which Fig.  1 is taken.

Author information

Authors and affiliations.

Norecopa, P.O. Box 750 Sentrum, 0106, Oslo, Norway

Adrian J. Smith

You can also search for this author in PubMed   Google Scholar

Contributions

Adrian Smith is sole author and contributor to this paper. The author read and approved the final manuscript.

Authors’ information

Adrian Smith is Secretary of Norecopa, which is affiliated to the Norwegian Veterinary Institute. The views expressed in this paper are his own, and not necessarily those of Norecopa or the Veterinary Institute.

Corresponding author

Correspondence to Adrian J. Smith .

Ethics declarations

Competing interests.

Adrian Smith is lead author of the PREPARE guidelines and has editorial responsibility for creation and updating of the Norecopa website ( https://norecopa.no ), both of which are cited several times in this paper. He has no other competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Smith, A.J. Guidelines for planning and conducting high-quality research and testing on animals. Lab Anim Res 36 , 21 (2020). https://doi.org/10.1186/s42826-020-00054-0

Download citation

Received : 20 May 2020

Accepted : 25 June 2020

Published : 10 July 2020

DOI : https://doi.org/10.1186/s42826-020-00054-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Reproducibility

Laboratory Animal Research

ISSN: 2233-7660

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

animal research study design

  • Search Menu
  • Advance articles
  • Themed Issues
  • Author Guidelines
  • Open Access
  • About ILAR Journal
  • About the Institute for Laboratory Animal Research
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, determining a suitable research strategy, choosing a model, principles of experimental design, designing powerful experiments: controlling variation and choosing an appropriate sample size, statistical analysis, presentation of results, appendix: a numerical example.

  • < Previous

Design and Statistical Methods in Studies Using Animal Models of Development

Michael F.W. Festing, Ph.D., has retired from the MRC Toxicology Unit, University of Leicester, UK. Dr. Festing continues to lecture, publish, and consult on statistics and genetics.

  • Article contents
  • Figures & tables
  • Supplementary Data

Michael F. W. Festing, Design and Statistical Methods in Studies Using Animal Models of Development, ILAR Journal , Volume 47, Issue 1, 2006, Pages 5–14, https://doi.org/10.1093/ilar.47.1.5

  • Permissions Icon Permissions

Experiments involving neonates should follow the same basic principles as most other experiments. They should be unbiased, be powerful, have a good range of applicability, not be excessively complex, and be statistically analyzable to show the range of uncertainty in the conclusions. However, investigation of growth and development in neonatal multiparous animals poses special problems associated with the choice of “experimental unit” and differences between litters: the “litter effect.” Two main types of experiments are described, with recommendations regarding their design and statistical analysis: First, the “between litter design” is used when females or whole litters are assigned to a treatment group. In this case the litter, rather than the individuals within a litter, is the experimental unit and should be the unit for the statistical analysis. Measurements made on individual neonatal animals need to be combined within each litter. Counting each neonate as a separate observation may lead to incorrect conclusions. The number of observations for each outcome (“n”) is based on the number of treated females or whole litters. Where litter sizes vary, it may be necessary to use a weighted statistical analysis because means based on more observations are more reliable than those based on a few observations. Second, the more powerful “within-litter design” is used when neonates can be individually assigned to treatment groups so that individuals within a litter can have different treatments. In this case, the individual neonate is the experimental unit, and “n” is based on the number of individual pups, not on the number of whole litters. However, variation in litter size means that it may be difficult to perform balanced experiments with equal numbers of animals in each treatment group within each litter. This increases the complexity of the statistical analysis. A numerical example using a general linear model analysis of variance is provided in the Appendix. The use of isogenic strains should be considered in neonatal research. These strains are like immortal clones of genetically identical individuals (i.e., they are uniform, stable, and repeatable), and their use should result in more powerful experiments. Inbred females mated to males of a different inbred strain will produce F1 hybrid offspring that will be uniform, vigorous, and genetically identical. Different strains may develop at different rates and respond differently to experimental treatments.

The principles of experimental design are universal. They apply equally, for example, to experiments in the life sciences involving humans, animals, plants, and cell cultures. However, in some areas of research, the experimental subjects may have characteristics that necessitate special attention if the experiments are to be designed well and analyzed correctly. Experiments involving neonates of multiparous species are just such a special case. Investigators must identify the correct “experimental unit” (EU 1 ) and take “litter effect” into account for the experiments to afford correct results. These critical aspects of experimental design are discussed below.

There are several different types of investigation, which include but are not limited to the following: observational studies, pilot studies, exploratory experiments, confirmatory studies, and experiments that seek parameter estimates. The first example, observational studies , do not involve the imposition of an experimental treatment. The comparison of animals of two different genotypes is an observational study even though it may have the appearance of being an experiment. Because it is not possible to assign a genotype to an individual at random, it is the investigator's responsibility to ensure that the animals are, to the extent possible, identical in all other respects apart from their genotype. However, the statistical methods used for observational and experimental studies are essentially the same.

Pilot studies are usually small investigations, sometimes involving only a single animal, with the aim of testing the logistics of a proposed study, and sometimes of gaining preliminary data to be used in the design of a more definitive experiment. For example, a pilot study could be used to assess whether dose levels are appropriate, and to gain information on likely responses and variability.

Exploratory experiments look at the pattern of response to some treatment but are not based on a formal, testable hypothesis. Often many outcomes (characters) are measured, requiring multiple statistical tests. Even though one may use a correction of the p values (e.g., Bonferroni's method of dividing the chosen critical value [usually 0.05] by the number of statistical tests) ( Roberts and Russo 1999 ), exploratory experiments tend to generate more questions than they provide answers. They are usually used to generate hypotheses to be tested in a confirmatory study , where the aim is to test some formal, prestated, preferably quite simple hypothesis. Experiments may also be done to estimate parameters such as dose-response curves, means, and proportions.

There is surprisingly little discussion of the concept of “models” in biomedical research despite their extensive use ( Festing 2004 ). According to the American philosopher Marx Wartofsky, “Theories, hypotheses, models and analogies I take all to be species of a genus, and my thesis is best stated directly by characterizing this genus, as representation (although “imaging” or “mirroring” will do quite as well)” ( Wartofsky 1979 ). He goes on to say, “There is an additional trivial truth, which may strike some people as shocking: anything can be a model of anything else! This is to say no more than that between any two things in the universe there is some property they both share….”

Although the preceding statements are of little help in deciding whether or not a particular animal or in vitro system is a good model of humans, it does at least clarify the fact that models do not have to resemble the thing being modeled in every respect. Indeed, in some cases it is essential for the model to be different from the thing being modeled. Rodents are used widely as models of humans because they are small and economical. The availability of isogenic strains is also an advantage because they make it possible to do efficient experiments using fewer animals and scientific resources. The critical factor is whether the model is like humans for the specific system being modeled, such as the growth and differentiation of some organ or biochemical characteristic, or the response of neonates to xenobiotics.

The basic principles of experimental design were formulated many years ago ( Fisher 1960 ), and they remain unchanged. To understand the ensuing brief discussion of these principles, however, it is first necessary to understand the two special characteristics of neonates that strongly influence the design and statistical analysis of experiments involving them.

“Experimental Unit”

Experiments normally involve a number of subjects, or EUs, in each treatment group to afford information about interunit variation and a comparison with the variation between treatment groups. Each EU must be capable of being assigned to a different treatment group, and the data recorded on the individual EUs are subjected to the statistical analysis.

The EU in animal research is commonly the individual animal. However, in research involving neonates, if the pregnant female or the whole litter is subjected to an experimental treatment, the female or the whole litter, not the individual neonate, is the EU, because individual pups within a litter do not receive different treatments (although see below). It is incorrect to use the data from individual pups because the number of independent observations (“n”) would be too large and the results would be incorrect, potentially leading to false-positive results ( Raubertas et al. 1999 ; Zorrilla 1997 ). Values from individual neonates may be taken into account, for example, by averaging them. Such averaging could improve the precision of the litter mean, although they do not contribute as individual EUs ( Haseman and Hogan 1975 ).

Because litters vary in size, if all the neonates are measured in each litter, the averages will vary in precision according to the number of pups per litter. It may be advantageous to use a weighted statistical analysis when evaluating the results. Pups from large litters may also be smaller and less developed than those from smaller litters, so if size (e.g., crown-rump length) is an important outcome, it may be important to correct for this difference in the statistical analysis. Where the outcome is a binary variable such as “normal/abnormal,” a full statistical analysis may require advanced statistical methods ( Hunt and Bowman 2004 ; Yamamoto and Yanagimoto 1994 ).

If individual pups within a litter are subjected to different treatments either postnatally or as a result of surgical or other intervention on the pregnant female, then “n” will be based on the number of individual pups in a treatment group, and the individual pup is the EU. It is possible to have an experiment that is a mixture of a between-litter and a within-litter design. For example, if pregnant females receive one of two or more treatments (e.g., a drug treatment or a vehicle control), and then after birth the neonates within each litter receive additional individual treatments (e.g., some but not all receive a vitamin supplement), then for the drug treatment the pregnant female is the EU, while for the vitamin supplement the neonate is the EU. This design, known as a “split-plot” experimental design ( Cox 1958 ), is often useful although the statistical analysis probably requires professional advice.

“Litter Effect”

In most cases, individual neonates within a litter are more similar than individuals from different litters; in other words, litters differ in a wide range of characteristics. If genetically heterogeneous animals are being used, then individuals within a litter will be full sibs and genetically more similar than unrelated animals. Both pre- and postnatally, animals also tend to have a similar environment. For example, animals from a large litter may be relatively small and immature. There may even be inaccuracies in recording time of birth so that some litters appear to be older than they really are.

It is important to consider “litter effect” when designing an experiment that involves neonates as the EUs. Suppose, for example, that the experiment involves treating some of the neonates with a hormone, while others receive a placebo. Operationally it would be most convenient to treat whole litters, because then pups would not need to be individually identified before weaning. However, in such a case, the litter rather than the individual pups will be the EU. Each litter will be an “n” of one rather than the number of pups in the litter. In contrast, if pups within a litter can be individually identified and assigned to the treatments, then the pups will be the EUs, and each pup will be an “n” of one. However, in this case although the pups within a litter will tend to be quite similar (e.g., in a character such as weight), there may be large differences between pups having the same treatment but in different litters. It will be necessary to remove these differences between litters in the analysis because otherwise, the power of the experiment to detect treatment effects will be severely reduced.

An additional complication is that litters vary in size, so it may be difficult to obtain a balanced design with equal numbers of animals on each treatment within every litter. As a result, it may even be difficult to calculate treatment means. A numerical example of the analysis of a within-litter experiment illustrating some of these problems is given in the Appendix.

Some litter effects due to the common environment of litter mates may gradually disappear once the animals are weaned and are no longer dependent on milk supply. However, litter effects due to the genetic similarity of full sibs will remain for the life of the animals, assuming studies are performed using genetically heterogeneous animals such as Sprague-Dawley rats or any breed of rabbits.

Cross-fostering soon after birth may reduce but will not entirely eliminate litter effects. For example, cross-fostering did not eliminate a litter effect associated with susceptibility to dental caries ( Peeling and Looker 1987 ), a highly inherited character, in outbred Sprague-Dawley rats, or an effect on growth rate ( Raubertas et al. 1999 ). Standardization of postnatal litter size is a common practice and is likely to reduce, but not eliminate, between-litter variability associated with maternal effects such as limitations in milk yield. One commercial company pooled all 2-day-old Sprague-Dawley pups and made up single sex litters of 12 young. Most female pups were discarded at this age because demand was almost entirely for males. Females left without a litter were returned to the breeding colony where they soon became pregnant again without any apparent problems ( Lane-Peter et al. 1968 ). Such a procedure will reduce but not eliminate litter effects because females will still differ in milk yield. It may increase the variability within a litter because individuals will no longer be full siblings, and the procedure is likely to be practical only in breeding colonies where large numbers of females litter at the same time. Nevertheless, it may be worth investigating for neonatal research because it would be very convenient for all litters to have the same number of pups.

Requirements for a Well-designed Experiment

The principles of good experimental design have been known for many years ( Cox 1958 ). These principles are described very briefly as follows.

Absence of bias must be ensured through the use of the use of randomization and blinding. Animals must be selected and assigned to the treatment groups in such a way that there is no systematic difference among groups before starting or during the conduct of the experiment. These factors may be mistaken for the effects of the treatment. This goal is usually achieved by assigning animals (or other experimental subjects) to the treatment groups using a formal randomization system. Subsequent housing and necessary measurements should be in random order. Randomization distributes uncontrolled variation among the groups with equal probability.

The exact method of randomization depends on the design of the experiment. In the most simple “completely randomized” design (i.e., in a between-litter experiment), subjects (e.g., pregnant females) are simply assigned to treatments regardless of their characteristics. Thus, if a teratology experiment involves 20 treated and 20 control pregnant rats, 20 bits of paper could have the letter “C” and 20 the letter “T” written on them. These would be placed in a receptacle and thoroughly shaken. A piece of paper would then be withdrawn, and the first rat would be assigned to the indicated treatment. This process would be repeated with all of the remaining rats. When the neonate is to be the EU in a within-litter experiment, randomization must be done separately within each litter. Again, it is possible to use physical randomization, tables of random numbers, or random numbers generated by a computer.

Ideally, subjects should be identified by codes so that the investigator and other staff members are blind with respect to the treatment groups to the extent possible. Blinding is likely to be particularly important when there is a subjective element to recording observations (e.g., when reading and scoring histological preparations). It would be very unacceptable, for example, to score, measure, or record data from all of the controls first, and subsequently from each treatment group, because standards may change as the scorer becomes more expert. Thus, all manipulations and recording of information should be done either in random order or in such a way as to take account of any time trends with treatment groups equally represented at each time point.

A powerful experiment is one that has a high probability of detecting a difference between treatment groups, assuming that a difference exists. Power depends on the relationship between the variability of the experimental subjects, the size of the treatment effect, and the sample size (discussed in more detail below). Large experiments are likely to be expensive and may exceed the available resources of a facility, so it is worth spending some time and effort to choose uniform experimental material that is sensitive to the effects of the treatment. Thus, if the experimental subjects are adult animals (as in a teratogenesis experiment), they should be closely matched for age, weight, genotype (e.g., by using an isogenic strain where practical), and previous history.

Choosing the Strain or Breed

There are many different strains of mice ( www.informatics.jax.org ) and rats ( www.rgdb.mcw.org ) as well as several breeds of rabbits, dogs, and other species. It may be possible to choose one or more strains that are sensitive to the proposed treatments, although for the larger species it is usually necessary to use whatever is available.

Isogenic strains (inbred strains and F1 hybrids between two such strains) of mice and rats are widely available and have many useful properties ( Beck et al. 2000 ; Festing 1999a , b ; Festing and Fisher 2000 ). They resemble immortal clones of genetically identical individuals in some respects. Tissue and organ grafts between individuals of the same isogenic strain are not immunologically rejected and therefore such strains could be of particular value for studies involving such procedures.

Isogenic strains remain genetically constant for many generations and have an international distribution, so that work involving the same strains can be replicated throughout the world. A single individual can be genotyped at loci of interest, which will serve to genotype all animals of that strain. Thus, a genetic profile of the genes present in each strain can be built up by all investigators working on that strain. The genetic authenticity of the animals can be tested using a small sample of DNA. Each strain has a unique set of characteristics, which may make a particular strain valuable for a particular type of study. Some care must be taken in interpreting results if a single inbred strain is used because it represents only a single genotype. However, the interpretation of results is also not easy when using an outbred stock because generally little is known about its genotype.

One disadvantage of inbred strains for neonatal research is that they often have a poor breeding performance, which may limit their use. When the individual neonate is the EU (in a within-litter experiment), it may be worth using inbred mothers mated to a male of a different inbred strain. The pups will then be F1 hybrids, which are vigorous and uniform. Litter size is about 30% larger than when pure isogenic strains are used. When the mother is the EU, it may be worthwhile to use F1 hybrids, which breed exceptionally well as a result of hybrid vigor ( Festing 1976 ). The sire could be either another F1 hybrid of the same strain, in which case the pups will be genetically heterogeneous F2 hybrids, or the females could be backcrossed to one of her parental strains so that the pups would be backcross individuals that, although genetically heterogeneous, are less variable than F2 hybrids.

Outbred stocks such as Sprague-Dawley or Wistar rats and Swiss mice are used widely, but the scientific case for doing to is questionable ( Festing 1999b ). Animals from different breeders will be genetically different even though they may have the same name. The genotype of any individual will be unknown, the stock is subject to genetic drift over a period of time, the actual degree of genetic heterogeneity is usually unknown, and few methods of genetic quality control are available. It is not even possible to distinguish genetically between Wistar and Sprague-Dawley rats ( Festing 1999b ). Thus, it is necessary to balance the advantage of better breeding performance against these disadvantages.

Designing the Experiment

After choosing the EU (the female, and/or litter, or individual neonate), it is necessary to determine the number and types of treatment. It may be useful to perform a small pilot study to define dose levels and clarify logistics. It may be necessary to study male and female neonates separately, in which case a factorial design including both sexes in the one experiment may be appropriate (see below, Increasing the Range of Applicability). Outcomes (characters) to be measured or counted must be decided. Where measurements are possible, they are frequently more precise than a “count” (number of positive/negative), and greater precision requires fewer EUs. Each neonate may provide several numerical observations. For example, one should give thought to methods of analyzing individual growth curves within an overall analysis. A microarray experiment may result in thousands of observations from each individual, so the method of statistical analysis of the resulting data should always be considered at this design stage.

Determining Sample Size

The usual way of estimating sample size is to use a power analysis. The success of using this tool depends on a mathematical relationship between several variables, as shown in Figure 1 . However, a serious limitation of this method is that it depends critically on the estimate of the standard deviation. This value is not available because the experiment has not yet been done, so it must be estimated from a previous experiment or from the literature. Unfortunately, because standard deviations can vary substantially between different experiments, the power calculations can provide only an indication of the appropriate size of an experiment. This should be interpreted with common sense and in relation to available facilities.

 The variables involved in a power analysis for a two-sample t-test. Usually the effect size of interest, the significance level, sidedness of the test, variablilty of the material and power are specified, which determines the required sample size. Alternatively, if the sample size is fixed due to resource limitations, the method can be used to assess power or effect size.

The variables involved in a power analysis for a two-sample t-test. Usually the effect size of interest, the significance level, sidedness of the test, variablilty of the material and power are specified, which determines the required sample size. Alternatively, if the sample size is fixed due to resource limitations, the method can be used to assess power or effect size.

It is easiest to describe the method for a character where there is a treated and control group with a measurement outcome that can be analyzed using an unpaired t-test, such as a teratology experiment with two treatment groups, treated and control. Six variables are involved. Usually the significance level and sidedness of the test are specified, (often the significance level “α” is set at 0.05 with a two-sided test) and the variability of the material (i.e., standard deviation) is taken from a previous study or the literature. When the neonate is the EU, it is necessary to estimate the standard deviation from the pooled standard deviations within litters and treatment groups. The effect size is the minimum difference in means between the two groups the investigator considers to be of biological or clinical importance. Somewhat arbitrarily, the power (i.e., chance that the study will find a statistically significant effect of the specified size) is usually set somewhere between 80 and 95%. It is then possible to estimate the required sample size.

For the calculations, a number of dedicated computer programs such as nQuery Advisor ( Elashoff 2000 ) are available. In addition, many statistical packages such as MINITAB have routines for power analysis, and there are a number of free sites on the web (e.g., http://www.biomath.info ), where one can enter data to obtain estimates of required sample sizes. In some circumstances, such as when resources are limited, the sample size may be fixed and the power analysis can then be used to estimate the power of the proposed experiment (i.e., the chance that the specified effect is likely to be detected). The calculations are similar for a binary variable (normal/abnormal) with two groups, but the specification becomes more difficult when there are several treatment groups, or when the data are not appropriate for a parametric analysis ( Dell et al. 2002 ).

An alternative method of sample size determination is the so-called “resource equation method,” which depends on the law of diminishing returns. This method is useful for small and complex biological experiments that involve several treatment groups for which the results are to be analyzed using the analysis of variance. In such a situation, it is difficult to use a power analysis. The experiment should be of an appropriate size if the error degrees of freedom in an analysis of variance are somewhere between 10 and 20 ( Festing et al. 2002 ; Mead 1988 ). This case reduces to the very simple equation:

X = N – T – B + 1,

where N is the total number of observations, T is the number of treatments, B is the number of blocks (litters for a within-litter experiment), and X should be between approximately 10 and 20.

For a within-litter experiment with three treatments, an average litter size of six, and a proposal to use five litters,

X = (6 × 5) – 3 – 5 + 1 =23.

The limits of X being between 10 and 20 can be liberally interpreted, so this proposed experiment would be of an appropriate size, although just beyond the suggested upper limit.

The experiment described in the Appendix has X = 50, which is more than twice as large as suggested by this method. A repeated analysis of the data in the Appendix using only the first three litters gives X = 23 and a p value for treatments of 0.007 compared with p = 0.001 using six litters. Thus, if the experiment had been performed with approximately half the number of animals, the conclusions would have been about the same. Compared with the power analysis, the resource equation method is somewhat crude. Nevertheless, it often seems to work in practice, particularly when relatively large treatment effects are expected.

Increasing the Range of Applicability: Factorial Designs

It is often important to know the extent to which a response to a treatment can be generalized. Is the same response found in males and females, or in different strains of animals, or with different diets? Does the presence of some drug or chemical alter response? Factorial experimental designs allow such questions to be examined without requiring any substantial increase in resources. A typical example might be to learn whether alcohol potentiates the effect of a teratogen in rats. If, for example, the basic plan was to have 20 pregnant females as controls and 20 treated with the teratogen, then the effect of alcohol might be studied by administering alcohol to half the rats in each group. There would then be four groups of 10 pregnant females with or without alcohol and with or without the teratogen. It first seems as though the group size has been reduced from two to 10 rats, but in fact the effect of the teratogen is still determined by comparing those receiving the teratogen (20 rats) and those that do not receive it (20 rats). Similarly, the effect of the alcohol is determined by comparing the 20 rats that receive it with the 20 rats that do not receive it. Finally, any potentiating effect of alcohol is determined by seeing whether the difference in fetal weight, number of abnormalities, and other factors between the teratogen-treated and -untreated rats is greater in the group receiving alcohol than in those that do not receive it.

Factorial designs can also be used for within-litter experiments. Pups could be sexed and assigned separately at random to either a control or a treated group. There would then be four groups within each litter: male and female controls and male and female treated. The experiment could then be analyzed (probably using an analysis of variance) to determine whether the pups responded to the treatment, averaging across sexes; whether the measured outcome (e.g., weaning weight) differed between males and females, averaging across treatments; and whether the response to the treatment differed between the two sexes.

Factorial designs provide a way of obtaining more information from the same scientific resources at relatively little extra cost. Any number of factors (e.g., treatments, strain, sex, diet) can be involved, and each can have any number of levels (i.e., there can be any number of dose levels within a factor). The main extra cost is the increase in the complexity of the experiment, which could lead to mistakes, and the increased complexity of the statistical analysis. Splitting groups into a number of subgroups does not lead to any substantial loss of power, provided the experiment is not too small.

Avoiding Excessive Complexity

Complex experiments may lead to mistakes and invalid conclusions. All experiments should be planned ahead, with written protocols and standard operating procedures. It is appropriate to alter experiments while they are in progress only in exceptional cases (e.g., for ethical reasons). Animal care staff should be regarded as integral and valued members of the research team. If mistakes occur, it is vital to acknowledge them, rather than covered them up, so that staff members are not made to fear that they will be in serious trouble if they make a mistake.

No experiment should be started without the investigator having a clear idea of how the results will be analyzed statistically, although it may be necessary to modify the analysis later in the light of actual results. For example, it may be necessary to transform scales and to account for missing observations. However, the statistical analysis is a basic and integral part of the experimental design. Moreover, time (i.e., avoiding delay) is important. Normally, it is important to analyze experiments as soon as they have been completed so that the results can be used in formulating future experiments (e.g., adjusting dose levels or altering the timing of observations in subsequent experiments).

The aim of the statistical analysis is to obtain summarized results that may be easily understood and that clarify the range of uncertainty in the conclusions. Access to a good statistical textbook is highly recommended. A basic assumption is that the EUs are a random sample from a population of such units (real or hypothetical), and the aim is to make inferences about the population from the sample. The accuracy of these inferences will depend mainly on the biological variability of the EUs and the sample size, assuming that the experiment has been designed well to avoid bias. Clearly, if the sample size is very small and/or the variation is large, then only rough estimates of the population characteristics will be available.

It is essential to use a good-quality statistical package. Spread sheets such as EXCEL are adequate for storing and manipulating the raw data, but they should not be used for the main statistical analysis. The output is often not standard, and it fails to provide the range of methods available in a dedicated package. For example, the statistical analysis presented in the Appendix could not be done using EXCEL. Packages such as SPSS, MINITAB, SAS, Statistika, Graphpad, GLIM, Genstat, and BMDP are readily available and have been tested thoroughly for errors. One or more are usually available on most institutional networks.

The first step in the analysis should be to screen the data for errors. Histograms and dotplots showing individual observations (e.g., as in Figure 2 in the Appendix), possibly plotted against dose levels, or plots of two outcomes likely to be correlated will often show whether there are any serious outliers. Any outliers should be individually checked against notebooks or original printouts to ensure that they are not transcription errors, and should be corrected if necessary. Outliers that appear to be valid should not be discarded at this stage. Many outcomes of measurement data, particularly concentrations of a substance, have a log-normal distribution, with most numbers being relatively low but with a few very high. If this is the case, the data can be transformed by taking logarithms or square roots of the raw observations. This step frequently removes outliers and allows parametric statistical methods—usually a t test or an analysis of variance (ANOVA 1 )—to be used in the analysis. These parametric methods depend on the assumption that the residuals (deviations of each observation from group means) have a normal distribution and the variation is approximately the same in each group.

One way to deal with one or two persistent outliers is to perform the statistical analysis with and without them. If it makes no difference to the conclusions, then they can be retained. However, if the conclusions depend entirely on one or a few outliers, and these appear to be perfectly valid data points, the results should be treated with caution. Outliers that are more than 3 standard deviations from the mean (assuming an approximately normal distribution) are automatically rejected by some authors; but again, it may be worth seeing what effect the outliers have on the overall conclusions.

When it is not possible to normalize badly skewed data using a scale transformation, and when the aim is to compare groups, it may be necessary to analyze the data using nonparametric methods such as the Mann-Whitney or Wilcoxon test. Dose response curves are normally estimated using some form of regression analysis. A numerical example illustrating the statistical analysis of a within-litter experiment using the analysis of variance is shown in the Appendix.

Scientific papers are often written in such a way as almost to observe exactly what the investigators did. In theory, sufficient information should be given so that others can repeat the studies. Unfortunately, in a surprisingly large proportion of papers, it is difficult or impossible to determine exactly how many animals were used, or how many separate experiments were involved.

Guidelines are available for the design and statistical analysis of experiments using animals (e.g., Festing and Altman 2002 ), and they include a number of suggestions for presenting results.

Label and number each experiment;

State the number of animals used in each experiment, along with the purpose of each experiment;

Identify the species, breed, and/or strain of animals complying with agreed international nomenclature rules where these are available (e.g., for rats and mice, WWW.informatics.jax.org );

Provide details of husbandry (e.g., diet and housing) to the extent allowed by the journal editor;

Describe efforts where possible to minimize pain, distress, or lasting harm to the animals;

Describe methods of statistical analysis, with references in the case of any unusual methods used;

Identify the statistical software used;

Avoid excess decimal places where means, proportions, or differences are presented;

Include measures of variation (e.g., standard deviations, standard errors, or, preferably, confidence intervals [ Altman 1991 ; Altman et al. 2000 ]);

Identify the number of observations for every mean, including those shown graphically. It is not adequate to make statements such as “the number in each group ranged from four to 10.” Where possible, tabulate means in columns for ease of comparison.

Use graphs to illustrate points that are difficult to show in tables or in the text. Where possible, show individual observations rather than means with error bars because this presentation more clearly indicates the distribution of the observations. If error bars are used, explain clearly whether they are standard deviations, standard error, or confidence intervals.

Again, the main aim in presenting the results should be to state as clearly and succinctly as possible exactly what was done and what results were obtained.

Consider the weaning weight of 59 unsexed Sprague-Dawley rats (real data), including one that died as a missing observation ( Table 1 ). When pups were 2 days old, each litter was split, and the pups were assigned at random to a control group, a “low-dose” group, or a “high-dose” group (simulated by subtracting 0.5 g from the low-dose group and 1.0 g from the high-dose group). Within each litter, to the extent possible, the same number of pups were assigned to each treatment, and pups were individually marked for subsequent identification. The sex of the pups was not recorded. The aim of the statistical analysis is to determine whether the treatments altered weaning weight, and if so to what extent. (Note: It should be a reduction of approximately 0.5 g and 1.0 g in the low and high groups, respectively.)

Data for the numerical example. The table shows weaning weight (g) of six litters of Sprague-Dawley rats assigned to three treatments: control, low, and high doses. Weights are real data, but treatments are simulated (see text).

X, missing observation due to death of animal.

(1) Mean of litter by treatment means. These means are biased (see 4, below).

(2) Mean of all animals in a treatment group, ignoring litter. These means are biased (see 4, below).

(3) Differences between least squares means give the best unbiased estimate of the treatment differences.

(4) Numbers in parenthesis show the size of the treatment effect (control mean-dose mean) estimated from these means. The least squares means give the best unbiased estimate of the size of the treatment effect.

The first step in analyzing such data is to examine it graphically to learn whether there are any obvious outliers and to obtain a visual impression of the situation (see plot in Figure 2 ). In this case, there are no obvious outliers. However, the litter effect is very obvious and clearly there is considerable variation within each litter. Although there is a tendency for the controls to weigh more than the treated groups (e.g., in litter 6), in litter 2 the lightest pup is a control.

Weaning weight by litter number and treatment for the numerical example. Note that some random variation or “jitter” has been applied on the X-axis to avoid too much overlap between points (see text for details).

Weaning weight by litter number and treatment for the numerical example. Note that some random variation or “jitter” has been applied on the X-axis to avoid too much overlap between points (see text for details).

Anyone planning to make a career in animal research is strongly advised to familiarize him- or herself with the analysis of variance as it is the most appropriate statistical method for dealing with most data arising from formal experiments like this one. A good introduction to the methods is given by Roberts and Russo (1999) , and it is also described in detail in most statistical textbooks.

The data in Table 1 can be analyzed using a two-way (treatment and litter) analysis of variance “without interaction.” A t test would be entirely inappropriate because there are more than two groups, and it is necessary to account for the litter effect. The ANOVA quantifies the variation associated with treatments, litters, and the remaining “residual” or “error” variation. It is assumed that the response is the same in each litter apart from sampling variation (hence “without interaction”). However, there is a problem with these data as they stand. The usual two-way ANOVA assumes that there are equal numbers in each treatment group within each litter. In this case, there is one missing observation in litter 2 and two extra animals in litters 5 (high-dose) and 6 (control). The data could be adjusted by discarding at random two animals from groups where there are the extras, and replacing the value for the animal that died by an appropriate value. Missing values can be worked out using formulae available in most of the older textbooks (e.g., Cochran and Cox 1957 ). In situations where there is more than one animal in a litter by treatment subgroup, as in this case, it would probably be sufficiently accurate (although not strictly correct) to replace the missing value with the mean of the rest of the animals in the group. Having a balanced design used to be almost essential because otherwise the calculations were extremely tedious. However, modern statistical packages now make it possible to do a “general linear model” ANOVA, which is capable of accommodating unequal numbers in each group, so a balanced design is no longer so essential.

A general linear model ANOVA of the data in Table 1 is shown in Table 2 . Note that whereas in the normal ANOVA there is a heading labeled “Sums of Squares” (or simply SS), in this case there are two headings “Seq SS” and “Adj SS,” with the two being slightly different for the litter effect. The ANOVA shows an F value of 7.99 and a p value of 0.001 for the treatment effect (abbreviated Trt). The “least squares means” presented in Table 2 are marginally different from the simple means and weighted means presented in Table 1 (all three types of means are shown in Table 1 ) inasmuch as they take account of the unequal group sizes.

General linear model analysis of variance of the data in Table 1

Wt, weight; SS, sums of squares; DF, degrees of freedom; Seq, sequential; Adj, adjusted; F, variance ratio (a test statistic like Student's t); Trt, treatment; SE, standard error; p, probability that a difference as large as or larger than the one observed could have arisen by chance; T-value, Student's t.

It is often necessary to use a post hoc comparison to determine which means differ from which. When the aim is to compare the means of the treatment groups with the control, Dunnett's test is appropriate (shown in Table 2 ). If the aim is to compare each mean with every other mean, it is appropriate to use other available post hoc comparisons (e.g., Tukey's test [ Roberts and Russo 1999 ]). Dunnett's test subtracts the mean of the control group from each of the other groups and then either gives a 95% confidence interval (CI 1 ) for the difference, or involves a t test to resolve whether it is different from zero. Both approaches are shown in the case. Note that the differences between the three groups are larger than the simulated treatment effect of -0.5 g and -1.0 g in the low- and high-dose groups, respectively, because the groups already differed by chance.

In this case, 95% CIs for the means should be calculated by hand. The error mean square of 5.26 is the pooled within-group variance, so the standard deviation is the square root of this value, or 2.29. Standard errors are calculated by dividing 2.29 by the square root of the number in each mean (19 in the control and low-dose group, 20 in the high-dose group). The 95% CI is estimated from the formulae given below (also shown in most statistical text books):

M- SE*t 0.05,d.f. < M < M + SE*t 0.05,d.f.,

where M is the observed mean, the SE is the standard error of the mean, and t 0.05,d.f. is the value of the Student's t for the 0.05 level of significance for the degrees of used in estimating the variance, which is 50 ( Table 2 ). The means can now be presented as follows:

Control mean = 48.1 (95% CI 47.0, 49.1);

Low-dose mean = 46.2 (95% CI 45.1, 47.2);

High-dose mean = 45.2 (95% CI 44.1, 46.2).

These confidence intervals could be used as error bars in a bar diagram.

Finally, if one performed a similar experiment, but treated whole litters rather than doing a within-litter experiment, the EU would be the litter, rather than the individual pup within the litter. To determine how many litters would be needed, assume for simplicity that there would be only a control and a high-dose group. The question can be addressed using a power analysis as described above. The standard deviation of litter means in Table 1 is 5.68 g.

If one decided that a treatment effect (difference between treated and control groups) of 4 g in mean pup weight would be of scientific interest, and the experiment should have a 90% power and a significance level of 0.05, with a two-sided t-test, then using the power calculator in MINITAB, 44 litters in each group would be required to perform this experiment. Thus, the between-litter experiment would involve a total of 88 litters and at an average of 9.7 pups per litter over 850 pups, yet would only be capable of distinguishing an effect of 4.0 g compared with a resolution of 2.9 g in the within-litter experiment involving six litters and only 59 pups. Clearly, between-litter designs should only be used in situations where there is no alternative, such as in teratology experiments.

Altman DG . 1991 . Practical Statistics for Medical Research . London : Chapman and Hall .

Altman DG Machin D Bryant TN Gardiner MJ . 2000 . Statistics with Confidence . London : BMJ Press .

Beck JA Lloyd S Hafezparast M Lennon-Pierce M Eppig JT Festing MFW Fisher EMC . 2000 . Genealogies of mouse inbred strains. Nat Genet 24 : 23 – 25 .

Google Scholar

Cochran WG Cox GM . 1957 . Experimental Designs . New York : John Wiley & Sons, Inc .

Cox DR . 1958 . Planning Experiments . New York : John Wiley & Sons .

Dell R Holleran S Ramakrishnan R . 2002 . Sample size determination. ILAR J 43 : 207 – 213 .

Elashoff JD . 2000 . nQuery Advisor Version 4 .0 User's Guide. Cork : Statistical Solutions .

Festing MFW . 1976 . Effects of marginal malnutrition on the breeding performance of inbred and F1 hybrid mice-a diallel study. In: Antikatzides T ed. The Laboratory Animal in the Study of Reproduction . Stuttgart : Gustav Fischer . p 99 – 114 .

Google Preview

Festing MFW . 1999 . Introduction to laboratory animal genetics . In: Poole T ed. The UFAW Handbook on the Care and Use of Laboratory Animals . Harlow : Longman Scientific and Technical . p 61 – 94 .

Festing MFW . 1999 . Warning: The use of genetically heterogeneous mice may seriously damage your research. Neurobiol Aging 20 : 237 – 244 .

Festing MFW . 2004 . Is the use of animals in biomedical research still necessary in 2002? Unfortunately, “yes.” Altern Anim Res 32 (S1) : 733 – 739 .

Festing MFW Altman DG . 2002 . Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR J 43 : 233 – 243 .

Festing MFW Fisher EMC . 2000 . Mighty mice. Nature 404 : 815 .

Festing MFW Overend P Gaines Das R Cortina Borja M Berdoy M . 2002 . The Design of Animal Experiments . London : Laboratory Animals Ltd .

Fisher RA . 1960 . The Design of Experiments . New York : Hafner Publishing Company, Inc .

Haseman JK Hogan MD . 1975 . Selection of the experimental unit in teratology studies. Teratology 12 : 165 – 171 .

Hunt DL Bowman D . 2004 . A parametric model for detecting hormetic effects in developmental toxicity studies. Risk Anal 24 : 65 – 72 .

Lane-Peter W Lane-Petter ME Boutwell CW . 1968 . Intensive breeding of rats. I. Crossfostering. Lab Anim 2 : 35 – 39 .

Mead R . 1988 . The Design of Experiments . Cambridge : Cambridge University Press .

Peeling AN Looker T . 1987 . Problem of standardising growth rates for animals suckled in separate litters. Growth 51 : 165 – 169 .

Raubertas RF Davis BA Bowen WH Pearson SK Watson GE . 1999 . Litter effects on caries in rats and implications for experimental design. Caries Res 33 : 164 – 169 .

Roberts MJ Russo R . 1999 . A Student's Guide to the Analysis of Variance . London : Routledge .

Wartofsky MW . 1979 . Models: Representation and the Scientific Understanding . Dordrecht : D. Reidel Publishing Company .

Yamamoto E Yanagimoto T . 1994 . Statistical methods for the beta-binomial model in teratology. Environ Health Perspect 102 (Suppl 1) : 25 – 31 .

Zorrilla EP . 1997 . Multiparous species present problems (and possibilities) to developmentalists. Dev Psychobiol 30 : 141 – 150 .

Abbreviations used in this article: ANOVA, analysis of variance; 95% CI, 95% confidence interval; EU, experimental unit.

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1930-6180
  • Print ISSN 1084-2020
  • Copyright © 2024 Institute for Laboratory Animal Research
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

How to Design Experiments in Animal Behaviour

16. Cutting-Edge Research at Trifling Cost

  • Series Article
  • Published: 04 February 2021
  • Volume 26 , pages 105–125, ( 2021 )

Cite this article

  • Raghavendra Gadagkar 1  

158 Accesses

Explore all metrics

I have had multiple aims in writing this series of articles. My primary aim has been to show how simple and innovative experiments can be performed at almost no cost, by nearly anyone, to create significant new knowledge. The history of science shows that this is true in most areas of scientific research, albeit to varying degrees. I have focussed on the field of animal behaviour both because I am more familiar with this field than others, but also because, the field of animal behaviour is especially well-suited for such low-cost research. It has also been my aim, of course, to discuss the principles of ethology (the scientific study of animal behaviour), through the medium of these experiments. My motivation in writing this series is to bring social prestige to low-cost research, make the practice of science more inclusive and democratic, and empower large numbers of people to become knowledge producers rather than merely remain knowledge consumers. The people I especially have in mind are, less-endowed sections of society, including, but not restricted to, underdeveloped countries, marginalised institutions and individuals, students, the general public, amateurs, and all those with little or no access to large research grants and sophisticated laboratory facilities, for whatever reason

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

animal research study design

Raghavendra Gadagkar

animal research study design

Ethologically Informed Design and DEEP Ethology in Theory and Practice

Suggested reading.

L W Drew, Are we losing the science of taxonomy? As need grows, numbers and training are failing to keep up, BioScience , Vol.61, pp.942–946, December 2011.

Article   Google Scholar  

E O Wilson, Half-earth: Our Planet’s Fight For Life , First edition. Liveright Publishing Corporation, a division of W.W. Norton & Company, New York, 2016.

Google Scholar  

T Saunders, Taxonomy-The neglected science of discovery, Newsroom , 22 April 2019. https://www.newsroom.co.nz/@health-science/2019/04/22/544490?slug=taxonomy-the-neglected-science-of-discovery (accessed 09 December 2020).

R Gadagkar, Ropalidia , in Encyclopedia of Social Insects , C. Starr, Ed. Springer International Publishing, Cham, pp.1–11, 2021.

R Gadagkar et al. , Insights and opportunities in insect social behavior, Current Opinion in Insect Science , Vol.34, pp.ix–xx, August 2019.

R C Lewontin, The Genetic Basis of Evolutionary Change , Columbia University Press, New York, 1974.

R Gadagkar, The Evolution of a Biologist in an Interdisciplinary Environment, in 25 Jahre Wissenschaftskolleg zu Berlin, 1981–2006. The Wissenschaftskolleg and beyond. To Joachim Nettelbeck, Secretary of the Kolleg from 1981-2012 , D Grimm and R Meyer-Kalkus, Eds. Akademie Verlag GmbH, Berlin, 2006.

N Oreskes, Why Trust Science? Princeton University Press, Princeton, NJ, USA and Oxford, UK., 2019.

Book   Google Scholar  

R Gadagkar, A Review of: “ Why Trust Science?” by Naomi Oreskes, Princeton University Press, Current Science , Vol.118, pp. 1464–1466, 2020.

S Klein, We Are All Stardust , The Experiment, LLC, New York, 2015.

V S Ramachandran and S Blakeslee, Phantoms in the Brain: Probing the Mysteries of the Human Mind , 1st ed, William Morrow, New York, 1998.

Y Martel, Life of Pi , Canongate, Edinburgh, 2012.

J D Watson, The Double Helix: A Personal Account of the Discovery of the Structure of DNA , Penguin Books Ltd., England, 1970.

K Lorenz, King Solomon’s Ring — New Light on Animal Ways , Thomas Y. Crowell Company, New York, 1952.

R Gadagkar, Science as a hobby: How and why I came to study the social life of an Indian primitively eusocial wasp, Current Science , Vol.100, pp.845–858, 2011.

R Gadagkar, Half a century of worship at “Tata’s temple of science”, Resonance: journal of science education , Vol.25, No.5, p. 727–733, 2020.

Download references

Acknowledgements

I thank my editor T N C Vidya for her encouragement, patience and sound advice throughout this series.

Author information

Authors and affiliations.

Centre for Ecological Sciences, Indian Institute of Science, Bangalore, 560 012, India

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Raghavendra Gadagkar .

Additional information

Raghavendra Gadagkar is DST Year of Science Chair Professor at the Centre for Ecological Sciences, Indian Institute of Science, Bangalore, Honorary Professor at JNCASR, and Non-Resident Permanent Fellow of the Wissenschaftskolleg (Institute for Advanced Study), Berlin. During the past 40 years he has established an active school of research in the area of animal behaviour, ecology and evolution. Understanding the origin and evolution of cooperation in animals, especially in social insects, such as ants, bees and wasps, is a major goal of his research . http://ces.iisc.ac.in/hpg/ragh . https://www.researchgate.net/profile/Raghavendra_Gadagkar

Some passages in this article are reprinted from Suggested Readings [4, 5, 15 and 16] .

Rights and permissions

Reprints and permissions

About this article

Gadagkar, R. How to Design Experiments in Animal Behaviour. Reson 26 , 105–125 (2021). https://doi.org/10.1007/s12045-020-1108-6

Download citation

Published : 04 February 2021

Issue Date : January 2021

DOI : https://doi.org/10.1007/s12045-020-1108-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Animal behaviour
  • low-cost research
  • science funding
  • grant-free research
  • democratizing science
  • diversity in science
  • Find a journal
  • Publish with us
  • Track your research
  • Research article
  • Open access
  • Published: 05 July 2018

Animal experimental research design in critical care

  • Justin S. Merkow 1 ,
  • Janine M. Hoerauf 1 ,
  • Angela F. Moss 2 ,
  • Jason Brainard 1 ,
  • Lena M. Mayes 1 ,
  • Ana Fernandez-Bustamante 1 ,
  • Susan K. Mikulich-Gilbertson 3 , 4 &
  • Karsten Bartels 1  

BMC Medical Research Methodology volume  18 , Article number:  71 ( 2018 ) Cite this article

16k Accesses

5 Citations

9 Altmetric

Metrics details

Limited translational success in critical care medicine is thought to be in part due to inadequate methodology, study design, and reporting in preclinical studies. The purpose of this study was to compare reporting of core features of experimental rigor: blinding, randomization, and power calculations in critical care medicine animal experimental research. We hypothesized that these study design characteristics were more frequently reported in 2015 versus 2005.

We performed an observational bibliometric study to grade manuscripts on blinding, randomization, and power calculations. Chi-square tests and logistic regression were used for analysis. Inter-rater agreement was assessed using kappa and Gwet’s AC1.

A total of 825 articles from seven journals were included. In 2005, power estimations were reported in 2%, randomization in 35%, and blinding in 20% ( n  = 482). In 2015, these metrics were included in 9, 47, and 36% of articles ( n  = 343). The increase in proportion for the metrics tested was statistically significant ( p  < 0.001, p  = 0.002, and p  < 0.001).

Conclusions

Only a minority of published manuscripts in critical care medicine journals reported on recommended study design steps to increase rigor. Routine justification for the presence or absence of blinding, randomization, and power calculations should be considered to better enable readers to assess potential sources of bias.

Peer Review reports

Despite a significant increase in the volume of biomedical research over the past decade, there has been limited translational success in clinical medicine [ 1 , 2 ]. Reproducibility specifically for animal research is low [ 3 , 4 , 5 ]. In attempts to address this problem, the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines as well as the revised National Institutes of Health grant application process have proposed standards for research involving animals to enhance the quality of experimental design, study conduct, and analysis of results [ 6 , 7 , 8 ]. These steps are intended to reduce bias and ultimately improve reproducibility and facilitate the translation of biomedical research to novel clinical applications that improve patient outcomes. Additionally, there is an ethical dilemma regarding animal welfare as well as financial waste related to permitting investment into research without tangible returns [ 9 ]. Specifically for the field of critical care medicine, small studies have shown that animal research methodology, study design, and reporting tends to lack rigor in several important areas [ 10 , 11 , 12 , 13 ].

Improvements in reporting of key experimental design features could enable readers to better judge sources of bias and eventually enhance validity and likelihood of translation. The objective of our study was to evaluate all critical care journals and compare reported animal experimental research in 2005 vs. 2015 regarding power analysis, randomization, and blinding procedures. Our hypothesis was that there had been increased implementation of these methods in 2015 compared to 2005. Also, we sought to provide information on the status quo of reported experimental design features to promote rigor.

We performed an observational bibliometric analysis of animal research published in critical care medicine journals using PRISMA and STROBE guidelines [ 14 , 15 ]. Journals were selected based on their inclusion on the Thomson Reuters™ Journal Citation Reports® subject category “Critical Care Medicine” [ 16 ]. A PubMed search included animal experimental studies published in 2005 and 2015. Our primary search criterion was that the article was reporting on an animal study based on an experiment. Animals were further defined as: “any of a kingdom of living things composed of many cells typically differing from plants in capacity for active movement, in rapid response to stimulation, in being unable to carry photosynthesis, and lack of cellulose cell walls” [ 17 ]. We excluded meta-analyses, case reports, historical articles, letters, review articles, and editorials. One investigator manually assessed the PubMed search results for animal experimental studies. Then, the PubMed filter “other animals” was applied to the initial search results to detect any animal experimental studies not found in the manual search. Journals that did not publish at least ten animal studies in both 2005 and 2015 were excluded from the analysis (Fig.  1 ). To assess consistency in the identification of manuscripts reporting on animal experimental research, a second investigator blinded to the results of the first investigator independently searched two journals that were randomly selected from the seven journals included in this study.

figure 1

Study flow diagram

Next, we rated all animal studies selected. A computer-generated randomization scheme was used to randomize articles by both year and journal before the analysis (Excel, Microsoft Co., Redmond, WA). Studies were analyzed using their full-text Portable Document Format (PDF). Reporting of power analysis, randomization, and blinding was then graded using a 0–3 point scale (0-not mentioned, 1-mentioned but specified as not performed, 2- performed but no details given, 3-performed and details given) [ 18 ]. To assess inter-rater agreement for criterion ratings, we randomly selected 10 % of the total articles for re-rating by a second investigator blinded to the results of the first investigator.

Statistical analysis

To address the primary hypothesis, ordinal scale rating scores were collapsed into binary (performed/not performed) variables. Chi-square tests were used to examine overall trends in reporting of quality metrics for 2005 and 2015. Simple logistic regression with time as a continuous covariate was used to estimate the effect of time on quality metrics performed and reported in published articles. The reference group was “not performed”, and odds ratios were calculated for the entire 10-year increment in time.

To assess the relationship between year of study and degree of reporting of quality metrics (as ordinal variables), the Wilcoxon Rank Sum test was used. Proportional odds models for ordinal logistic regression was used to calculate an odds ratio for the increase in reporting of metrics in 2015 compared to 2005. The proportional odds assumptions were verified by the Score Test.

Inter-rater agreement was assessed for each of the three metrics (power, randomization, and blinding) using the Cohen’s Kappa and Gwet’s AC1 [ 19 ]. Gwet’s AC1 is an alternative inter-rater reliability coefficient to Cohen’s kappa that is more stable in the presence of high prevalence and unbalanced marginal probability [ 19 , 20 ]. Inter-rater agreement for identification of animal study articles was assessed using the kappa coefficient. The level of agreement was interpreted using the scale for interpretation of Kappa [ 21 ]. The statistical analysis was done in SAS 9.4 (SAS Institute, Cary, NC). Statistical tests were performed adjusting for multiple comparisons using the Bonferroni method to maintain an overall 0.05 level of significance.

Power analysis

For the power analysis, we assumed a 12% absolute increase in reporting incidences for each of the three metrics over a 10-year interval in two independent proportions [ 18 ]. We anticipated a baseline reporting level of 5% in 2005 and a reporting level of 17% in 2015. A total of 141 studies in each year (282 total) would yield 80% power to detect an absolute difference in the proportion of metrics identified of at least 12% as significant.

For the randomization metric, we assumed a 13% absolute increase in reporting incidences for each of the three metrics over a 10-year interval in two independent proportions [ 18 ]. We anticipated a baseline reporting level of 41% in 2005 and a reporting level of 54% in 2015. A total of 307 studies in each year (614 total) would yield 80% power to detect an absolute difference in the proportion of metrics identified of at least 13% as significant.

For the blinding metric, we assumed a 21% absolute increase in reporting incidences for each of the three metrics over a 10-year interval in two independent proportions [ 18 ]. We anticipated a baseline reporting level of 26% in 2005 and a reporting level of 47% in 2015. A total of 109 studies in each year (218 total) would yield 80% power to detect an absolute difference in the proportion of metrics identified of at least 12% as significant.

All power calculations were done using G*Power, version 3.1.9.2. To maintain a 0.05 significance level across the three outcome metrics, the Bonferroni method for multiple comparisons was used to adjust the alpha to 0.017.

After excluding critical care journals that did not publish at least ten animal studies in each year, seven journals comprising 825 articles (482 in 2005, 343 in 2015) were included in the analysis. They included: American Journal of Respiratory and Critical Care Medicine, Burns, Critical Care, Critical Care Medicine, Journal of Neurotrauma, Resuscitation, and Shock. The odds of any of the three metrics being performed in 2015 were higher than in 2005. The breakdown of the changes in reporting frequencies for each journal is depicted in Fig.  2 . For power analysis, the odds were 4.52 times (1.86,11.0) higher, for randomization 1.64 times (1.16,2.31) higher, and for blinding 2.22 times (1.51,3.25) higher in 2015 compared to 2005 (Table  1 ).

figure 2

Frequencies of recommended study design feature reporting per journal. Comparison was made using Chi square test

The highest rating of “performed and details given” was present in 2005 vs. 2015 for power analysis in 2% vs. 8%, for randomization in 3% vs. 8%, and for blinding in 7% vs. 13% of manuscripts. An article published in 2015 was 3.26 (1.61,6.61) times more likely to have a higher level of reporting of power analyses than in 2005. 2015 articles were 1.67 (1.21,2.32) times more likely to have a higher level of reporting of randomization than in 2005, and the odds of a higher level of reporting of blinding was 2.10 (1.45,3.04) times greater in 2015 compared to 2005 (Table  2 ).

For the binary ratings, observed agreement between the two investigators for the 82 articles assessed was 0.95, 0.93, and 0.90 for power, randomization, and blinding respectively. Cohen’s Kappa values indicated moderate agreement for power, almost perfect agreement for randomization, and substantial agreement for blinding. Gwet’s AC1 values indicated almost perfect agreement beyond that which occurs by chance alone (Table  3 ). Observed agreement between the two investigators in identifying all articles reporting animal experimental research from two randomly selected journals for inclusion/exclusion in this study was 0.99. The kappa coefficient indicates almost perfect agreement beyond that which occurs by chance alone (0.97 (95% CI 0.94,0.99)).

The quality of research and reporting of animal studies in critical care medicine journals is an area of increased interest, especially as reproducibility and successful translation of basic science results to clinical application has been low [ 22 , 23 , 24 ]. In addition to impeding progress in the development of novel therapies, these issues also present ethical concerns [ 9 , 25 , 26 , 27 ]. In attempts to improve animal research quality, initiatives such as the ARRIVE guidelines have been created to improve the methodological rigor and to enhance translation [ 8 ]. To date, there are few studies examining the reporting of recommended experimental design feature to increase scientific rigor and reduce bias in animal experimental critical care research.

In our study, we evaluated the methodological quality of animal research in critical care journals in 2005 and 2015 and found a significant increase in the reporting of power analyses, randomization, and sample size calculations. Our hypothesis that these metrics are more commonly reported in 2015 compared to 2005 was confirmed. Introduced in 2010, the ARRIVE guidelines [ 8 ] may have been one of several factors that led to the improved reporting of recommended study design features in 2015. Our analysis using an ordinal scoring system still found the lowest rating category to be the most common one for every criterion assessed, even in 2015. Contemporary research in the field of critical care reports on recommended procedures to improve experimental design rigor only in a minority of manuscripts. This is in line with the limited published literature on this topic. Bara et al. [ 13 ], reviewed 77 animal research articles published in critical care journals over a six-month period in 2012. They found that 61% reported randomization and 6% of these reported some type of allocation concealment and only 2% reported a method of randomization.

Huet et al. [ 12 ] highlighted the importance on enhancing animal research quality including improving the use of the 3Rs (replacement, reduction, refinement), which are the guiding principles for ethical animal testing [ 28 , 29 , 30 ]. They emphasized, however, that there continues to be poor adherence to these recommendations. Festing et al. [ 3 ], emphasized the historical significance of animal research and the major contributions resulting from it: animal research has led to the advancement of immunization medicine, use of vitamins in almost eliminating diseases such as scurvy and rickets, and the discovery of insulin and its effect on metabolic diseases. Yet, they also identified a lack of adherence to good practices of research design as a major impediment to progress in medicine.

Although enhanced translation is the ultimate goal of measures to improve experimental design rigor, it remains to be determined if there has been an improvement in reproducibility or successful translation of animal experimental research results. Given the significant time lag between the description of basic science results and publication of clinical trial results, proof of a direct relationship between reported experimental design rigor and translation to novel therapies for critically ill patients will be challenging. It is also possible that some articles may not have described quality metrics that were in fact utilized in the research protocol. In addition, editors and reviewers may have recommended reporting according to the more recent ARRIVE [ 8 ] guidelines during the review process. The observed difference between 2005 and 2015 may, therefore, reflect more a change in reporting as opposed to a change in experimental practices. Of note, an innovative online tool, the “Experimental Design Assistant” was introduced in October 2015 as a guide for researchers to assist in the rigorous design of experiments [ 31 ]. However, none of the articles included in our study mentioned utilizing this resource. Further, our search strategy may not have detected all animal research articles in critical care journals in the two time periods examined. However, almost perfect agreement existed between two independent investigators in this regard. Critical care relevant research is published in other (non-critical care medicine specific) journals, and we did not include non-critical care journals in this study. Indeed, when comparing 2005 to 2015, the annual number of animal experimental manuscripts published in critical care journals decreased by 139 articles. This contrasts with findings that overall, publications in the medical literature have been increasing in the last decade [ 32 , 33 ]. Finally, publication bias was not assessed in this study. Publication bias likely has a significant impact on the quality of animal research and its ability to be translated into successful clinical trials [ 34 , 35 ].

The application and reporting of recommended quality metrics in animal experimental research published in critical care medicine journals continue to be modest. However, the increase in reported measures aimed to improve experimental design quality and reduce sources of bias in 2015 compared to 2005 is promising. Reporting of blinding, randomization, and sample size estimates should be encouraged in future animal experimental publications in critical care medicine. The routine justification for the presence or absence of these study design features should be considered in reports on animal experimental research.

Abbreviations

Animal Research: Reporting of In Vivo Experiments guidelines

National Institutes of Health

Kilkenny C, Parsons N, Kadyszewski E, et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009;4:e7824.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Atkinson G, Batterham AM, Dowdall N, Thompson A, Van Drongelen A. From animal cage to aircraft cabin: an overview of evidence translation in jet lag research. Eur J Appl Physiol. 2014;114:2459–68.

Article   PubMed   CAS   Google Scholar  

Festing MF, Nevalainen T. The design and statistical analysis of animal experiments: introduction to this issue. ILAR J. 2014;55:379–82.

Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116:116–26.

Garner JP. The significance of meaning: why do over 90% of behavioral neuroscience results fail to translate to humans, and what can we do to fix it? ILAR J. 2014;55:438–56.

Collins FS, Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014;505:612–3.

Article   PubMed   PubMed Central   Google Scholar  

Galley HF. Mice, men, and medicine. Br J Anaesth. 2010;105:396–400.

Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010;8:e1000412.

Ferdowsian HR, Beck N. Ethical and scientific considerations regarding animal testing and research. PLoS One. 2011;6:e24059.

Bara M, Joffe AR. The ethical dimension in published animal research in critical care: the public face of science. Crit Care. 2014;18:R15.

Uhlig C, Krause H, Koch T, Gama de Abreu M, Spieth PM. Anesthesia and monitoring in small laboratory mammals used in anesthesiology, respiratory and critical care research: a systematic review on the current reporting in Top-10 impact factor ranked journals. PLoS One. 2015;10:e0134205.

Huet O, de Haan JB. The ethical dimension in published animal research in critical care: the dark side of our moon. Crit Care. 2014;18:120.

Bara M, Joffe AR. The methodological quality of animal research in critical care: the public face of science. Ann Intensive Care. 2014;4:26.

von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61:344–9.

Article   PubMed   Google Scholar  

Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151:264–9. W64

Thomson Reuters, Institute for Scientific Information, National Library of Medicine (U.S.). Web of science. MEDLINE. New York: Thomson Reuters; 2016.

Merriam-Webster Inc. Merriam-Webster's collegiate dictionary. 11th ed. Springfield: Merriam-Webster, Inc.; 2003. p. 1623.

Google Scholar  

Hoerauf JM, Moss AF, Fernandez-Bustamante A, Bartels K. Study design rigor in animal-experimental research published in anesthesia journals. Anesth Analg. 2018;126:217–22.

Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61:29–48.

Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol. 2013;13:61.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

Dyson A, Singer M. Animal models of sepsis: why does preclinical efficacy fail to translate to the clinical setting? Crit Care Med. 2009;37:S30–7.

Xiong Y, Mahmood A, Chopp M. Animal models of traumatic brain injury. Nat Rev Neurosci. 2013;14:128–42.

Reynolds PS. Twenty years after: do animal trials inform clinical resuscitation research? Resuscitation. 2012;83:16–7.

Hess KR. Statistical design considerations in animal studies published recently in cancer research. Cancer Res. 2011;71:625.

Landis SC, Amara SG, Asadullah K, et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. 2012;490:187–91.

Couzin-Frankel J. When mice mislead. Science. 2013;342:922–3. 5

Blache D, Martin GB, Maloney SK. Towards ethically improved animal experimentation in the study of animal reproduction. Reprod Domest Anim. 2008;43 Suppl 2:8–14.

Schuppli CA, Fraser D, McDonald M. Expanding the three Rs to meet new challenges in humane animal experimentation. Altern Lab Anim. 2004;32:525–32.

PubMed   CAS   Google Scholar  

Leenaars M, Savenije B, Nagtegaal A, van der Vaart L, Ritskes-Hoitinga M. Assessing the search for and implementation of the Three Rs: a survey among scientists. Altern Lab Anim. 2009;37:297–303.

Percie du Sert N, Bamsey I, Bate ST, et al. The experimental design assistant. PLoS Biol. 2017;15:e2003779.

Druss BG, Marcus SC. Growth and decentralization of the medical literature: implications for evidence-based medicine. J Med Libr Assoc. 2005;93:499–501.

PubMed   PubMed Central   Google Scholar  

Larsen PO, von Ins M. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics. 2010;84:575–603.

Sena ES, van der Worp HB, Bath PM, Howells DW, Macleod MR. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. 2010;8:e1000344.

ter Riet G, Korevaar DA, Leenaars M, et al. Publication bias in laboratory animal research: a survey on magnitude, drivers, consequences and potential solutions. PLoS One. 2012;7:e43404.

Download references

This work was supported by the National Institutes of Health Award Number K23DA040923 to Karsten Bartels. In addition this work was supported by NIH Award Number UL1TR001082. The content of this report is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The NIH had no involvement in study design, collection, analysis, interpretation of data, writing of the report, or the decision to submit the article for publication.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Author information

Authors and affiliations.

Department of Anesthesiology, Medicine, and Surgery, University of Colorado, School of Medicine, Anschutz Medical Campus, 12401 E. 17th Ave., Leprino Office Building, 7th Floor, MS B-113, Aurora, CO, 80045, USA

Justin S. Merkow, Janine M. Hoerauf, Jason Brainard, Lena M. Mayes, Ana Fernandez-Bustamante & Karsten Bartels

Adult and Child Center for Health Outcomes and Delivery Science, University of Colorado, School of Medicine, Aurora, Colorado, USA

Angela F. Moss

Department of Psychiatry, University of Colorado, School of Medicine, Aurora, Colorado, USA

Susan K. Mikulich-Gilbertson

Department of Biostatistics & Informatics, University of Colorado, School of Public Health, Aurora, Colorado, USA

You can also search for this author in PubMed   Google Scholar

Contributions

JSM wrote the manuscript. JSM and JMH contributed to the conduct of the study and data collection. JSM, JMH, and KB contributed to the conduct of the study. KB and AFM contributed to the study design. AFM, SKMG, and KB contributed to data analysis. JB, LMM, AFB, and SKMG played major roles interpreting the data. All authors (JSM, JMH, AFM, JB, LMM, AFB, SKMG, and KB) critically revised the manuscript. KB secured funding for the study and performed manuscript preparation. All authors approved the final version.

Corresponding author

Correspondence to Karsten Bartels .

Ethics declarations

Ethics approval and consent to participate.

Not applicable. This was an observational bibliometric study.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Merkow, J.S., Hoerauf, J.M., Moss, A.F. et al. Animal experimental research design in critical care. BMC Med Res Methodol 18 , 71 (2018). https://doi.org/10.1186/s12874-018-0526-6

Download citation

Received : 13 December 2017

Accepted : 19 June 2018

Published : 05 July 2018

DOI : https://doi.org/10.1186/s12874-018-0526-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Critical care
  • Study design

BMC Medical Research Methodology

ISSN: 1471-2288

animal research study design

1. Study Design

For each experiment, provide brief details of study design including:

The choice of control or comparator group is dependent on the experimental objective. Negative controls are used to determine if a difference between groups is caused by the intervention (e.g. wild-type animals vs genetically modified animals, placebo vs active treatment, sham surgery vs. surgical intervention). Positive controls can be used to support the interpretation of negative results or determine if an expected effect is detectable.  It may not be necessary to include a separate control with no active treatment if, for example, the experiment aims to compare a treatment administered by different methods (e.g. intraperitoneal administration vs. oral gavage), or animals that are used as their own control in a longitudinal study. A pilot study, such as one designed to test the feasibility of a procedure might also not require a control group.

For complex study designs, a visual representation is more easily interpreted than a text description, so a timeline diagram or flow chart is recommended. Diagrams facilitate the identification of which treatments and procedures were applied to specific animals or groups of animals, and at what point in the study these were performed. They also help to communicate complex design features such as whether factors are crossed or nested (hierarchical/multi-level designs), blocking (to reduce unwanted variation, see  item 4 – Randomisation ), or repeated measurements over time on the same experimental unit (repeated measures designs), see  [1-3]  for more information on different design types. The Experimental Design Assistant (EDA) is a platform to support researchers in the design of  in vivo  experiments, it can be used to generate diagrams to represent any type of experimental design  [4] .

For each experiment performed, clearly report all groups used. Selectively excluding some experimental groups (for example because the data are inconsistent, or conflict with the narrative of the paper) is misleading and should be avoided  [5] . Ensure that test groups, comparators and controls (negative or positive) can be identified easily. State clearly if the same control group was used for multiple experiments, or if no control group was used. 

  • Festing MF and Altman DG (2002). Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR journal . http://www.ncbi.nlm.nih.gov/pubmed/12391400
  • Bate ST and Clark RA (2014). The design and statistical analysis of animal experiments. Cambridge University Press. https://www.cambridge.org/core/books/design-and-statistical-analysis-of-animal-experiments/BDD758F3C49CF5BEB160A9C54ED48706
  • Ruxton G and Colegrave N (2017). Experimental design for the life sciences. Fourth Edition. Oxford University Press. https://global.oup.com/academic/product/experimental-design-for-the-life-sciences-9780198717355?cc=us&lang=en&
  • Percie du Sert N, Bamsey I, Bate ST, Berdoy M, Clark RA, Cuthill I, Fry D, Karp NA, Macleod M, Moon L, Stanford SC and Lings B (2017). The Experimental Design Assistant. PLoS Biol . doi: 10.1371/journal.pbio.2003779
  • The BMJ Scientific misconduct. (Access Date: 10 january 2020). Available at: https://www.bmj.com/about-bmj/resources-authors/forms-policies-and-checklists/scientific-misconduct

Example 1 

“The DAV1 study is a one-way, two-period crossover trial with 16 piglets receiving amoxicillin and placebo at period 1 and only amoxicillin at period 2. Amoxicillin was administered orally with a single dose of 30 mg.kg -1 . Plasma amoxicillin concentrations were collected at same sampling times at each period: 0.5, 1, 1.5, 2, 4, 6, 8, 10 and 12 h..”  [1]

Example 2  

“Example of a study plan created using the Experimental Design Assistant showing a simple comparative study for the effect of two drugs on the metastatic spread of two different cancer cell lines. Block randomisation has been used to create 3 groups containing an equal number of zebrafish embryos injected with either cell line, and each group will be treated with a different drug treatment (including vehicle control). Each measurement outcome will be analysed by 2-way ANOVA to determine the effect of drug treatment on growth, survival and invasion of each cancer cell line.”  [2]

  • Nguyen TT, Bazzoli C and Mentre F (2012). Design evaluation and optimisation in crossover pharmacokinetic studies analysed by nonlinear mixed effects models. Statistics in medicine . doi: 10.1002/sim.4390
  • Hill D, Chen L, Snaar-Jagalska E and Chaudhry B (2018). Embryonic zebrafish xenograft assay of human cancer metastasis [version 2; referees: 2 approved]. F1000Research . doi: 10.12688/f1000research.16659.2

Within a design, biological and technical factors will often be organised hierarchically, such as cells within animals and mitochondria within cells, or cages within rooms and animals within cages. Such hierarchies can make determining the sample size difficult (is it the number of animals, cells or mitochondria?). The sample size is the number of experimental units per group. The experimental unit is defined as the biological entity subjected to an intervention independently of all other units, such that it is possible to assign any two experimental units to different treatment groups. It is also sometimes called the unit of randomisation. In addition, the experimental units should not influence each other on the outcomes that are measured.

Commonly, the experimental unit is the individual animal, each independently allocated to a treatment group (e.g. a drug administered by injection). However, the experimental unit may be the cage or the litter (e.g. a diet administered to a whole cage, or a treatment administered to a dam and investigated in her pups), or it could be part of the animal (e.g. different drug treatments applied topically to distinct body regions of the same animal). Animals may also serve as their own controls receiving different treatments separated by washout periods; here the experimental unit is an animal for a period of time. There may also be multiple experimental units in a single experiment, such as when a treatment is given to a pregnant dam and then the weaned pups are allocated to different diets  [1] . See  [2-4]  for further guidance on identifying experimental units.  

Conflating experimental units with subsamples or repeated measurements can lead to artificial inflation of the sample size. For example, measurements from 50 individual cells from a single mouse represent  n  = 1 when the experimental unit is the mouse. The 50 measurements are subsamples and provide an estimate of measurement error so should be averaged or used in a nested analysis. Reporting n = 50 in this case is an example of pseudoreplication [5] . It underestimates the true variability in a study, which can lead to false positives and invalidate the analysis and resulting conclusions  [5,6] . If, however, each cell taken from the mouse is then randomly allocated to different treatments and assessed individually, the cell might be regarded as the experimental unit.

Clearly indicate the experimental unit for each experiment so that the sample sizes and statistical analyses can be properly evaluated.

  • Burdge GC, Lillycrop KA, Jackson AA, Gluckman PD and Hanson MA (2008). The nature of the growth pattern and of the metabolic response to fasting in the rat are dependent upon the dietary protein and folic acid intakes of their pregnant dams and post-weaning fat consumption. Br J Nutr . doi: 10.1017/S0007114507815819
  • Bate ST and Clark RA (2014). The design and statistical analysis of animal experiments. Cambridge University Press.  https://www.cambridge.org/core/books/design-and-statistical-analysis-of-animal-experiments/BDD758F3C49CF5BEB160A9C54ED48706
  • Lazic SE, Clarke-Williams CJ and Munafò MR (2018). What exactly is ‘N’ in cell culture and animal experiments? PLOS Biology . doi: 10.1371/journal.pbio.2005282
  • NC3Rs Experimental unit. (Access Date: 21/03/2019). Available at: https://eda.nc3rs.org.uk/experimental-design-unit
  •  Lazic SE (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience . doi: 10.1186/1471-2202-11-5
  • Hurlbert SH (1984). Pseudoreplication and the design of ecological field experiments. Ecological Monographs . doi: 10.2307/1942661

  “The present study used the tissues collected at E15.5 from dams fed the 1X choline and 4X choline diets (n = 3 dams per group, per  fetal  sex; total n = 12 dams). To ensure statistical independence, only one placenta (either male or female) from each dam was used for each experiment. Each placenta, therefore,  was considered to be  an experimental unit.”  [1]  

Example 2 

“We have used data collected from high-throughput phenotyping, which is based on a pipeline concept where a mouse is characterized by a series of standardized and validated tests underpinned by standard operating procedures (SOPs)…The individual mouse was considered the experimental unit within the studies.”  [2]  

Example 3  

“Fish were divided in two groups according to weight (0.7-1.2 g and 1.3-1.7 g) and randomly stocked (at a density of 15 fish per experimental unit) in 24 plastic tanks holding 60 L of water.”  [3]  

Example 4 

“In the study, n refers to number of animals, with five acquisitions from each [ corticostriatal ] slice, with a maximum of three slices obtained from each experimental animal used for each protocol (six animals each group).”  [4]

  • Kwan S, King J, Grenier J, Yan J, Jiang X, Roberson M and Caudill M (2018). Maternal Choline Supplementation during Normal Murine Pregnancy Alters the Placental Epigenome: Results of an Exploratory Study. Nutrients . doi: 10.3390/nu10040417
  • Karp NA, Mason J, Beaudet AL, Benjamini Y, Bower L, Braun RE, Brown SDM, Chesler EJ, Dickinson ME, Flenniken AM, Fuchs H, Angelis MHd, Gao X, Guo S, Greenaway S, Heller R, Herault Y, Justice MJ, Kurbatova N, Lelliott CJ, Lloyd KCK, Mallon A-M, Mank JE, Masuya H, McKerlie C, Meehan TF, Mott RF, Murray SA, Parkinson H, Ramirez-Solis R, et al. (2017). Prevalence of sexual dimorphism in mammalian phenotypic traits. Nature communications . doi: 10.1038/ncomms15475
  • Ribeiro FdAS, Vasquez LA, Fernandes JBK and Sakomura NK (2012). Feeding level and frequency for freshwater angelfish. Revista Brasileira de Zootecnia . doi: 10.1590/S1516-35982012000600033  
  • Grasselli G, Rossi S, Musella A, Gentile A, Loizzo S, Muzio L, Di Sanza C, Errico F, Musumeci G, Haji N, Fresegna D, Sepman H, De Chiara V, Furlan R, Martino G, Usiello A, Mandolesi G and Centonze D (2013). Abnormal NMDA receptor function exacerbates experimental autoimmune encephalomyelitis. Br J Pharmacol . doi: 10.1111/j.1476-5381.2012.02178.x

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

The study design elements employed by researchers in preclinical animal experiments from two research domains and implications for automation of systematic reviews

Annette m. o'connor.

1 Department of Veterinary Diagnostic and Production Animal Medicine, College of Veterinary Medicine, Iowa State University, Ames, Iowa, United States of America

Sarah C. Totton

2 Independent researcher, Guelph, ON, Canada

Jonah N. Cullen

Mahmood ramezani.

3 Industrial and Manufacturing Systems Engineering, College of Engineering, Iowa State University, Ames, Iowa, United States of America

Vijay Kalivarapu

4 Virtual Reality Applications Center, Iowa State University, Ames, Iowa, United States of America

Chaohui Yuan

5 Department of Statistics, Iowa State University, Ames, Iowa, United States of America

Stephen B. Gilbert

Associated data.

The file of design elements and supporting text and the reference list will be available at the ISU digital depository at https://doi.org/10.25380/iastate.6531224.v1 .

Systematic reviews are increasingly using data from preclinical animal experiments in evidence networks. Further, there are ever-increasing efforts to automate aspects of the systematic review process. When assessing systematic bias and unit-of-analysis errors in preclinical experiments, it is critical to understand the study design elements employed by investigators. Such information can also inform prioritization of automation efforts that allow the identification of the most common issues. The aim of this study was to identify the design elements used by investigators in preclinical research in order to inform unique aspects of assessment of bias and error in preclinical research. Using 100 preclinical experiments each related to brain trauma and toxicology, we assessed design elements described by the investigators. We evaluated Methods and Materials sections of reports for descriptions of the following design elements: 1) use of comparison group, 2) unit of allocation of the interventions to study units, 3) arrangement of factors, 4) method of factor allocation to study units, 5) concealment of the factors during allocation and outcome assessment, 6) independence of study units, and 7) nature of factors. Many investigators reported using design elements that suggested the potential for unit-of-analysis errors, i.e., descriptions of repeated measurements of the outcome (94/200) and descriptions of potential for pseudo-replication (99/200). Use of complex factor arrangements was common, with 112 experiments using some form of factorial design (complete, incomplete or split-plot-like). In the toxicology dataset, 20 of the 100 experiments appeared to use a split-plot-like design, although no investigators used this term. The common use of repeated measures and factorial designs means understanding bias and error in preclinical experimental design might require greater expertise than simple parallel designs. Similarly, use of complex factor arrangements creates novel challenges for accurate automation of data extraction and bias and error assessment in preclinical experiments.

Introduction

Systematic reviews are increasingly incorporating data from preclinical animal experiments [ 1 – 5 ]. Accurate and efficient interpretation of the study design used in such experiments is an important component of that process, because a unique aspect of systematic reviews is the assessment of bias and errors in the study design, in addition to extraction of the effect sizes and effect size precision. Here we refer to "study design" as the procedural outline for conducting an investigation. Therefore, a study design is comprised of multiple "design elements," which include use (or not) of randomization, use (or not) of blinding, how often the outcome is measured, the type of control group used, and how the experimental factors are arranged [ 6 ]. To assess bias and errors and extract the study results, it is critical that the reviewers understand the study design and know which elements are reported. For a systematic reviewer, a study described as an " individually randomized , 3 by 2 factorial design blocked by sex , with repeated measures and blinded outcome assessment " immediately reveals the design element options employed by the investigators. It also conveys that the investigators used design element options that relate to risk of systematic biases (randomized and blinded) and that have the potential to create unit-of-analysis errors (repeated measures). A unit-of-analysis error occurs when the unit of allocation of the intervention is different from the unit used in the statistical analysis. Further, this description of the study ensures that the reviewer knows the results will likely contain an assessment of two main effects and an interaction (factorial design).

Assessment of the study design is a very labor- and time-intensive process, as it requires considerable time and expertise to recognize specific design elements such as split-plot designs. Automated recognition of design elements would considerably speed up this aspect of systematic reviews. However, effective systematic review automation might requires knowledge of which design elements are commonly employed, as such information will enable prioritization of targets for automation efforts.

Although many studies have described the frequency with which randomization and blinding are reported by investigators in preclinical experiments [ 7 ], our focus was to extend to the description of less commonly assessed design elements, particularly those that relate to replicates and the arrangement of study factors. Our rationale for selecting this focus is that these elements are under studied yet important design elements that impact study validity and accurate extraction of study results [ 8 – 12 ].

Our long-term goal is to develop automated tools for the recognition of design elements in research publications, as recognition of important study design elements requires considerable expertise, and automated classification of design elements will enable more accurate, rapid, and cost-effective risk-of-bias and error assessment and extraction of study results. Working towards that longer-term goal, the objective of this study was to identify and assess the frequency of design elements in preclinical animal experiments. Such information will be needed so that automation methods can focus on identifying the most commonly employed design elements and therefore maximize value to reviewers.

Materials and methods

This study is an observational survey using manuscripts describing preclinical animal experiments from systematic reviews in two broad topic areas: brain trauma/stroke and toxicology.

Data sources

Manuscripts included described primary research of a single comparative animal experiment (published in English). Only in vivo studies were eligible. If an eligible manuscript also contained an in vitro or ex vivo intervention element, the manuscript as a whole was excluded. The single-study criterion was necessary for a companion project using the same set of studies. The datasets for each topic area contained 100 manuscripts. One dataset was obtained from the CAMARADES (Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies) group and described animal models of stroke/brain trauma. The second dataset was obtained from the citation lists of four systematic reviews that evaluated animal models for toxicology. Further details of how the corpus was obtained are provided in S1 Text .

Eligible studies

Initial screening of manuscripts for the corpus was conducted using the online systematic review software DistillerSR® (Ottawa, ON, Canada, https://www.evidencepartners.com/ ). Initial eligibility assessment was performed based on the abstract, keywords, introduction, and the materials and methods sections. Studies were eligible for assessment if published in English (the full text, not just the abstract), if they were primary research of a comparative intervention or assessment of brain trauma/stroke outcomes in non-primate mammals, consisting of only one experiment, and assessing only interventions applied to the whole animal (i.e., no in vitro or ex vivo level interventions).

Two independent reviewers (JC and ST) with backgrounds in study design pilot-tested the initial screening (eligibility) form on 30 studies. Subsequent to the pilot-testing, only one reviewer (JC or ST) was required to determine study eligibility.

After eligibility assessment, 100 references, out of the 213 eligible studies in the CAMARADES dataset, were selected using a random number sequence generator ( https://www.random.org/sequences ). The rationale for the sample size of 100 studies was to enable 95% confidence of the ability to identify design elements present in at least 5% of manuscripts assuming 100% sensitivity and 100% specificity of detection ( http://epitools.ausvet.com.au/content.php?page=Freedom ), which in the absence of prior data seemed a pragmatic goal for detection of design elements. It was decided by the 1 st author that if a design element occurred in fewer than 5% of the papers, then it was rare enough to ignore for this report. To extract the data, a PDF annotation tool (AFLEX interface) was developed which enabled pre-specified design elements to be tagged/associated with specific text within the full-text PDF [ 13 ]. This web-based tool allows the user to upload a PDF, highlight passages of text in them, and tag those passages with the design elements. E.g., a user might select a sentence that provides evidence for the unit of analysis being the group and the arrangement of factors being parallel. That highlighted sentence can then be tagged with Group and Parallel. After tagging, the tool allows easy review of the evidentiary sentences for that article, or a review of all Group sentences across all tagged articles, etc. Currently the tool is being used internally by the authors, but it could become available for public use.

The design element assessment extraction form was pilot-tested by two independent reviewers (JC and ST). After pilot-testing, each study was assessed and extracted by the two independent reviewers (JC and ST). To identify and resolve conflicts about design elements and supporting text, an RStudio-based Shiny [ 14 , 15 ] web interface was developed, which identified where design elements and text were not the same for both reviewers. Following conflict resolution or adjudication by a third reviewer (AOC), any necessary changes to the final dataset were made.

Identification of design elements used and supporting text collection process

The design elements sought were selected based on previous experience with identifying and extracting study design elements and consisted of a comprehensive suite of elements relevant to comparative preclinical animal experiments. As part of assessing whether the list was comprehensive, several risk-of-bias tools proposed for animal experiments were reviewed to determine which design elements would relate to systematic bias and unit-of-analysis errors [ 7 , 16 , 17 ].

The selected design elements are: 1) comparison group, 2) unit of allocation of the interventions to study units, 3) arrangement of factors, 4) method of factor allocation to study units, 5) concealment of the factors during allocation and outcome assessment, 6) independence of study units, and 7) nature of factors. For each design element, there are options that investigators might employ. For example, for the design element "arrangement of factors" investigators can choose from a parallel arrangement of factors, a single-level factorial arrangement, a split-plot-like factorial arrangement, or a cross-over arrangement. The suite of design elements and options are described in Table 1 . The suite of design elements and their associated validity and bias domains can be seen in S1 Table .

The methods section of each manuscript was searched for text that indicated the design elements described by the investigators. If identified, the option employed by the investigators and a text description of the option were extracted using de novo software ( Table 1 ). In addition, text in the title, abstract, introduction, and the materials and methods section were surveyed for any investigator-identified study design label and, if present, this information was extracted.

Certain design elements must be present in an experimental study; for example, all studies must identify a unit of allocation, an arrangement of factors, and a method of allocation of factors to study units. When it was not possible to discern the options used based on the investigators' description, these design elements were referred to as "unclear". Other design elements are optional, such as concealment of factors during allocation or outcome assessment, repeated measures, or the use of pseudo-replication. If no text was found to describe these elements, this was coded as having "no discernable description (NDD)" for that design element.

To ensure a consistent approach to element and option recognition, the following standards were employed:

  • In order to determine whether a control group was concurrent, text was selected that described the division or allocation of the study population into groups.
  • In order to determine whether the unit of allocation was at the individual level, we required the investigators to provide either a dosage (e.g., mg/kg) or a route of administration (e.g., intravenous, intraperitoneal) that could only be delivered individually. Simply providing a concentration of the intervention in the water or food was not sufficient for the reviewers to determine the unit of allocation, unless the authors also explicitly described the housing as individual.
  • We differentiated language that suggested pseudo-replication from language that suggested repeated measurement of outcomes, although these approaches both refer to replicates [ 9 ]. Pseudo-replication refers to multiple measures of an outcome designed to capture random experimental noise, i.e., multiple pups within a litter when the dam had been allocated to treatment or multiple tissue sections within an animal. Repeated measurement refers to multiple outcome measurements when a factor of interest varies, such as time or decibel level. Descriptions of measures that were unlikely to be related to the outcome were not extracted, as such information did not relate to the extracted results. For example, repeatedly measuring body temperature while the animal was under anesthesia was to ensure animal health and was therefore unlikely to be reported in the results. Two approaches to recognition of repeated measures were used: 1) if the investigators described a process of repeated measurements of outcomes on a study unit, and 2) if the statistical methods described an approach to control for repeated measures, such as "a repeated measures ANOVA was conducted".
  • For the arrangement of factors, when the factors were assigned to the same level of animal and the interaction between multiple factors was of interest, this was considered a single-level factorial design . A factorial design was considered complete when every possible combination of factors was represented by an arm of the design [ 11 ].

A common feature of preclinical studies is a "sham" arm, which is often included for the purposes of quality control. This "sham" arm is often paired with a factorial design, and as a consequence, could be mistaken for part of an incomplete factorial design. The difference between these designs is based on the nature of the "single" arm. A sham arm consists of animals that received neither an intervention, nor a challenge (where "challenge" was induction of stroke in the CAMARADES dataset). The sham arm is a quality control feature of the study, rather than having an outcome that is truly of interest. Data from animals in the sham arm may function as a baseline for the outcomes from control groups (which received the challenge) and treatment groups (which received both the challenge and an intervention.

A split-plot-like arrangement referred to a factorial arrangement where one factor is nested within the other.

If the arrangement of factors could not be deciphered based on the investigators' text, the portion of the text describing the overall organization of the factors was extracted and labeled "unclear" as the design element.

To describe the findings, we calculated the frequencies of design elements and options for the selected studies.

Study characteristics

Investigator-identified study design.

No investigator reported a specific study design name such as "2 by 2 factorial" in any of the 100 studies extracted from the CAMARADES (stroke/brain trauma) dataset. Only seven studies from the toxicology dataset contained an investigator-identified study design. All seven of these studies were described by the investigators as factorial designs. Interestingly, two of these seven studies appeared to be split-plot-like designs based on the investigators' description of the arrangement of factors. Of course, split-plot is a unique sub-group of factorial design; therefore, the description of these studies as factorial is technically correct. However, the use of the term "split-plot-like design" is preferable, as it would alert reviewers more quickly to the potential for unit-of-analysis errors in the manuscripts.

Frequency of study design elements and options

Table 2 shows the frequency of reporting of design elements in the two datasets.

* NDD = no discernable description: neither reviewer/reader was able to find text that described this element.

One of the most important findings is that, despite the absence of specific design labels, the reviewers were almost always able to confidently determine the arrangement of factors used by the investigators. This means that this information about the design element is not missing, as is often the case for other important design elements such as randomization or blinding. Authors appear to not routinely use regular expressions such as "2 by 2-factorial design" or "split-plot design" and instead describe these elements using more complex language forms than might be expected.

Another important finding is that more variation in the unit of allocation was observed in the toxicology dataset than in the brain trauma/stroke dataset. The toxicology dataset included more nested, group, and unclear allocations. The factors studied in our particular toxicology dataset tended to be those conducive to application to the food or water and if animals were group-housed, it was probably more expedient for the investigators to allocate these factors at the group level by adding them to the food or water of group-caged animals. In the brain trauma/stroke dataset, the interventions of interest were usually those that could only be applied at the individual level (e.g. injectable drugs) and cross-generational effects of the intervention were not of interest to the investigators. By contrast, investigators in our toxicology dataset studies were often interested in cross-generational effects of the toxins of interest, hence we found that the factors were often applied to pregnant dams and their offspring (nested allocation).

Similarly, more variation was observed in the arrangement of factors in the toxicology studies compared to the brain trauma/stroke studies. Important for unit-of-analysis errors, 20% of the toxicology studies used language that suggested a split-plot-like arrangement of factors of interest, although as previously noted, no investigator used the term "split-plot". As with group-level allocation, the use of a split-plot-like arrangement of factors (one or more subplots nested within a whole plot) suggests that unit-of-analysis errors could occur. Reviewers would benefit from being alerted to this potential, as it enables them to verify that the study correctly adjusted for the whole-plot error term in the statistical analysis [ 12 ].

With respect to allocation to treatment group, not surprisingly, randomization was the only reported method of allocation. The studies not indicating randomization did not report which method was used to allocate the interventions to the study units. Similarly, blinding of allocation and outcome assessment were rarely described in preclinical studies.

Language that suggests the potential for unit-of-analysis concerns as a result of pseudo-replication and repeated measures was common in both datasets. Almost 50% of studies used language that described pseudo-replication and/or repeated measures [ 8 , 9 ]. Our goal with this study was not to determine whether the investigators addressed these concerns when conducting their analysis. However, it is relevant to note that sometimes, though not always, the investigators' description of the element also indicates that the unit-of-analysis errors concern was addressed. This has implications for efficient text extraction and bias or error assessment. For example, in the toxicology dataset, 53 manuscripts used language that suggested repeated measures, such as in the following text:

"Offspring were weighed at 7 day intervals and food intake over 24 hours was measured at 25 day intervals."[ 18 ]

However, only 26 of those 53 studies also provided language in the methods and materials section that suggested that this unit-of-analysis concern had been addressed. For example:

"The repeated measures ANOVA was used for the acquisition phase of the MWM and rMWM (with the repeated measure: trial block), followed by a Bonferroni post hoc to analyze possible interactions between trial block, genotype and/or diet." (emphasis added) [ 19 ]

In the CAMARADES (brain trauma/stroke) dataset, 28 of the 40 studies that used language suggesting repeated measures did not also include language that indicated this had been addressed in the statistical analysis. Similar results were found for pseudo-replication; for the toxicology dataset, 46 studies used language that suggested pseudo-replication, but 34 (74%) of these studies did not clearly indicate how this was addressed analytically. For the CAMARADES dataset, in 29 of 44 (66%) studies the investigators' description of pseudo-replication did not also contain evidence of the solution. An example of language the reviewers considered to suggest the issue and the resolution is:

"The digital reading (in Newtons) of three successive trials were obtained for each mouse, averaged and used for data analysis." [ 20 ]

Also of interest was the finding that many studies, especially in the toxicology dataset, included factors of interest that could not be randomized. This was most seen for factors related to factors of genotype or sex, for example:

"In order to determine the contribution of both genetic TXNIP-deletion (TKO) and the pharmacologic TXNIP inhibition with RES on outcome/recover after embolic middle cerebral artery occlusion (eMCAO) stroke, the total 64 mice (WT and TKO) were separated into following groups: WT mice subjected to sham operated control + vehicle treatment group I (sham only); WT mice subjected to eMCAO + vehicle treatment group II (WTeMCAO only); WT mice subjected to eMCAO + RES (5mg/kg) treatment group III (WTeMCAO + RES only) and TKO mice subjected to eMCAO group + vehicle treatment IV (TKO-eMCAO only)." [ 20 ]

Our interpretation of this design is that genotype was a factor of interest, but animals could not be randomized to genotype in the true sense. This has implications for automated risk-of-bias assessment, as it is not possible to assume that all factors studied in preclinical experiments can be randomized to group.

The data suggest that investigators report the use of a variety of design elements in preclinical studies. To date, much of the focus on comprehensive reporting in biomedical research has been on the design elements that relate to selection bias and detection bias. The design element "allocation to group" is related to selection bias, and incorporation of blinded outcome assessment relates to detection bias [ 1 , 6 , 17 , 21 – 25 ]. This focus is likely a function of three factors. First, in the literature on human studies the reporting of these design elements has been evaluated for years and continues to be the focus of many studies; second, there is empirical evidence of an association between reporting of these elements and the effect size of intervention studies [ 26 – 31 ]. Finally, the assessment of these factors does not require advanced understanding of study design because authors use typical expressions or keywords more commonly to describe the options for these design elements, i.e., randomization and blinding, and therefore the task of assessment of reporting is relatively simple.

Less focus has been applied to the reporting of elements that may impact the potential for unit-of-analysis errors. Interestingly, our data suggest that such elements are actually quite common in preclinical studies. For example, in the two datasets we evaluated, almost 50% of investigators opted to include a design element that suggested the potential for repeated measures or pseudo-replication, and 20% of the studies in the toxicology dataset described split-plot-like designs. Regrettably, we could not identify other reviews that evaluated design elements associated with potential unit-of-analysis errors in other sets of preclinical studies or human studies. One report of preclinical researchers did study investigator awareness of bias and error avoiding design elements and included the option of independent observations. Surprisingly, many investigators identified independent observations as an approach to avoiding attrition bias (~40%), performance bias (~50%), selective reporting (~30%), detection bias (~50%), publication bias (~35%), and selection bias (~38%) [ 32 ]. While independent observations are important, they are not related to any of these sources of bias. The survey did not ask questions about avoiding unit-of-analysis errors.

The findings also illustrated the complexity of designs that include multiple elements. For example, some reviewers might assume that all split-plot designs use a nested allocation; however, this is not the case, for several reasons. To illustrate, the text below describes a split-plot design with allocation of the diet to dams (whole plot) and then the sex of the pup is identified as a sub-plot factor.

"Once bred, pregnant dams (n = 6/group) were fed one of four diets; (1) control diet, (2) high fat (HF) diet, (3) control + methyl donor supplementation (Control + Met) and (4) high fat + methyl donor supplementation (HF + Met). …One animal per litter was used in individual experiments, to control for any litter effect. …Male and female offspring were followed longitudinally and tested at the following time points (1) 12 and 20 weeks of age-metabolic assessments, (2) 40 weeks of age- fat and sucrose preference test, and (3) 50 weeks of age-brain collection for gene expression and methylation assays." [ 33 ]

To understand the potential sources of bias and error in this design, substantial knowledge about the design and thorough interpretation of information is needed. First, the investigator cannot randomize the sub-plot factor (sex) as it is a characteristic of the animal. Therefore, the "nested" allocation, which may be considered the default for a split-plot design, is not appropriate in this study. Only the whole-plot factor (diet) can be "allocated" to the dam. Therefore, it is only relevant to assess the risk of bias due to allocation at the whole plot not the sub-plot level because the nature of the factor (sex) means it cannot be randomized. Further, diet is a factor that could be allocated at the individual or group level, and the investigators did not specify the unit of allocation. As a result, the description above might suggest the potential for pseudo-replication at the whole plot level if all the dams from one group where housed in the same cage and this correlation was not addressed in the design. This example illustrates why it is necessary to evaluate all design elements to fully understand the potential for systematic bias or unit-of-analysis errors estimation in preclinical studies.

We also found that studies in the two datasets commonly used complex arrangements of factors. In the CAMARADES (brain trauma/stroke) dataset, 40% of the 100 studies utilized some form of factorial design, and in the toxicology dataset, more than 75% of the 100 studies used some form of factorial design. Further, 25% of the studies in the toxicology dataset where spit-plot-like. Given that factorial designs often have interactions between main effects, reviewers and automated methods extracting data from preclinical studies will need to understand how to appropriately extract effect sizes and variance estimates from results with and without significant interactions. As a first step to assessing unit-of-analysis errors, reviewers and automated methods would need to be able to recognize a split-plot design so that the validity of the approach to analysis could be assessed. We have not previously seen the frequency of factor arrangement types assessed in preclinical animal experiments or human clinical trials. Our impression is that parallel and cross-over designs may predominate in human studies. For example, a search of trial titles for intervention studies submitted to Clinical Trials.Gov ( https://clinicaltrials.gov ) identified only 142 studies that used "factorial" in the title, yet 8146 titles included "parallel" in the title and 6100 used the term "cross-over".

A limitation of this study is that it is based on only two topic areas of preclinical studies with a relatively small subsample of 200 studies. The reason for this limited number relates to resources, i.e., it takes considerable expertise and time to identify all important design elements in a manuscript. This limitation re-enforces our original motivation–that design elements beyond randomization and blinding are also important for understanding study design and currently few authors clearly provide this information.

We would propose that three groups could use the findings here. Although authors do write about the design in a manner than enables experienced researchers to recognize the design elements, authors could better help others understand the design by using more key-terms for design elements. For example, describing a design as a 2 by 2 factorial design or that it contains a repeated measure elements improves the translation of research finings to end users. This does however require that authors are explicitly aware of the design elements employed and the appropriate terminology. Peer-reviews and editors could also encourage the use of common key-terms for design elements. For end users, in particular systematic reviewers, the information provided suggests that they should not currently rely upon authors to use key-terms to identify design elements, especially those with the potential to impact unit-of-analysis errors. Instead systematic reviewers in pre-clinical health should be aware that the features can be common, and should be considered when seek to extract valid estimates of effect size and precisions for use in systematic reviews.

Conclusions

This study documents that investigators of primary research in preclinical animal experiments employ many design elements. We find it particularly interesting that many of these design elements could relate to unit-of-analysis errors (nested allocation, group allocation, split-plot-like designs, pseudo-replication, and repeated measures). However, the potential for unit-of-analysis error is rarely discussed or included in risk-of-bias assessments in preclinical animal experiments in systematic reviews. It is rare for investigators in this area of research to specifically name the study design used. Reporting of allocation concealment is also rare. The toxicology dataset described more nested, group, and unclear allocations, indicating that reviewers in this topic area need to be particularly careful when reading these studies to understand whether unit-of-analysis errors suggested by the design are properly addressed in the statistical analysis.

Supporting information

Acknowledgments.

Thank you to the CAMARADES group led by Malcolm R. Macleod for providing the citations for studies included in the CAMRADES database.

Funding Statement

This research was funded by the Iowa State University Presidential Initiative for Data Driven Science (PIDDS). The funding was internal so there is no grant number. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

IMAGES

  1. Study designs that are used in animal and veterinary science

    animal research study design

  2. The 6 New Principles of Animal Research

    animal research study design

  3. Animal study design. Schematic representation of treatment groups and

    animal research study design

  4. | Animal study design.

    animal research study design

  5. Schematic representation of experimental animal study design

    animal research study design

  6. Design of animal studies.

    animal research study design

VIDEO

  1. 1- Introduction to Clinical Pharmacy Practice Course

  2. Understanding Animal Research Live Stream

  3. Antibiotic-Free Poultry Research: Optimize Product Understanding Through Thoughtful Design

  4. Clinical Study Design Part 1

  5. Types of Data Part 2 : Biostatistics Course

  6. Types of Data Part 1 : Biostatistics Course

COMMENTS

  1. Guidelines for experimental design and statistical analyses in animal

    Among these, Duncan's test is the most popular method in the animal nutritional studies. Approximately 37% of animal nutrition papers that conducted pair-wise comparisons in 2017 in AJAS used Duncan's test. The second most used tests were the LSD and Tukey's test; each accounted for 14% of multiple comparison tests.

  2. Practical Aspects of Experimental Design in Animal Research

    Paula D. Johnson, David G. Besselsen, Practical Aspects of Experimental Design in Animal Research, ILAR Journal, Volume 43, Issue 4, 2002, Pages 202-206, ... Practical issues that may need to be addressed include the lifespan of the animal model (for chronic studies), the anticipated progression of disease in that model ...

  3. PREPARE: guidelines for planning animal research and testing

    There is widespread concern about the quality, reproducibility and translatability of studies involving research animals. Although there are a number of reporting guidelines available, there is very little overarching guidance on how to plan animal experiments, despite the fact that this is the logical place to start ensuring quality. In this paper we present the PREPARE guidelines: Planning ...

  4. PDF Practical Aspects of Experimental Design in Animal Research

    studies and methods, identify appropriate animal models, and eliminate unnecessary duplication of research. The "3Rs" of animal research (Russell and Burch 1959) should also be considered at this stage: reduction of animal num-bers, refinement of methods, and replacement of animals by viable nonanimal alternatives when these exist. The

  5. The power of effective study design in animal experimentation

    From the practical standpoint of obtaining approval for research involving non-human animals, many jurisdictions approve studies locally via an institutionally based animal ethics committee focused on humane care and use of animals for scientific purposes and tend to do so in accordance with the principles of the 3Rs (articulated originally by ...

  6. A guide to open science practices for animal research

    To support researchers in designing their animal research, the National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs) has also developed the Experimental Design Assistant (EDA) [38,39]. This online tool helps researchers to better structure in vivo research by creating detailed schemes of the study design.

  7. Statistical review of animal trials—A guideline

    The latter is especially important if the study has a longitudinal design and observations are made at several time points. 3. Description of the study design: A detailed description of the study design is necessary to characterize whether it is suited to reach the study goals. Design aspects as well as methods against bias (blinding and ...

  8. Improving reproducibility in animal research by splitting the study

    To embrace this kind of unavoidable variation within a single study and thereby to increase reproducibility, the idea to implement multi-laboratory study designs in animal research has been ...

  9. Short course on experimental design

    Festing, formerly of the UK Medical Research Council and co-author of a paper reporting the poor quality of published animal research studies (PLoS ONE 4, e7824; 2009), has created "an ...

  10. Guidelines for planning and conducting high-quality research and

    There are important scientific, legal and ethical reasons for optimising the quality of animal research and testing. Concerns about the reproducibility and translatability of animal studies are now being voiced not only by those opposed to animal use, but also by scientists themselves.Many of the attempts to improve reproducibility have, until recently, focused on ways in which the reporting ...

  11. Design and Statistical Methods in Studies Using Animal Models of

    Pilot studies are usually small investigations, sometimes involving only a single animal, with the aim of testing the logistics of a proposed study, and sometimes of gaining preliminary data to be used in the design of a more definitive experiment. For example, a pilot study could be used to assess whether dose levels are appropriate, and to ...

  12. Reproducibility of animal research in light of biological variation

    Heterogenized study designs, which incorporate biological variables either as random or fixed effects, should become the default option for almost all study types of experimental animal research ...

  13. PDF Statistical experiment design for animal research

    experiment design? Animal research is crucial for biomedical advances because animal models often show higher discrimination than many other experimental alternatives and have the neces-sary fidelity which may be required (Russell WMS, Burch RL. 1959. The principles of

  14. How to Design Experiments in Animal Behaviour

    It has also been my aim, of course, to discuss the principles of ethology (the scientific study of animal behaviour), through the medium of these experiments. My motivation in writing this series is to bring social prestige to low-cost research, make the practice of science more inclusive and democratic, and empower large numbers of people to ...

  15. Animal experimental research design in critical care

    Background Limited translational success in critical care medicine is thought to be in part due to inadequate methodology, study design, and reporting in preclinical studies. The purpose of this study was to compare reporting of core features of experimental rigor: blinding, randomization, and power calculations in critical care medicine animal experimental research. We hypothesized that these ...

  16. Study Design

    1. Study Design. For each experiment, provide brief details of study design including: 1a The groups being compared, including control groups. If no control group has been used, the rationale should be stated. 1b The experimental unit (e.g. a single animal, litter, or cage of animals). The choice of control or comparator group is dependent on ...

  17. Study Design Rigor in Animal-Experimental Research Published

    sign, including sample size calculations, blinding procedures, and randomization steps. We hypothesized that the reporting of such metrics of study design rigor has increased over time for animal-experimental research published in anesthesia journals. METHODS: PubMed was searched for animal-experimental studies published in 2005, 2010, and 2015 in primarily English-language anesthesia journals ...

  18. The study design elements employed by researchers in preclinical animal

    Data sources. Manuscripts included described primary research of a single comparative animal experiment (published in English). Only in vivo studies were eligible. If an eligible manuscript also contained an in vitro or ex vivo intervention element, the manuscript as a whole was excluded. The single-study criterion was necessary for a companion project using the same set of studies.