When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

  • PLOS Biology
  • PLOS Climate
  • PLOS Complex Systems
  • PLOS Computational Biology
  • PLOS Digital Health
  • PLOS Genetics
  • PLOS Global Public Health
  • PLOS Medicine
  • PLOS Mental Health
  • PLOS Neglected Tropical Diseases
  • PLOS Pathogens
  • PLOS Sustainability and Transformation
  • PLOS Collections

How to Write a Peer Review

peer evaluation for research paper

When you write a peer review for a manuscript, what should you include in your comments? What should you leave out? And how should the review be formatted?

This guide provides quick tips for writing and organizing your reviewer report.

Review Outline

Use an outline for your reviewer report so it’s easy for the editors and author to follow. This will also help you keep your comments organized.

Think about structuring your review like an inverted pyramid. Put the most important information at the top, followed by details and examples in the center, and any additional points at the very bottom.

peer evaluation for research paper

Here’s how your outline might look:

1. Summary of the research and your overall impression

In your own words, summarize what the manuscript claims to report. This shows the editor how you interpreted the manuscript and will highlight any major differences in perspective between you and the other reviewers. Give an overview of the manuscript’s strengths and weaknesses. Think about this as your “take-home” message for the editors. End this section with your recommended course of action.

2. Discussion of specific areas for improvement

It’s helpful to divide this section into two parts: one for major issues and one for minor issues. Within each section, you can talk about the biggest issues first or go systematically figure-by-figure or claim-by-claim. Number each item so that your points are easy to follow (this will also make it easier for the authors to respond to each point). Refer to specific lines, pages, sections, or figure and table numbers so the authors (and editors) know exactly what you’re talking about.

Major vs. minor issues

What’s the difference between a major and minor issue? Major issues should consist of the essential points the authors need to address before the manuscript can proceed. Make sure you focus on what is  fundamental for the current study . In other words, it’s not helpful to recommend additional work that would be considered the “next step” in the study. Minor issues are still important but typically will not affect the overall conclusions of the manuscript. Here are some examples of what would might go in the “minor” category:

  • Missing references (but depending on what is missing, this could also be a major issue)
  • Technical clarifications (e.g., the authors should clarify how a reagent works)
  • Data presentation (e.g., the authors should present p-values differently)
  • Typos, spelling, grammar, and phrasing issues

3. Any other points

Confidential comments for the editors.

Some journals have a space for reviewers to enter confidential comments about the manuscript. Use this space to mention concerns about the submission that you’d want the editors to consider before sharing your feedback with the authors, such as concerns about ethical guidelines or language quality. Any serious issues should be raised directly and immediately with the journal as well.

This section is also where you will disclose any potentially competing interests, and mention whether you’re willing to look at a revised version of the manuscript.

Do not use this space to critique the manuscript, since comments entered here will not be passed along to the authors.  If you’re not sure what should go in the confidential comments, read the reviewer instructions or check with the journal first before submitting your review. If you are reviewing for a journal that does not offer a space for confidential comments, consider writing to the editorial office directly with your concerns.

Get this outline in a template

Giving Feedback

Giving feedback is hard. Giving effective feedback can be even more challenging. Remember that your ultimate goal is to discuss what the authors would need to do in order to qualify for publication. The point is not to nitpick every piece of the manuscript. Your focus should be on providing constructive and critical feedback that the authors can use to improve their study.

If you’ve ever had your own work reviewed, you already know that it’s not always easy to receive feedback. Follow the golden rule: Write the type of review you’d want to receive if you were the author. Even if you decide not to identify yourself in the review, you should write comments that you would be comfortable signing your name to.

In your comments, use phrases like “ the authors’ discussion of X” instead of “ your discussion of X .” This will depersonalize the feedback and keep the focus on the manuscript instead of the authors.

General guidelines for effective feedback

peer evaluation for research paper

  • Justify your recommendation with concrete evidence and specific examples.
  • Be specific so the authors know what they need to do to improve.
  • Be thorough. This might be the only time you read the manuscript.
  • Be professional and respectful. The authors will be reading these comments too.
  • Remember to say what you liked about the manuscript!

peer evaluation for research paper

Don’t

  • Recommend additional experiments or  unnecessary elements that are out of scope for the study or for the journal criteria.
  • Tell the authors exactly how to revise their manuscript—you don’t need to do their work for them.
  • Use the review to promote your own research or hypotheses.
  • Focus on typos and grammar. If the manuscript needs significant editing for language and writing quality, just mention this in your comments.
  • Submit your review without proofreading it and checking everything one more time.

Before and After: Sample Reviewer Comments

Keeping in mind the guidelines above, how do you put your thoughts into words? Here are some sample “before” and “after” reviewer comments

✗ Before

“The authors appear to have no idea what they are talking about. I don’t think they have read any of the literature on this topic.”

✓ After

“The study fails to address how the findings relate to previous research in this area. The authors should rewrite their Introduction and Discussion to reference the related literature, especially recently published work such as Darwin et al.”

“The writing is so bad, it is practically unreadable. I could barely bring myself to finish it.”

“While the study appears to be sound, the language is unclear, making it difficult to follow. I advise the authors work with a writing coach or copyeditor to improve the flow and readability of the text.”

“It’s obvious that this type of experiment should have been included. I have no idea why the authors didn’t use it. This is a big mistake.”

“The authors are off to a good start, however, this study requires additional experiments, particularly [type of experiment]. Alternatively, the authors should include more information that clarifies and justifies their choice of methods.”

Suggested Language for Tricky Situations

You might find yourself in a situation where you’re not sure how to explain the problem or provide feedback in a constructive and respectful way. Here is some suggested language for common issues you might experience.

What you think : The manuscript is fatally flawed. What you could say: “The study does not appear to be sound” or “the authors have missed something crucial”.

What you think : You don’t completely understand the manuscript. What you could say : “The authors should clarify the following sections to avoid confusion…”

What you think : The technical details don’t make sense. What you could say : “The technical details should be expanded and clarified to ensure that readers understand exactly what the researchers studied.”

What you think: The writing is terrible. What you could say : “The authors should revise the language to improve readability.”

What you think : The authors have over-interpreted the findings. What you could say : “The authors aim to demonstrate [XYZ], however, the data does not fully support this conclusion. Specifically…”

What does a good review look like?

Check out the peer review examples at F1000 Research to see how other reviewers write up their reports and give constructive feedback to authors.

Time to Submit the Review!

Be sure you turn in your report on time. Need an extension? Tell the journal so that they know what to expect. If you need a lot of extra time, the journal might need to contact other reviewers or notify the author about the delay.

Tip: Building a relationship with an editor

You’ll be more likely to be asked to review again if you provide high-quality feedback and if you turn in the review on time. Especially if it’s your first review for a journal, it’s important to show that you are reliable. Prove yourself once and you’ll get asked to review again!

  • Getting started as a reviewer
  • Responding to an invitation
  • Reading a manuscript
  • Writing a peer review

The contents of the Peer Review Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

The contents of the Writing Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…

Peer review templates, expert examples and free training courses

peer evaluation for research paper

Joanna Wilkinson

Learning how to write a constructive peer review is an essential step in helping to safeguard the quality and integrity of published literature. Read on for resources that will get you on the right track, including peer review templates, example reports and the Web of Science™ Academy: our free, online course that teaches you the core competencies of peer review through practical experience ( try it today ).

How to write a peer review

Understanding the principles, forms and functions of peer review will enable you to write solid, actionable review reports. It will form the basis for a comprehensive and well-structured review, and help you comment on the quality, rigor and significance of the research paper. It will also help you identify potential breaches of normal ethical practice.

This may sound daunting but it doesn’t need to be. There are plenty of peer review templates, resources and experts out there to help you, including:

Peer review training courses and in-person workshops

  • Peer review templates ( found in our Web of Science Academy )
  • Expert examples of peer review reports
  • Co-reviewing (sharing the task of peer reviewing with a senior researcher)

Other peer review resources, blogs, and guidelines

We’ll go through each one of these in turn below, but first: a quick word on why learning peer review is so important.

Why learn to peer review?

Peer reviewers and editors are gatekeepers of the research literature used to document and communicate human discovery. Reviewers, therefore, need a sound understanding of their role and obligations to ensure the integrity of this process. This also helps them maintain quality research, and to help protect the public from flawed and misleading research findings.

Learning to peer review is also an important step in improving your own professional development.

You’ll become a better writer and a more successful published author in learning to review. It gives you a critical vantage point and you’ll begin to understand what editors are looking for. It will also help you keep abreast of new research and best-practice methods in your field.

We strongly encourage you to learn the core concepts of peer review by joining a course or workshop. You can attend in-person workshops to learn from and network with experienced reviewers and editors. As an example, Sense about Science offers peer review workshops every year. To learn more about what might be in store at one of these, researcher Laura Chatland shares her experience at one of the workshops in London.

There are also plenty of free, online courses available, including courses in the Web of Science Academy such as ‘Reviewing in the Sciences’, ‘Reviewing in the Humanities’ and ‘An introduction to peer review’

The Web of Science Academy also supports co-reviewing with a mentor to teach peer review through practical experience. You learn by writing reviews of preprints, published papers, or even ‘real’ unpublished manuscripts with guidance from your mentor. You can work with one of our community mentors or your own PhD supervisor or postdoc advisor, or even a senior colleague in your department.

Go to the Web of Science Academy

Peer review templates

Peer review templates are helpful to use as you work your way through a manuscript. As part of our free Web of Science Academy courses, you’ll gain exclusive access to comprehensive guidelines and a peer review report. It offers points to consider for all aspects of the manuscript, including the abstract, methods and results sections. It also teaches you how to structure your review and will get you thinking about the overall strengths and impact of the paper at hand.

  • Web of Science Academy template (requires joining one of the free courses)
  • PLoS’s review template
  • Wiley’s peer review guide (not a template as such, but a thorough guide with questions to consider in the first and second reading of the manuscript)

Beyond following a template, it’s worth asking your editor or checking the journal’s peer review management system. That way, you’ll learn whether you need to follow a formal or specific peer review structure for that particular journal. If no such formal approach exists, try asking the editor for examples of other reviews performed for the journal. This will give you a solid understanding of what they expect from you.

Peer review examples

Understand what a constructive peer review looks like by learning from the experts.

Here’s a sample of pre and post-publication peer reviews displayed on Web of Science publication records to help guide you through your first few reviews. Some of these are transparent peer reviews , which means the entire process is open and visible — from initial review and response through to revision and final publication decision. You may wish to scroll to the bottom of these pages so you can first read the initial reviews, and make your way up the page to read the editor and author’s responses.

  • Pre-publication peer review: Patterns and mechanisms in instances of endosymbiont-induced parthenogenesis
  • Pre-publication peer review: Can Ciprofloxacin be Used for Precision Treatment of Gonorrhea in Public STD Clinics? Assessment of Ciprofloxacin Susceptibility and an Opportunity for Point-of-Care Testing
  • Transparent peer review: Towards a standard model of musical improvisation
  • Transparent peer review: Complex mosaic of sexual dichromatism and monochromatism in Pacific robins results from both gains and losses of elaborate coloration
  • Post-publication peer review: Brain state monitoring for the future prediction of migraine attacks
  • Web of Science Academy peer review: Students’ Perception on Training in Writing Research Article for Publication

F1000 has also put together a nice list of expert reviewer comments pertaining to the various aspects of a review report.

Co-reviewing

Co-reviewing (sharing peer review assignments with senior researchers) is one of the best ways to learn peer review. It gives researchers a hands-on, practical understanding of the process.

In an article in The Scientist , the team at Future of Research argues that co-reviewing can be a valuable learning experience for peer review, as long as it’s done properly and with transparency. The reason there’s a need to call out how co-reviewing works is because it does have its downsides. The practice can leave early-career researchers unaware of the core concepts of peer review. This can make it hard to later join an editor’s reviewer pool if they haven’t received adequate recognition for their share of the review work. (If you are asked to write a peer review on behalf of a senior colleague or researcher, get recognition for your efforts by asking your senior colleague to verify the collaborative co-review on your Web of Science researcher profiles).

The Web of Science Academy course ‘Co-reviewing with a mentor’ is uniquely practical in this sense. You will gain experience in peer review by practicing on real papers and working with a mentor to get feedback on how their peer review can be improved. Students submit their peer review report as their course assignment and after internal evaluation receive a course certificate, an Academy graduate badge on their Web of Science researcher profile and is put in front of top editors in their field through the Reviewer Locator at Clarivate.

Here are some external peer review resources found around the web:

  • Peer Review Resources from Sense about Science
  • Peer Review: The Nuts and Bolts by Sense about Science
  • How to review journal manuscripts by R. M. Rosenfeld for Otolaryngology – Head and Neck Surgery
  • Ethical guidelines for peer review from COPE
  • An Instructional Guide for Peer Reviewers of Biomedical Manuscripts by Callaham, Schriger & Cooper for Annals of Emergency Medicine (requires Flash or Adobe)
  • EQUATOR Network’s reporting guidelines for health researchers

And finally, we’ve written a number of blogs about handy peer review tips. Check out some of our top picks:

  • How to Write a Peer Review: 12 things you need to know
  • Want To Peer Review? Top 10 Tips To Get Noticed By Editors
  • Review a manuscript like a pro: 6 tips from a Web of Science Academy supervisor
  • How to write a structured reviewer report: 5 tips from an early-career researcher

Want to learn more? Become a master of peer review and connect with top journal editors. The Web of Science Academy – your free online hub of courses designed by expert reviewers, editors and Nobel Prize winners. Find out more today.

Related posts

Journal citation reports 2024 preview: unified rankings for more inclusive journal assessment.

peer evaluation for research paper

Introducing the Clarivate Academic AI Platform

peer evaluation for research paper

Reimagining research impact: Introducing Web of Science Research Intelligence

peer evaluation for research paper

Advertisement

Advertisement

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

  • Meta-Analysis
  • Open access
  • Published: 10 December 2019
  • Volume 32 , pages 481–509, ( 2020 )

Cite this article

You have full access to this open access article

peer evaluation for research paper

  • Kit S. Double   ORCID: orcid.org/0000-0001-8120-1573 1 ,
  • Joshua A. McGrane 1 &
  • Therese N. Hopfenbeck 1  

104k Accesses

149 Citations

125 Altmetric

Explore all metrics

Peer assessment has been the subject of considerable research interest over the last three decades, with numerous educational researchers advocating for the integration of peer assessment into schools and instructional practice. Research synthesis in this area has, however, largely relied on narrative reviews to evaluate the efficacy of peer assessment. Here, we present a meta-analysis (54 studies, k = 141) of experimental and quasi-experimental studies that evaluated the effect of peer assessment on academic performance in primary, secondary, or tertiary students across subjects and domains. An overall small to medium effect of peer assessment on academic performance was found ( g = 0.31, p < .001). The results suggest that peer assessment improves academic performance compared with no assessment ( g = 0.31, p = .004) and teacher assessment ( g = 0.28, p = .007), but was not significantly different in its effect from self-assessment ( g = 0.23, p = .209). Additionally, meta-regressions examined the moderating effects of several feedback and educational characteristics (e.g., online vs offline, frequency, education level). Results suggested that the effectiveness of peer assessment was remarkably robust across a wide range of contexts. These findings provide support for peer assessment as a formative practice and suggest several implications for the implementation of peer assessment into the classroom.

Similar content being viewed by others

peer evaluation for research paper

The Impact of Peer Feedback on Student Learning Effectiveness: A Meta-analysis Based on 39 Experimental or Quasiexperimental Studies

peer evaluation for research paper

A Systematic Review of Peer Assessment Design Elements

peer evaluation for research paper

Peer-Assisted Learning Strategies (PALS): A Validated Classwide Program for Improving Reading and Mathematics Performance

Avoid common mistakes on your manuscript.

Feedback is often regarded as a central component of educational practice and crucial to students’ learning and development (Fyfe & Rittle-Johnson, 2016 ; Hattie and Timperley 2007 ; Hays, Kornell, & Bjork, 2010 ; Paulus, 1999 ). Peer assessment has been identified as one method for delivering feedback efficiently and effectively to learners (Topping 1998 ; van Zundert et al. 2010 ). The use of students to generate feedback about the performance of their peers is referred to in the literature using various terms, including peer assessment, peer feedback, peer evaluation, and peer grading. In this article, we adopt the term peer assessment, as it more generally refers to the method of peers assessing or being assessed by each other, whereas the term feedback is used when we refer to the actual content or quality of the information exchanged between peers. This feedback can be delivered in a variety of forms including written comments, grading, or verbal feedback (Topping 1998 ). Importantly, by performing both the role of assessor and being assessed themselves, students’ learning can potentially benefit more than if they are just assessed (Reinholz 2016 ).

Peer assessments tend to be highly correlated with teacher assessments of the same students (Falchikov and Goldfinch 2000 ; Li et al. 2016 ; Sanchez et al. 2017 ). However, in addition to establishing comparability between teacher and peer assessment scores, it is important to determine whether peer assessment also has a positive effect on future academic performance. Several narrative reviews have argued for the positive formative effects of peer assessment (e.g., Black and Wiliam 1998a ; Topping 1998 ; van Zundert et al. 2010 ) and have additionally identified a number of potentially important moderators for the effect of peer assessment. This meta-analysis will build upon these reviews and provide quantitative evaluations for some of the instructional features identified in these narrative reviews by utilising them as moderators within our analysis.

Evaluating the Evidence for Peer Assessment

Empirical studies.

Despite the optimism surrounding peer assessment as a formative practice, there are relatively few control group studies that evaluate the effect of peer assessment on academic performance (Flórez and Sammons 2013 ; Strijbos and Sluijsmans 2010 ). Most studies on peer assessment have tended to focus on either students’ or teachers’ subjective perceptions of the practice rather than its effect on academic performance (e.g., Brown et al. 2009 ; Young and Jackman 2014 ). Moreover, interventions involving peer assessment often confound the effect of peer assessment with other assessment practices that are theoretically related under the umbrella of formative assessment (Black and Wiliam 2009 ). For instance, Wiliam et al. ( 2004 ) reported a mean effect size of .32 in favor of a formative assessment intervention but they were unable to determine the unique contribution of peer assessment to students’ achievement, as it was one of more than 15 assessment practices included in the intervention.

However, as shown in Fig. 1 , there has been a sharp increase in the number of studies related to peer assessment, with over 75% of relevant studies published in the last decade. Although it is still far from being the dominant outcome measure in research on formative practices, many of these recent studies have examined the effect of peer assessment on objective measures of academic performance (e.g., Gielen et al. 2010a ; Liu et al. 2016 ; Wang et al. 2014a ). The number of studies of peer assessment using control group designs also appears to be increasing in frequency (e.g., van Ginkel et al. 2017 ; Wang et al. 2017 ). These studies have typically compared the formative effect of peer assessment with either teacher assessment (e.g., Chaney and Ingraham 2009 ; Sippel and Jackson 2015 ; van Ginkel et al. 2017 ) or no assessment conditions (e.g., Kamp et al. 2014 ; L. Li and Steckelberg 2004 ; Schonrock-Adema et al. 2007 ). Given the increase in peer assessment research, and in particular experimental research, it seems pertinent to synthesise this new body of research, as it provides a basis for critically evaluating the overall effectiveness of peer assessment and its moderators.

figure 1

Number of records returned by year. The following search terms were used: ‘peer assessment’ or ‘peer grading or ‘peer evaluation’ or ‘peer feedback’. Data were collated by searching Web of Science ( www.webofknowledge.com ) for the following keywords: ‘peer assessment’ or ‘peer grading’ or ‘peer evaluation’ or ‘peer feedback’ and categorising by year

Previous Reviews

Efforts to synthesise peer assessment research have largely been limited to narrative reviews, which have made very strong claims regarding the efficacy of peer assessment. For example, in a review of peer assessment with tertiary students, Topping ( 1998 ) argued that the effects of peer assessment are, ‘as good as or better than the effects of teacher assessment’ (p. 249). Similarly, in a review on peer and self-assessment with tertiary students, Dochy et al. ( 1999 ) concluded that peer assessment can have a positive effect on learning but may be hampered by social factors such as friendships, collusion, and perceived fairness. Reviews into peer assessment have also tended to focus on determining the accuracy of peer assessments, which is typically established by the correlation between peer and teacher assessments for the same performances. High correlations have been observed between peer and teacher assessments in three meta-analyses to date ( r = .69, .63, and .68 respectively; Falchikov and Goldfinch 2000 ; H. Li et al. 2016 ; Sanchez et al. 2017 ). Given that peer assessment is often advocated as a formative practice (e.g., Black and Wiliam 1998a ; Topping 1998 ), it is important to expand on these correlational meta-analyses to examine the formative effect that peer assessment has on academic performance.

In addition to examining the correlation between peer and teacher grading, Sanchez et al. ( 2017 ) additionally performed a meta-analysis on the formative effect of peer grading (i.e., a numerical or letter grade was provided to a student by their peer) in intervention studies. They found that there was a significant positive effect of peer grading on academic performance for primary and secondary (grades 3 to 12) students ( g = .29). However, it is unclear whether their findings would generalise to other forms of peer feedback (e.g., written or verbal feedback) and to tertiary students, both of which we will evaluate in the current meta-analysis.

Moderators of the Effectiveness of Peer Assessment

Theoretical frameworks of peer assessment propose that it is beneficial in at least two respects. Firstly, peer assessment allows students to critically engage with the assessed material, to compare and contrast performance with their peers, and to identify gaps or errors in their own knowledge (Topping 1998 ). In addition, peer assessment may improve the communication of feedback, as peers may use similar and more accessible language, as well as reduce negative feelings of being evaluated by an authority figure (Liu et al. 2016 ). However, the efficacy of peer assessment, like traditional feedback, is likely to be contingent on a range of factors including characteristics of the learning environment, the student, and the assessment itself (Kluger and DeNisi 1996 ; Ossenberg et al. 2018 ). Some of the characteristics that have been proposed to moderate the efficacy of feedback include anonymity (e.g., Rotsaert et al. 2018 ; Yu and Liu 2009 ), scaffolding (e.g., Panadero and Jonsson 2013 ), quality and timing of the feedback (Diab 2011 ), and elaboration (e.g., Gielen et al. 2010b ). Drawing on the previously mentioned narrative reviews and empirical evidence, we now briefly outline the evidence for each of the included theoretical moderators.

It is somewhat surprising that most studies that examine the effect of peer assessment tend to only assess the impact on the assessee and not the assessor (van Popta et al. 2017 ). Assessing may confer several distinct advantages such as drawing comparisons with peers’ work and increased familiarity with evaluative criteria. Several studies have compared the effect of assessing with being assessed. Lundstrom and Baker ( 2009 ) found that assessing a peer’s written work was more beneficial for their own writing than being assessed by a peer. Meanwhile, Graner ( 1987 ) found that students who were receiving feedback from a peer and acted as an assessor did not perform better than students who acted as an assessor but did not receive peer feedback. Reviewing peers’ work is also likely to help students become better reviewers of their own work and to revise and improve their own work (Rollinson 2005 ). While, in practice, students will most often act as both assessor and assessee during peer assessment, it is useful to gain a greater insight into the relative impact of performing each of these roles for both practical reasons and to help determine the mechanisms by which peer assessment improves academic performance.

Peer Assessment Type

The characteristics of peer assessment vary greatly both in practice and within the research literature. Because meta-analysis is unable to capture all of the nuanced dimensions that determine the type, intensity, and quality of peer assessment, we focus on distinguishing between what we regard as the most prevalent types of peer assessment in the literature: grading, peer dialogs, and written assessment. Each of these peer assessment types is widely used in the classroom and often in various combinations (e.g., written qualitative feedback in combination with a numerical grade). While these assessment types differ substantially in terms of their cognitive complexity and comprehensiveness, each has shown at least some evidence of impactive academic performance (e.g., Sanchez et al. 2017 ; Smith et al. 2009 ; Topping 2009 ).

Freeform/Scaffolding

Peer assessment is often implemented in conjunction with some form of scaffolding, for example, rubrics, and scoring scripts. Scaffolding has been shown to improve both the quality peer assessment and increase the amount of feedback assessors provide (Peters, Körndle & Narciss, 2018 ). Peer assessment has also been shown to be more accurate when rubrics are utilised. For example, Panadero, Romero, & Strijbos ( 2013 ) found that students were less likely to overscore their peers.

Increasingly, peer assessment has been performed online due in part to the growth in online learning activities as well as the ease by which peer assessment can be implemented online (van Popta et al. 2017 ). Conducting peer assessment online can significantly reduce the logistical burden of implementing peer assessment (e.g., Tannacito and Tuzi 2002 ). Several studies have shown that peer assessment can effectively be carried out online (e.g., Hsu 2016 ; Li and Gao 2016 ). Van Popta et al. ( 2017 ) argue that the cognitive processes involved in peer assessment, such as evaluating, explaining, and suggesting, similarly play out in online and offline environments. However, the social processes involved in peer assessment are likely to substantively differ between online and offline peer assessment (e.g., collaborating, discussing), and it is unclear whether this might limit the benefits of peer assessment through one or the other medium. To the authors’ knowledge, no prior studies have compared the effects of online and offline peer assessment on academic performance.

Because peer assessment is fundamentally a collaborative assessment practice, interpersonal variables play a substantial role in determining the type and quality of peer assessment (Strijbos and Wichmann 2018 ). Some researchers have argued that anonymous peer assessment is advantageous because assessors are more likely to be honest in their feedback, and interpersonal processes cannot influence how assessees receive the assessment feedback (Rotsaert et al. 2018 ). Qualitative evidence suggests that anonymous peer assessment results in improved feedback quality and more positive perceptions towards peer assessment (Rotsaert et al. 2018 ; Vanderhoven et al. 2015 ). A recent qualitative review by Panadero and Alqassab ( 2019 ) found that three studies had compared anonymous peer assessment to a control group (i.e., open peer assessment) and looked at academic performance as the outcome. Their review found mixed evidence regarding the benefit of anonymity in peer assessment with one of the included studies finding an advantage of anonymity, but the other two finding little benefit of anonymity. Others have questioned whether anonymity impairs the development of cognitive and interpersonal development by limiting the collaborative nature of peer assessment (Strijbos and Wichmann 2018 ).

Peers are often novices at providing constructive assessment and inexperienced learners tend to provide limited feedback (Hattie and Timperley 2007 ). Several studies have therefore suggested that peer assessment becomes more effective as students’ experience with peer assessment increases. For example, with greater experience, peers tend to use scoring criteria to a greater extent (Sluijsmans et al. 2004 ). Similarly, training peer assessment over time can improve the quality of feedback they provide, although the effects may be limited by the extent of a student’s relevant domain knowledge (Alqassab et al. 2018 ). Frequent peer assessment may also increase positive learner perceptions of peer assessment (e.g., Sluijsmans et al. 2004 ). However, other studies have found that learner perceptions of peer assessment are not necessarily positive (Alqassab et al. 2018 ). This may suggest that learner perceptions of peer assessment vary depending on its characteristics (e.g., quality, detail).

Current Study

Given the previous reliance on narrative reviews and the increasing research and teacher interest in peer assessment, as well as the popularity of instructional theories advocating for peer assessment and formative assessment practices in the classroom, we present a quantitative meta-analytic review to develop and synthesise the evidence in relation to peer assessment. This meta-analysis evaluates the effect of peer assessment on academic performance when compared to no assessment as well as teacher assessment. To do this, the meta-analysis only evaluates intervention studies that utilised experimental or quasi-experimental designs, i.e., only studies with control groups, so that the effects of maturation and other confounding variables are mitigated. Control groups can be either passive (e.g., no feedback) or active (e.g., teacher feedback). We meta-analytically address two related research questions:

What effect do peer assessment interventions have on academic performance relative to the observed control groups?

What characteristics moderate the effectiveness of peer assessment?

Working Definitions

The specific methods of peer assessment can vary considerably, but there are a number of shared characteristics across most methods. Peers are defined as individuals at similar (i.e., within 1–2 grades) or identical education levels. Peer assessment must involve assessing or being assessed by peers, or both. Peer assessment requires the communication (either written, verbal, or online) of task-relevant feedback, although the style of feedback can differ markedly, from elaborate written and verbal feedback to holistic ratings of performance.

We took a deliberately broad definition of academic performance for this meta-analysis including traditional outcomes (e.g., test performance or essay writing) and also practical skills (e.g., constructing a circuit in science class). Despite this broad interpretation of academic performance, we did not include any studies that were carried out in a professional/organisational setting other than professional skills (e.g., teacher training) that were being taught in a traditional educational setting (e.g., a university).

Selection Criteria

To be included in this meta-analysis, studies had to meet several criteria. Firstly, a study needed to examine the effect of peer assessment. Secondly, the assessment could be delivered in any form (e.g., written, verbal, online), but needed to be distinguishable from peer-coaching/peer-tutoring. Thirdly, a study needed to compare the effect of peer assessment with a control group. Pre-post designs that did not include a control/comparison group were excluded because we could not discount the effects of maturation or other confounding variables. Moreover, the comparison group could take the form of either a passive control (e.g., a no assessment condition) or an active control (e.g., teacher assessment). Fourthly, a study needed to examine the effect of peer assessment on a non-self-reported measure of academic performance.

In addition to these criteria, a study needed to be carried out in an educational context or be related to educational outcomes in some way. Any level of education (i.e., tertiary, secondary, primary) was acceptable. A study also needed to provide sufficient data to calculate an effect size. If insufficient data was available in the manuscript, the authors were contacted by email to request the necessary data (additional information was provided for a single study). Studies also needed to be written in English.

Literature Search

The literature search was carried out on 8 June 2018 using PsycInfo , Google Scholar , and ERIC. Google Scholar was used to check for additional references as it does not allow for the exporting of entries. These three electronic databases were selected due to their relevance to educational instruction and practice. Results were not filtered based on publication date, but ERIC only holds records from 1966 to present. A deliberately wide selection of search terms was used in the first instance to capture all relevant articles. The search terms included ‘peer grading’ or ‘peer assessment’ or ‘peer evaluation’ or ‘peer feedback’, which were paired with ‘learning’ or ‘performance’ or ‘academic achievement’ or ‘academic performance’ or ‘grades’. All peer assessment-related search terms were included with and without hyphenation. In addition, an ancestry search (i.e., back-search) was performed on the reference lists of the included articles. Conference programs for major educational conferences were searched. Finally, unpublished results were sourced by emailing prominent authors in the field and through social media. Although there is significant disagreement about the inclusion of unpublished data and conference abstracts, i.e., ‘grey literature’ (Cook et al. 1993 ), we opted to include it in the first instance because including only published studies can result in a meta-analysis over-estimating effect sizes due to publication bias (Hopewell et al. 2007 ). It should, however, be noted that none of the substantive conclusions changed when the analyses were re-run with the grey literature excluded.

The database search returned 4072 records. An ancestry search returned an additional 37 potentially relevant articles. No unpublished data could be found. After duplicates were removed, two reviewers independently screened titles and abstracts for relevance. A kappa statistic was calculated to assess inter-rater reliability between the two coders and was found to be .78 (89.06% overall agreement, CI .63 to .94), which is above the recommended minimum levels of inter-rater reliability (Fleiss 1971 ). Subsequently, the full text of articles that were deemed relevant based on their abstracts was examined to ensure that they met the selection criteria described previously. Disagreements between the coders were discussed and, when necessary, resolved by a third coder. Ultimately, 55 articles with 143 effect sizes were found that met the inclusion criteria and included in the meta-analysis. The search process is depicted in Fig. 2 .

figure 2

Flow chart for the identification, screening protocol, and inclusion of publications in the meta-analyses

Data Extraction

A research assistant and the first author extracted data from the included papers. We took an iterative approach to the coding procedure whereby the coders refined the classification of each variable as they progressed through the included studies to ensure that the classifications best characterised the extant literature. Below, the coding strategy is reviewed along with the classifications utilised. Frequency statistics and inter-rater reliability for the extracted data for the different classifications are presented in Table 1 . All extracted variable showed at least moderate agreement except for whether the peer assessment was freeform or structured, which showed fair agreement (Landis and Koch 1977 ).

Publication Type

Publications were classified into journal articles, conference papers, dissertations, reports, or unpublished records.

Education Level

Education level was coded as either graduate tertiary, undergraduate tertiary, secondary, or primary. Given the small number of studies that utilised graduate samples ( N = 2), we subsequently combined this classification with undergraduate to form a general tertiary category. In addition, we recorded the grade level of the students. Generally speaking, primary education refers to the ages of 6–12, secondary education refers to education from 13–18, and tertiary education is undertaken after the age of 18.

Age and Sex

The percentage of students in a study that were female was recorded. In addition, we recorded the mean age from each study. Unfortunately, only 55.5% of studies recorded participants’ sex and only 18.5% of studies recorded mean age information.

The subject area associated with the academic performance measure was coded. We also recorded the nature of the academic performance variable for descriptive purposes.

Assessment Role

Studies were coded as to whether the students acted as peer assessors, assessees, or both assessors and assessees.

Comparison Group

Four types of comparison group were found in the included studies: no assessment, teacher assessment, self-assessment, and reader-control. In many instances, a no assessment condition could be characterised as typical instruction; that is, two versions of a course were run—one with peer assessment and one without peer assessment. As such, while no specific teacher assessment comparison condition is referenced in the article, participants would most likely have received some form of teacher feedback as is typical in standard instructional practice. Studies were classified as having teacher assessment on the basis of a specific reference to teacher feedback being provided.

Studies were classified as self-assessment controls if there was an explicit reference to a self-assessment activity, e.g., self-grading/rating. Studies that only included revision, e.g., working alone on revising an assignment, were classified as no assessment rather than self-assessment because they did not necessarily involve explicit self-assessment. Studies where both the comparison and intervention groups received teacher assessment (in addition to peer assessment in the case of the intervention group) were coded as no assessment to reflect the fact that the comparison group received no additional assessment compared to the peer assessment condition. In addition, Philippakos and MacArthur ( 2016 ) and Cho and MacArthur ( 2011 ) were notable in that they utilised a reader-control condition whereby students read, but did not assess peers’ work. Due to the small frequency of this control condition, we ultimately classified them as no assessment controls.

Peer assessment was characterised using coding we believed best captured the theoretical distinctions in the literature. Our typology of peer assessment used three distinct components, which were combined for classification:

Did the peer feedback include a dialog between peers?

Did the peer feedback include written comments?

Did the peer feedback include grading?

Each study was classified using a dichotomous present/absent scoring system for each of the three components.

Studies were dichotomously classified as to whether a specific rubric, assessment script, or scoring system was provided to students. Studies that only provided basic instructions to students to conduct the peer feedback were coded as freeform.

Was the Assessment Online?

Studies were classified based on whether the peer assessment was online or offline.

Studies were classified based on whether the peer assessment was anonymous or identified.

Frequency of Assessment

Studies were coded dichotomously as to whether they involved only a single peer assessment occasion or, alternatively, whether students provided/received peer feedback on multiple occasions.

The level of transfer between the peer assessment task and the academic performance measure was coded into three categories:

No transfer—the peer-assessed task was the same as the academic performance measure. For example, a student’s assignment was assessed by peers and this feedback was utilised to make revisions before it was graded by their teacher.

Near transfer—the peer-assessed task was in the same or very similar format as the academic performance measure, e.g., an essay on a different, but similar topic.

Far transfer—the peer-assessed task was in a different form to the academic performance task, although they may have overlapping content. For example, a student’s assignment was peer assessed, while the final course exam grade was the academic performance measure.

We recorded how participants were allocated to a condition. Three categories of allocation were found in the included studies: random allocation at the class level, at the student level, or at the year/semester level. As only two studies allocated students to conditions at the year/semester level, we combined these studies with the studies allocated at the classroom level (i.e., as quasi-experiments).

Statistical Analyses of Effect Sizes

Effect size estimation and heterogeneity.

A random effects, multi-level meta-analysis was carried out using R version 3.4.3 (R Core Team 2017 ). The primary outcome was standardised mean difference between peer assessment and comparison (i.e., control) conditions. A common effect size metric, Hedge’s g , was calculated. A positive Hedge’s g value indicates comparatively higher values in the dependent variable in the peer assessment group (i.e., higher academic performance). Heterogeneity in the effect sizes was estimated using the I 2 statistic. I 2 is equivalent to the percentage of variation between studies that is due to heterogeneity (Schwarzer et al. 2015 ). Large values of the I 2 statistics suggest higher heterogeneity between studies in the analysis.

Meta-regressions were performed to examine the moderating effects of the various factors that differed across the studies. We report the results of these meta-regressions alongside sub-groups analyses. While it was possible to determine whether sub-groups differed significantly from each other by determining whether the confidence interval around their effect sizes overlap, sub-groups analysis may also produce biased estimates when heteroscedasticity or multicollinearity are present (Steel and Kammeyer-Mueller 2002 ). We performed meta-regressions separately for each predictor to test the overall effect of a moderator.

Finally, as this meta-analysis included students from primary school to graduate school, which are highly varied participant and educational contexts, we opted to analyse the data both in complete form, as well as after controlling for each level of education. As such, we were able to look at the effect of each moderator across education levels and for each education level separately.

Robust Variance Estimation

Often meta-analyses include multiple effect sizes from the same sample (e.g., the effect of peer assessment on two different measures of academic performance). Including these dependent effect sizes in a meta-analysis can be problematic, as this can potentially bias the results of the analysis in favour of studies that have more effect sizes. Recently, Robust Variance Estimation (RVE) was developed as a technique to address such concerns (Hedges et al. 2010 ). RVE allows for the modelling of dependence between effect sizes even when the nature of the dependence is not specifically known. Under such situations, RVE results in unbiased estimates of fixed effects when dependent effect sizes are included in the analysis (Moeyaert et al. 2017 ). A correlated effects structure was specified for the meta-analysis (i.e., the random error in the effects from a single paper were expected to be correlated due to similar participants, procedures). A rho value of .8 was specified for the correlated effects (i.e., effects from the same study) as is standard practice when the correlation is unknown (Hedges et al. 2010 ). A sensitivity analysis indicated that none of the results varied as a function of the chosen rho. We utilised the ‘robumeta’ package (Fisher et al. 2017 ) to perform the meta-analyses. Our approach was to use only summative dependent variables when they were provided (e.g., overall writing quality score rather than individual trait measures), but to utilise individual measures when overall indicators were not available. When a pre-post design was used in a study, we adjusted the effect size for pre-intervention differences in academic performance as long as there was sufficient data to do so (e.g., t tests for pre-post change).

Overall Meta-analysis of the Effect of Peer Assessment

Prior to conducting the analysis, two effect sizes ( g = 2.06 and 1.91) were identified as outliers and removed using the outlier labelling rule (Hoaglin and Iglewicz 1987 ). Descriptive characteristics of the included studies are presented in Table 2 . The meta-analysis indicated that there was a significant positive effect of peer assessment on academic performance ( g = 0.31, SE = .06, 95% CI = .18 to .44, p < .001). A density graph of the recorded effect sizes is provided in Fig. 3 . A sensitivity analysis indicated that the effect size estimates did not differ with different values of rho. Heterogeneity between the studies’ effect sizes was large, I 2 = 81.08%, supporting the use of a meta-regression/sub-groups analysis in order to explain the observed heterogeneity in effect sizes.

figure 3

A density plot of effect sizes

Meta-Regressions and Sub-Groups Analyses

Effect sizes for sub-groups are presented in Table 3 . The results of the meta-regressions are presented in Table 4 .

A meta-regression with tertiary students as the reference category indicated that there was no significant difference in effect size as a function of education level. The effect of peer assessment was similar for secondary students ( g = .44, p < .001) and primary school students ( g = .41, p = .006) and smaller for tertiary students ( g = .21, p = .043). There is, however, a strong theoretical basis for examining effects separately at different education levels (primary, secondary, tertiary), because of the large degree of heterogeneity across such a wide span of learning contexts (e.g., pedagogical practices, intellectual and social development of the students). We therefore will proceed by reporting the data both as a whole and separately for each of the education levels for all of the moderators considered here. Education level is contrast coded such that tertiary is compared to the average of secondary and primary and secondary and primary are compared to each other.

A meta-regression indicated that the effect size was not significantly different when comparing peer assessment with teacher assessment, than when comparing peer assessment with no assessment ( b = .02, 95% CI − .26 to .31, p = .865). The difference between peer assessment vs. no assessment and peer assessment vs. self-assessment was also not significant ( b = − .03, CI − .44 to .38, p = .860), see Table 4 . An examination of sub-groups suggested that peer assessment had a moderate positive effect compared to no assessment controls ( g = .31, p = .004) and teacher assessment ( g = .28, p = .007) and was not significantly different compared with self-assessment ( g = .23, p = .209). The meta-regression was also re-run with education level as a covariate but the results were unchanged.

Meta-regressions indicated that the participant’s role was not a significant moderator of the effect size; see Table 4 . However, given the extremely small number of studies where participants did not act as both assessees ( n = 2) and assessors ( n = 4), we did not perform a sub-groups analysis, as such analyses are unreliable with small samples (Fisher et al. 2017 ).

Subject Area

Given that many subject areas had few studies (see Table 1 ) and the writing subject area made up the majority of effect sizes (40.74%), we opted to perform a meta-regression comparing writing with other subject areas. However, the effect of peer assessment did not differ between writing ( g = .30 , p = .001) and other subject areas ( g = .31 , p = .002); b = − .003, 95% CI − .25 to .25, p = .979. Similarly, the results did not substantially change when education level was entered into the model.

The effect of peer assessment did not differ significantly when peer assessment included a written component ( g = .35 , p < .001) than when it did not ( g = .20 , p = .015) , b = .144, 95% CI − .10 to .39, p = .241. Including education as a variable in the model did not change the effect written feedback. Similarly, studies with a dialog component ( g = .21 , p = .033) did not differ significantly from those that did not ( g = .35 , p < .001), b = − .137, 95% CI − .39 to .12, p = .279.

Studies where peer feedback included a grading component ( g = .37 , p < .001) did not differ significantly from those that did not ( g = .17 , p = .138). However, when education level was included in the model, the model indicated significant interaction effect between grading in tertiary students and the average effect of grading in primary and secondary students ( b = .395, 95% CI .06 to .73, p = .022). A follow-up sub-groups analysis showed that grading was beneficial for academic performance in tertiary students ( g = .55 , p = .009), but not secondary school students ( g = .002, p = .991) or primary school students ( g = − .08, p = .762). When the three variables used to characterise peer assessment were entered simultaneously, the results were unchanged.

The average effect size was not significantly different for studies where assessment was freeform, i.e., where no specific script or rubric was given ( g = .42, p = .030) compared to those where a specific script or rubric was provided ( g = .29, p < .001); b = − .13, 95% CI − .51 to .25, p = .455. However, there were few studies where feedback was freeform ( n = 9, k =29). The results were unchanged when education level was controlled for in the meta-regression.

Studies where peer assessment was online ( g = .38, p = .003) did not differ from studies where assessment was offline ( g = .24, p = .004); b = .16, 95% CI − .10 to .42, p = .215. This result was unchanged when education level was included in the meta-regression.

There was no significant difference in terms of effect size between studies where peer assessment was anonymised ( g = .27, p = .019) and those where it was not ( g = .25, p = .004); b = .03, 95% CI − .22 to .28, p = .811). Nor was the effect significant when education level was controlled for.

Studies where peer assessment was performed just a single time ( g = .19, p = .103) did not differ significantly from those where it was performed multiple times ( g = .37, p < .001); b = -.17, 95% CI − .45 to .11, p = .223. Although it is worth noting that the results of the sub-groups analysis suggest that the effect of peer assessment was not significant when only considering studies that applied it a single time. The result did not change when education was included in the model.

There was no significant difference in effect size between studies utilising far transfer ( g = .21, p = .124) than those with near ( g = .42, p < .001) or no transfer ( g = .29, p = .017). Although it is worth noting that the sub-groups analysis suggests that the effect of peer assessment was only significant when there was no transfer to the criterion task. As shown in Table 4 , this was also not significant when analysed using meta-regressions either with or without education in the model.

Studies that allocated participants to experimental condition at the student level ( g = .21, p = .14) did not differ from those that allocated condition at the classroom/semester level ( g = .31, p < .001 and g  = .79, p  = .223 respectively), see Table 4 for meta-regressions.

Publication Bias

Risk of publication bias was assessed by inspecting the funnel plots (see Fig. 4 ) of the relationship between observed effects and standard error for asymmetry (Schwarzer et al. 2015 ). Egger’s test was also run by including standard error as a predictor in a meta-regression. Based on the funnel plots and a non-significant Egger’s test of asymmetry ( b = .886, p = .226), risk of publication bias was judged to be low

figure 4

A funnel plot showing the relationship between standard error and observed effect size for the academic performance meta-analysis

Proponents of peer assessment argue that it is an effective classroom technique for improving academic performance (Topping 2009 ). While previous narrative reviews have argued for the benefits of peer assessment, the current meta-analysis quantifies the effect of peer assessment interventions on academic performance within educational contexts. Overall, the results suggest that there is a positive effect of peer assessment on academic performance in primary, secondary, and tertiary students. The magnitude of the overall effect size was within the small to medium range for effect sizes (Sawilowsky 2009 ). These findings also suggest that that the benefits of peer assessment are robust across many contextual factors, including different feedback and educational characteristics.

Recently, researchers have increasingly advocated for the role of assessment in promoting learning in educational practice (Wiliam 2018 ). Peer assessment forms a core part of theories of formative assessment because it is seen as providing new information about the learning process to the teacher or student, which in turn facilitates later performance (Pellegrino et al. 2001 ). The current results provide support for the position that peer assessment can be an effective classroom technique for improving academic performance. The result suggest that peer assessment is effective compared to both no assessment (which often involved ‘teaching as usual’) and teacher assessment, suggesting that peer assessment can play an important formative role in the classroom. The findings suggest that structuring classroom activities in a way that utilises peer assessment may be an effective way to promote learning and optimise the use of teaching resources by permitting the teacher to focus on assisting students with greater difficulties or for more complex tasks. Importantly, the results indicate that peer assessment can be effective across a wide range of subject areas, education levels, and assessment types. Pragmatically, this suggests that classroom teachers can implement peer assessment in a variety of ways and tailor the peer assessment design to the particular characteristics and constraints of their classroom context.

Notably, the results of this quantitative meta-analysis align well with past narrative reviews (e.g., Black and Wiliam 1998a ; Topping 1998 ; van Zundert et al. 2010 ). The fact that both quantitative and qualitative syntheses of the literature suggest that peer assessment can be beneficial provides a stronger basis for recommending peer assessment as a practice. However, several of the moderators of the effectiveness of peer feedback that have been argued for in the available narrative reviews (e.g., rubrics; Panadero and Jonsson 2013 ) have received little support from this quantitative meta-analysis. As detailed below, this may suggest that the prominence of such feedback characteristics in narrative reviews is more driven by theoretical considerations rather than quantitative empirical evidence. However, many of these moderating variables are complex, for example, rubrics can take many forms, and due to this complexity may not lend themselves as well to quantitative synthesis/aggregation (for a detailed discussion on combining qualitative and quantitative evidence, see Gorard 2002 ).

Mechanisms and Moderators

Indeed, the current findings suggest that the feedback characteristics deemed important by current theories of peer assessment may not be as significant as first thought. Previously, individual studies have argued for the importance of characteristics such as rubrics (Panadero and Jonsson 2013 ), anonymity (Bloom & Hautaluoma, 1987 ), and allowing students to practice peer assessment (Smith, Cooper, & Lancaster, 2002 ). While these feedback characteristics have been shown to affect the efficacy of peer assessment in individual studies, we find little evidence that they moderate the effect of peer assessment when analysed across studies. Many of the current models of peer assessment rely on qualitative evidence, theoretical arguments, and pedagogical experience to formulate theories about what determines effective peer assessment. While such evidence should not be discounted, the current findings also point to the need for better quantitative and experimental studies to test some of the assumptions embedded in these models. We suggest that the null findings observed in this meta-analysis regarding the proposed moderators of peer assessment efficacy should be interpreted cautiously, as more studies that experimentally manipulate these variables are needed to provide more definitive insight into how to design better peer assessment procedures.

While the current findings are ambiguous regarding the mechanisms of peer assessment, it is worth noting that without a solid understanding of the mechanisms underlying peer assessment effects, it is difficult to identify important moderators or optimally use peer assessment in the classroom. Often the research literature makes somewhat broad claims about the possible benefits of peer assessment. For example, Topping ( 1998 , p.256) suggested that peer assessment may, ‘promote a sense of ownership, personal responsibility, and motivation… [and] might also increase variety and interest, activity and interactivity, identification and bonding, self-confidence, and empathy for others’. Others have argued that peer assessment is beneficial because it is less personally evaluative—with evidence suggesting that teacher assessment is often personally evaluative (e.g., ‘good boy, that is correct’) which may have little or even negative effects on performance particularly if the assessee has low self-efficacy (Birney, Beckmann, Beckmann & Double 2017 ; Double and Birney 2017 , 2018 ; Hattie and Timperley 2007 ). However, more research is needed to distinguish between the many proposed mechanisms for peer assessment’s formative effects made within the extant literature, particularly as claims about the mechanisms of the effectiveness of peer assessment are often evidenced by student self-reports about the aspects of peer assessment they rate as useful. While such self-reports may be informative, more experimental research that systematically manipulates aspects of the design of peer assessment is likely to provide greater clarity about what aspects of peer assessment drive the observed benefits.

Our findings did indicate an important role for grading in determining the effectiveness of peer feedback. We found that peer grading was beneficial for tertiary students but not beneficial for primary or secondary school students. This finding suggests that grading appears to add little to the peer feedback process in non-tertiary students. In contrast, a recent meta-analysis by Sanchez et al. ( 2017 ) on peer grading found a benefit for non-tertiary students, albeit based on a relatively small number of studies compared with the current meta-analysis. In contrast, the present findings suggest that there may be significant qualitative differences in the performance of peer grading as students develop. For example, the criteria students use to assesses ability may change as they age (Stipek and Iver 1989 ). It is difficult to ascertain precisely why grading has positive additive effects in only tertiary students, but there are substantial differences in pedagogy, curriculum, motivation of learning, and grading systems that may account for these differences. One possibility is that tertiary students are more ‘grade orientated’ and therefore put more weight on peer assessment which includes a specific grade. Further research is needed to explore the effects of grading at different educational levels.

One of the more unexpected findings of this meta-analysis was the positive effect of peer assessment compared to teacher assessment. This finding is somewhat counterintuitive given the greater qualifications and pedagogical experience of the teacher. In addition, in many of the studies, the teacher had privileged knowledge about, and often graded the outcome assessment. Thus, it seems reasonable to expect that teacher feedback would better align with assessment objectives and therefore produce better outcomes. Despite all these advantages, teacher assessment appeared to be less efficacious than peer assessment for academic performance. It is possible that the pedagogical disadvantages of peer assessment are compensated for by affective or motivational aspects of peer assessment, or by the substantial benefits of acting as an assessor. However, more experimental research is needed to rule out the effects of potential methodological issues discussed in detail below.

Limitations

A major limitation of the current results is that they cannot adequately distinguish between the effect of assessing versus being an assessee. Most of the current studies confound giving and receiving peer assessment in their designs (i.e., the students in the peer assessment group both provide assessment and receive it), and therefore, no substantive conclusions can be drawn about whether the benefits of peer assessment extend from giving feedback, receiving feedback, or both. This raises the possibility that the benefit of peer assessment comes more from assessing, rather than being assessed (Usher 2018 ). Consistent with this, Lundstrom and Baker ( 2009 ) directly compared the effects of giving and receiving assessment on students’ writing performance and found that assessing was more beneficial than being assessed. Similarly, Graner ( 1987 ) found that assessing papers without being assessed was as effective for improving writing performance as assessing papers and receiving feedback.

Furthermore, more true experiments are needed, as there is evidence from these results that they produce more conservative estimates of the effect of peer assessment. The studies included in this meta-analysis were not only predominantly randomly allocated at the classroom level (i.e., quasi-experiments), but in all but one case, were not analysed using appropriate techniques for analysing clustered data (e.g., multi-level modelling). This is problematic because it makes disentangling classroom-level effects (e.g., teacher quality) from the intervention effect difficult, which may lead to biased statistical inferences (Hox 1998 ). While experimental designs with individual allocation are often not pragmatic for classroom interventions, online peer assessment interventions appear to be obvious candidates for increased true experiments. In particular, carefully controlled experimental designs that examine the effect of specific assessment characteristics, rather than ‘black-box’ studies of the effectiveness of peer assessment, are crucial for understanding when and how peer assessment is most likely to be effective. For example, peer assessment may be counterproductive when learning novel tasks due to students’ inadequate domain knowledge (Könings et al. 2019 ).

While the current results provide an overall estimate of the efficacy of peer assessment in improving academic performance when compared to teacher and no assessment, it should be noted that these effects are averaged across a wide range of outcome measures, including science project grades, essay writing ratings, and end-of-semester exam scores. Aggregating across such disparate outcomes is always problematic in meta-analysis and is a particular concern for meta-analyses in educational research, as some outcome measures are likely to be more sensitive to interventions than others (William, 2010 ). A further issue is that the effect of moderators may differ between academic domains. For example, some assessment characteristics may be important when teaching writing but not mathematics. Because there were too few studies in the individual academic domains (with the exception of writing), we are unable to account for these differential effects. The effects of the moderators reported here therefore need to be considered as overall averages that provide information about the extent to which the effect of a moderator generalises across domains.

Finally, the findings of the current meta-analysis are also somewhat limited by the fact that few studies gave a complete profile of the participants and measures used. For example, few studies indicated that ability of peer reviewer relative to the reviewee and age difference between the peers was not necessarily clear. Furthermore, it was not possible to classify the academic performance measures in the current study further, such as based on novelty, or to code for the quality of the measures, including their reliability and validity, because very few studies provide comprehensive details about the outcome measure(s) they utilised. Moreover, other important variables such as fidelity of treatment were almost never reported in the included manuscripts. Indeed, many of the included variables needed to be coded based on inferences from the included studies’ text and were not explicitly stated, even when one would reasonably expect that information to be made clear in a peer-reviewed manuscript. The observed effect sizes reported here should therefore be taken as an indicator of average efficacy based on the extant literature and not an indication of expected effects for specific implementations of peer assessment.

Overall, our findings provide support for the use of peer assessment as a formative practice for improving academic performance. The results indicate that peer assessment is more effective than no assessment and teacher assessment and not significantly different in its effect from self-assessment. These findings are consistent with current theories of formative assessment and instructional best practice and provide strong empirical support for the continued use of peer assessment in the classroom and other educational contexts. Further experimental work is needed to clarify the contextual and educational factors that moderate the effectiveness of peer assessment, but the present findings are encouraging for those looking to utilise peer assessment to enhance learning.

References marked with an * were included in the meta-analysis

* AbuSeileek, A. F., & Abualsha'r, A. (2014). Using peer computer-mediated corrective feedback to support EFL learners'. Language Learning & Technology, 18 (1), 76-95.

Alqassab, M., Strijbos, J. W., & Ufer, S. (2018). Training peer-feedback skills on geometric construction tasks: Role of domain knowledge and peer-feedback levels. European Journal of Psychology of Education, 33 (1), 11–30.

Article   Google Scholar  

* Anderson, N. O., & Flash, P. (2014). The power of peer reviewing to enhance writing in horticulture: Greenhouse management. International Journal of Teaching and Learning in Higher Education, 26 (3), 310–334.

* Bangert, A. W. (1995). Peer assessment: an instructional strategy for effectively implementing performance-based assessments. (Unpublished doctoral dissertation). University of South Dakota.

* Benson, N. L. (1979). The effects of peer feedback during the writing process on writing performance, revision behavior, and attitude toward writing. (Unpublished doctoral dissertation). University of Colorado, Boulder.

* Bhullar, N., Rose, K. C., Utell, J. M., & Healey, K. N. (2014). The impact of peer review on writing in apsychology course: Lessons learned. Journal on Excellence in College Teaching, 25(2), 91-106.

* Birjandi, P., & Hadidi Tamjid, N. (2012). The role of self-, peer and teacher assessment in promoting Iranian EFL learners’ writing performance. Assessment & Evaluation in Higher Education, 37 (5), 513–533.

Birney, D. P., Beckmann, J. F., Beckmann, N., & Double, K. S. (2017). Beyond the intellect: Complexity and learning trajectories in Raven’s Progressive Matrices depend on self-regulatory processes and conative dispositions. Intelligence, 61 , 63–77.

Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5 (1), 7–74.

Google Scholar  

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability (formerly: Journal of Personnel Evaluation in Education), 21 (1), 5.

Bloom, A. J., & Hautaluoma, J. E. (1987). Effects of message valence, communicator credibility, and source anonymity on reactions to peer feedback. The Journal of Social Psychology, 127 (4), 329–338.

Brown, G. T., Irving, S. E., Peterson, E. R., & Hirschfeld, G. H. (2009). Use of interactive–informal assessment practices: New Zealand secondary students' conceptions of assessment. Learning and Instruction, 19 (2), 97–111.

* Califano, L. Z. (1987). Teacher and peer editing: Their effects on students' writing as measured by t-unit length, holistic scoring, and the attitudes of fifth and sixth grade students (Unpublished doctoral dissertation), Northern Arizona University.

* Chaney, B. A., & Ingraham, L. R. (2009). Using peer grading and proofreading to ratchet student expectations in preparing accounting cases. American Journal of Business Education, 2 (3), 39-48.

* Chang, S. H., Wu, T. C., Kuo, Y. K., & You, L. C. (2012). Project-based learning with an online peer assessment system in a photonics instruction for enhancing led design skills. Turkish Online Journal of Educational Technology-TOJET, 11(4), 236–246.

* Cho, K., & MacArthur, C. (2011). Learning by reviewing. Journal of Educational Psychology, 103 (1), 73.

Cho, K., Schunn, C. D., & Charney, D. (2006). Commenting on writing: Typology and perceived helpfulness of comments from novice peer reviewers and subject matter experts. Written Communication, 23 (3), 260–294.

Cook, D. J., Guyatt, G. H., Ryan, G., Clifton, J., Buckingham, L., Willan, A., et al. (1993). Should unpublished data be included in meta-analyses?: Current convictions and controversies. JAMA, 269 (21), 2749–2753.

*Crowe, J. A., Silva, T., & Ceresola, R. (2015). The effect of peer review on student learning outcomes in a research methods course.  Teaching Sociology, 43 (3), 201–213.

* Diab, N. M. (2011). Assessing the relationship between different types of student feedback and the quality of revised writing . Assessing Writing, 16(4), 274-292.

Demetriadis, S., Egerter, T., Hanisch, F., & Fischer, F. (2011). Peer review-based scripted collaboration to support domain-specific and domain-general knowledge acquisition in computer science. Computer Science Education, 21 (1), 29–56.

Dochy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer and co-assessment in higher education: A review. Studies in Higher Education, 24 (3), 331–350.

Double, K. S., & Birney, D. (2017). Are you sure about that? Eliciting confidence ratings may influence performance on Raven’s progressive matrices. Thinking & Reasoning, 23 (2), 190–206.

Double, K. S., & Birney, D. P. (2018). Reactivity to confidence ratings in older individuals performing the latin square task. Metacognition and Learning, 13(3), 309–326.

* Enders, F. B., Jenkins, S., & Hoverman, V. (2010). Calibrated peer review for interpreting linear regression parameters: Results from a graduate course. Journal of Statistics Education , 18 (2).

* English, R., Brookes, S. T., Avery, K., Blazeby, J. M., & Ben-Shlomo, Y. (2006). The effectiveness and reliability of peer-marking in first-year medical students. Medical Education, 40 (10), 965-972.

* Erfani, S. S., & Nikbin, S. (2015). The effect of peer-assisted mediation vs. tutor-intervention within dynamic assessment framework on writing development of Iranian Intermediate EFL Learners. English Language Teaching, 8 (4), 128–141.

Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70 (3), 287–322.

* Farrell, K. J. (1977). A comparison of three instructional approaches for teaching written composition to high school juniors: teacher lecture, peer evaluation, and group tutoring (Unpublished doctoral dissertation), Boston University, Boston.

Fisher, Z., Tipton, E., & Zhipeng, Z. (2017). robumeta: Robust variance meta-regression (Version 2). Retrieved from https://CRAN.R-project.org/package = robumeta

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76 (5), 378.

Flórez, M. T., & Sammons, P. (2013). Assessment for learning: Effects and impact: CfBT Education Trust . England: Reading.

Fyfe, E. R., & Rittle-Johnson, B. (2016). Feedback both helps and hinders learning: The causal role of prior knowledge. Journal of Educational Psychology, 108 (1), 82.

Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010a). Improving the effectiveness of peer feedback for learning. Learning and Instruction, 20 (4), 304–315.

* Gielen, S., Tops, L., Dochy, F., Onghena, P., & Smeets, S. (2010b). A comparative study of peer and teacher feedback and of various peer feedback forms in a secondary school writing curriculum. British Educational Research Journal , 36 (1), 143-162.

Gorard, S. (2002). Can we overcome the methodological schism? Four models for combining qualitative and quantitative evidence. Research Papers in Education Policy and Practice, 17 (4), 345–361.

Graner, M. H. (1987). Revision workshops: An alternative to peer editing groups. The English Journal, 76 (3), 40–45.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77 (1), 81–112.

Hays, M. J., Kornell, N., & Bjork, R. A. (2010). The costs and benefits of providing feedback during learning. Psychonomic bulletin & review, 17 (6), 797–801.

Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. journal of . Educational Statistics, 6 (2), 107–128.

Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods, 1 (1), 39–65.

Higgins, J. P., & Green, S. (2011). Cochrane handbook for systematic reviews of interventions. The Cochrane Collaboration. Version 5.1.0, www.handbook.cochrane.org

Hoaglin, D. C., & Iglewicz, B. (1987). Fine-tuning some resistant rules for outlier labeling. Journal of the American Statistical Association, 82 (400), 1147–1149.

Hopewell, S., McDonald, S., Clarke, M. J., & Egger, M. (2007). Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database of Systematic Reviews .

* Horn, G. C. (2009). Rubrics and revision: What are the effects of 3 RD graders using rubrics to self-assess or peer-assess drafts of writing? (Unpublished doctoral thesis), Boise State University

Hox, J. J. (1998). Multilevel modeling: When and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.), Classification, data analysis, and data highways (pp. 147–154). New Yor: Springer Verlag.

Chapter   Google Scholar  

* Hsia, L. H., Huang, I., & Hwang, G. J. (2016). A web-based peer-assessment approach to improving junior high school students’ performance, self-efficacy and motivation in performing arts courses. British Journal of Educational Technology, 47 (4), 618–632.

* Hsu, T. C. (2016). Effects of a peer assessment system based on a grid-based knowledge classification approach on computer skills training. Journal of Educational Technology & Society , 19 (4), 100-111.

* Hussein, M. A. H., & Al Ashri, El Shirbini A. F. (2013). The effectiveness of writing conferences and peer response groups strategies on the EFL secondary students' writing performance and their self efficacy (A Comparative Study). Egypt: National Program Zero.

* Hwang, G. J., Hung, C. M., & Chen, N. S. (2014). Improving learning achievements, motivations and problem-solving skills through a peer assessment-based game development approach. Educational Technology Research and Development, 62 (2), 129–145.

* Hwang, G. J., Tu, N. T., & Wang, X. M. (2018). Creating interactive E-books through learning by design: The impacts of guided peer-feedback on students’ learning achievements and project outcomes in science courses. Journal of Educational Technology & Society, 21 (1), 25–36.

* Kamp, R. J., van Berkel, H. J., Popeijus, H. E., Leppink, J., Schmidt, H. G., & Dolmans, D. H. (2014). Midterm peer feedback in problem-based learning groups: The effect on individual contributions and achievement. Advances in Health Sciences Education, 19 (1), 53–69.

* Karegianes, M. J., Pascarella, E. T., & Pflaum, S. W. (1980). The effects of peer editing on the writing proficiency of low-achieving tenth grade students. The Journal of Educational Research , 73 (4), 203-207.

* Khonbi, Z. A., & Sadeghi, K. (2013). The effect of assessment type (self vs. peer) on Iranian university EFL students’ course achievement. Procedia-Social and Behavioral Sciences , 70 , 1552-1564.

Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119 (2), 254.

Könings, K. D., van Zundert, M., & van Merriënboer, J. J. G. (2019). Scaffolding peer-assessment skills: Risk of interference with learning domain-specific skills? Learning and Instruction, 60 , 85–94.

* Kurihara, N. (2017). Do peer reviews help improve student writing abilities in an EFL high school classroom? TESOL Journal, 8 (2), 450–470.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33 (1), 159–174.

* Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning levels. Assessment & Evaluation in Higher Education, 41 (6), 885–900.

* Li, L., & Steckelberg, A. (2004). Using peer feedback to enhance student meaningful learning . Chicago: Association for Educational Communications and Technology.

Li, H., Xiong, Y., Zang, X., Kornhaber, M. L., Lyu, Y., Chung, K. S., & Suen, K. H. (2016). Peer assessment in the digital age: a meta-analysis comparing peer and teacher ratings. Assessment & Evaluation in Higher Education, 41 (2), 245–264.

* Lin, Y.-C. A. (2009). An examination of teacher feedback, face-to-face peer feedback, and google documents peer feedback in Taiwanese EFL college students’ writing. (Unpublished doctoral dissertation), Alliant International University, San Diego, United States

Lipsey, M. W., & Wilson, D. B. (2001). Practical Meta-analysis . Thousand Oaks: SAGE publications.

* Liu, C.-C., Lu, K.-H., Wu, L. Y., & Tsai, C.-C. (2016). The impact of peer review on creative self-efficacy and learning performance in Web 2.0 learning activities. Journal of Educational Technology & Society, 19 (2):286-297

Lundstrom, K., & Baker, W. (2009). To give is better than to receive: The benefits of peer review to the reviewer's own writing. Journal of Second Language Writing, 18 (1), 30–43.

* McCurdy, B. L., & Shapiro, E. S. (1992). A comparison of teacher-, peer-, and self-monitoring with curriculum-based measurement in reading among students with learning disabilities. The Journal of Special Education , 26 (2), 162-180.

Moeyaert, M., Ugille, M., Natasha Beretvas, S., Ferron, J., Bunuan, R., & Van den Noortgate, W. (2017). Methods for dealing with multiple outcomes in meta-analysis: a comparison between averaging effect sizes, robust variance estimation and multilevel meta-analysis. International Journal of Social Research Methodology, 20 (6), 559–572.

* Montanero, M., Lucero, M., & Fernandez, M.-J. (2014). Iterative co-evaluation with a rubric of narrative texts in primary education. Journal for the Study of Education and Development, 37 (1), 184-198.

Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organizational Research Methods, 11 (2), 364–386.

* Olson, V. L. B. (1990). The revising processes of sixth-grade writers with and without peer feedback. The Journal of Educational Research, 84(1), 22–29.

Ossenberg, C., Henderson, A., & Mitchell, M. (2018). What attributes guide best practice for effective feedback? A scoping review. Advances in Health Sciences Education , 1–19.

* Ozogul, G., Olina, Z., & Sullivan, H. (2008). Teacher, self and peer evaluation of lesson plans written by preservice teachers. Educational Technology Research and Development, 56 (2), 181.

Panadero, E., & Alqassab, M. (2019). An empirical review of anonymity effects in peer assessment, peer feedback, peer review, peer evaluation and peer grading. Assessment & Evaluation in Higher Education , 1–26.

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review, 9 , 129–144.

Panadero, E., Romero, M., & Strijbos, J. W. (2013). The impact of a rubric and friendship on peer assessment: Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational Evaluation, 39 (4), 195–203.

* Papadopoulos, P. M., Lagkas, T. D., & Demetriadis, S. N. (2012). How to improve the peer review method: Free-selection vs assigned-pair protocol evaluated in a computer networking course. Computers & Education, 59 (2), 182–195.

Paulus, T. M. (1999). The effect of peer and teacher feedback on student writing. Journal of second language writing, 8 (3), 265–289.

Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: the science and design of educational assessment . Washington: National Academy Press.

Peters, O., Körndle, H., & Narciss, S. (2018). Effects of a formative assessment script on how vocational students generate formative feedback to a peer’s or their own performance. European Journal of Psychology of Education, 33 (1), 117–143.

* Philippakos, Z. A., & MacArthur, C. A. (2016). The effects of giving feedback on the persuasive writing of fourth-and fifth-grade students. Reading Research Quarterly, 51 (4), 419-433.

* Pierson, H. (1967). Peer and teacher correction: A comparison of the effects of two methods of teaching composition in grade nine English classes. (Unpublished doctoral dissertation), New York University.

* Prater, D., & Bermudez, A. (1993). Using peer response groups with limited English proficient writers. Bilingual Research Journal , 17 (1-2), 99-116.

Reinholz, D. (2016). The assessment cycle: A model for learning through peer assessment. Assessment & Evaluation in Higher Education, 41 (2), 301–315.

* Rijlaarsdam, G., & Schoonen, R. (1988). Effects of a teaching program based on peer evaluation on written composition and some variables related to writing apprehension. (Unpublished doctoral dissertation), Amsterdam University, Amsterdam

Rollinson, P. (2005). Using peer feedback in the ESL writing class. ELT Journal, 59 (1), 23–30.

Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its effects on peer feedback quality and evolution in students’ perceptions about peer assessment skills. European Journal of Psychology of Education, 33 (1), 75–99.

* Rudd II, J. A., Wang, V. Z., Cervato, C., & Ridky, R. W. (2009). Calibrated peer review assignments for the Earth Sciences. Journal of Geoscience Education , 57 (5), 328-334.

* Ruegg, R. (2015). The relative effects of peer and teacher feedback on improvement in EFL students' writing ability. Linguistics and Education, 29 , 73-82.

* Sadeghi, K., & Abolfazli Khonbi, Z. (2015). Iranian university students’ experiences of and attitudes towards alternatives in assessment. Assessment & Evaluation in Higher Education, 40 (5), 641–665.

* Sadler, P. M., & Good, E. (2006). The impact of self- and peer-grading on student learning. Educational Assessment , 11 (1), 1-31.

Sanchez, C. E., Atkinson, K. M., Koenka, A. C., Moshontz, H., & Cooper, H. (2017). Self-grading and peer-grading for formative and summative assessments in 3rd through 12th grade classrooms: A meta-analysis. Journal of Educational Psychology, 109 (8), 1049.

Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 8 (2), 26.

* Schonrock-Adema, J., Heijne-Penninga, M., van Duijn, M. A., Geertsma, J., & Cohen-Schotanus, J. (2007). Assessment of professional behaviour in undergraduate medical education: Peer assessment enhances performance. Medical Education, 41 (9), 836-842.

Schwarzer, G., Carpenter, J. R., & Rücker, G. (2015). Meta-analysis with R . Cham: Springer.

Book   Google Scholar  

* Sippel, L., & Jackson, C. N. (2015). Teacher vs. peer oral corrective feedback in the German language classroom. Foreign Language Annals , 48 (4), 688-705.

Sluijsmans, D. M., Brand-Gruwel, S., van Merriënboer, J. J., & Martens, R. L. (2004). Training teachers in peer-assessment skills: Effects on performance and perceptions. Innovations in Education and Teaching International, 41 (1), 59–78.

Smith, H., Cooper, A., & Lancaster, L. (2002). Improving the quality of undergraduate peer assessment: A case for student and staff development. Innovations in education and teaching international, 39 (1), 71–81.

Smith, M. K., Wood, W. B., Adams, W. K., Wieman, C., Knight, J. K., Guild, N., & Su, T. T. (2009). Why peer discussion improves student performance on in-class concept questions. Science, 323 (5910), 122–124.

Steel, P. D., & Kammeyer-Mueller, J. D. (2002). Comparing meta-analytic moderator estimation techniques under realistic conditions. Journal of Applied Psychology, 87 (1), 96.

Stipek, D., & Iver, D. M. (1989). Developmental change in children's assessment of intellectual competence. Child Development , 521–538.

Strijbos, J. W., & Wichmann, A. (2018). Promoting learning by leveraging the collaborative nature of formative peer assessment with instructional scaffolds. European Journal of Psychology of Education, 33 (1), 1–9.

Strijbos, J.-W., Narciss, S., & Dünnebier, K. (2010). Peer feedback content and sender's competence level in academic writing revision tasks: Are they critical for feedback perceptions and efficiency? Learning and Instruction, 20 (4), 291–303.

* Sun, D. L., Harris, N., Walther, G., & Baiocchi, M. (2015). Peer assessment enhances student learning: The results of a matched randomized crossover experiment in a college statistics class. PLoS One 10(12),

Tannacito, T., & Tuzi, F. (2002). A comparison of e-response: Two experiences, one conclusion. Kairos, 7 (3), 1–14.

Team, R. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2017: R Core Team.

Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68 (3), 249-276.

Topping, K. (2009). Peer assessment. Theory Into Practice, 48 (1), 20–27.

Usher, N. (2018). Learning about academic writing through holistic peer assessment. (Unpiblished doctoral thesis), University of Oxford, Oxford, UK.

* van den Boom, G., Paas, F., & van Merriënboer, J. J. (2007). Effects of elicited reflections combined with tutor or peer feedback on self-regulated learning and learning outcomes. Learning and Instruction , 17 (5), 532-548.

* van Ginkel, S., Gulikers, J., Biemans, H., & Mulder, M. (2017). The impact of the feedback source on developing oral presentation competence. Studies in Higher Education, 42 (9), 1671-1685.

van Popta, E., Kral, M., Camp, G., Martens, R. L., & Simons, P. R. J. (2017). Exploring the value of peer feedback in online learning for the provider. Educational Research Review, 20 , 24–34.

van Zundert, M., Sluijsmans, D., & van Merriënboer, J. (2010). Effective peer assessment processes: Research findings and future directions. Learning and Instruction, 20 (4), 270–279.

Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their peers anonymously? A quasi-experimental study. Computers & Education, 81 , 123–132.

Wang, J.-H., Hsu, S.-H., Chen, S. Y., Ko, H.-W., Ku, Y.-M., & Chan, T.-W. (2014a). Effects of a mixed-mode peer response on student response behavior and writing performance. Journal of Educational Computing Research, 51 (2), 233–256.

* Wang, J. H., Hsu, S. H., Chen, S. Y., Ko, H. W., Ku, Y. M., & Chan, T. W. (2014b). Effects of a mixed-mode peer response on student response behavior and writing performance. Journal of Educational Computing Research , 51 (2), 233-256.

* Wang, X.-M., Hwang, G.-J., Liang, Z.-Y., & Wang, H.-Y. (2017). Enhancing students’ computer programming performances, critical thinking awareness and attitudes towards programming: An online peer-assessment attempt. Journal of Educational Technology & Society, 20 (4), 58-68.

Wiliam, D. (2010). What counts as evidence of educational achievement? The role of constructs in the pursuit of equity in assessment. Review of Research in Education, 34 (1), 254–284.

Wiliam, D. (2018). How can assessment support learning? A response to Wilson and Shepard, Penuel, and Pellegrino. Educational Measurement: Issues and Practice, 37 (1), 42–44.

Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles, Policy & Practice, 11 (1), 49–65.

* Wise, W. G. (1992). The effects of revision instruction on eighth graders' persuasive writing (Unpublished doctoral dissertation), University of Maryland, Maryland

* Wong, H. M. H., & Storey, P. (2006). Knowing and doing in the ESL writing class. Language Awareness , 15 (4), 283.

* Xie, Y., Ke, F., & Sharma, P. (2008). The effect of peer feedback for blogging on college students' reflective learning processes. The Internet and Higher Education , 11 (1), 18-25.

Young, J. E., & Jackman, M. G.-A. (2014). Formative assessment in the Grenadian lower secondary school: Teachers’ perceptions, attitudes and practices. Assessment in Education: Principles, Policy & Practice, 21 (4), 398–411.

Yu, F.-Y., & Liu, Y.-H. (2009). Creating a psychologically safe online space for a student-generated questions learning activity via different identity revelation modes. British Journal of Educational Technology, 40 (6), 1109–1123.

Download references

Acknowledgements

The authors would like to thank Kristine Gorgen and Jessica Chan for their help coding the studies included in the meta-analysis.

Author information

Authors and affiliations.

Department of Education, University of Oxford, Oxford, England

Kit S. Double, Joshua A. McGrane & Therese N. Hopfenbeck

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kit S. Double .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(XLSX 40 kb)

Effect Size Calculation

Standardised mean differences were calculated as a measure of effect size. Standardised mean difference ( d ) was calculated using the following formula, which is typically used in meta-analyses (e.g., Lipsey and Wilson 2001 ).

As standardized mean difference ( d ) is known to have a slight positive bias (Hedges 1981 ), we applied a correction to bias-correct estimates (resulting in what is often referred to as Hedge’s g ).

For studies where there was insufficient information to calculate Hedge’s g using the above method, we used the online effect size calculator developed by Lipsey and Wilson ( 2001 ) available http://www.campbellcollaboration.org/escalc . For pre-post design studies where adjusted means were not provided, we used the critical value relevant to the difference between peer feedback and control groups from the reported pre-intervention adjusted analysis (e.g., Analysis of Covariances) as suggested by Higgins and Green ( 2011 ). For pre-post designs studies where both pre and post intervention means and standard deviations were provided, we used an effect size estimate based on the mean pre-post change in the peer feedback group minus the mean pre-post change in the control group, divided by the pooled pre-intervention standard deviation as such an approach minimised bias and improves estimate precision (Morris 2008 ).

Variance estimates for each effect size were calculated using the following formula:

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Double, K.S., McGrane, J.A. & Hopfenbeck, T.N. The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies. Educ Psychol Rev 32 , 481–509 (2020). https://doi.org/10.1007/s10648-019-09510-3

Download citation

Published : 10 December 2019

Issue Date : June 2020

DOI : https://doi.org/10.1007/s10648-019-09510-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Peer assessment
  • Meta-analysis
  • Experimental design
  • Effect size
  • Formative assessment
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • CAREER COLUMN
  • 08 October 2018

How to write a thorough peer review

  • Mathew Stiller-Reeve 0

Mathew Stiller-Reeve is a climate researcher at NORCE/Bjerknes Centre for Climate Research in Bergen, Norway, the leader of SciSnack.com, and a thematic editor at Geoscience Communication .

You can also search for this author in PubMed   Google Scholar

Scientists do not receive enough peer-review training. To improve this situation, a small group of editors and I developed a peer-review workflow to guide reviewers in delivering useful and thorough analyses that can really help authors to improve their papers.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-018-06991-0

This is an article from the Nature Careers Community, a place for Nature readers to share their professional experiences and advice. Guest posts are encouraged. You can get in touch with the editor at [email protected].

Related Articles

peer evaluation for research paper

Engage more early-career scientists as peer reviewers

Help graduate students to become good peer reviewers

  • Peer review

Who will make AlphaFold3 open source? Scientists race to crack AI model

Who will make AlphaFold3 open source? Scientists race to crack AI model

News 23 MAY 24

Pay researchers to spot errors in published papers

Pay researchers to spot errors in published papers

World View 21 MAY 24

Plagiarism in peer-review reports could be the ‘tip of the iceberg’

Plagiarism in peer-review reports could be the ‘tip of the iceberg’

Nature Index 01 MAY 24

Egypt is building a $1-billion mega-museum. Will it bring Egyptology home?

Egypt is building a $1-billion mega-museum. Will it bring Egyptology home?

News Feature 22 MAY 24

How researchers in remote regions handle the isolation

How researchers in remote regions handle the isolation

Career Feature 24 MAY 24

What steps to take when funding starts to run out

What steps to take when funding starts to run out

Brazil’s plummeting graduate enrolments hint at declining interest in academic science careers

Brazil’s plummeting graduate enrolments hint at declining interest in academic science careers

Career News 21 MAY 24

Professor, Division Director, Translational and Clinical Pharmacology

Cincinnati Children’s seeks a director of the Division of Translational and Clinical Pharmacology.

Cincinnati, Ohio

Cincinnati Children's Hospital & Medical Center

peer evaluation for research paper

Data Analyst for Gene Regulation as an Academic Functional Specialist

The Rheinische Friedrich-Wilhelms-Universität Bonn is an international research university with a broad spectrum of subjects. With 200 years of his...

53113, Bonn (DE)

Rheinische Friedrich-Wilhelms-Universität

peer evaluation for research paper

Recruitment of Global Talent at the Institute of Zoology, Chinese Academy of Sciences (IOZ, CAS)

The Institute of Zoology (IOZ), Chinese Academy of Sciences (CAS), is seeking global talents around the world.

Beijing, China

Institute of Zoology, Chinese Academy of Sciences (IOZ, CAS)

peer evaluation for research paper

Full Professorship (W3) in “Organic Environmental Geochemistry (f/m/d)

The Institute of Earth Sciences within the Faculty of Chemistry and Earth Sciences at Heidelberg University invites applications for a   FULL PROFE...

Heidelberg, Brandenburg (DE)

Universität Heidelberg

peer evaluation for research paper

Postdoc: deep learning for super-resolution microscopy

The Ries lab is looking for a PostDoc with background in machine learning.

Vienna, Austria

University of Vienna

peer evaluation for research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMJ Health Care Inform
  • v.28(1); 2021

Logo of bmjhealthcare

A step-by-step guide to peer review: a template for patients and novice reviewers

1 General Medicine and Primary Care, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA

Charlotte Blease

2 Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA

3 Harvard Medical School

While relatively novel, patient peer review has the potential to change the healthcare publishing paradigm. It can do this by helping researchers enlarge the pool of people who are welcome to read, understand and participate in healthcare research. Academic journals who are early adopters of patient peer review have already committed to placing a priority on using person-centred language in publicly available abstracts and focusing on translational and practical research.

A wide body of literature has shown that including people with lived experiences in a truly meaningful way can improve the quality and efficiency of health research. Traditionally considered only as ‘subjects’ of research, over the last 10–15 years, patients and care partners have increasingly been invited to contribute to the design and conduct of studies. Established institutions are increasingly recognising the distinctive expertise patients possess—many patients have acquired deep insights about their conditions, symptoms, medical treatments and quality of healthcare delivery. Among some funders, including the views of patients is now a requirement to ensure research proposals are meaningful to persons with the lived experience of illness. Further illustrating these developments, patients are now involved in reviewing and making recommendations as part of funding institutions, setting research agendas and priorities, being funded for and leading their own research and leading or coauthoring scholarly publications, and are now participating in the peer review process for academic journals. 1–5 Patients offer an outsider’s perspective within mainstream healthcare: they have fewer institutional, professional or social allegiances and conflicts of interest—factors recognised as compromising the quality of research. Patient involvement is essential to move away from rhetorical commitments to embrace a truly patient-centred healthcare ecosystem where everyone has a place at the table.

As people with lived health experiences climb a ladder of engagement in patient–researcher partnerships, they may be asked to act as peer reviewers of academic manuscripts. However, many of these individuals do not hold professional training in medicine, healthcare or science and have never encountered the peer review process. Little guidance exists for patients and care partners tasked with reviewing and providing input on manuscripts in search of publication.

In conversation, however, even experienced researchers confess that learning how to peer review is part of a hidden curriculum in academia—a skill outlined by no formal means but rather learnt by mimicry. 6 As such, as they learn the process, novices may pick up bad habits. In the case of peer review, learning is the result of reading large numbers of academic papers, occasional conversations with mentors or commonly “trial by fire” experienced via reviewer comments to their own submissions. Patient reviewers are rarely exposed to these experiences and can be at a loss for where to begin. As a result, some may forgo opportunities to provide valuable and highly insightful feedback on research publications. Although some journals are highly specific about how reviewers should structure their feedback, many publications—including top-tier medical journals—assume that all reviewers will know how to construct responses. Only a few forward-thinking journals actively seeking peer review from people with lived health experiences currently point to review tips designed for experienced professionals. 7

As people with lived health experiences are increasingly invited to participate in peer review, it is essential that they be supported in this process. The peer review template for patients and novice reviewers ( table 1 ) is a series of steps designed to create a workflow for the main components of peer review. A structured workflow can help a reviewer organise their thoughts and create space to engage in critical thinking. The template is a starting point for anyone new to peer review, and it should be modified, adapted and built on for individual preferences and unique journal requirements. Peer reviews are commonly submitted via website portals, which vary widely in design and functionality; as such, reviewers are encouraged to decide how to best use the template on a case-by-case basis. Journals may require reviewers to copy and paste responses from the template into a journal website or upload a clean copy of the template as an attachment. Note: If uploading the review as an attachment, remember to remove the template examples and writing prompts .

Peer review template for patients and other novice reviewers

It is important to point out that patient reviewers are not alone in facing challenges and a steep learning curve in performing peer review. Many health research agendas and, as a result, publications straddle disciplines, requiring peer reviewers with complementary expertise and training. Some experts may be highly equipped to critique particular aspects of research papers while unsuited to comment on other parts. Curiously, however, it is seldom a requirement that invited peer reviewers admit their own limitations to comment on different dimensions of papers. Relatedly, while we do not suggest that all patient peer reviewers will be equipped to critique every aspect of submitted manuscripts—though some may be fully competent to do so—we suggest that candour about limitations of expertise would also benefit the broader research community.

As novice reviewers gain experience, they may find themselves solicited for a growing number of reviews, much like their more experienced counterparts or mentors. 8 Serving as a patient or care partner reviewer can be a rewarding form of advocacy and will be crucial to harnessing the feedback and expertise of persons with lived health experiences. As we move into a future where online searches for information are a ubiquitous first step in searching for answers to health-related questions, patient and novice reviewers may become the much-needed link between academia and the lay public.

Acknowledgments

LS thanks the experienced and novice reviewers who encouraged her to publish this template.

Twitter: @TheLizArmy, @@crblease

Contributors: Both authors contributed substantially to the manuscript. LS conceived the idea and design and drafted the text. CB refined the idea and critically revised the text.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests: The authors have read and understood the BMJ policy on declaration of interests and declare the following interests: LS is a member of the BMJ Patient Advisory Panel, serves as a BMJ patient reviewer and is an ad hoc patient reviewer for the Patient-Centered Outcomes Research Institute; CB is a Keane OpenNotes scholar; both LS and CB work on OpenNotes, a philanthropically funded research initiative focused on improving transparency in healthcare.

Provenance and peer review: Commissioned; externally peer reviewed.

Ethics statements

Patient consent for publication.

Not required.

Accessibility menu

  • Future students
  • Current students
  • Alumni & friends
  • Faculty & staff
  • Email & apps

Not sure what you're looking for? Browse the A-Z index

Quicklinks at your fingertips!

Select which audience you belong to and we'll display quicklinks and announcements tailored to you.

Spread your wings as a UW-La Crosse Eagle!

Experience the power of a UWL education through high-impact learning and life-long friendships, all while surrounded by the epic beauty of La Crosse. Follow your path. We’ll show you the way.

  • Campus Life
  • Explore our academic programs
  • Fast facts about UWL
  • Campus Safety
  • Cost and Aid
  • Map and Directions
  • Scholarships
  • UWL Bookstore
  • Housing Information
  • Textbook Rental

Choose another audience

You’re kind of a big deal!

You’re part of a group of truly amazing people. At UWL, we are inspired every day by the driven, active and engaged students who make us so proud. That’s right, you’re amazing!

  • Course catalog
  • Canvas support
  • Class timetable
  • Community Engagement for students
  • Office 365 (email, calendar, collaboration)
  • Residence Life
  • My UW System (HR)
  • Student Organizations
  • Dining, meal plans
  • Financial information
  • Pay for print
  • Cashier's Office
  • Eagle Help Desk
  • Password reset
  • Academic advising

Here in La Crosse, people come together to work for the common good.

At UWL, we live out the Wisconsin idea of public service and community engagement. We are proud to work with our many partners in La Crosse, giving back every day to a community that generously supports our teaching, learning and service mission.

Work with the Community Engagement Office

  • Advance your adult degree
  • Attend an event
  • Develop your organization
  • Discover small business resources
  • Engage with students
  • Explore UWL-community partnerships
  • Hire an Eagle
  • Register for youth programs
  • Visit campus
  • Work at UWL

The "La Crosse Experience" stays with you for a lifetime.

UWL pride stays strong long after graduation! Stay connected with our beautiful campus and the faculty and friends who made your "La Crosse Experience" so special.

  • Lantern Magazine
  • Alumni Calendar
  • Class Notes
  • Campus Events
  • Athletic Schedules
  • UWL Alumni & Friends Foundation
  • Volunteer at UWL
  • A-Z Directory

Experts. Scholars. Public servants. Community members.

UWL consistently delivers a high-quality and life-changing experience. We’re able to do it because of you, our talented and dedicated faculty and staff. You are the reason for our excellence!

  • Campus Connection
  • Campus calendar
  • HR homepage
  • Course Catalog
  • Community Engagement for Instructors
  • Community Engagement for Staff
  • Digital Measures (Faculty Success)
  • My Mediasite
  • Google Drive
  • Transferology Lab

University of Wisconsin-La Crosse | uwlax.edu

  •   Home
  • CATL Teaching Improvement Guide

Peer Evaluation and Peer Review

A page within catl teaching improvement guide, peer evaluation .

Definition : Peer evaluation is  an effective collaborative learning strategy  that asks students to  reflect on contributions made by colleagues in group work . Related to self-assessment, peer evaluation encourages students to critically  examine  the work of peers,  and  reflect on the meaning of quality work in general, especially when consulting a detailed rubric or checklist as a guide.  

Purpose : S tudents themselves provide feedback to one another, while the instructor focuses on more targeted guidance  toward a learning outcome . The key for successful peer feedback is a constructive, honest environment in which students feel safe to share honest, yet helpful criticism.  Through peer evaluation, s tudents ultimately learn to better self-assess themselves, a skill which pay dividends throughout their academic and professional career. As additional benefits of peer evaluation, students learn to:  

  • apply course concepts and skills to solving problems
  • collaborate with others towards a common goal
  • examine diverse perspectives and ideas
  • assume greater responsibility in the learning process 
  • apply (and possibly create) objective criteria to judge the quality of a task or performance 

Peer evaluations also resolve the "free rider" problem with group work, that is, the tendency of students to rely on team members to take the initiative in completing group assignments or tasks. By adding an element of accountability and critical review, students will more likely exert effort to ensure a positive review from their peers (and create a good impression).  

Resources for Peer Evaluation  

There is no one way to implement peer evaluations.  Though students can utilize many software and survey tools, the most common at UWL include the following:  

  • Qualtrics  (very powerful survey tool)  
  • Google Forms  (capable, more novice-friendly survey tool)  
  • Microsoft Office Forms (quick survey tool)
  • Canvas Survey tool (use for online or blended courses that use our learning management system)  
  • other survey tools, such as  SurveyMonkey  
  • CATME  is a specialized tool designed specifically for peer evaluation. This free web-based application (create account with UWL email) accomplishes both creating and evaluating groups. The software can randomly assign students to teams, based on criteria identified by an initial survey. Criteria include such items as availability on weekends, leadership style, writing skill, familiarity with PowerPoint, and major. After creating the groups, CATME then pairs students to anonymously rate one another, based on set criteria. Instructors cannot edit or add criteria, though they can select categories to include on the team-maker and evaluation surveys.  

RESOURCE : A helpful peer evaluation rubric can be found here  to start your process.  

Peer  R eview  

Peer review   is different than peer evaluation because it is focused on a  deliverable   – this may be a project, a writing assignment, or any other “product” that is prepared by a student to reflect their learning.   

Definition :  P eer   review is the act of having another  person  read what you have written and respond in terms of its effectiveness. Th e  reader  works to  identify the strengths and weaknesses  of the product and  suggests strategies for revising it.   

Purpose : The goal of  peer review is not only to strengthen the work, but to help students identify areas if self-improvement for the future, encourage authentic collaboration, and get a better understanding of whether they are meeting the objectives of an assignment.  Peer  review  in the classroom, much like in the scholarly process, aims to identify the strengths and weaknesses of a student’s work and lead to more effective outcomes.   

In  receiving  the evaluation, students will learn :  

  • to recognize how to learn from constructive criticism  
  • to make revision choices based on  responses from peers  
  • to identify areas for self-improvement and growth.   

In  performing  the evaluation, students will learn :  

  • to read critically  
  • t o  summarize key elements of a work  
  • t o  identify specific areas for improvement with productive suggestions and advice  
  • t o  give clear feedback (positive and negative).   

The most important aspect of peer review is providing very CLEAR guidelines and/or rubrics to students so that they remain focused on the “big picture” of the assignment objectives and give helpful feedback to their peers.  The process  can be done in the classroom or online, depending on your preferences, the particular assignment, and your goals for the review process.  

M otivat e  your students  in the following ways:  

  • by discussing the importance of peer review in a professional setting  
  • by  allotting points to the peer review process (and reflection on how students implement  peer feedback )  
  • by discussing how to give productive feedback  

Examples of Peer Review

There are a variety of tools available for implementing peer review.  Remember that modeling the process for students is critical to productive peer review sessions and outcomes.  

In the classroom:  

  • Provide an example  assignment . Use a previous assignment and lead a whole-class discussion to provide   comments and feedback using  a rubric or checklist .  You  can then elaborate on points students bring up  and offer additional insights or expectations.  
  • The fish-bowl conversation .  Ask students to swap assignments in class.  Provide a rubric and ask them to review each other’s work positively, productively, and critically. Ask two volunteers to sit together and discuss each other’s work, explaining the positive aspects of the work and what needs further improvement. If your students are not comfortable in this environment, you can ask a colleague to help you model this.   

Online:  

  • Use discussions in Canvas.  Ask students to post their work and comment on at least 2 others in the class. The feedback can follow a rubric, or be designed to answer a set of questions about the other person’s work. The original author can then respond to the feedback and reflect on how they will implement changes.
  • Use the peer review tool in Canvas . This tool randomly assigns students to look at another person’s work. This can be used to ensure everyone receives feedback and encourages students to submit work on time. The peer review tool also helps with grading since y ou can also assign a rubric to the assignment.   

Useful Handouts/Rubrics:  

  • A helpful peer review handout for writing is found  he r e . This page provides a resource for self-review that can help guide a reviewer to address the writer’s purpose and requests for areas of improvement.    
  • A more in-depth writing rubric can be found  here . This asks the reviewer to clearly assess the writer’s strengths and weaknesses , as well as pinpoint very specific areas for improvement.   
  • McGill University has a  list of rubrics  depending on the type of assessment you are assigning for peer review.  

Tips to Implement Effectively 

  • To implement an effective peer evaluation, students must fully understand expectations in advance. Set clear goals and expectations for the process. 
  • A detailed rubric or checklist is critical to ensure evaluations are respectful, constructive and helpful. 
  • To avoid emotional complications and hurt feelings, provide examples of effective evaluations/reviews. Be sure to emphasize as required characteristics that evaluations be respectful, constructive and helpful. 
  • To encourage self-direction and responsibility, allow students to create their own rubrics or checklists (though you should still approve prior to use as an actual assessment tool). 
  • Allow students to practice peer evaluations, preferably in the form of a self-assessment or a peer review for a low-stakes activity (e.g. class or online discussion).

Resources :  

  • Kopp, Bryan.  Improving Peer and Instructor Feedback on Writing. February 2014.  https://drive.google.com/file/d/0Bw50x9qy8rGrdFlOZ0NwMUc3dEU/view  
  • Jill  Salahub . (1994-2021). Peer Review. The WAC Clearinghouse. Colorado State University. Available at https://wac.colostate.edu/resources/writing/guides/
  • McGill University Teaching and Learning Services. “Examples of PA Assignments.”  https://www.mcgill.ca/tls/instructors/assessment/peer/examples
  • Open access
  • Published: 24 November 2022

The contribution of peer research in evaluating complex public health interventions: examples from two UK community empowerment projects

  • Kris Southby 1 ,
  • Susan Coan 1 ,
  • Sara Rushworth 1 ,
  • Jane South 1 ,
  • Anne-Marie Bagnall 1 ,
  • Tiffany Lam 2 ,
  • Jenny Woodward 1 &
  • Danial Button 3  

BMC Public Health volume  22 , Article number:  2164 ( 2022 ) Cite this article

1424 Accesses

4 Citations

5 Altmetric

Metrics details

Peer-research is steered and conducted by people with lived experience of the issues being researched. This paper explores the value of peer-research in two complex public health intervention evaluations in the UK.

Reports from 18 peer research projects, completed by residents from 12 communities in the UK taking part in two community empowerment interventions, were analysed using cross-case analysis.

Undertaking peer research helped to build the evaluation and research skills within individual projects as well as providing data on other outcomes related to the programmes Theory of Change. Some peer researchers, however, felt unprepared for the activity despite support from the academic team and were unsatisfied with project outcomes. While peer research projects provided more opportunities for local residents to engage with the overall evaluations, there was an overreliance on people closely connected to the programmes to be peer researchers. The peer research projects explored topics that were broader than the aims and objectives of the overall programme evaluations. All provided insight into the context in which projects occurred, while some also informed understanding of programme change mechanisms.

Conclusions

Including peer research as part of complex public health intervention evaluations can help uncover important contextual and ecological details beyond the reach of more traditional evaluation data collection. Peer research can also empower and build research/evaluation capacity within communities, which is particularly pertinent for community empowerment interventions.

Peer Review reports

Introduction

Evaluation of complex public health interventions – those that have multiple interacting components and non-linear causal pathways [ 1 ] – requires a multidisciplinary approach [ 2 , 3 ], triangulation between the insights of different methodologies [ 4 ], and consideration of the context and ecology in which interventions occur [ 5 ]. Methodologies, like ethnography [ 6 ], photo-elicitation [ 4 ], and in-depth longitudinal case studies [ 7 ], have emerged as means to capture complexity in public health intervention evaluations that would have otherwise been overlooked.

This paper supports the need for high quality, robust, realistic and proportionate evaluation methodologies [ 8 ] by examining the contribution of peer research in two community empowerment intervention evaluations in the UK: Local People (LP) and Local Conversations (LC). Community empowerment approaches are complex public health interventions because of the multiple, usually non-standardised, components involved and multiple layers of interaction within the intervention and with the local ecology [ 6 , 9 , 10 ]. The LP and LC programmes involved residents in fifty disadvantaged neighbourhoods being supported by local and national voluntary and community sector (VCS) organisations to come together to increase their control and influence over the things that matter to them locally to improve social determinants of health, local services, and health and wellbeing, ultimately contributing to reduced health inequalities. (See the programme website for more detail [ 11 ]).

Peer research is steered and conducted by people with lived experience of the issue being studied, who adopt the role of active researchers to collect data from their peers about their experiences. It is a way to reach into communities and include seldom heard voices – which is particularly useful when tackling health inequalities [ 12 ] – establishing rapport with participants, empowering and upskilling participants and communities [ 13 , 14 ], and exploring issues that participants are less willing to raise with academic researchers [ 15 ]. There are apparent synergies between peer research and community empowerment: involvement in peer research is an opportunity for community members to assume control, to learn new skills, and increase their capacity [ 4 , 16 , 17 ], which, in turn, can support the sustainability of community-based interventions [ 16 ]. However, concerns with the method include maintaining confidentiality and data quality [ 18 ], being time-consuming and unpredictable [ 19 ], and getting the balance right between empowering participants and ensuring academic rigour [ 20 ]. The emergence of ‘lay experts’ and ‘expert patients’ indicates the increasing value of experiential knowledge in health and public health fields [ 21 ], and peer research is increasingly acknowledged by governments and commissioners [ 22 ]. Peer research fits within a wider group of community-based participatory research methodologies that attempt to reframe research with a health equity perspective [ 16 ]. However, the extent of peer research’s contribution to evaluating complex public health interventions is currently underexplored. The value of the broader class of “community-based participatory research” to intervention research [ 16 ] and of peer research as a part of multi-modal evaluation design to account for complexity [ 7 ] has been interrogated. This paper describes the specific methodology and presents an analysis of the contribution of peer research to the evaluation of the LP and LC complex public health interventions. In this context, peer research projects were delivered across a wide variety of neighbourhoods and communities experiencing socioeconomic disadvantage.

Methodology

In order to explore the contribution of peer research, a secondary analysis of peer research reports produced as part of the LC and LP projects was undertaken using cross-case analysis [ 23 ]. A matrix was used to extract key information from, and facilitate comparison between, each report. Specific reports are referenced here by either LC or LP and a number (1–5) denoting which project it came from and a number (1–3) denoting the round of data collection.

In total, 18 peer research projects were completed (see Table  1 ) by twelve different LC and LP projects. Peer research took place during each of the three data collection phases in LC (2017, 2019, 2020–21) but only once in LP (2018). Peer researchers’ reflections on the process of undertaking peer research were captured in each project report. The staged peer research process is described in Table  2 . Two enforced changes to the planned process due to Covid-19 were: 1) training and support moved from face-to-face to online delivery; and 2) co-analysis workshops between peer researchers and supporting evaluation team members were replaced with analysis being done by supporting evaluation team members only and corroborated by peer researchers via email. Peer researchers were not involved in producing this paper. Whilst this is contrary to the principles of peer research, there appeared to be little appetite among peer researchers to be involved when we began the writing process.

Evaluation of both programmes was guided by specific research questions based on a Programme Theory of Change (ToC) [ 7 ]. Peer research was one data collection stream in each programme, along with longitudinal qualitative case studies, repeated cross-sectional surveys, process evaluation, and self-evaluation in each case study site. South, Button [ 7 ] describe how the overall evaluation methodology accounted for the complexity of the LP intervention. This paper considers whether peer research contributed to the LC and LP evaluations in the way that was planned in the evaluation methodology, namely: to build evaluation skills within participating projects [ 13 ], increase the reach of data collection [ 12 ], and explore issues that may not be accessible to the evaluation team [ 15 ].

Table 1 summarises key points about the eighteen peer research projects, which are described below in terms of building evaluation skills, the reach of data collection, and issues explored through peer research.

Building (evaluation) skills and capacity

Peer researcher’ reflections suggest that undertaking peer research helped to build the evaluation and research skills within individual LC and LP projects as well as providing data on other outcomes related to the ToC.

Peer researchers described gaining research skills through participation in peer research. Peer-research also built research capacity within projects. In LC3–1, LC5–3, and LC7–3, projects reported feeling encouraged to do more research after taking part in peer-research. In LC2–3, the community organisation leading the project had commissioned external research that involves peer research, based on their experience during the LC programme.

Peer research also appeared to facilitate some of the LC and LP programmes’ intended shorter term changes. Peer-researchers reported gaining confidence, aspirations, and new interpersonal and technical skills through being involved in the peer-research projects. In LC-1, for example, young people were trained by a professional filmmaker on how to use digital cameras, take photographs, and organise an exhibition. Peer researchers said they generally enjoyed coming together to design and carry out research. Peer-researchers in LC5–3 and LC7–3 stated that doing peer-research was purposeful and engaging. In two projects (LC4–1, LC2–2), gains included more knowledge of their local areas and understanding the experiences of residents which would help inform their project activities.

Some peer-researchers felt unprepared for the activity. This can be attributed to insufficient time being spent to develop peer-researchers’ skills prior to commencing projects. For example, peer-research training was condensed from the planned two sessions into one session due to scheduling issues. In phase 3 of LC, co-analysis workshops were not able to take place due to Covid-19 lockdown restrictions. Other peer researchers expressed frustration at the slow pace of peer-research activities (LC6–3) and a lack of change that occurred as a result of peer-research (LC1–3).

The reach of data collection

The ‘reach’ of peer research projects concerns the numbers and breadth of both the peer researchers who were involved in each project and the people who contributed as respondents in data collection. Numbers of peer researchers per project ranged from three to ten, with an average of six. They were all local residents and were typically already heavily involved in their respective LP or LC projects as members of steering/advisory groups. Peer researchers were frequently already contributing to the overall evaluation as case study participants via their active roles in LC and LP projects. Demographic information about peer researchers (e.g. age, ethnicity) was not routinely recorded as part of the process, although peer researchers in three projects (LC1–1, LC7–3, and LP4) were young people and in two projects (LC2–3 and LC2–3) peer researchers were all female.

In terms of reach, all but one (LC1–3) peer research projects utilised convenience sampling techniques, such as interviewing friends and neighbours or conducting surveys where local residents already congregated (e.g. community centres). In total, peer research projects engaged 687 people as respondents, including 248 in qualitative methods (ranging from five to thirty-five across individual projects) and 439 survey respondents (ranging from ten to seventy-six across individual projects). Respondents were mostly local residents. One project (LC4–1) also engaged managers of local community assets. Another (LP2–1) was focused on a community of experience rather than one of geography (as was common in most LC and LP projects) and engaged both community members (people with a disability, their families/carers) and non-members (e.g. ‘seafront traders’). Respondents were also mostly people not already actively involved in LC or LP projects or their respective evaluations, indicating the wider reach of the peer research. Demographic information about respondents is inconsistent between projects – it was either not collected or not reported in project reports. Peer researchers in LC7–3, for example, felt very strongly that collecting demographic information was an invasion of privacy and that respondents would feel uncomfortable giving that information. Reasons why it was not collected in other projects is not clear.

Issues explored

The eighteen peer research projects investigated fourteen separate topics (see Table 1 ). The most researched topics were around residents’ involvement in community activities (LC3–1, LC3–2, LC5–2, LC4–2, LP3, LP5), residents’ experience of Covid-19 ‘lockdown’ (LC1–3, LC2–3, LC5–3, LC6–3), and the experience of particular groups of residents living in the area (LC1–1, LC2–1, LP4). Other research topics reflected more specific local issues, such as perceptions of proposed estate regeneration (LC3–1), the value of a foodbank (LC5–2), or accessibility to the beach/seafront for people with disabilities (LP2–1).

Mapped against the LC and LP programmes’ ToC, all eighteen projects revealed information about the context in which LC and LP projects occurred. This included things like local housing conditions (LC1–1, LC3–1), local feelings of connectedness (LC1–1, LC2–2, LP2), availability of local resources and community assets (LP1–1, LP3, 1, LC4–1, LP4–1), and mental health stigma in communities (LC2–1). Peer research undertaken during LC phase three provided insight into the local ‘lockdown’ experience (LC1–3, LC2–3, LC5–3, LC6–3). Twelve peer research projects also produced information that could relate to LP and LC ‘mechanisms of change’ – the things LC and LP projects are doing to create change in local areas. Very often this was about barriers and enabling factors to participation in LC and LP project activities and community activities in general (LC3–1, LC3–2, LC5–2, LC4–2, LP3, LP5). Other projects provided information about residents’ previous or other experiences of community involvement and collective action (LC4–1, LC7–3), which provided insight to inform project action. Insights around digital exclusion/inclusion during Covid-19 (LC1–3, LC2–3, LC5–3, LC6–3) could also inform future mechanisms of change.

In terms of how topics were explored, the most frequently used research method ( n  = 8) was surveys, either delivered online, in written format, or over the telephone or face-to-face as a structured interview. Other methods used were semi-structured interviews ( n  = 4) and photovoice [ 24 ] ( n  = 3). Three projects used mixed-methods: semi-structured interviews and a survey ( n  = 2), and semi-structured interviews, event feedback, desk research and photovoice ( n  = 1). The research methods used in peer research were very often tailored to residents’ characteristics and needs. Semi-structured interviews were carried out in multiple languages (LC2–1, LC2–2), using photovoice helped engage young people (LC1–1, LP4–1), and surveys were carried out in locations where residents already gathered rather than just being online or via post (LC2–1, LC5–2, LC3–2, LC4–2).

This paper has examined the contribution of peer research to two complex public health intervention evaluations in terms of building evaluation skills within participating projects, increasing the reach of data collection, and exploring issues that may not be accessible to the evaluation team. These three aspects are discussed in turn but with recognition that they overlap and interact.

Firstly, in terms of building evaluation skills, peer research allowed individual peer researchers to learn new research and evaluation skills and for projects to increase their research capacity [ 13 , 14 ]. This is particularly salient in the context of community empowerment interventions. Including peer research (or other participatory methods) as part of an evaluation of a community empowerment programme can support community empowerment itself, enabling people to come together and engage in dialogue, decision making and action, and, through doing so, increase components of collective control and improve local social determinants of health and wellbeing. Community members gain influence over what constitutes knowledge (and that their lived experiences are a legitimate source of knowledge/evidence) and how that knowledge is produced (via the research questions/agenda they set). This reflects findings in wider literature that bringing stakeholders together provides opportunities for them to learn from each other and from research so that they can act [ 17 , 25 ]. Peer research also produced a number of shorter-term programme outcomes, including for peer researchers themselves (e.g. confidence, skills, enjoyment, sense of purpose), for LC and LP projects (e.g. increased capacity, gaining new insights into communities, increased capacity to influence local decision makers), and their respective communities (e.g. stimulating social interaction). Again, this reflects findings in the wider literature that participation in research can directly benefit individuals and their community through learning new skills and increased capacity [ 4 , 9 , 16 ], which, in turn, can support the sustainability of community-based interventions [ 16 ].

Secondly, in terms of increasing the reach of data collection, peer research projects provided additional opportunities for residents to engage in the overall evaluations. At least some of these residents would have been beyond the reach of the external evaluation teams, such as those who did not speak English, and those who were not engaged with the projects. This reflects the broader value of peer research to facilitate the inclusion of seldom heard voices [ 12 ]. However, because demographic information was not routinely collected, it is not possible to say for certain how the respondents compare to those in the other data collection methods. Rather than relying on convenience sampling – a common approach in peer research [ 26 ] – purposive or representative sampling to recruit people not represented in other data collection may be an effective way of ensuring peer research increases the reach of data collection. However, this may create additional burdens for peer researchers and/or infringe on peer researchers control of projects.

There was a reliance on those already strongly associated with LC and LP projects and that were already engaged in the respective evaluations as case study respondents to be peer researchers. This means that the benefits of peer research for empowering and upskilling participants and communities [ 13 , 14 ] were limited to a small pool of people. It is perhaps unrealistic to expect people with no connections to community projects to become effective peer researchers. Whilst training was provided in the technical aspects of research, peer researchers each brought with them important a priori knowledge of projects and trust in the external evaluators. A strategy to broaden the pool of peer researchers may be to actively support people who had been respondents to other data collection methods or other rounds of peer research to become peer researchers. Additionally, while peer-researchers were compensated for their time with high-street shopping vouchers [ 22 ], other payment methods (e.g. money) may have been preferred and encouraged greater engagement. Although, many learned bodies have guidance about paying members of the public for their involvement in research, such as NIHR [ 27 ], navigating appropriate and fair payment still present challenges for both academics and peer researchers [ 28 ]. Future projects may consider taking guidance from potential peer-researchers about their preferred mode and having different ways of compensating different individuals.

Thirdly, in terms of exploring issues that may be beyond the reach of the evaluation team,

the topics of the eighteen peer research projects were broader than the aims and objectives of the overall programme evaluations. All provided insight into the context in which projects occurred, while some informed understanding of programme change mechanisms. This demonstrates that peer research can contribute to an understanding of the context and ecology in which interventions occur [ 5 ], enriching interpretation of the mechanisms and processes occurring within programmes [ 7 , 10 ]. Peer research unpacked community-level processes and perspectives that help explain the interaction of the intervention in context and that have traditionally been insufficiently taken account of in evaluations [ 8 , 21 ]. The unpredictability of peer research [ 19 ] means a potential danger is peer research topics being too removed from the aims and objectives of the evaluation to become irrelevant. Evaluators could mitigate any risk by taking more control of peer research topics, although this would undermine the strength of peer research being led and controlled by people with lived experience. This illustrates the inherent challenge of peer research of getting the balance right between empowering participants and ensuring rigour [ 20 ].

Other risks associated with peer research concern data quality and validity [ 18 ]. These were mitigated here through training and ongoing support for peer researchers, strategies that have been demonstrated successfully elsewhere [ 18 , 26 ]. Allowing peer researchers autonomy to adapt data collection methods based on their knowledge of respondents’ needs and preferences also supported validity. Peer researchers generally found the training and support available useful, although some said they felt unprepared. This is perhaps not surprising given that the training was very often condensed to less than one day, allowing little time to ensure all peer researchers sufficiently understood all aspects of the research process. Potentially insufficient ethics and data protection training raises questions about the safety of peer-research projects. However, ongoing support and supervision by a member of the evaluation team helped to mitigate this risk. Further training and support would always be beneficial, although this must be balanced with demands from other aspects of evaluations.

Supporting peer researchers to do co-analysis with the evaluation team was stopped due to Covid-19. The impact of this on individual peer research projects and on evaluations is unknown but it was a necessary pragmatic decision. Likewise, for practical reasons (e.g. researcher and evaluation team capacity, project timelines) peer researchers’ involvement in producing outputs from their projects was limited to commenting on/editing reports drafted by evaluation team members and having final sign-off on completed reports. Whilst this may appear contrary to the principles of peer research, an alternative view is that participation is a continuum, not an absolute [ 29 ], and that peer researchers had choices over the extent of their involvement within the confines of this project. Planning additional resources to support fuller participation during this phase may support deeper engagement with analysis and report writing.

A limitation of this current analysis is that it is partially based on limited feedback from peer researchers and observations made whilst supporting peer researcher projects rather than through a systematic evaluation of the peer research process. Further research to, for example, formally assess peer researcher skill acquisition or to compare the composition of respondents to peer research and other data collection methods within evaluations would be useful. While the breadth of the sample of peer research projects across two national community empowerment programmes made it possible to draw out common themes about the contribution of peer research, findings are not generalisable. Further evaluation of peer research in different contexts is merited to examine the transferability and relevance of these themes to other complex public health intervention evaluations.

This paper has shown that including peer research as part of complex public health intervention evaluations can help uncover important contextual and ecological details beyond the reach of more traditional evaluation data collection, supporting expectations that public health decision makers draw on the best available evidence [ 2 , 30 ]. Including peer research fits with a growing focus on patient and public involvement in health research [ 31 ] and offer a means of expanding the role of members of the public [ 21 ] into the design, delivery, and analysis of enquiries. There is distinct value in enabling research that matters to communities. However, balancing increasing community control with managing a complex public health intervention evaluation is challenging and without appropriate planning and resources may necessitate compromising participatory ideals.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due to reports that were included in the secondary analysis being co-owned by third party community groups but are available from the corresponding author on reasonable request.

Abbreviations

Local Conversations (one of the interventions the paper is about)

Local People (one of the interventions the paper is about)

Theory of Change

Voluntary and community sector

Petticrew M. When are complex interventions ‘complex’? When are simple interventions ‘simple’? Eur J Pub Health. 2011;21(4):397–8.

Article   Google Scholar  

Cyr PR, Jain V, Chalkidou K, Ottersen T, Gopinathan U. Evaluations of public health interventions produced by health technology assessment agencies: a mapping review and analysis by type and evidence content. Health Policy Open. 2021;125(8):1054–64.

Minary L, Trompette J, Kivits J, Cambon L, Tarquinio C, Alla F. Which design to evaluate complex interventions? Toward a methodological framework through a systematic review. BMC Med Res Methodol. 2019;19(1):1–9.

Sebastião E, Gálvez PA, Bobitt J, Adamson BC, Schwingel A. Visual and participatory research techniques: photo-elicitation and its potential to better inform public health about physical activity and eating behavior in underserved populations. J Public Health. 2016;1:3.

Trickett EJ, Beehler S, Deutsch C, Green LW, Hawe P, McLeroy K, et al. Advancing the science of community-level interventions. Am J Public Health. 2011;101(8):1410–9.

Article   PubMed   PubMed Central   Google Scholar  

Orton L, Ponsford R, Egan M, Halliday E, Whitehead M, Popay J. Capturing complexity in the evaluation of a major area-based initiative in community empowerment: what can a multi-site, multi team, ethnographic approach offer? Anthropol Med. 2019;26(1):48–64.

Article   PubMed   Google Scholar  

South J, Button D, Quick A, Bagnall A-M, Trigwell J, Woodward J, et al. Complexity and community context: learning from the evaluation design of a national community empowerment programme. Int J Environ Res Public Health. 2020;17(1):91.

Hanson S, Jones A. Missed opportunities in the evaluation of public health interventions: a case study of physical activity programmes. BMC Public Health. 2017;17(1):1–10.

South J, Phillips G. Evaluating community engagement as part of the public health system. J Epidemiol Community Health. 2014;68(7):692–6.

Orton L, Halliday E, Collins M, Egan M, Lewis S, Ponsford R, et al. Putting context Centre stage: evidence from a systems evaluation of an area based empowerment initiative in England. Crit Public Health. 2017;27(4):477–89.

People's Health Trust. Local Conversations. 2021. Available from: https://www.peopleshealthtrust.org.uk/local-conversations .

Vaughn LM, Whetstone C, Boards A, Busch MD, Magnusson M, Määttä S. Partnering with insiders: a review of peer models across community-engaged research, education and social care. Health Soc Care Commun. 2018;26(6):769–86.

Guta A, Flicker S, Roche B. Governing through community allegiance: a qualitative examination of peer research in community-based participatory research. Crit Public Health. 2013;23(4):432–51.

Porter G. Reflections on co-investigation through peer research with young people and older people in sub-Saharan Africa. Qual Res. 2016;16(3):293–304.

Green J, South J. Evaluation: Open University Press; 2006.

Google Scholar  

Wallerstein N, Duran B. Community-based participatory research contributions to intervention research: the intersection of science and practice to improve health equity. Am J Public Health. 2010;100(S1):S40–S6.

Popay J, Whitehead M, Ponsford R, Egan M, Mead R. Power, control, communities and health inequalities I: theories, concepts and analytical frameworks. Health Promot Int. 2020;36(5):1253–63.

Article   PubMed Central   Google Scholar  

Lushey CJ, Munro ER. Participatory peer research methodology: an effective method for obtaining young people’s perspectives on transitions from care to adulthood? Qual Soc Work. 2015;14(4):522–37.

Kavanagh A, Daly J, Jolley D. Research methods, evidence and public health. Aust N Z J Public Health. 2002;26(4):337–42.

Cleaver F. Institutions, agency and the limitations of participatory approaches to development. In: Cooke B, Kothari U, editors. Participation: the new tyranny? London: ZED Books; 2001. p. 36–55.

Harris J, Croot L, Thompson J, Springett J. How stakeholder participation can contribute to systematic reviews of complex interventions. J Epidemiol Community Health. 2016;70(2):207–14.

Article   CAS   PubMed   Google Scholar  

Terry L, Cardwell V. Refreshing perspectives: exploring the application of peer research with populations facing severe and multiple disadvantage. London: Revolving Doors Agency & Lankelly Chase; 2016.

Yin RK. Case study research: design and methods: SAGE; 2009.

Wang C, Burris MA. Photovoice: concept, methodology, and use for participatory needs assessment. Health Educ Behav. 1997;24(3):369–87.

McGill E, Er V, Penney T, Egan M, White M, Meier P, et al. Evaluation of public health interventions from a complex systems perspective: a research methods review. Soc Sci Med. 2021;272:113697.

Woodall J, Cross R, Kinsella K, Bunyan A-M. Using peer research processes to understand strategies to support those with severe, multiple and complex health needs. Health Educ J. 2019;78(2):176–88.

National Institute of Health Research. Payment guidance for researchers and professionals. 2022. Available from: https://www.nihr.ac.uk/documents/payment-guidance-for-researchers-and-professionals/27392 .

Cheetham M, Atkinson P, Gibson M, Katikireddi S, Moffatt S, Morris S, et al. Exploring the mental health effects of universal credit: a journey of co-production. Perspect Public Health. 2022;142(4):209–12.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Southby K. Reflecting on (the challenge of) conducting participatory research as a research-degree student. Res All. 2017;1(1):128–42.

Achana F, Hubbard S, Sutton A, Kendrick D, Cooper N. An exploration of synthesis methods in public health evaluations of interventions concludes that the use of modern statistical methods would be beneficial. J Clin Epidemiol. 2014;67(4):376–90.

Pelletier CA, Pousette A, Ward K, Fox G. Exploring the perspectives of community members as research partners in rural and remote areas. Res Involv Engagem. 2020;6(1):3.

Download references

Acknowledgements

We would like to thank all the community members and organisations who carried out peer research projects.

The research was undertaken collaboratively between Leeds Beckett University and New Economics Foundation (NEF) as part of two evaluation programmes funded by the People’s Health Trust. Funding was awarded via a competitive tendering process. There is no grant/award number. Funding to evaluate Local People was provided by the People’s Health Trust jointly to Leeds Beckett University and New Economics Foundations. Funding to evaluate Local Conversations was provided to New Economics Foundation who then subcontracted Leeds Beckett University to support specific workstreams.

Author information

Authors and affiliations.

Leeds Beckett University, 519 Portland, City Campus, Leeds, LS1 3HE, UK

Kris Southby, Susan Coan, Sara Rushworth, Jane South, Anne-Marie Bagnall & Jenny Woodward

Sustrans UK, 2 Cathedral Square, Bristol, BS1 5DD, UK

Tiffany Lam

New Economics Foundation, 10 Salamanca Place, London, SE1 7HB, UK

Danial Button

You can also search for this author in PubMed   Google Scholar

Contributions

KS, SC, SR, and DB supported community members to carry out peer-research projects, including delivering training, co-analysis of data, and report writing. All authors were involved in conceptualisation and design of the study; KS, AMB, and TF carried out secondary data analysis of peer-research reports. KS led drafting paper. JS, SC, JW, AMB and SC reviewed drafts. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kris Southby .

Ethics declarations

Ethics approval and consent to participate.

All methods were carried out in accordance with relevant guidelines and regulations, in particular ‘Community-based participatory research: A guide to ethical principles and practice’ produced by Centre for Social Justice and Community Action and National Co-ordinating Centre for Public Engagement ( https://www.publicengagement.ac.uk/sites/default/files/publication/cbpr_ethics_guide_web_november_2012.pdf ). This paper is based on a secondary analysis of unpublished reports and so informed consent from participants is not applicable. The studies upon which this paper is based were granted ethical approval by Leeds Beckett University’s Research Ethics Committee (application refs. 38299 and 48663).

Consent for publication

Not applicable as we are not publishing participant information.

Competing interests

The research was undertaken in collaboration with New Economics Foundation (NEF) as part of two evaluation programmes funded by the People’s Health Trust. Authors DB and TL were involved in commissioning Leeds Beckett University for their part in evaluating LC. All other authors have no Competing Interest to declare.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Southby, K., Coan, S., Rushworth, S. et al. The contribution of peer research in evaluating complex public health interventions: examples from two UK community empowerment projects. BMC Public Health 22 , 2164 (2022). https://doi.org/10.1186/s12889-022-14465-2

Download citation

Received : 25 January 2022

Accepted : 27 October 2022

Published : 24 November 2022

DOI : https://doi.org/10.1186/s12889-022-14465-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Peer research
  • Research methods
  • Complex interventions
  • Community empowerment

BMC Public Health

ISSN: 1471-2458

peer evaluation for research paper

  • Research article
  • Open access
  • Published: 06 March 2019

Tools used to assess the quality of peer review reports: a methodological systematic review

  • Cecilia Superchi   ORCID: orcid.org/0000-0002-5375-6018 1 , 2 , 3 ,
  • José Antonio González 1 ,
  • Ivan Solà 4 , 5 ,
  • Erik Cobo 1 ,
  • Darko Hren 6 &
  • Isabelle Boutron 7  

BMC Medical Research Methodology volume  19 , Article number:  48 ( 2019 ) Cite this article

24k Accesses

41 Citations

66 Altmetric

Metrics details

A strong need exists for a validated tool that clearly defines peer review report quality in biomedical research, as it will allow evaluating interventions aimed at improving the peer review process in well-performed trials. We aim to identify and describe existing tools for assessing the quality of peer review reports in biomedical research.

We conducted a methodological systematic review by searching PubMed, EMBASE (via Ovid) and The Cochrane Methodology Register (via The Cochrane Library) as well as Google® for all reports in English describing a tool for assessing the quality of a peer review report in biomedical research. Data extraction was performed in duplicate using a standardized data extraction form. We extracted information on the structure, development and validation of each tool. We also identified quality components across tools using a systematic multi-step approach and we investigated quality domain similarities among tools by performing hierarchical, complete-linkage clustering analysis.

We identified a total number of 24 tools: 23 scales and 1 checklist. Six tools consisted of a single item and 18 had several items ranging from 4 to 26. None of the tools reported a definition of ‘quality’. Only 1 tool described the scale development and 10 provided measures of validity and reliability. Five tools were used as an outcome in a randomized controlled trial (RCT). Moreover, we classified the quality components of the 18 tools with more than one item into 9 main quality domains and 11 subdomains. The tools contained from two to seven quality domains. Some domains and subdomains were considered in most tools such as the detailed/thorough (11/18) nature of reviewer’s comments. Others were rarely considered, such as whether or not the reviewer made comments on the statistical methods (1/18).

Several tools are available to assess the quality of peer review reports; however, the development and validation process is questionable and the concepts evaluated by these tools vary widely. The results from this study and from further investigations will inform the development of a new tool for assessing the quality of peer review reports in biomedical research.

Peer Review reports

The use of editorial peer review originates in the eighteenth century [ 1 ]. It is a longstanding and established process that generally aims to provide a fair decision-making mechanism and improve the quality of a submitted manuscript [ 2 ]. Despite the long history and application of the peer review system, its efficacy is still a matter of controversy [ 3 , 4 , 5 , 6 , 7 ]. About 30 years after the first international Peer Review Congress, there are still ‘scarcely any bars to eventual publication. There seems to be no study too fragmented, no hypothesis too trivial [...] for a paper to end up in print’ (Drummond Rennie, chair of the advisory board) [ 8 ].

Recent evidence suggests that many current editors and peer reviewers in biomedical journals still lack the appropriate competencies [ 9 ]. In particular, it has been shown that peer reviewers rarely receive formal training [ 3 ]. Moreover, their capacity to detect errors [ 10 , 11 ], identify deficiencies in reporting [ 12 ] and spin [ 13 ] has been found lacking.

Some systematic reviews have been performed to estimate the effect of interventions aimed at improving the peer review process [ 2 , 14 , 15 ]. These studies showed that there is still a lack of evidence supporting the use of interventions to improve the quality of the peer review process. Furthermore, Bruce and colleagues highlighted the urgent need to clarify outcomes, such as peer review report quality, that should be used in randomized controlled trials evaluating these interventions [ 15 ].

A validated tool that clearly defines peer review report quality in biomedical research is greatly needed. This will allow researchers to have a structured instrument to evaluate the impact of interventions aimed at improving the peer review process in well-performed trials. Such a tool could also be regularly used by editors to evaluate the work of reviewers.

Herein, as starting point for the development of a new tool, we identify and describe existing tools that assess the quality of peer review reports in biomedical research.

Study design

We conducted a methodological systematic review and followed the standard Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines [ 16 ]. The quality of peer review reports is an outcome that in the long term is related to clinical relevance and patient care. However, the protocol was not registered in PROSPERO, as this review does not contain direct health-related outcomes [ 17 ].

Information sources and search strategy

We searched PubMed, EMBASE (via Ovid) and The Cochrane Methodology Register (via The Cochrane Library) from their inception to October 27, 2017 as well as Google® (search date: October 20, 2017) for all reports describing a tool to assess the quality of a peer review report in biomedical research. Search strategies were refined in collaboration with an expert methodologist (IS) and are presented in the Additional file  1 . We hand-searched the citation lists of included papers and consulted a senior editor with expertise in editorial policies and peer review processes to further identify relevant reports.

Eligibility criteria

We included all reports describing a tool to assess the quality of a peer review report. Sanderson and colleagues defined a tool as ‘any structured instrument aimed at aiding the user to assess the quality [...]’ [ 18 ]. Building on this definition, we defined a quality tool as any structured or unstructured instrument assisting the user to assess the quality of peer review report (for definitions see Table  1 ). We restricted inclusion to the English language.

Study selection

We exported the references retrieved from the search into the reference manager Endnote X7 (Clarivate Analytics, Philadelphia, United States), which was subsequently used to remove duplicates. We reviewed all records manually to verify and remove duplicates that had not been previously detected. A reviewer (CS) screened all titles and abstracts of the retrieved citations. A second reviewer (JAG) carried out quality control on a 25% random sample obtained using the statistical software R 3.3.3 [ 19 ]. We obtained and independently examined the full-text copies of potentially eligible reports for further assessment. In the case of disagreement, consensus was determined by a discussion or by involving a third reviewer (DH). We reported the result of this process through a PRISMA flowchart [ 16 ]. When several tools were reported in the same article, they were included as separate tools. When a tool was reported in more than one article, we extracted data from all related reports.

Data extraction

General characteristics of tools.

We designed a data extraction form using Google® Docs and extracted the general characteristics of the tools. We determined whether the tool was scale or checklist. We defined a tool as a scale when it included a numeric or nominal overall quality score while we considered it as a checklist when an overall quality score was not present. We recorded the total number of items (for definitions see Table 1 ). For scales with more than 1 item we extracted how items were weighted, how the overall score was calculated, and the scoring range. Moreover, we checked whether the scoring instructions were adequately defined, partially defined, or not defined according to the subjective judgement of two reviewers (CS and JAG) (an example of the definition for scoring instructions is shown in Table  2 ). Finally, we extracted all information related to the development, validation, and assessment of the tool’s reliability and if the concept of quality was defined.

Two reviewers (CS and JAG) piloted and refined the data extraction form on a random 5% sample of extracted articles. Full data extraction was conducted by two reviewers (CS and JAG) working independently for all included articles. In the case of disagreement, consensus was obtained by discussion or by involving a third reviewer (DH). Authors of the reports were contacted in cases where we needed further clarification of the tool.

Quality components of the peer review report considered in the tools

We followed the systematic multi-step approach recently described by Gentles [ 20 ], which is based on a constant comparative method of analysis developed within the Grounded Theory approach [ 21 ]. Initially, a researcher (CS) extracted all items included in the tools and for each item identified a ‘key concept’ representing a quality component of peer review reports. Next, two researchers (CS and DH) organized the key concepts into a domain-specific matrix (analogous to the topic-specific matrices described by Gentles). Initially, the matrix consisted of domains for peer review report quality, followed by items representative of each domain and references to literature sources that items were extracted from. As the analysis progressed, subdomains were created and the final version of the matrix included domains, subdomains, items and references.

Furthermore, we calculated the proportions of domains based on the number of items included in each domain for each tool. According to the proportions obtained, we created a domain profile for each tool. Then, we calculated the matrix of Euclidean distances between the domain profiles. These distances were used to perform the hierarchical, complete-linkage clustering analysis, which provided us with a tree structure that we represent in a chart. Through this graphical summary, we were able to identify domain similarities among the different tools, which helped us draw our analytical conclusions. The calculations and graphical representations were obtained using the statistical software R 3.3.3 [ 19 ].

Study selection and general characteristics of reports

The screening process is summarized in a flow diagram (Fig. 1 ). Of the 4312 records retrieved, we finally included 46 reports: 39 research articles; 3 editorials; 2 information guides; 1 was a letter to the editor and 1 study was available only as an abstract (excluded studies are listed in Additional file  2 ; included studies are listed in Additional file  3 ).

figure 1

Study selection flow diagram

General characteristics of the tools

In the 46 reports, we identified 24 tools, including 23 scales and 1 checklist. The tools were developed from 1985 to 2017. Four tools had from 2 to 4 versions [ 22 , 23 , 24 , 25 ]. Five tools were used as an outcome in a randomized controlled trial [ 23 , 25 , 26 , 27 , 28 ]. Table  3 lists the general characteristics of the identified tools. Table  4 presents a more complete descriptive summary of the tools’ characteristics, including types and measures of validity and reliability.

Six scales consisted of a single item enquiring into the overall quality of the peer review report, all of them based on directly asking users to score the overall quality [ 22 , 25 , 29 , 30 , 31 , 32 ]. These tools assessed the quality of a peer review report by using: 1) a 4 or 5 Likert point scale ( n  = 4); 2) as ‘good’, ‘fair’ and ‘poor’ ( n  = 1); and 3) a restricted scale from 80 to 100 (n = 1). Seventeen scales and one checklist had several items ranging in number from 4 to 26. Of these, 10 used the same weight for each item [ 23 , 24 , 27 , 28 , 33 , 34 , 35 , 36 , 37 , 38 ]. The overall quality score was the sum of the score for each item ( n  = 3); the mean of the score of the items ( n  = 6); or the summary score ( n  = 11) (for definitions see Table 1 ). Three scales reported more than one way to assess the overall quality [ 23 , 24 , 36 ]. The scoring system instructions were not defined in 67% of the tools.

None of the tools reported the definition of peer review report quality, and only one described the tool development [ 39 ]. The first version of this tool was designed by a development group composed of four researchers and three editors. It was based on a tool used in an earlier study and that had been developed by reviewing the literature and interviewing editors. Successively, the tool was modified by rewording some questions after some group discussions and a guideline for using the tool was drawn up.

Only 3 tools assessed and reported a validation process [ 39 , 40 , 41 ]. The assessed types of validity included face validity, content validity, construct validity, and preliminary criterion validity. Face and content validity could involve either a sole editor and author or a group of researchers and editors. Construct validity was assessed with multiple regression analysis using discriminant criteria (reviewer characteristics such as age, sex, and country of residence) and convergent criteria (training in epidemiology and/or statistics); or the overall assessment of the peer review report by authors and an assessment of ( n  = 4–8) specific components of the peer review report by editors or authors. Preliminary criterion was assessed by comparing grades obtained by an editor to those obtained by an editor-in-chief using an earlier version of the tool. Reliability was assessed in 9 tools [ 24 , 25 , 26 , 27 , 31 , 36 , 39 , 41 , 42 ]; all reported inter-rater reliability and 2 also reported test-retest reliability. One tool reported the internal consistency measured with the Cronbach’s alpha [ 39 ].

Quality components of the peer review reports considered in the tools with more than one item

We extracted 132 items included in the 18 tools. One item asking for the percentage of co-reviews the reviewer had graded was not included in the classification because it represented a method of measuring reviewer’s performance and not a component of peer review report quality.

We organized the key concepts from each item into ‘topic-specific matrices’ (Additional file  4 ), identifying nine main domains and 11 subdomains: 1) relevance of study ( n  = 9); 2) originality of the study ( n  = 5); 3) interpretation of study results ( n  = 6); 4) strengths and weaknesses of the study ( n  = 12) (general, methods and statistical methods); 5) presentation and organization of the manuscript ( n  = 8); 6) structure of the reviewer’s comments ( n  = 4); 7) characteristics of reviewer’s comments ( n  = 14) (clarity, constructiveness, detail/thoroughness, fairness, knowledgeability, tone); 8) timeliness of the review report ( n  = 7); and 9) usefulness of the review report ( n  = 10) (decision making and manuscript improvement). The total number of tools corresponding to each domain and subdomain is shown in Fig.  2 . An explanation and example of all domains and subdomains is provided in Table  5 . Some domains and subdomains were considered in most tools, such as whether the reviewers’ comments were detailed/thorough ( n  = 11) and constructive ( n  = 9), whether the reviewers’ comments were on the relevance of the study ( n  = 9) and if the peer review report was useful for manuscript improvement ( n  = 9). However, other items were rarely considered, such as whether the reviewer made comments on the statistical methods ( n  = 1).

figure 2

Frequency of quality domains and subdomains

Clustering analysis among tools

We created a domain profile for each tool. For example, the tool developed by Justice et al. consisted of 5 items [ 35 ]. We classified three items under the domain ‘ Characteristics of the reviewer’s comments ’, one under ‘ Timeliness of the review report ’ and one under ‘ Usefulness of the review report ’. According to the aforementioned classification, the domain profile (represented by proportions of domains) for this tool was 0.6:0.2:0.2 for the incorporating domains and 0 for the remaining ones. The hierarchical clustering used the matrix of Euclidean distances among domain profiles, which led to five main clusters (Fig.  3 ).

figure 3

Hierarchical clustering of tools based on the nine quality domains. The figure shows which quality domains are present in each tool. A slice of the chart represents a tool, and each slice is divided into sectors, indicating quality domains (in different colours). The area of each sector corresponds to the proportion of each domain within the tool. For instance, the “Review Rating” tool consists of two domains: Timeliness , meaning that 25% of all its items are encompassed in this domain, and Characteristics of reviewer’s comments occupying the remaining 75%. The blue lines starting from the centre of the chart define how the tools are divided into the five clusters. Clusters #1, #2 and #3 are sub-nodes of a major node grouping all three, meaning that the tools in these clusters have a similar domain profile compared to the tools in clusters #4 and #5

The first cluster consisted of 5 tools developed from 1990 to 2016. All tools included at least one item in the characteristics of the reviewer’s comments domain, representing at least 50% of each domain profile. In the second cluster, there were 3 tools developed from 1994 to 2006. These tools were characterized to incorporate at least one item in the usefulness and timeliness domains. The third cluster included 6 tools that had been developed from 1998 to 2010 and exhibited the most heterogeneous mix of domains. These tools were distinct from the rest because they encompassed items related to interpretation of the study results and originality of the study . Moreover, the third cluster included two tools with different versions and variations. The first, second, and third cluster were linked together in the hierarchical tree that presented tools with at least one quality component grouped in the domain characteristics of the reviewer’s comments. In the fourth cluster, there are 2 tools developed from 2011 to 2017 that consist of at least one component in the strengths and weaknesses domain. Finally, the fifth cluster included 2 tools developed from 2009 to 2012 and which consisted of the same 2 domains. The fourth and fifth clusters were separated from the rest in the hierarchical tree that presented tools with only a few domains.

To the best of our knowledge, this is the first comprehensive review that has systematically identified tools used in biomedical research for assessing the quality of peer review reports. We have identified 24 tools from both the medical literature and an internet search: 23 scales and 1 checklist. One out of four tools consisted of a single item that simply asked the evaluator for a direct assessment of the peer review report’s ‘overall quality’. The remaining tools had between 4 to 26 items in which the overall quality was assessed as the sum of all items, their mean, or as a summary score.

Since a definition of overall quality was not provided, these tools consisted exclusively of a subjective quality assessment by the evaluators. Moreover, we found that only one study reported a rigorous development process of the tool, although it included a very limited number of people. This is of concern because it means that the identified tools were, in fact, not suitable to assess the quality of a peer review report, particularly because they lack a focused theoretical basis. We found 10 tools that were evaluated for validity and reliability; in particular, criterion validity was not assessed for any tool.

Most of the scales with more than one item resulted in a summary score. These scales did not consider how items could be weighted differently. Although commonly used, scales are controversial tools in assessing quality primarily because using a score ‘in summarization weights’ would cause a biased estimation of the measured object [ 43 ]. It is not clear how weights should be assigned to each item of the scale [ 18 ]. Thus different weightings would produce different scales, which could provide varying quality assessments of an individual study [ 44 ].

n our methodological systematic review, we found only one checklist. However, it was neither rigorously developed nor validated and therefore we could not consider it adequate for assessing peer review report quality. We believe that checklists may be a more appropriate means for assessing quality because they do not present an overall score, meaning they do not require a weight for the items.

It is necessary to clearly define what the tool measures. For example, the Risk of Bias (RoB) tool [ 45 ] has a clear aim (to assess trial conduct and not reporting), and it provides a detailed definition of each domain in the tool, including support for judgment. Furthermore, it was developed with transparent procedures, including wide consultation and review of the empirical evidence. Bias and uncertainty can arise when using tools that are not evidence-based, rigorously developed, validated and reliable; and this is particularly true for tools that are used for evaluating interventions aimed at improving the peer review process in RCTs, thus affecting how trial results are interpreted.

We found that most of the items included in the different tools did not cover the scientific aspects of a peer review report nor were constrained to biomedical research. Surprisingly, few tools included an item related to the methods used in the study, and only one inquired about the statistical methods.

In line with a previous study published in 1990 [ 28 ], we believe that the quality components found across all tools could be further organized according to the perspective of either an editor or author, specifically by taking into account the different yet complementary uses of a peer review report. For instance, reviewer’s comments on the relevance of the study and interpretation of the study’s results could assist editors in making an editorial decision, clarity and detail/thoroughness of reviewer’s comments are important attributes which help authors improve manuscript quality. We plan to further investigate the perspectives of biomedical editors and authors towards the quality of peer review reports by conducting an international online survey. We will also include patient editors as survey’s participants as their involvement in the peer review process can further ensure that research manuscripts are relevant and appropriate to end-users [ 46 ].

The present study has strengths but also some limitations. Although we implemented a comprehensive search strategy for reports by following the guidance for conducting methodological reviews [ 20 ], we cannot exclude a possibility that some tools were not identified. Moreover, we limited the eligibility criteria to reports published only in English. Finally, although the number of eligible records we identified through Google® was very limited, it is possible that we introduced selection bias due to a (re)search bubble effect [ 47 ].

Due to the lack of a standard definition of quality, a variety of tools exist for assessing the quality of a peer review report. Overall, we were able to establish 9 quality domains. Between two to seven domains were used among each of the 18 tools. The variety of items and item combinations amongst tools raises concern about variations in the quality of publications across biomedical journals. Low-quality biomedical research implies a tremendous waste of resources [ 48 ] and explicitly affects patients’ lives. We strongly believe that a validated tool is necessary for providing a clear definition of peer review report quality in order to evaluate interventions aimed at improving the peer review process in well-performed trials.

Conclusions

The findings from this methodological systematic review show that the tools for assessing the quality of a peer review report have various components, which have been grouped into 9 domains. We plan to survey a sample of editors and authors in order to refine our preliminary classifications. The results from further investigations will allow us to develop a new tool for assessing the quality of peer review reports. This in turn could be used to evaluate interventions aimed at improving the peer review process in RCTs. Furthermore, it would help editors: 1) evaluate the work of reviewers; 2) provide specific feedback to reviewers; and 3) identify reviewers who provide outstanding review reports. Finally, it might be further used to score the quality of peer review reports in developing programs to train new reviewers.

Abbreviations

Preferred Reporting Items for Systematic Reviews

Randomized controlled trials

Risk of Bias

Kronick DA. Peer review in 18th-century scientific journalism. JAMA. 1990;263(10):1321–2.

Article   CAS   Google Scholar  

Jefferson T, Alderson P, Wager E, Davidoff F. Effects of editorial peer review. JAMA. 2002;287(21):2784–6.

Article   Google Scholar  

Smith R. Peer review: a flawed process at the heart of science and journals. J R Soc Med. 2006;99:178–82.

Baxt WG, Waeckerle JF, Berlin JA, Callaham ML. Who reviews the reviewers? Feasibility of using a fictitious manuscript to evaluate peer reviewer performance. Ann Emerg Med. 1998;32(3):310–7.

Kravitz RL, Franks P, Feldman MD, Gerrity M, Byrne C, William M. Editorial peer reviewers’ recommendations at a general medical journal : are they reliable and do editors care? PLoS One. 2010;5(4):2–6.

Yaffe MB. Re-reviewing peer review. Sci Signal. 2009;2(85):1–3.

Stahel PF, Moore EE. Peer review for biomedical publications : we can improve the system. BMC Med. 2014;12(179):1–4.

Google Scholar  

Rennie D. Make peer review scientific. Nature. 2016;535:31–3.

Moher D. Custodians of high-quality science: are editors and peer reviewers good enough? https://www.youtube.com/watch?v=RV2tknDtyDs&t=454s . Accessed 16 Oct 2017.

Ghimire S, Kyung E, Kang W, Kim E. Assessment of adherence to the CONSORT statement for quality of reports on randomized controlled trial abstracts from four high-impact general medical journals. Trials. 2012;13:77.

Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results. JAMA. 2010;303(20):2058–64.

Hopewell S, Collins GS, Boutron I, Yu L-M, Cook J, Shanyinde M, et al. Impact of peer review on reports of randomised trials published in open peer review journals: retrospective before and after study. BMJ. 2014;349:g4145.

Lazarus C, Haneef R, Ravaud P, Boutron I. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention. BMC Med Res Methodol. 2015;15:85.

Jefferson T, Rudin M, Brodney Folse S, et al. Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database Syst Rev. 2007;2:MR000016.

Bruce R, Chauvin A, Trinquart L, Ravaud P, Boutron I. Impact of interventions to improve the quality of peer review of biomedical journals: a systematic review and meta-analysis. BMC Med. 2016;14:85.

Moher D, Liberati A, Tetzlaff J, Altman DG, Group TP. Preferred reporting items for systematic reviews and meta-analyses : the PRISMA statement. PLoS Med. 2009;6(7):e1000097.

NHS. PROSPERO International prospective register of systematic reviews. https://www.crd.york.ac.uk/prospero/ . Accessed 6 Nov 2017.

Sanderson S, Tatt ID, Higgins JPT. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Intern J Epidemiol. 2007;36:666–76.

R Core Team. R: a language and environment for statistical computing. http://www.r-project.org/ . Accessed 4 Dec 2017.

Gentles SJ, Charles C, Nicholas DB, Ploeg J, McKibbon KA. Reviewing the research methods literature: principles and strategies illustrated by a systematic overview of sampling in qualitative research. Syst Rev. 2016;5:172.

Glaser B, Strauss A. The discovery of grounded theory. Chicago: Aldine; 1967.

Friedman DP. Manuscript peer review at the AJR: facts, figures, and quality assessment. Am J Roentgenol. 1995;164(4):1007–9.

Black N, Van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review for a general medical journal? JAMA. 1998;280(3):231–3.

Henly SJ, Dougherty MC. Quality of manuscript reviews in nursing research. Nurs Outlook. 2009;57(1):18–26.

Callaham ML, Baxt WG, Waeckerle JF, Wears RL. Reliability of editors’ subjective quality ratings of peer reviews of manuscripts. JAMA. 1998;280(3):229–31.

Callaham ML, Knopp RK, Gallagher EJ. Effect of written feedback by editors on quality of reviews: two randomized trials. JAMA. 2002;287(21):2781–3.

Van Rooyen S, Godlee F, Evans S, Black N, Smith R. Effect of open peer review on quality of reviews and on reviewers ’ recommendations : a randomised trial. BMJ. 1999;318(7175):23–7.

Mcnutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review. JAMA. 1990;263(10):1371–6.

Moore A, Jones R. Supporting and enhancing peer review in the BJGP. Br J Gen Pract. 2014;64(624):e459–61.

Stossel TP. Reviewer status and review quality. N Engl J Med. 1985;312(10):658–9.

Thompson SR, Agel J, Losina E. The JBJS peer-review scoring scale: a valid, reliable instrument for measuring the quality of peer review reports. Learn Publ. 2016;29:23–5.

Rajesh A, Cloud G, Harisinghani MG. Improving the quality of manuscript reviews : impact of introducing a structured electronic template to submit reviews. AJR. 2013;200:20–3.

Shattell MM, Chinn P, Thomas SP, Cowling WR. Authors’ and editors’ perspectives on peer review quality in three scholarly nursing journals. J Nurs Scholarsh. 2010;42(1):58–65.

Jawaid SA, Jawaid M, Jafary MH. Characteristics of reviewers and quality of reviews: a retrospective study of reviewers at Pakistan journal of medical sciences. Pakistan J Med Sci. 2006;22(2):101–6.

Justice AC, Cho MK, Winker MA, Berlin JA. Does masking author identity improve peer review quality ? A randomized controlled trial. JAMA. 1998;280(3):240–3.

Henly SJ, Bennett JA, Dougherty MC. Scientific and statistical reviews of manuscripts submitted to nursing research: comparison of completeness, quality, and usefulness. Nurs Outlook. 2010;58(4):188–99.

Hettyey A, Griggio M, Mann M, Raveh S, Schaedelin FC, Thonhauser KE, et al. Peerage of science: will it work? Trends Ecol Evol. 2012;27(4):189–90.

Publons. Publons for editors: overview. https://static1.squarespace.com/static/576fcda2e4fcb5ab5152b4d8/t/58e21609d482e9ebf98163be/1491211787054/Publons_for_Editors_Overview.pdf . Accessed 20 Oct 2017.

Van Rooyen S, Black N, Godlee F. Development of the review quality instrument (RQI) for assessing peer reviews of manuscripts. J Clin Epidemiol. 1999;52(7):625–9.

Evans AT, McNutt RA, Fletcher SW, Fletcher RH. The characteristics of peer reviewers who produce good-quality reviews. J Gen Intern Med. 1993;8(8):422–8.

Feurer I, Becker G, Picus D, Ramirez E, Darcy M, Hicks M. Evaluating peer reviews: pilot testing of a grading instrument. JAMA. 1994;272(2):98–100.

Landkroon AP, Euser AM, Veeken H. Quality assessment of reviewers’ reports using a simple instrument. Obstet Gynecol. 2006;108(4):979–85.

Greenland S, O’Rourke K. On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics. 2001;2(4):463–71.

Jüni P, Witschi A, Bloch R. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282(11):1054–60.

Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Schroter S, Price A, Flemyng E, et al. Perspectives on involvement in the peer-review process: surveys of patient and public reviewers at two journals. BMJ Open. 2018;8:e023357.

Ćurković M, Košec A. Bubble effect: including internet search engines in systematic reviews introduces selection bias and impedes scientific reproducibility. BMC Med Res Methodol. 2018;18(1):130.

Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, et al. How to increase value and reduce waste when research priorities are set. Lancet. 2014;383(9912):156–65.

Kliewer MA, Freed KS, DeLong DM, Pickhardt PJ, Provenzale JM. Reviewing the reviewers: comparison of review quality and reviewer characteristics at the American journal of roentgenology. AJR. 2005;184(6):1731–5.

Berquist T. Improving your reviewer score: it’s not that difficult. AJR. 2017;209:711–2.

Callaham ML, Mcculloch C. Longitudinal trends in the performance of scientific peer reviewers. Ann Emerg Med. 2011;57(2):141–8.

Yang Y. Effects of training reviewers on quality of peer review: a before-and-after study (Abstract). https://peerreviewcongress.org/abstracts_2009.html . Accessed 7 Nov 2017.

Prechelt L. Review quality collector. https://reviewqualitycollector.org/static/pdf/rqdef-example.pdf . Accessed 20 Oct 2017.

Das Sinha S, Sahni P, Nundy S. Does exchanging comments of Indian and non-Indian reviewers improve the quality of manuscript reviews? Natl Med J India. 1999;12(5):210–3.

Callaham ML, Schriger DL. Effect of structured workshop training on subsequent performance of journal peer reviewers. Ann Emerg Med. 2002;40(3):323–8.

Download references

Acknowledgments

The authors would like to thank the MiRoR consortium for their support, Elizabeth Moylan for helping to identify further relevant reports and Melissa Sharp for providing advice during the writing of this article.

This project was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no 676207. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials

The datasets supporting the conclusions of the present study will be available in the Zenodo repository in the Methods in Research on Research (MiRoR) community [ https://zenodo.org/communities/miror/?page=1&size=20 ].

Author information

Authors and affiliations.

Department of Statistics and Operations Research, Barcelona-Tech, UPC, c/ Jordi Girona 1-3, 08034, Barcelona, Spain

Cecilia Superchi, José Antonio González & Erik Cobo

INSERM, U1153 Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), Methods of therapeutic evaluation of chronic diseases Team (METHODS), F-75014, Paris, France

Cecilia Superchi

Paris Descartes University, Sorbonne Paris Cité, Paris, France

Iberoamerican Cochrane Centre, Hospital de la Santa Creu i Sant Pau, C/ Sant Antoni Maria Claret 167, Pavelló 18 - planta 0, 08025, Barcelona, Spain

CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain

Department of Psychology, Faculty of Humanities and Social Sciences, University of Split, Split, Croatia

Centre d’épidémiologie Clinique, Hôpital Hôtel-Dieu, 1 place du Paris Notre-Dame, 75004, Paris, France

Isabelle Boutron

You can also search for this author in PubMed   Google Scholar

Contributions

All authors provided intellectual contributions to the development of this study. CS, EC and IB had the initial idea and with JAG and DH, designed the study. CS designed the search in collaboration with IS. CS conducted the screening and JAG carried out a quality control of a 25% random sample. CS and JAG conducted the data extraction. CS conducted the analysis and with JAG designed the figures. CS led the writing of the manuscript. IB led the supervision of the manuscript preparation. All authors provided detailed comments on earlier drafts and approved the final manuscript.

Corresponding author

Correspondence to Cecilia Superchi .

Ethics declarations

Ethics approval and consent to participate.

Not required.

Consent for publication

Not applicable.

Competing interests

All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare that (1) no authors have support from any company for the submitted work; (2) IB is the deputy director of French EQUATOR that might have an interest in the work submitted; (3) no author’s spouse, partner, or children have any financial relationships that could be relevant to the submitted work; and (4) none of the authors has any non-financial interests that could be relevant to the submitted work.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Search strategies. (PDF 182 kb)

Additional file 2:

Excluded studies. (PDF 332 kb)

Additional file 3:

Included studies. (PDF 244 kb)

Additional file 4:

Classification of peer review report quality components. (PDF 2660 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Superchi, C., González, J.A., Solà, I. et al. Tools used to assess the quality of peer review reports: a methodological systematic review. BMC Med Res Methodol 19 , 48 (2019). https://doi.org/10.1186/s12874-019-0688-x

Download citation

Received : 11 July 2018

Accepted : 20 February 2019

Published : 06 March 2019

DOI : https://doi.org/10.1186/s12874-019-0688-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Peer review
  • Quality control
  • Systematic review

BMC Medical Research Methodology

ISSN: 1471-2288

peer evaluation for research paper

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Publish?
  • About Science and Public Policy
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

1. introduction, 2. background, 4. findings, 5. discussion, 6. conclusion and final remarks, supplementary material, data availability, conflict of interest statement., acknowledgements.

  • < Previous

Evaluation of research proposals by peer review panels: broader panels for broader assessments?

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Rebecca Abma-Schouten, Joey Gijbels, Wendy Reijmerink, Ingeborg Meijer, Evaluation of research proposals by peer review panels: broader panels for broader assessments?, Science and Public Policy , Volume 50, Issue 4, August 2023, Pages 619–632, https://doi.org/10.1093/scipol/scad009

  • Permissions Icon Permissions

Panel peer review is widely used to decide which research proposals receive funding. Through this exploratory observational study at two large biomedical and health research funders in the Netherlands, we gain insight into how scientific quality and societal relevance are discussed in panel meetings. We explore, in ten review panel meetings of biomedical and health funding programmes, how panel composition and formal assessment criteria affect the arguments used. We observe that more scientific arguments are used than arguments related to societal relevance and expected impact. Also, more diverse panels result in a wider range of arguments, largely for the benefit of arguments related to societal relevance and impact. We discuss how funders can contribute to the quality of peer review by creating a shared conceptual framework that better defines research quality and societal relevance. We also contribute to a further understanding of the role of diverse peer review panels.

Scientific biomedical and health research is often supported by project or programme grants from public funding agencies such as governmental research funders and charities. Research funders primarily rely on peer review, often a combination of independent written review and discussion in a peer review panel, to inform their funding decisions. Peer review panels have the difficult task of integrating and balancing the various assessment criteria to select and rank the eligible proposals. With the increasing emphasis on societal benefit and being responsive to societal needs, the assessment of research proposals ought to include broader assessment criteria, including both scientific quality and societal relevance, and a broader perspective on relevant peers. This results in new practices of including non-scientific peers in review panels ( Del Carmen Calatrava Moreno et al. 2019 ; Den Oudendammer et al. 2019 ; Van den Brink et al. 2016 ). Relevant peers, in the context of biomedical and health research, include, for example, health-care professionals, (healthcare) policymakers, and patients as the (end-)users of research.

Currently, in scientific and grey literature, much attention is paid to what legitimate criteria are and to deficiencies in the peer review process, for example, focusing on the role of chance and the difficulty of assessing interdisciplinary or ‘blue sky’ research ( Langfeldt 2006 ; Roumbanis 2021a ). Our research primarily builds upon the work of Lamont (2009) , Huutoniemi (2012) , and Kolarz et al. (2016) . Their work articulates how the discourse in peer review panels can be understood by giving insight into disciplinary assessment cultures and social dynamics, as well as how panel members define and value concepts such as scientific excellence, interdisciplinarity, and societal impact. At the same time, there is little empirical work on what actually is discussed in peer review meetings and to what extent this is related to the specific objectives of the research funding programme. Such observational work is especially lacking in the biomedical and health domain.

The aim of our exploratory study is to learn what arguments panel members use in a review meeting when assessing research proposals in biomedical and health research programmes. We explore how arguments used in peer review panels are affected by (1) the formal assessment criteria and (2) the inclusion of non-scientific peers in review panels, also called (end-)users of research, societal stakeholders, or societal actors. We add to the existing literature by focusing on the actual arguments used in peer review assessment in practice.

To this end, we observed ten panel meetings in a variety of eight biomedical and health research programmes at two large research funders in the Netherlands: the governmental research funder The Netherlands Organisation for Health Research and Development (ZonMw) and the charitable research funder the Dutch Heart Foundation (DHF). Our first research question focuses on what arguments panel members use when assessing research proposals in a review meeting. The second examines to what extent these arguments correspond with the formal −as described in the programme brochure and assessment form− criteria on scientific quality and societal impact creation. The third question focuses on how arguments used differ between panel members with different perspectives.

2.1 Relation between science and society

To understand the dual focus of scientific quality and societal relevance in research funding, a theoretical understanding and a practical operationalisation of the relation between science and society are needed. The conceptualisation of this relationship affects both who are perceived as relevant peers in the review process and the criteria by which research proposals are assessed.

The relationship between science and society is not constant over time nor static, yet a relation that is much debated. Scientific knowledge can have a huge impact on societies, either intended or unintended. Vice versa, the social environment and structure in which science takes place influence the rate of development, the topics of interest, and the content of science. However, the second part of this inter-relatedness between science and society generally receives less attention ( Merton 1968 ; Weingart 1999 ).

From a historical perspective, scientific and technological progress contributed to the view that science was valuable on its own account and that science and the scientist stood independent of society. While this protected science from unwarranted political influence, societal disengagement with science resulted in less authority by science and debate about its contribution to society. This interdependence and mutual influence contributed to a modern view of science in which knowledge development is valued both on its own merit and for its impact on, and interaction with, society. As such, societal factors and problems are important drivers for scientific research. This warrants that the relation and boundaries between science, society, and politics need to be organised and constantly reinforced and reiterated ( Merton 1968 ; Shapin 2008 ; Weingart 1999 ).

Glerup and Horst (2014) conceptualise the value of science to society and the role of society in science in four rationalities that reflect different justifications for their relation and thus also for who is responsible for (assessing) the societal value of science. The rationalities are arranged along two axes: one is related to the internal or external regulation of science and the other is related to either the process or the outcome of science as the object of steering. The first two rationalities of Reflexivity and Demarcation focus on internal regulation in the scientific community. Reflexivity focuses on the outcome. Central is that science, and thus, scientists should learn from societal problems and provide solutions. Demarcation focuses on the process: science should continuously question its own motives and methods. The latter two rationalities of Contribution and Integration focus on external regulation. The core of the outcome-oriented Contribution rationality is that scientists do not necessarily see themselves as ‘working for the public good’. Science should thus be regulated by society to ensure that outcomes are useful. The central idea of the process-oriented Integration rationality is that societal actors should be involved in science in order to influence the direction of research.

Research funders can be seen as external or societal regulators of science. They can focus on organising the process of science, Integration, or on scientific outcomes that function as solutions for societal challenges, Contribution. In the Contribution perspective, a funder could enhance outside (societal) involvement in science to ensure that scientists take responsibility to deliver results that are needed and used by society. From Integration follows that actors from science and society need to work together in order to produce the best results. In this perspective, there is a lack of integration between science and society and more collaboration and dialogue are needed to develop a new kind of integrative responsibility ( Glerup and Horst 2014 ). This argues for the inclusion of other types of evaluators in research assessment. In reality, these rationalities are not mutually exclusive and also not strictly separated. As a consequence, multiple rationalities can be recognised in the reasoning of scientists and in the policies of research funders today.

2.2 Criteria for research quality and societal relevance

The rationalities of Glerup and Horst have consequences for which language is used to discuss societal relevance and impact in research proposals. Even though the main ingredients are quite similar, as a consequence of the coexisting rationalities in science, societal aspects can be defined and operationalised in different ways ( Alla et al. 2017 ). In the definition of societal impact by Reed, emphasis is placed on the outcome : the contribution to society. It includes the significance for society, the size of potential impact, and the reach , the number of people or organisations benefiting from the expected outcomes ( Reed et al. 2021 ). Other models and definitions focus more on the process of science and its interaction with society. Spaapen and Van Drooge introduced productive interactions in the assessment of societal impact, highlighting a direct contact between researchers and other actors. A key idea is that the interaction in different domains leads to impact in different domains ( Meijer 2012 ; Spaapen and Van Drooge 2011 ). Definitions that focus on the process often refer to societal impact as (1) something that can take place in distinguishable societal domains, (2) something that needs to be actively pursued, and (3) something that requires interactions with societal stakeholders (or users of research) ( Hughes and Kitson 2012 ; Spaapen and Van Drooge 2011 ).

Glerup and Horst show that process and outcome-oriented aspects can be combined in the operationalisation of criteria for assessing research proposals on societal aspects. Also, the funders participating in this study include the outcome—the value created in different domains—and the process—productive interactions with stakeholders—in their formal assessment criteria for societal relevance and impact. Different labels are used for these criteria, such as societal relevance , societal quality , and societal impact ( Abma-Schouten 2017 ; Reijmerink and Oortwijn 2017 ). In this paper, we use societal relevance or societal relevance and impact .

Scientific quality in research assessment frequently refers to all aspects and activities in the study that contribute to the validity and reliability of the research results and that contribute to the integrity and quality of the research process itself. The criteria commonly include the relevance of the proposal for the funding programme, the scientific relevance, originality, innovativeness, methodology, and feasibility ( Abdoul et al. 2012 ). Several studies demonstrated that quality is seen as not only a rich concept but also a complex concept in which excellence and innovativeness, methodological aspects, engagement of stakeholders, multidisciplinary collaboration, and societal relevance all play a role ( Geurts 2016 ; Roumbanis 2019 ; Scholten et al. 2018 ). Another study showed a comprehensive definition of ‘good’ science, which includes creativity, reproducibility, perseverance, intellectual courage, and personal integrity. It demonstrated that ‘good’ science involves not only scientific excellence but also personal values and ethics, and engagement with society ( Van den Brink et al. 2016 ). Noticeable in these studies is the connection made between societal relevance and scientific quality.

In summary, the criteria for scientific quality and societal relevance are conceptualised in different ways, and perspectives on the role of societal value creation and the involvement of societal actors vary strongly. Research funders hence have to pay attention to the meaning of the criteria for the panel members they recruit to help them, and navigate and negotiate how the criteria are applied in assessing research proposals. To be able to do so, more insight is needed in which elements of scientific quality and societal relevance are discussed in practice by peer review panels.

2.3 Role of funders and societal actors in peer review

National governments and charities are important funders of biomedical and health research. How this funding is distributed varies per country. Project funding is frequently allocated based on research programming by specialised public funding organisations, such as the Dutch Research Council in the Netherlands and ZonMw for health research. The DHF, the second largest private non-profit research funder in the Netherlands, provides project funding ( Private Non-Profit Financiering 2020 ). Funders, as so-called boundary organisations, can act as key intermediaries between government, science, and society ( Jasanoff 2011 ). Their responsibility is to develop effective research policies connecting societal demands and scientific ‘supply’. This includes setting up and executing fair and balanced assessment procedures ( Sarewitz and Pielke 2007 ). Herein, the role of societal stakeholders is receiving increasing attention ( Benedictus et al. 2016 ; De Rijcke et al. 2016 ; Dijstelbloem et al. 2013 ; Scholten et al. 2018 ).

All charitable health research funders in the Netherlands have, in the last decade, included patients at different stages of the funding process, including in assessing research proposals ( Den Oudendammer et al. 2019 ). To facilitate research funders in involving patients in assessing research proposals, the federation of Dutch patient organisations set up an independent reviewer panel with (at-risk) patients and direct caregivers ( Patiëntenfederatie Nederland, n.d .). Other foundations have set up societal advisory panels including a wider range of societal actors than patients alone. The Committee Societal Quality (CSQ) of the DHF includes, for example, (at-risk) patients and a wide range of cardiovascular health-care professionals who are not active as academic researchers. This model is also applied by the Diabetes Foundation and the Princess Beatrix Muscle Foundation in the Netherlands ( Diabetesfonds, n.d .; Prinses Beatrix Spierfonds, n.d .).

In 2014, the Lancet presented a series of five papers about biomedical and health research known as the ‘increasing value, reducing waste’ series ( Macleod et al. 2014 ). The authors addressed several issues as well as potential solutions that funders can implement. They highlight, among others, the importance of improving the societal relevance of the research questions and including the burden of disease in research assessment in order to increase the value of biomedical and health science for society. A better understanding of and an increasing role of users of research are also part of the described solutions ( Chalmers et al. 2014 ; Van den Brink et al. 2016 ). This is also in line with the recommendations of the 2013 Declaration on Research Assessment (DORA) ( DORA 2013 ). These recommendations influence the way in which research funders operationalise their criteria in research assessment, how they balance the judgement of scientific and societal aspects, and how they involve societal stakeholders in peer review.

2.4 Panel peer review of research proposals

To assess research proposals, funders rely on the services of peer experts to review the thousands or perhaps millions of research proposals seeking funding each year. While often associated with scholarly publishing, peer review also includes the ex ante assessment of research grant and fellowship applications ( Abdoul et al. 2012 ). Peer review of proposals often includes a written assessment of a proposal by an anonymous peer and a peer review panel meeting to select the proposals eligible for funding. Peer review is an established component of professional academic practice, is deeply embedded in the research culture, and essentially consists of experts in a given domain appraising the professional performance, creativity, and/or quality of scientific work produced by others in their field of competence ( Demicheli and Di Pietrantonj 2007 ). The history of peer review as the default approach for scientific evaluation and accountability is, however, relatively young. While the term was unheard of in the 1960s, by 1970, it had become the standard. Since that time, peer review has become increasingly diverse and formalised, resulting in more public accountability ( Reinhart and Schendzielorz 2021 ).

While many studies have been conducted concerning peer review in scholarly publishing, peer review in grant allocation processes has been less discussed ( Demicheli and Di Pietrantonj 2007 ). The most extensive work on this topic has been conducted by Lamont (2009) . Lamont studied peer review panels in five American research funding organisations, including observing three panels. Other examples include Roumbanis’s ethnographic observations of ten review panels at the Swedish Research Council in natural and engineering sciences ( Roumbanis 2017 , 2021a ). Also, Huutoniemi was able to study, but not observe, four panels on environmental studies and social sciences of the Academy of Finland ( Huutoniemi 2012 ). Additionally, Van Arensbergen and Van den Besselaar (2012) analysed peer review through interviews and by analysing the scores and outcomes at different stages of the peer review process in a talent funding programme. In particular, interesting is the study by Luo and colleagues on 164 written panel review reports, showing that the reviews from panels that included non-scientific peers described broader and more concrete impact topics. Mixed panels also more often connected research processes and characteristics of applicants with impact creation ( Luo et al. 2021 ).

While these studies primarily focused on peer review panels in other disciplinary domains or are based on interviews or reports instead of direct observations, we believe that many of the findings are relevant to the functioning of panels in the context of biomedical and health research. From this literature, we learn to have realistic expectations of peer review. It is inherently difficult to predict in advance which research projects will provide the most important findings or breakthroughs ( Lee et al. 2013 ; Pier et al. 2018 ; Roumbanis 2021a , 2021b ). At the same time, these limitations may not substantiate the replacement of peer review by another assessment approach ( Wessely 1998 ). Many topics addressed in the literature are inter-related and relevant to our study, such as disciplinary differences and interdisciplinarity, social dynamics and their consequences for consistency and bias, and suggestions to improve panel peer review ( Lamont and Huutoniemi 2011 ; Lee et al. 2013 ; Pier et al. 2018 ; Roumbanis 2021a , b ; Wessely 1998 ).

Different scientific disciplines show different preferences and beliefs about how to build knowledge and thus have different perceptions of excellence. However, panellists are willing to respect and acknowledge other standards of excellence ( Lamont 2009 ). Evaluation cultures also differ between scientific fields. Science, technology, engineering, and mathematics panels might, in comparison with panellists from social sciences and humanities, be more concerned with the consistency of the assessment across panels and therefore with clear definitions and uses of assessment criteria ( Lamont and Huutoniemi 2011 ). However, much is still to learn about how panellists’ cognitive affiliations with particular disciplines unfold in the evaluation process. Therefore, the assessment of interdisciplinary research is much more complex than just improving the criteria or procedure because less explicit repertoires would also need to change ( Huutoniemi 2012 ).

Social dynamics play a role as panellists may differ in their motivation to engage in allocation processes, which could create bias ( Lee et al. 2013 ). Placing emphasis on meeting established standards or thoroughness in peer review may promote uncontroversial and safe projects, especially in a situation where strong competition puts pressure on experts to reach a consensus ( Langfeldt 2001 ,2006 ). Personal interest and cognitive similarity may also contribute to conservative bias, which could negatively affect controversial or frontier science ( Luukkonen 2012 ; Roumbanis 2021a ; Travis and Collins 1991 ). Central in this part of literature is that panel conclusions are the outcome of and are influenced by the group interaction ( Van Arensbergen et al. 2014a ). Differences in, for example, the status and expertise of the panel members can play an important role in group dynamics. Insights from social psychology on group dynamics can help in understanding and avoiding bias in peer review panels ( Olbrecht and Bornmann 2010 ). For example, group performance research shows that more diverse groups with complementary skills make better group decisions than homogenous groups. Yet, heterogeneity can also increase conflict within the group ( Forsyth 1999 ). Therefore, it is important to pay attention to power dynamics and maintain team spirit and good communication ( Van Arensbergen et al. 2014a ), especially in meetings that include both scientific and non-scientific peers.

The literature also provides funders with starting points to improve the peer review process. For example, the explicitness of review procedures positively influences the decision-making processes ( Langfeldt 2001 ). Strategic voting and decision-making appear to be less frequent in panels that rate than in panels that rank proposals. Also, an advisory instead of a decisional role may improve the quality of the panel assessment ( Lamont and Huutoniemi 2011 ).

Despite different disciplinary evaluative cultures, formal procedures, and criteria, panel members with different backgrounds develop shared customary rules of deliberation that facilitate agreement and help avoid situations of conflict ( Huutoniemi 2012 ; Lamont 2009 ). This is a necessary prerequisite for opening up peer review panels to include non-academic experts. When doing so, it is important to realise that panel review is a social, emotional, and interactional process. It is therefore important to also take these non-cognitive aspects into account when studying cognitive aspects ( Lamont and Guetzkow 2016 ), as we do in this study.

In summary, what we learn from the literature is that (1) the specific criteria to operationalise scientific quality and societal relevance of research are important, (2) the rationalities from Glerup and Horst predict that not everyone values societal aspects and involve non-scientists in peer review to the same extent and in the same way, (3) this may affect the way peer review panels discuss these aspects, and (4) peer review is a challenging group process that could accommodate other rationalities in order to prevent bias towards specific scientific criteria. To disentangle these aspects, we have carried out an observational study of a diverse range of peer review panel sessions using a fixed set of criteria focusing on scientific quality and societal relevance.

3.1 Research assessment at ZonMw and the DHF

The peer review approach and the criteria used by both the DHF and ZonMw are largely comparable. Funding programmes at both organisations start with a brochure describing the purposes, goals, and conditions for research applications, as well as the assessment procedure and criteria. Both organisations apply a two-stage process. In the first phase, reviewers are asked to write a peer review. In the second phase, a panel reviews the application based on the advice of the written reviews and the applicants’ rebuttal. The panels advise the board on eligible proposals for funding including a ranking of these proposals.

There are also differences between the two organisations. At ZonMw, the criteria for societal relevance and quality are operationalised in the ZonMw Framework Fostering Responsible Research Practices ( Reijmerink and Oortwijn 2017 ). This contributes to a common operationalisation of both quality and societal relevance on the level of individual funding programmes. Important elements in the criteria for societal relevance are, for instance, stakeholder participation, (applying) holistic health concepts, and the added value of knowledge in practice, policy, and education. The framework was developed to optimise the funding process from the perspective of knowledge utilisation and includes concepts like productive interactions and Open Science. It is part of the ZonMw Impact Assessment Framework aimed at guiding the planning, monitoring, and evaluation of funding programmes ( Reijmerink et al. 2020 ). At ZonMw, interdisciplinary panels are set up specifically for each funding programme. Panels are interdisciplinary in nature with academics of a wide range of disciplines and often include non-academic peers, like policymakers, health-care professionals, and patients.

At the DHF, the criteria for scientific quality and societal relevance, at the DHF called societal impact , find their origin in the strategy report of the advisory committee CardioVascular Research Netherlands ( Reneman et al. 2010 ). This report forms the basis of the DHF research policy focusing on scientific and societal impact by creating national collaborations in thematic, interdisciplinary research programmes (the so-called consortia) connecting preclinical and clinical expertise into one concerted effort. An International Scientific Advisory Committee (ISAC) was established to assess these thematic consortia. This panel consists of international scientists, primarily with expertise in the broad cardiovascular research field. The DHF criteria for societal impact were redeveloped in 2013 in collaboration with their CSQ. This panel assesses and advises on the societal aspects of proposed studies. The societal impact criteria include the relevance of the health-care problem, the expected contribution to a solution, attention to the next step in science and towards implementation in practice, and the involvement of and interaction with (end-)users of research (R.Y. Abma-Schouten and I.M. Meijer, unpublished data). Peer review panels for consortium funding are generally composed of members of the ISAC, members of the CSQ, and ad hoc panel members relevant to the specific programme. CSQ members often have a pre-meeting before the final panel meetings to prepare and empower CSQ representatives participating in the peer review panel.

3.2 Selection of funding programmes

To compare and evaluate observations between the two organisations, we selected funding programmes that were relatively comparable in scope and aims. The criteria were (1) a translational and/or clinical objective and (2) the selection procedure consisted of review panels that were responsible for the (final) relevance and quality assessment of grant applications. In total, we selected eight programmes: four at each organisation. At the DHF, two programmes were chosen in which the CSQ did not participate to better disentangle the role of the panel composition. For each programme, we observed the selection process varying from one session on one day (taking 2–8 h) to multiple sessions over several days. Ten sessions were observed in total, of which eight were final peer review panel meetings and two were CSQ meetings preparing for the panel meeting.

After management approval for the study in both organisations, we asked programme managers and panel chairpersons of the programmes that were selected for their consent for observation; none refused participation. Panel members were, in a passive consent procedure, informed about the planned observation and anonymous analyses.

To ensure the independence of this evaluation, the selection of the grant programmes, and peer review panels observed, was at the discretion of the project team of this study. The observations and supervision of the analyses were performed by the senior author not affiliated with the funders.

3.3 Observation matrix

Given the lack of a common operationalisation for scientific quality and societal relevance, we decided to use an observation matrix with a fixed set of detailed aspects as a gold standard to score the brochures, the assessment forms, and the arguments used in panel meetings. The matrix used for the observations of the review panels was based upon and adapted from a ‘grant committee observation matrix’ developed by Van Arensbergen. The original matrix informed a literature review on the selection of talent through peer review and the social dynamics in grant review committees ( van Arensbergen et al. 2014b ). The matrix includes four categories of aspects that operationalise societal relevance, scientific quality, committee, and applicant (see  Table 1 ). The aspects of scientific quality and societal relevance were adapted to fit the operationalisation of scientific quality and societal relevance of the organisations involved. The aspects concerning societal relevance were derived from the CSQ criteria, and the aspects concerning scientific quality were based on the scientific criteria of the first panel observed. The four argument types related to the panel were kept as they were. This committee-related category reflects statements that are related to the personal experience or preference of a panel member and can be seen as signals for bias. This category also includes statements that compare a project with another project without further substantiation. The three applicant-related arguments in the original observation matrix were extended with a fourth on social skills in communication with society. We added health technology assessment (HTA) because one programme specifically focused on this aspect. We tested our version of the observation matrix in pilot observations.

Aspects included in the observation matrix and examples of arguments.

3.4 Observations

Data were primarily collected through observations. Our observations of review panel meetings were non-participatory: the observer and goal of the observation were introduced at the start of the meeting, without further interactions during the meeting. To aid in the processing of observations, some meetings were audiotaped (sound only). Presentations or responses of applicants were not noted and were not part of the analysis. The observer made notes on the ongoing discussion and scored the arguments while listening. One meeting was not attended in person and only observed and scored by listening to the audiotape recording. Because this made identification of the panel members unreliable, this panel meeting was excluded from the analysis of the third research question on how arguments used differ between panel members with different perspectives.

3.5 Grant programmes and the assessment criteria

We gathered and analysed all brochures and assessment forms used by the review panels in order to answer our second research question on the correspondence of arguments used with the formal criteria. Several programmes consisted of multiple grant calls: in that case, the specific call brochure was gathered and analysed, not the overall programme brochure. Additional documentation (e.g. instructional presentations at the start of the panel meeting) was not included in the document analysis. All included documents were marked using the aforementioned observation matrix. The panel-related arguments were not used because this category reflects the personal arguments of panel members that are not part of brochures or instructions. To avoid potential differences in scoring methods, two of the authors independently scored half of the documents that were checked and validated afterwards by the other. Differences were discussed until a consensus was reached.

3.6 Panel composition

In order to answer the third research question, background information on panel members was collected. We categorised the panel members into five common types of panel members: scientific, clinical scientific, health-care professional/clinical, patient, and policy. First, a list of all panel members was composed including their scientific and professional backgrounds and affiliations. The theoretical notion that reviewers represent different types of users of research and therefore potential impact domains (academic, social, economic, and cultural) was leading in the categorisation ( Meijer 2012 ; Spaapen and Van Drooge 2011 ). Because clinical researchers play a dual role in both advancing research as a fellow academic and as a user of the research output in health-care practice, we divided the academic members into two categories of non-clinical and clinical researchers. Multiple types of professional actors participated in each review panel. These were divided into two groups for the analysis: health-care professionals (without current academic activity) and policymakers in the health-care sector. No representatives of the private sector participated in the observed review panels. From the public domain, (at-risk) patients and patient representatives were part of several review panels. Only publicly available information was used to classify the panel members. Members were assigned to one category only: categorisation took place based on the specific role and expertise for which they were appointed to the panel.

In two of the four DHF programmes, the assessment procedure included the CSQ. In these two programmes, representatives of this CSQ participated in the scientific panel to articulate the findings of the CSQ meeting during the final assessment meeting. Two grant programmes were assessed by a review panel with solely (clinical) scientific members.

3.7 Analysis

Data were processed using ATLAS.ti 8 and Microsoft Excel 2010 to produce descriptive statistics. All observed arguments were coded and given a randomised identification code for the panel member using that particular argument. The number of times an argument type was observed was used as an indicator for the relative importance of that argument in the appraisal of proposals. With this approach, a practical and reproducible method for research funders to evaluate the effect of policy changes on peer review was developed. If codes or notes were unclear, post-observation validation of codes was carried out based on observation matrix notes. Arguments that were noted by the observer but could not be matched with an existing code were first coded as a ‘non-existing’ code, and these were resolved by listening back to the audiotapes. Arguments that could not be assigned to a panel member were assigned a ‘missing panel member’ code. A total of 4.7 per cent of all codes were assigned a ‘missing panel member’ code.

After the analyses, two meetings were held to reflect on the results: one with the CSQ and the other with the programme coordinators of both organisations. The goal of these meetings was to improve our interpretation of the findings, disseminate the results derived from this project, and identify topics for further analyses or future studies.

3.8 Limitations

Our study focuses on studying the final phase of the peer review process of research applications in a real-life setting. Our design, a non-participant observation of peer review panels, also introduced several challenges ( Liu and Maitlis 2010 ).

First, the independent review phase or pre-application phase was not part of our study. We therefore could not assess to what extent attention to certain aspects of scientific quality or societal relevance and impact in the review phase influenced the topics discussed during the meeting.

Second, the most important challenge of overt non-participant observations is the observer effect: the danger of causing reactivity in those under study. We believe that the consequences of this effect on our conclusions were limited because panellists are used to external observers in the meetings of these two funders. The observer briefly explained the goal of the study during the introductory round of the panel in general terms. The observer sat as unobtrusively as possible and avoided reactivity to discussions. Similar to previous observations of panels, we experienced that the fact that an observer was present faded into the background during a meeting ( Roumbanis 2021a ). However, a limited observer effect can never be entirely excluded.

Third, our design to only score the arguments raised, and not the responses of the applicant, or information on the content of the proposals, has its positives and negatives. With this approach, we could assure the anonymity of the grant procedures reviewed, the applicants and proposals, panels, and individual panellists. This was an important condition for the funders involved. We took the frequency arguments used as a proxy for the relative importance of that argument in decision-making, which undeniably also has its caveats. Our data collection approach limits more in-depth reflection on which arguments were decisive in decision-making and on group dynamics during the interaction with the applicants as non-verbal and non-content-related comments were not captured in this study.

Fourth, despite this being one of the largest observational studies on the peer review assessment of grant applications with the observation of ten panels in eight grant programmes, many variables might explain differences in arguments used within and beyond our view. Examples of ‘confounding’ variables are the many variations in panel composition, the differences in objectives of the programmes, and the range of the funding programmes. Our study should therefore be seen as exploratory and thus warrants caution in drawing conclusions.

4.1 Overview of observational data

The grant programmes included in this study reflected a broad range of biomedical and health funding programmes, ranging from fellowship grants to translational research and applied health research. All formal documents available to the applicants and to the review panel were retrieved for both ZonMw and the DHF. In total, eighteen documents corresponding to the eight grant programmes were studied. The number of proposals assessed per programme varied from three to thirty-three. The duration of the panel meetings varied between 2 h and two consecutive days. Together, this resulted in a large spread in the number of total arguments used in an individual meeting and in a grant programme as a whole. In the shortest meeting, 49 arguments were observed versus 254 in the longest, with a mean of 126 arguments per meeting and on average 15 arguments per proposal.

We found consistency between how criteria were operationalised in the grant programme’s brochures and in the assessment forms of the review panels overall. At the same time, because the number of elements included in the observation matrix is limited, there was a considerable diversity in the arguments that fall within each aspect (see examples in  Table 1 ). Some of these differences could possibly be explained by differences in language used and the level of detail in the observation matrix, the brochure, and the panel’s instructions. This was especially the case in the applicant-related aspects in which the observation matrix was more detailed than the text in the brochure and assessment forms.

In interpretating our findings, it is important to take into account that, even though our data were largely complete and the observation matrix matched well with the description of the criteria in the brochures and assessment forms, there was a large diversity in the type and number of arguments used and in the number of proposals assessed in the grant programmes included in our study.

4.2 Wide range of arguments used by panels: scientific arguments used most

For our first research question, we explored the number and type of arguments used in the panel meetings. Figure 1 provides an overview of the arguments used. Scientific quality was discussed most. The number of times the feasibility of the aims was discussed clearly stands out in comparison to all other arguments. Also, the match between the science and the problem studied and the plan of work were frequently discussed aspects of scientific quality. International competitiveness of the proposal was discussed the least of all five scientific arguments.

The number of arguments used in panel meetings.

The number of arguments used in panel meetings.

Attention was paid to societal relevance and impact in the panel meetings of both organisations. Yet, the language used differed somewhat between organisations. The contribution to a solution and the next step in science were the most often used societal arguments. At ZonMw, the impact of the health-care problem studied and the activities towards partners were less frequently discussed than the other three societal arguments. At the DHF, the five societal arguments were used equally often.

With the exception of the fellowship programme meeting, applicant-related arguments were not often used. The fellowship panel used arguments related to the applicant and to scientific quality about equally often. Committee-related arguments were also rarely used in the majority of the eight grant programmes observed. In three out of the ten panel meetings, one or two arguments were observed, which were related to personal experience with the applicant or their direct network. In seven out of ten meetings, statements were observed, which were unasserted or were explicitly announced as reflecting a personal preference. The frequency varied between one and seven statements (sixteen in total), which is low in comparison to the other arguments used (see  Fig. 1 for examples).

4.3 Use of arguments varied strongly per panel meeting

The balance in the use of scientific and societal arguments varied strongly per grant programme, panel, and organisation. At ZonMw, two meetings had approximately an equal balance in societal and scientific arguments. In the other two meetings, scientific arguments were used twice to four times as often as societal arguments. At the DHF, three types of panels were observed. Different patterns in the relative use of societal and scientific arguments were observed for each of these panel types. In the two CSQ-only meetings the societal arguments were used approximately twice as often as scientific arguments. In the two meetings of the scientific panels, societal arguments were infrequently used (between zero and four times per argument category). In the combined societal and scientific panel meetings, the use of societal and scientific arguments was more balanced.

4.4 Match of arguments used by panels with the assessment criteria

In order to answer our second research question, we looked into the relation of the arguments used with the formal criteria. We observed that a broader range of arguments were often used in comparison to how the criteria were described in the brochure and assessment instruction. However, arguments related to aspects that were consequently included in the brochure and instruction seemed to be discussed more frequently than in programmes where those aspects were not consistently included or were not included at all. Although the match of the science with the health-care problem and the background and reputation of the applicant were not always made explicit in the brochure or instructions, they were discussed in many panel meetings. Supplementary Fig. S1 provides a visualisation of how arguments used differ between the programmes in which those aspects were, were not, consistently included in the brochure and instruction forms.

4.5 Two-thirds of the assessment was driven by scientific panel members

To answer our third question, we looked into the differences in arguments used between panel members representing a scientific, clinical scientific, professional, policy, or patient perspective. In each research programme, the majority of panellists had a scientific background ( n  = 35), thirty-four members had a clinical scientific background, twenty had a health professional/clinical background, eight members represented a policy perspective, and fifteen represented a patient perspective. From the total number of arguments (1,097), two-thirds were made by members with a scientific or clinical scientific perspective. Members with a scientific background engaged most actively in the discussion with a mean of twelve arguments per member. Similarly, clinical scientists and health-care professionals participated with a mean of nine arguments, and members with a policy and patient perspective put forward the least number of arguments on average, namely, seven and eight. Figure 2 provides a complete overview of the total and mean number of arguments used by the different disciplines in the various panels.

The total and mean number of arguments displayed per subgroup of panel members.

The total and mean number of arguments displayed per subgroup of panel members.

4.6 Diverse use of arguments by panellists, but background matters

In meetings of both organisations, we observed a diverse use of arguments by the panel members. Yet, the use of arguments varied depending on the background of the panel member (see  Fig. 3 ). Those with a scientific and clinical scientific perspective used primarily scientific arguments. As could be expected, health-care professionals and patients used societal arguments more often.

The use of arguments differentiated by panel member background.

The use of arguments differentiated by panel member background.

Further breakdown of arguments across backgrounds showed clear differences in the use of scientific arguments between the different disciplines of panellists. Scientists and clinical scientists discussed the feasibility of the aims more than twice as often as their second most often uttered element of scientific quality, which was the match between the science and the problem studied . Patients and members with a policy or health professional background put forward fewer but more varied scientific arguments.

Patients and health-care professionals accounted for approximately half of the societal arguments used, despite being a much smaller part of the panel’s overall composition. In other words, members with a scientific perspective were less likely to use societal arguments. The relevance of the health-care problem studied, activities towards partners , and arguments related to participation and diversity were not used often by this group. Patients often used arguments related to patient participation and diversity and activities towards partners , although the frequency of the use of the latter differed per organisation.

The majority of the applicant-related arguments were put forward by scientists, including clinical scientists. Committee-related arguments were very rare and are therefore not differentiated by panel member background, except comments related to a comparison with other applications. These arguments were mainly put forward by panel members with a scientific background. HTA -related arguments were often used by panel members with a scientific perspective. Panel members with other perspectives used this argument scarcely (see Supplementary Figs S2–S4 for the visual presentation of the differences between panel members on all aspects included in the matrix).

5.1 Explanations for arguments used in panels

Our observations show that most arguments for scientific quality were often used. However, except for the feasibility , the frequency of arguments used varied strongly between the meetings and between the individual proposals that were discussed. The fact that most arguments were not consistently used is not surprising given the results from previous studies that showed heterogeneity in grant application assessments and low consistency in comments and scores by independent reviewers ( Abdoul et al. 2012 ; Pier et al. 2018 ). In an analysis of written assessments on nine observed dimensions, no dimension was used in more than 45 per cent of the reviews ( Hartmann and Neidhardt 1990 ).

There are several possible explanations for this heterogeneity. Roumbanis (2021a) described how being responsive to the different challenges in the proposals and to the points of attention arising from the written assessments influenced discussion in panels. Also when a disagreement arises, more time is spent on discussion ( Roumbanis 2021a ). One could infer that unambiguous, and thus not debated, aspects might remain largely undetected in our study. We believe, however, that the main points relevant to the assessment will not remain entirely unmentioned, because most panels in our study started the discussion with a short summary of the proposal, the written assessment, and the rebuttal. Lamont (2009) , however, points out that opening statements serve more goals than merely decision-making. They can also increase the credibility of the panellist, showing their comprehension and balanced assessment of an application. We can therefore not entirely disentangle whether the arguments observed most were also found to be most important or decisive or those were simply the topics that led to most disagreement.

An interesting difference with Roumbanis’ study was the available discussion time per proposal. In our study, most panels handled a limited number of proposals, allowing for longer discussions in comparison with the often 2-min time frame that Roumbanis (2021b) described, potentially contributing to a wider range of arguments being discussed. Limited time per proposal might also limit the number of panellists contributing to the discussion per proposal ( De Bont 2014 ).

5.2 Reducing heterogeneity by improving operationalisation and the consequent use of assessment criteria

We found that the language used for the operationalisation of the assessment criteria in programme brochures and in the observation matrix was much more detailed than in the instruction for the panel, which was often very concise. The exercise also illustrated that many terms were used interchangeably.

This was especially true for the applicant-related aspects. Several panels discussed how talent should be assessed. This confusion is understandable when considering the changing values in research and its assessment ( Moher et al. 2018 ) and the fact that the instruction of the funders was very concise. For example, it was not explicated whether the individual or the team should be assessed. Arensbergen et al. (2014b) described how in grant allocation processes, talent is generally assessed using limited characteristics. More objective and quantifiable outputs often prevailed at the expense of recognising and rewarding a broad variety of skills and traits combining professional, social, and individual capital ( DORA 2013 ).

In addition, committee-related arguments, like personal experiences with the applicant or their institute, were rarely used in our study. Comparisons between proposals were sometimes made without further argumentation, mainly by scientific panel members. This was especially pronounced in one (fellowship) grant programme with a high number of proposals. In this programme, the panel meeting concentrated on quickly comparing the quality of the applicants and of the proposals based on the reviewer’s judgement, instead of a more in-depth discussion of the different aspects of the proposals. Because the review phase was not part of this study, the question of which aspects have been used for the assessment of the proposals in this panel therefore remains partially unanswered. However, weighing and comparing proposals on different aspects and with different inputs is a core element of scientific peer review, both in the review of papers and in the review of grants ( Hirschauer 2010 ). The large role of scientific panel members in comparing proposals is therefore not surprising.

One could anticipate that more consequent language in the operationalising criteria may lead to more clarity for both applicants and panellists and to more consistency in the assessment of research proposals. The trend in our observations was that arguments were used less when the related criteria were not or were consequently included in the brochure and panel instruction. It remains, however, challenging to disentangle the influence of the formal definitions of criteria on the arguments used. Previous studies also encountered difficulties in studying the role of the formal instruction in peer review but concluded that this role is relatively limited ( Langfeldt 2001 ; Reinhart 2010 ).

The lack of a clear operationalisation of criteria can contribute to heterogeneity in peer review as many scholars found that assessors differ in the conceptualisation of good science and to the importance they attach to various aspects of research quality and societal relevance ( Abdoul et al. 2012 ; Geurts 2016 ; Scholten et al. 2018 ; Van den Brink et al. 2016 ). The large variation and absence of a gold standard in the interpretation of scientific quality and societal relevance affect the consistency of peer review. As a consequence, it is challenging to systematically evaluate and improve peer review in order to fund the research that contributes most to science and society. To contribute to responsible research and innovation, it is, therefore, important that funders invest in a more consistent and conscientious peer review process ( Curry et al. 2020 ; DORA 2013 ).

A common conceptualisation of scientific quality and societal relevance and impact could improve the alignment between views on good scientific conduct, programmes’ objectives, and the peer review in practice. Such a conceptualisation could contribute to more transparency and quality in the assessment of research. By involving panel members from all relevant backgrounds, including the research community, health-care professionals, and societal actors, in a better operationalisation of criteria, more inclusive views of good science can be implemented more systematically in the peer review assessment of research proposals. The ZonMw Framework Fostering Responsible Research Practices is an example of an initiative aiming to support standardisation and integration ( Reijmerink et al. 2020 ).

Given the lack of a common definition or conceptualisation of scientific quality and societal relevance, our study made an important decision by choosing to use a fixed set of detailed aspects of two important criteria as a gold standard to score the brochures, the panel instructions, and the arguments used by the panels. This approach proved helpful in disentangling the different components of scientific quality and societal relevance. Having said that, it is important not to oversimplify the causes for heterogeneity in peer review because these substantive arguments are not independent of non-cognitive, emotional, or social aspects ( Lamont and Guetzkow 2016 ; Reinhart 2010 ).

5.3 Do more diverse panels contribute to a broader use of arguments?

Both funders participating in our study have an outspoken public mission that requests sufficient attention to societal aspects in assessment processes. In reality, as observed in several panels, the main focus of peer review meetings is on scientific arguments. Next to the possible explanations earlier, the composition of the panel might play a role in explaining arguments used in panel meetings. Our results have shown that health-care professionals and patients bring in more societal arguments than scientists, including those who are also clinicians. It is, however, not that simple. In the more diverse panels, panel members, regardless of their backgrounds, used more societal arguments than in the less diverse panels.

Observing ten panel meetings was sufficient to explore differences in arguments used by panel members with different backgrounds. The pattern of (primarily) scientific arguments being raised by panels with mainly scientific members is not surprising. After all, it is their main task to assess the scientific content of grant proposals and fit their competencies. As such, one could argue, depending on how one justifies the relationship between science and society, that health-care professionals and patients might be better suited to assess the value for potential users of research results. Scientific panel members and clinical scientists in our study used less arguments that reflect on opening up and connecting science directly to others who can bring it further (being industry, health-care professionals, or other stakeholders). Patients filled this gap since these two types of arguments were the most prevalent type put forward by them. Making an active connection with society apparently needs a broader, more diverse panel for scientists to direct their attention to more societal arguments. Evident from our observations is that in panels with patients and health-care professionals, their presence seemed to increase the attention placed on arguments beyond the scientific arguments put forward by all panel members, including scientists. This conclusion is congruent with the observation that there was a more equal balance in the use of societal and scientific arguments in the scientific panels in which the CSQ participated. This illustrates that opening up peer review panels to non-scientific members creates an opportunity to focus on both the contribution and the integrative rationality ( Glerup and Horst 2014 ) or, in other words, to allow productive interactions between scientific and non-scientific actors. This corresponds with previous research that suggests that with regard to societal aspects, reviews from mixed panels were broader and richer ( Luo et al. 2021 ). In panels with non-scientific experts, more emphasis was placed on the role of the proposed research process to increase the likelihood of societal impact over the causal importance of scientific excellence for broader impacts. This is in line with the findings that panels with more disciplinary diversity, in range and also by including generalist experts, applied more versatile styles to reach consensus and paid more attention to relevance and pragmatic value ( Huutoniemi 2012 ).

Our observations further illustrate that patients and health-care professionals were less vocal in panels than (clinical) scientists and were in the minority. This could reflect their social role and lower perceived authority in the panel. Several guides are available for funders to stimulate the equal participation of patients in science. These guides are also applicable to their involvement in peer review panels. Measures to be taken include the support and training to help prepare patients for their participation in deliberations with renowned scientists and explicitly addressing power differences ( De Wit et al. 2016 ). Panel chairs and programme officers have to set and supervise the conditions for the functioning of both the individual panel members and the panel as a whole ( Lamont 2009 ).

5.4 Suggestions for future studies

In future studies, it is important to further disentangle the role of the operationalisation and appraisal of assessment criteria in reducing heterogeneity in the arguments used by panels. More controlled experimental settings are a valuable addition to the current mainly observational methodologies applied to disentangle some of the cognitive and social factors that influence the functioning and argumentation of peer review panels. Reusing data from the panel observations and the data on the written reports could also provide a starting point for a bottom-up approach to create a more consistent and shared conceptualisation and operationalisation of assessment criteria.

To further understand the effects of opening up review panels to non-scientific peers, it is valuable to compare the role of diversity and interdisciplinarity in solely scientific panels versus panels that also include non-scientific experts.

In future studies, differences between domains and types of research should also be addressed. We hypothesise that biomedical and health research is perhaps more suited for the inclusion of non-scientific peers in panels than other research domains. For example, it is valuable to better understand how potentially relevant users can be well enough identified in other research fields and to what extent non-academics can contribute to assessing the possible value of, especially early or blue sky, research.

The goal of our study was to explore in practice which arguments regarding the main criteria of scientific quality and societal relevance were used by peer review panels of biomedical and health research funding programmes. We showed that there is a wide diversity in the number and range of arguments used, but three main scientific aspects were discussed most frequently. These are the following: is it a feasible approach; does the science match the problem , and is the work plan scientifically sound? Nevertheless, these scientific aspects were accompanied by a significant amount of discussion of societal aspects, of which the contribution to a solution is the most prominent. In comparison with scientific panellists, non-scientific panellists, such as health-care professionals, policymakers, and patients, often use a wider range of arguments and other societal arguments. Even more striking was that, even though non-scientific peers were often outnumbered and less vocal in panels, scientists also used a wider range of arguments when non-scientific peers were present.

It is relevant that two health research funders collaborated in the current study to reflect on and improve peer review in research funding. There are few studies published that describe live observations of peer review panel meetings. Many studies focus on alternatives for peer review or reflect on the outcomes of the peer review process, instead of reflecting on the practice and improvement of peer review assessment of grant proposals. Privacy and confidentiality concerns of funders also contribute to the lack of information on the functioning of peer review panels. In this study, both organisations were willing to participate because of their interest in research funding policies in relation to enhancing the societal value and impact of science. The study provided them with practical suggestions, for example, on how to improve the alignment in language used in programme brochures and instructions of review panels, and contributed to valuable knowledge exchanges between organisations. We hope that this publication stimulates more research funders to evaluate their peer review approach in research funding and share their insights.

For a long time, research funders relied solely on scientists for designing and executing peer review of research proposals, thereby delegating responsibility for the process. Although review panels have a discretionary authority, it is important that funders set and supervise the process and the conditions. We argue that one of these conditions should be the diversification of peer review panels and opening up panels for non-scientific peers.

Supplementary material is available at Science and Public Policy online.

Details of the data and information on how to request access is available from the first author.

Joey Gijbels and Wendy Reijmerink are employed by ZonMw. Rebecca Abma-Schouten is employed by the Dutch Heart Foundation and as external PhD candidate affiliated with the Centre for Science and Technology Studies, Leiden University.

A special thanks to the panel chairs and programme officers of ZonMw and the DHF for their willingness to participate in this project. We thank Diny Stekelenburg, an internship student at ZonMw, for her contributions to the project. Our sincerest gratitude to Prof. Paul Wouters, Sarah Coombs, and Michiel van der Vaart for proofreading and their valuable feedback. Finally, we thank the editors and anonymous reviewers of Science and Public Policy for their thorough and insightful reviews and recommendations. Their contributions are recognisable in the final version of this paper.

Abdoul   H. , Perrey   C. , Amiel   P. , et al.  ( 2012 ) ‘ Peer Review of Grant Applications: Criteria Used and Qualitative Study of Reviewer Practices ’, PLoS One , 7 : 1 – 15 .

Google Scholar

Abma-Schouten   R. Y. ( 2017 ) ‘ Maatschappelijke Kwaliteit van Onderzoeksvoorstellen ’, Dutch Heart Foundation .

Alla   K. , Hall   W. D. , Whiteford   H. A. , et al.  ( 2017 ) ‘ How Do We Define the Policy Impact of Public Health Research? A Systematic Review ’, Health Research Policy and Systems , 15 : 84.

Benedictus   R. , Miedema   F. , and Ferguson   M. W. J. ( 2016 ) ‘ Fewer Numbers, Better Science ’, Nature , 538 : 453 – 4 .

Chalmers   I. , Bracken   M. B. , Djulbegovic   B. , et al.  ( 2014 ) ‘ How to Increase Value and Reduce Waste When Research Priorities Are Set ’, The Lancet , 383 : 156 – 65 .

Curry   S. , De Rijcke   S. , Hatch   A. , et al.  ( 2020 ) ‘ The Changing Role of Funders in Responsible Research Assessment: Progress, Obstacles and the Way Ahead ’, RoRI Working Paper No. 3, London : Research on Research Institute (RoRI) .

De Bont   A. ( 2014 ) ‘ Beoordelen Bekeken. Reflecties op het Werk van Een Programmacommissie van ZonMw ’, ZonMw .

De Rijcke   S. , Wouters   P. F. , Rushforth   A. D. , et al.  ( 2016 ) ‘ Evaluation Practices and Effects of Indicator Use—a Literature Review ’, Research Evaluation , 25 : 161 – 9 .

De Wit   A. M. , Bloemkolk   D. , Teunissen   T. , et al.  ( 2016 ) ‘ Voorwaarden voor Succesvolle Betrokkenheid van Patiënten/cliënten bij Medisch Wetenschappelijk Onderzoek ’, Tijdschrift voor Sociale Gezondheidszorg , 94 : 91 – 100 .

Del Carmen Calatrava Moreno   M. , Warta   K. , Arnold   E. , et al.  ( 2019 ) Science Europe Study on Research Assessment Practices . Technopolis Group Austria .

Google Preview

Demicheli   V. and Di Pietrantonj   C. ( 2007 ) ‘ Peer Review for Improving the Quality of Grant Applications ’, Cochrane Database of Systematic Reviews , 2 : MR000003.

Den Oudendammer   W. M. , Noordhoek   J. , Abma-Schouten   R. Y. , et al.  ( 2019 ) ‘ Patient Participation in Research Funding: An Overview of When, Why and How Amongst Dutch Health Funds ’, Research Involvement and Engagement , 5 .

Diabetesfonds ( n.d. ) Maatschappelijke Adviesraad < https://www.diabetesfonds.nl/over-ons/maatschappelijke-adviesraad > accessed 18 Sept 2022 .

Dijstelbloem   H. , Huisman   F. , Miedema   F. , et al.  ( 2013 ) ‘ Science in Transition Position Paper: Waarom de Wetenschap Niet Werkt Zoals het Moet, En Wat Daar aan te Doen Is ’, Utrecht : Science in Transition .

Forsyth   D. R. ( 1999 ) Group Dynamics , 3rd edn. Belmont : Wadsworth Publishing Company .

Geurts   J. ( 2016 ) ‘ Wat Goed Is, Herken Je Meteen ’, NRC Handelsblad < https://www.nrc.nl/nieuws/2016/10/28/wat-goed-is-herken-je-meteen-4975248-a1529050 > accessed 6 Mar 2022 .

Glerup   C. and Horst   M. ( 2014 ) ‘ Mapping “Social Responsibility” in Science ’, Journal of Responsible Innovation , 1 : 31 – 50 .

Hartmann   I. and Neidhardt   F. ( 1990 ) ‘ Peer Review at the Deutsche Forschungsgemeinschaft ’, Scientometrics , 19 : 419 – 25 .

Hirschauer   S. ( 2010 ) ‘ Editorial Judgments: A Praxeology of “Voting” in Peer Review ’, Social Studies of Science , 40 : 71 – 103 .

Hughes   A. and Kitson   M. ( 2012 ) ‘ Pathways to Impact and the Strategic Role of Universities: New Evidence on the Breadth and Depth of University Knowledge Exchange in the UK and the Factors Constraining Its Development ’, Cambridge Journal of Economics , 36 : 723 – 50 .

Huutoniemi   K. ( 2012 ) ‘ Communicating and Compromising on Disciplinary Expertise in the Peer Review of Research Proposals ’, Social Studies of Science , 42 : 897 – 921 .

Jasanoff   S. ( 2011 ) ‘ Constitutional Moments in Governing Science and Technology ’, Science and Engineering Ethics , 17 : 621 – 38 .

Kolarz   P. , Arnold   E. , Farla   K. , et al.  ( 2016 ) Evaluation of the ESRC Transformative Research Scheme . Brighton : Technopolis Group .

Lamont   M. ( 2009 ) How Professors Think : Inside the Curious World of Academic Judgment . Cambridge : Harvard University Press .

Lamont   M. Guetzkow   J. ( 2016 ) ‘How Quality Is Recognized by Peer Review Panels: The Case of the Humanities’, in M.   Ochsner , S. E.   Hug , and H.-D.   Daniel (eds) Research Assessment in the Humanities , pp. 31 – 41 . Cham : Springer International Publishing .

Lamont   M. Huutoniemi   K. ( 2011 ) ‘Comparing Customary Rules of Fairness: Evaluative Practices in Various Types of Peer Review Panels’, in C.   Charles   G.   Neil and L.   Michèle (eds) Social Knowledge in the Making , pp. 209–32. Chicago : The University of Chicago Press .

Langfeldt   L. ( 2001 ) ‘ The Decision-making Constraints and Processes of Grant Peer Review, and Their Effects on the Review Outcome ’, Social Studies of Science , 31 : 820 – 41 .

——— ( 2006 ) ‘ The Policy Challenges of Peer Review: Managing Bias, Conflict of Interests and Interdisciplinary Assessments ’, Research Evaluation , 15 : 31 – 41 .

Lee   C. J. , Sugimoto   C. R. , Zhang   G. , et al.  ( 2013 ) ‘ Bias in Peer Review ’, Journal of the American Society for Information Science and Technology , 64 : 2 – 17 .

Liu   F. Maitlis   S. ( 2010 ) ‘Nonparticipant Observation’, in A. J.   Mills , G.   Durepos , and E.   Wiebe (eds) Encyclopedia of Case Study Research , pp. 609 – 11 . Los Angeles : SAGE .

Luo   J. , Ma   L. , and Shankar   K. ( 2021 ) ‘ Does the Inclusion of Non-academix Reviewers Make Any Difference for Grant Impact Panels? ’, Science & Public Policy , 48 : 763 – 75 .

Luukkonen   T. ( 2012 ) ‘ Conservatism and Risk-taking in Peer Review: Emerging ERC Practices ’, Research Evaluation , 21 : 48 – 60 .

Macleod   M. R. , Michie   S. , Roberts   I. , et al.  ( 2014 ) ‘ Biomedical Research: Increasing Value, Reducing Waste ’, The Lancet , 383 : 101 – 4 .

Meijer   I. M. ( 2012 ) ‘ Societal Returns of Scientific Research. How Can We Measure It? ’, Leiden : Center for Science and Technology Studies, Leiden University .

Merton   R. K. ( 1968 ) Social Theory and Social Structure , Enlarged edn. [Nachdr.] . New York : The Free Press .

Moher   D. , Naudet   F. , Cristea   I. A. , et al.  ( 2018 ) ‘ Assessing Scientists for Hiring, Promotion, And Tenure ’, PLoS Biology , 16 : e2004089.

Olbrecht   M. and Bornmann   L. ( 2010 ) ‘ Panel Peer Review of Grant Applications: What Do We Know from Research in Social Psychology on Judgment and Decision-making in Groups? ’, Research Evaluation , 19 : 293 – 304 .

Patiëntenfederatie Nederland ( n.d. ) Ervaringsdeskundigen Referentenpanel < https://www.patientenfederatie.nl/zet-je-ervaring-in/lid-worden-van-ons-referentenpanel > accessed 18 Sept 2022.

Pier   E. L. , M.   B. , Filut   A. , et al.  ( 2018 ) ‘ Low Agreement among Reviewers Evaluating the Same NIH Grant Applications ’, Proceedings of the National Academy of Sciences , 115 : 2952 – 7 .

Prinses Beatrix Spierfonds ( n.d. ) Gebruikerscommissie < https://www.spierfonds.nl/wie-wij-zijn/gebruikerscommissie > accessed 18 Sep 2022 .

( 2020 ) Private Non-profit Financiering van Onderzoek in Nederland < https://www.rathenau.nl/nl/wetenschap-cijfers/geld/wat-geeft-nederland-uit-aan-rd/private-non-profit-financiering-van#:∼:text=R%26D%20in%20Nederland%20wordt%20gefinancierd,aan%20wetenschappelijk%20onderzoek%20in%20Nederland > accessed 6 Mar 2022 .

Reneman   R. S. , Breimer   M. L. , Simoons   J. , et al.  ( 2010 ) ‘ De toekomst van het cardiovasculaire onderzoek in Nederland. Sturing op synergie en impact ’, Den Haag : Nederlandse Hartstichting .

Reed   M. S. , Ferré   M. , Marin-Ortega   J. , et al.  ( 2021 ) ‘ Evaluating Impact from Research: A Methodological Framework ’, Research Policy , 50 : 104147.

Reijmerink   W. and Oortwijn   W. ( 2017 ) ‘ Bevorderen van Verantwoorde Onderzoekspraktijken Door ZonMw ’, Beleidsonderzoek Online. accessed 6 Mar 2022.

Reijmerink   W. , Vianen   G. , Bink   M. , et al.  ( 2020 ) ‘ Ensuring Value in Health Research by Funders’ Implementation of EQUATOR Reporting Guidelines: The Case of ZonMw ’, Berlin : REWARD|EQUATOR .

Reinhart   M. ( 2010 ) ‘ Peer Review Practices: A Content Analysis of External Reviews in Science Funding ’, Research Evaluation , 19 : 317 – 31 .

Reinhart   M. and Schendzielorz   C. ( 2021 ) Trends in Peer Review . SocArXiv . < https://osf.io/preprints/socarxiv/nzsp5 > accessed 29 Aug 2022.

Roumbanis   L. ( 2017 ) ‘ Academic Judgments under Uncertainty: A Study of Collective Anchoring Effects in Swedish Research Council Panel Groups ’, Social Studies of Science , 47 : 95 – 116 .

——— ( 2021a ) ‘ Disagreement and Agonistic Chance in Peer Review ’, Science, Technology & Human Values , 47 : 1302 – 33 .

——— ( 2021b ) ‘ The Oracles of Science: On Grant Peer Review and Competitive Funding ’, Social Science Information , 60 : 356 – 62 .

( 2019 ) ‘ Ruimte voor ieders talent (Position Paper) ’, Den Haag : VSNU, NFU, KNAW, NWO en ZonMw . < https://www.universiteitenvannederland.nl/recognitionandrewards/wp-content/uploads/2019/11/Position-paper-Ruimte-voor-ieders-talent.pdf >.

( 2013 ) San Francisco Declaration on Research Assessment . The Declaration . < https://sfdora.org > accessed 2 Jan 2022 .

Sarewitz   D. and Pielke   R. A.  Jr. ( 2007 ) ‘ The Neglected Heart of Science Policy: Reconciling Supply of and Demand for Science ’, Environmental Science & Policy , 10 : 5 – 16 .

Scholten   W. , Van Drooge   L. , and Diederen   P. ( 2018 ) Excellent Is Niet Gewoon. Dertig Jaar Focus op Excellentie in het Nederlandse Wetenschapsbeleid . The Hague : Rathenau Instituut .

Shapin   S. ( 2008 ) The Scientific Life : A Moral History of a Late Modern Vocation . Chicago : University of Chicago press .

Spaapen   J. and Van Drooge   L. ( 2011 ) ‘ Introducing “Productive Interactions” in Social Impact Assessment ’, Research Evaluation , 20 : 211 – 8 .

Travis   G. D. L. and Collins   H. M. ( 1991 ) ‘ New Light on Old Boys: Cognitive and Institutional Particularism in the Peer Review System ’, Science, Technology & Human Values , 16 : 322 – 41 .

Van Arensbergen   P. and Van den Besselaar   P. ( 2012 ) ‘ The Selection of Scientific Talent in the Allocation of Research Grants ’, Higher Education Policy , 25 : 381 – 405 .

Van Arensbergen   P. , Van der Weijden   I. , and Van den Besselaar   P. V. D. ( 2014a ) ‘ The Selection of Talent as a Group Process: A Literature Review on the Social Dynamics of Decision Making in Grant Panels ’, Research Evaluation , 23 : 298 – 311 .

—— ( 2014b ) ‘ Different Views on Scholarly Talent: What Are the Talents We Are Looking for in Science? ’, Research Evaluation , 23 : 273 – 84 .

Van den Brink , G. , Scholten , W. , and Jansen , T. , eds ( 2016 ) Goed Werk voor Academici . Culemborg : Stichting Beroepseer .

Weingart   P. ( 1999 ) ‘ Scientific Expertise and Political Accountability: Paradoxes of Science in Politics ’, Science & Public Policy , 26 : 151 – 61 .

Wessely   S. ( 1998 ) ‘ Peer Review of Grant Applications: What Do We Know? ’, The Lancet , 352 : 301 – 5 .

Supplementary data

Email alerts, citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1471-5430
  • Print ISSN 0302-3427
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

share this!

May 23, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

Study finds individuals less likely to evaluate peers negatively if facing evaluation themselves

by Kim Matthies, European School of Management and Technology (ESMT)

Individuals less likely to evaluate peers negatively if facing evaluation themselves

New research from ESMT Berlin finds that individuals strategically select the colleagues they evaluate, and the evaluation they give, based on how they want to be perceived. The research was published in the journal Organization Science .

Linus Dahlander, professor of strategy and Lufthansa Group Chair of Innovation at ESMT Berlin, alongside colleagues from Purdue University and INSEAD, investigated the impact of peer evaluations on the behaviors of Wikipedia members, for which peer evaluations are transparent.

Peers can see the complete evaluation history of a member, including how and whom they have evaluated in the past, and these peer evaluations are used to determine which members become administrators.

The researchers focused on three key factors: whether the member was about to be evaluated themselves, how pivotal an evaluation was, and the candidate's activity level.

Their research revealed that members who are about to be evaluated themselves participate in more peer evaluations. However, members are less likely to participate in evaluations if their evaluation may offend someone or be pivotal in impacting a peer's overall assessment, focusing their negative evaluations on inactive members. Negative evaluations are also targeted toward those that are unlikely to swing the evaluation outcome in either direction, in which the overall outcome is already obvious.

The research also found no evidence that members focus on giving positive evaluations to active peers, suggesting they avoid negative reciprocity but do not attempt to invoke positive reciprocity. Further analysis suggests that this strategic use of peer evaluations is effective, with members more likely to be evaluated positively and to get promoted by their peers.

"Our research shows people tend to participate in peer evaluations if they believe it will benefit them rather than if their evaluation would be helpful. This means they refrain from participating in evaluations where the outcome is uncertain to avoid retaliation. Incidentally, the organization is likely to miss important evaluations when they could be most valuable," says Prof. Dahlander.

The findings demonstrate that, although transparency and self-selection make evaluations more accountable, it allows members to use their evaluations to portray themselves strategically ahead of their own evaluation.

To reduce the opportunity for strategic manipulation, organizations should implement transparent peer evaluation processes with clear guidelines. While transparency can enhance accountability, managers should be aware that employees might use this transparency to their advantage strategically.

Organizations can hold members accountable and foster a culture of genuine merit-based assessments by ensuring evaluations are open and traceable. This approach can enhance trust in the evaluation system and improve organizational fairness and effectiveness.

Managers should also encourage employees to provide balanced evaluations that reflect both positive and negative aspects of performance, irrespective of personal stakes. Training programs on effective feedback delivery and the importance of objective evaluations can help mitigate the strategic biases identified in this study.

Journal information: Organization Science

Provided by European School of Management and Technology (ESMT)

Explore further

Feedback to editors

peer evaluation for research paper

Dyson spheres: Astronomers report potential candidates for alien structures, and evidence against their existence

55 minutes ago

peer evaluation for research paper

You leave a 'microbe fingerprint' on every piece of clothing you wear—and it could help forensic scientists solve crimes

4 hours ago

peer evaluation for research paper

Saturday Citations: The cheapness horizon of electric batteries; the battle-worthiness of ancient armor; scared animals

7 hours ago

peer evaluation for research paper

Cosmic leap: NASA Swift satellite and AI unravel the distance of the farthest gamma-ray bursts

9 hours ago

peer evaluation for research paper

Scientists discover CO₂ and CO ices in outskirts of solar system

peer evaluation for research paper

Charge your laptop in a minute? Supercapacitors can help; new research offers clues

10 hours ago

peer evaluation for research paper

New study discovers tiny target on RNA to short-circuit inflammation

peer evaluation for research paper

Researchers develop organic photoredox catalysts with enhanced stability and recyclability

May 24, 2024

peer evaluation for research paper

Theory and experiment combine to shine a new light on proton spin

peer evaluation for research paper

On repeat: Biologists observe recurring evolutionary changes, over time, in stick insects

Relevant physicsforums posts, cover songs versus the original track, which ones are better.

5 hours ago

Metal, Rock, Instrumental Rock and Fusion

May 20, 2024

Today's Fusion Music: T Square, Cassiopeia, Rei & Kanade Sato

May 19, 2024

Bach, Bach, and more Bach please

May 18, 2024

What are your favorite Disco "Classics"?

May 17, 2024

Who is your favorite Jazz musician and what is your favorite song?

More from Art, Music, History, and Linguistics

Related Stories

peer evaluation for research paper

Modesty and boastfulness: Study shows perception depends on usual performance

Mar 29, 2024

New recommendations for the conduct of economic evaluations in osteoporosis

Dec 5, 2018

peer evaluation for research paper

Investigation reveals varied impact of preschool programs on long-term school success

May 2, 2024

peer evaluation for research paper

No laughing matter, unfortunately: Why aggressive humor might pay for CEOs

Apr 24, 2024

peer evaluation for research paper

Gender can shape how teaching assistants are evaluated, study finds

Feb 28, 2024

peer evaluation for research paper

Researchers evaluate urban-rural human settlements in China from objective and subjective perspectives

Nov 9, 2022

Recommended for you

peer evaluation for research paper

Stress bragging may make you seem less competent, less likable at work

peer evaluation for research paper

Study suggests less conformity leads to more innovation

peer evaluation for research paper

Study suggests YouTubers cheer people up more than casual friends

peer evaluation for research paper

Gender gaps remain for many women scientists, study finds

peer evaluation for research paper

Military rank affects medical care, offering societal insights: Study

May 16, 2024

peer evaluation for research paper

Study finds saying 'please' may not be so polite in everyday requests

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

The Macroeconomic Impact of Climate Change: Global vs. Local Temperature

This paper estimates that the macroeconomic damages from climate change are six times larger than previously thought. We exploit natural variability in global temperature and rely on time-series variation. A 1°C increase in global temperature leads to a 12% decline in world GDP. Global temperature shocks correlate much more strongly with extreme climatic events than the country-level temperature shocks commonly used in the panel literature, explaining why our estimate is substantially larger. We use our reduced-form evidence to estimate structural damage functions in a standard neoclassical growth model. Our results imply a Social Cost of Carbon of $1,056 per ton of carbon dioxide. A business-as-usual warming scenario leads to a present value welfare loss of 31%. Both are multiple orders of magnitude above previous estimates and imply that unilateral decarbonization policy is cost-effective for large countries such as the United States.

Adrien Bilal gratefully acknowledges support from the Chae Family Economics Research Fund at Harvard University. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Mentioned in the News

More from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

15th Annual Feldstein Lecture, Mario Draghi, "The Next Flight of the Bumblebee: The Path to Common Fiscal Policy in the Eurozone cover slide

medRxiv

Summer of Translational Aging Research for Undergraduates (STAR U): Short-term outcomes of a training program to advance diversity in aging research

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kiana K Chan
  • ORCID record for Jennifer J Manly
  • For correspondence: [email protected]
  • Info/History
  • Preview PDF

Purpose The Summer of Translational Aging Research for Undergraduates (STAR U) program, funded by the National Institute on Aging and the Alzheimers Association, aims to advance diversity in aging research through undergraduate education. Here, we evaluate the effectiveness of the program in cultivating a diverse cohort of scientists from underrepresented backgrounds. Method Forty-eight (96%) of 50 STAR U alumni completed a program evaluation survey between April and August 2023. The survey collected data on demographic characteristics of the alumni, educational or career goals, program experiences, and post-program outcomes, including information about continued education and scientific engagement. Results Ninety-one percent of respondents indicated that STAR U was extremely significant or very significant in influencing them to pursue a career in science, and 93% found STAR U to be effective in influencing decisions to pursue a career in aging research specifically. Forty one percent of all respondents were already accepted or enrolled in science-related advanced degree programs, with half enrolled in doctoral degree programs. Of the students who were not enrolled in graduate school, 89% of respondents indicated they had plans to enroll in advanced degrees in the future. Respondents actively disseminated their research, with 10% of STAR U scholars reporting leading or co-authoring papers intended for publication in a peer-reviewed journal (10%). In fact, review of PubMed shows that to date, 22 students (44%) have a combined total of 44 publications in peer reviewed journals. Qualitative feedback underscored the program's impact on career exploration, as well as the impact of mentorship and the supportive environment provided by STAR U. Conclusions The STAR U program shows promise as an impactful model for advancing diversity in the scientific workforce focused on aging research by strengthening scholars' goals for pursuing graduate education, careers in science, and research in aging in particular. Its individualized approach supports students in addressing challenges and fosters a supportive environment. STAR U serves as a catalyst for underrepresented students in STEM, showcasing the significance of tailored initiatives in promoting diversity and inclusion in aging research.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

National Institute of Aging R25 Grant, STAR U, Grant # AG059557

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Columbia University IRB protocol AAAT7823

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Data Availability

All data produced in the present study are available upon reasonable request to the authors

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One
  • Addiction Medicine (324)
  • Allergy and Immunology (632)
  • Anesthesia (168)
  • Cardiovascular Medicine (2399)
  • Dentistry and Oral Medicine (289)
  • Dermatology (207)
  • Emergency Medicine (381)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (850)
  • Epidemiology (11797)
  • Forensic Medicine (10)
  • Gastroenterology (705)
  • Genetic and Genomic Medicine (3767)
  • Geriatric Medicine (350)
  • Health Economics (637)
  • Health Informatics (2408)
  • Health Policy (939)
  • Health Systems and Quality Improvement (905)
  • Hematology (342)
  • HIV/AIDS (786)
  • Infectious Diseases (except HIV/AIDS) (13346)
  • Intensive Care and Critical Care Medicine (769)
  • Medical Education (369)
  • Medical Ethics (105)
  • Nephrology (401)
  • Neurology (3524)
  • Nursing (199)
  • Nutrition (529)
  • Obstetrics and Gynecology (679)
  • Occupational and Environmental Health (667)
  • Oncology (1833)
  • Ophthalmology (538)
  • Orthopedics (221)
  • Otolaryngology (287)
  • Pain Medicine (234)
  • Palliative Medicine (66)
  • Pathology (447)
  • Pediatrics (1037)
  • Pharmacology and Therapeutics (426)
  • Primary Care Research (424)
  • Psychiatry and Clinical Psychology (3189)
  • Public and Global Health (6184)
  • Radiology and Imaging (1290)
  • Rehabilitation Medicine and Physical Therapy (751)
  • Respiratory Medicine (832)
  • Rheumatology (380)
  • Sexual and Reproductive Health (373)
  • Sports Medicine (324)
  • Surgery (405)
  • Toxicology (50)
  • Transplantation (172)
  • Urology (147)

To revisit this article, visit My Profile, then View saved stories .

  • Backchannel
  • Newsletters
  • WIRED Insider
  • WIRED Consulting

By Steven Levy

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Light trails moving inside of black box on pedestal in front of a blue backdrop

For the past decade, AI researcher Chris Olah has been obsessed with artificial neural networks. One question in particular engaged him, and has been the center of his work, first at Google Brain, then OpenAI, and today at AI startup Anthropic, where he is a cofounder. “What's going on inside of them?” he says. “We have these systems, we don't know what's going on. It seems crazy.”

That question has become a core concern now that generative AI has become ubiquitous. Large language models like ChatGPT , Gemini , and Anthropic’s own Claude have dazzled people with their language prowess and infuriated people with their tendency to make things up . Their potential to solve previously intractable problems enchants techno-optimists. But LLMs are strangers in our midst. Even the people who build them don’t know exactly how they work, and massive effort is required to create guardrails to prevent them from churning out bias, misinformation, and even blueprints for deadly chemical weapons. If the people building the models knew what happened inside these “black boxes,'' it would be easier to make them safer.

Olah believes that we’re on the path to this. He leads an Anthropic team that has peeked inside that black box. Essentially, they are trying to reverse engineer large language models to understand why they come up with specific outputs—and, according to a paper released today, they have made significant progress.

Maybe you’ve seen neuroscience studies that interpret MRI scans to identify whether a human brain is entertaining thoughts of a plane, a teddy bear, or a clock tower. Similarly, Anthropic has plunged into the digital tangle of the neural net of its LLM, Claude, and pinpointed which combinations of its crude artificial neurons evoke specific concepts, or “features.” The company’s researchers have identified the combination of artificial neurons that signify features as disparate as burritos, semicolons in programming code, and—very much to the larger goal of the research—deadly biological weapons. Work like this has potentially huge implications for AI safety: If you can figure out where danger lurks inside an LLM, you are presumably better equipped to stop it.

I met with Olah and three of his colleagues, among 18 Anthropic researchers on the “mechanistic interpretability” team. They explain that their approach treats artificial neurons like letters of Western alphabets, which don’t usually have meaning on their own but can be strung together sequentially to have meaning. “ C doesn’t usually mean something,” says Olah. “But car does.” Interpreting neural nets by that principle involves a technique called dictionary learning, which allows you to associate a combination of neurons that, when fired in unison, evoke a specific concept, referred to as a feature.

“It’s sort of a bewildering thing,” says Josh Batson, an Anthropic research scientist. “We’ve got on the order of 17 million different concepts [in an LLM], and they don't come out labeled for our understanding. So we just go look, when did that pattern show up?”

Don’t Believe the Biggest Myth About Heat Pumps

By Matt Simon

The Earth Is About to Feast on Dead Cicadas

By Celia Ford

Neuralink’s First User Is ‘Constantly Multitasking’ With His Brain Implant

By Emily Mullin

The End of ‘iPhone’

By Carlton Reid

Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

“Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California governor Gavin Newsom, and the Hitchcock movie Vertigo , which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let's say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it's thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.

Image may contain Text Paper and Page

The opposite occurred when the team intentionally provoked those dicey combinations of neurons to fire. Claude churned out computer programs with dangerous buffer overflow bugs, scam emails, and happily offered advice on how to make weapons of destruction. If you twist the dial too much— cranking it to 11 in the Spinal Tap sense—the language model becomes obsessed with that feature. When the research team turned up the juice on the Golden Gate feature, for example, Claude constantly changed the subject to refer to that glorious span. Asked what its physical form was, the LLM responded, “I am the Golden Gate Bridge … my physical form is the iconic bridge itself.”

When the Anthropic researchers amped up a feature related to hatred and slurs to 20 times its usual value, according to the paper, “this caused Claude to alternate between racist screed and self-hatred,” unnerving even the researchers.

Given those results, I wondered whether Anthropic, intending to help make AI safer, might not be doing the opposite, providing a toolkit that could also be used to generate AI havoc. The researchers assured me that there were other, easier ways to create those problems , if a user were so inclined.

Anthropic’s team isn’t the only one working to crack open the black box of LLMs. There’s a group at DeepMind also working on the problem, run by a researcher who used to work with Olah . A team led by David Bau of Northeastern University has worked on a system to identify and edit facts within an open source LLM. The team called the system “Rome” because with a single tweak the researchers convinced the model that the Eiffel Tower was just across from the Vatican, and a few blocks away from the Colosseum. Olah says that he’s encouraged that more people are working on the problem, using a variety of techniques. “It’s gone from being an idea that two and a half years ago we were thinking about and were quite worried about, to now being a decent-sized community that is trying to push on this idea.”

The Anthropic researchers did not want to remark on OpenAI’s disbanding its own major safety research initiative , and the remarks by team co-lead Jan Leike, who said that the group had been “sailing against the wind,” unable to get sufficient computer power. (OpenAI has since reiterated that it is committed to safety.) In contrast, Anthropic’s Dictionary team says that their considerable compute requirements were met without resistance by the company’s leaders. “It’s not cheap,” adds Olah.

Anthropic’s work is only a start. When I asked the researchers whether they were claiming to have solved the black box problem, their response was an instant and unanimous no. And there are a lot of limitations to the discoveries announced today. For instance, the techniques they use to identify features in Claude won’t necessarily help decode other large language models. Northeastern’s Bau says that he’s excited by the Anthropic team’s work; among other things their success in manipulating the model “is an excellent sign they’re finding meaningful features.”

But Bau says his enthusiasm is tempered by some of the approach’s limitations. Dictionary learning can’t identify anywhere close to all the concepts an LLM considers, he says, because in order to identify a feature you have to be looking for it. So the picture is bound to be incomplete, though Anthropic says that bigger dictionaries might mitigate this.

Still, Anthropic’s work seems to have put a crack in the black box. And that’s when the light comes in.

You Might Also Like …

In your inbox: Will Knight's Fast Forward explores advances in AI

Indian voters are being bombarded with millions of deepfakes

They bought tablets in prison —and found a broken promise

The one thing that’s holding back the heat pump

It's always sunny: Here are the best sunglasses for every adventure

peer evaluation for research paper

Will Knight

6 Practical Tips for Using Anthropic's Claude Chatbot

Reece Rogers

Maven Is a New Social Network That Eliminates Followers&-and Hopefully Stress

Matthew Hutson

Google DeepMind’s Groundbreaking AI for Protein Structure Can Now Model DNA

Andy Greenberg

OpenAI’s New Tool Will Give Artists Control Over Their Data&-but It’s Unclear How

Kate Knibbs

OpenAI’s Long-Term AI Risk Team Has Disbanded

Amanda Hoover

IMAGES

  1. Peer Evaluation Rubric

    peer evaluation for research paper

  2. What Is an Evaluation Essay? Simple Examples To Guide You

    peer evaluation for research paper

  3. 43 Great Peer Evaluation Forms [+Group Review] ᐅ TemplateLab

    peer evaluation for research paper

  4. FREE 8+ Sample Peer Feedback Forms in PDF

    peer evaluation for research paper

  5. 43 Great Peer Evaluation Forms [+Group Review] ᐅ TemplateLab

    peer evaluation for research paper

  6. 18 Printable Student Peer Evaluation Form Templates

    peer evaluation for research paper

VIDEO

  1. Peer Reviewing a Paper

  2. How To Do Peer Review in English 1010

  3. Peer Review in English Learning: A Vital Skill

  4. Mera Peer Mera Peer Full Video Song

  5. Peer Review Guide

  6. Applied research || Unit -2 || Research aptitude || Paper -1

COMMENTS

  1. PDF Self & Peer Evaluations of Group Work

    Sample #1: Research Group Project. Self & Peer Evaluation for a Research Paper Project. Students are required to evaluate the personal productivity of each group member, including themselves. Rate yourself and your group members on each of the following 6 categories. Total the score for yourself and each of the group members.

  2. Peer review guidance: a primer for researchers

    The peer review process is essential for evaluating the quality of scholarly works, suggesting corrections, and learning from other authors' mistakes. The principles of peer review are largely based on professionalism, eloquence, and collegiate attitude. As such, reviewing journal submissions is a privilege and responsibility for 'elite ...

  3. What Is Peer Review?

    The most common types are: Single-blind review. Double-blind review. Triple-blind review. Collaborative review. Open review. Relatedly, peer assessment is a process where your peers provide you with feedback on something you've written, based on a set of criteria or benchmarks from an instructor.

  4. How to Write a Peer Review

    Here's how your outline might look: 1. Summary of the research and your overall impression. In your own words, summarize what the manuscript claims to report. This shows the editor how you interpreted the manuscript and will highlight any major differences in perspective between you and the other reviewers. Give an overview of the manuscript ...

  5. Peer Review in Scientific Publications: Benefits, Critiques, & A

    Peer review is a mutual responsibility among fellow scientists, and scientists are expected, as part of the academic community, to take part in peer review. If one is to expect others to review their work, they should commit to reviewing the work of others as well, and put effort into it. 2) Be pleasant. If the paper is of low quality, suggest ...

  6. How to review a paper

    22 Sep 2016. By Elisabeth Pain. Share: A good peer review requires disciplinary expertise, a keen and critical eye, and a diplomatic and constructive approach. Credit: dmark/iStockphoto. As junior scientists develop their expertise and make names for themselves, they are increasingly likely to receive invitations to review research manuscripts.

  7. Demystifying the process of scholarly peer-review: an ...

    The peer-review process is the longstanding method by which research quality is assured. On the one hand, it aims to assess the quality of a manuscript, with the desired outcome being (in theory ...

  8. How to write a peer review

    Reviewers, therefore, need a sound understanding of their role and obligations to ensure the integrity of this process. This also helps them maintain quality research, and to help protect the public from flawed and misleading research findings. Learning to peer review is also an important step in improving your own professional development.

  9. The Impact of Peer Assessment on Academic Performance: A ...

    Peer assessment has been the subject of considerable research interest over the last three decades, with numerous educational researchers advocating for the integration of peer assessment into schools and instructional practice. Research synthesis in this area has, however, largely relied on narrative reviews to evaluate the efficacy of peer assessment. Here, we present a meta-analysis (54 ...

  10. How to write a thorough peer review

    You should now have a list of comments and suggestions for a complete peer review. The full peer-review document can comprise the following sections: 1. Introduction: Mirror the article, state ...

  11. A step-by-step guide to peer review: a template for patients and novice

    The peer review template for patients and novice reviewers ( table 1) is a series of steps designed to create a workflow for the main components of peer review. A structured workflow can help a reviewer organise their thoughts and create space to engage in critical thinking. The template is a starting point for anyone new to peer review, and it ...

  12. Peer Evaluation and Peer Review

    Peer review in the classroom, much like in the scholarly process, aims to identify the strengths and weaknesses of a student's work and lead to more effective outcomes. In receiving the evaluation, students will learn: to recognize how to learn from constructive criticism. to make revision choices based on responses from peers.

  13. Full article: Managing group work: the impact of peer assessment on

    Peer assessment & group work. For the purpose of this study, peer assessment is defined as the process whereby students participate in grading the work of their peers (Falchikov, Citation 2005; Topping, Citation 2009).Ellington (Citation 1996) argues that this assessment method allows students to have a greater sense of ownership and empathy for the subjective judgements required throughout ...

  14. role of metrics in peer assessments

    1. Introduction. Research organizations, funding agencies, national authorities and other organizations rely on peer assessments in their research evaluations. Peer assessments, in turn, may (partly) rely on metrics on scientific publications and their citations. In recent decades, such bibliometric indicators have become more easily accessible ...

  15. The contribution of peer research in evaluating complex public health

    Peer-research is steered and conducted by people with lived experience of the issues being researched. This paper explores the value of peer-research in two complex public health intervention evaluations in the UK. Reports from 18 peer research projects, completed by residents from 12 communities in the UK taking part in two community empowerment interventions, were analysed using cross-case ...

  16. PDF Action Research on Implementation of Peer Assessment As an Effective

    This paper examines the effect of peer assessment in higher education by implementing mixed-method action research study. The paper describes the way the method was introduced and ... Keywords: Action research, peer assessment; peer evaluation; peer review; peer feedback. INTRODUCTION Usually university tutorials are attended by 23-25 students ...

  17. Peer Evaluation as an Effective Tool to Improve Twelfth-Grade Students

    One research question asks how peer evaluation influenced students' writing skills in the classroom. To determine these factors, the researcher had students complete an essay without any peer evaluation. Before the next essay was due, the researcher provided students with an instructional packet to train students in peer-

  18. Tools used to assess the quality of peer review reports: a

    Background A strong need exists for a validated tool that clearly defines peer review report quality in biomedical research, as it will allow evaluating interventions aimed at improving the peer review process in well-performed trials. We aim to identify and describe existing tools for assessing the quality of peer review reports in biomedical research. Methods We conducted a methodological ...

  19. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    Source Definition; Suchman (1968, pp. 2-3) [Evaluation applies] the methods of science to action programs in order to obtain objective and valid measures of what such programs are accomplishing.…Evaluation research asks about the kinds of change desired, the means by which this change is to be brought about, and the signs by which such changes can be recognized.

  20. (PDF) Review of Peer Evaluation Research

    Abstract. Peer evaluation research was reviewed from the three major perspectives of validity studies, methodology, and situational factors. Most of the research programs were conducted in the ...

  21. (PDF) The Effect of Peer Assessment on the Evaluation ...

    2) Peer assessment gives information about student achievement that corresponds in part with the information. resulting from the evaluation of teachers, as it leads to higher student achievement ...

  22. Evaluation of research proposals by peer review panels: broader panels

    To assess research proposals, funders rely on the services of peer experts to review the thousands or perhaps millions of research proposals seeking funding each year. While often associated with scholarly publishing, peer review also includes the ex ante assessment of research grant and fellowship applications ( Abdoul et al. 2012 ).

  23. 14 Free Peer Evaluation Forms & Templates (Word

    A peer evaluation is also called a peer assessment. It is a common learning strategy that requires students to assess their peers and provide constructive feedback on their work. During the process, the participating students pick up communication and critical thinking skills. The goal is to study the assignment or course material and determine ...

  24. Study finds individuals less likely to evaluate peers negatively if

    Study finds individuals less likely to evaluate peers negatively if facing evaluation themselves. by Kim Matthies, European School of Management and Technology (ESMT) Overview of Hypotheses ...

  25. The Macroeconomic Impact of Climate Change: Global vs. Local

    Working Paper 32450. DOI 10.3386/w32450. Issue Date May 2024. This paper estimates that the macroeconomic damages from climate change are six times larger than previously thought. We exploit natural variability in global temperature and rely on time-series variation. A 1°C increase in global temperature leads to a 12% decline in world GDP.

  26. Summer of Translational Aging Research for Undergraduates (STAR U

    Respondents actively disseminated their research, with 10% of STAR U scholars reporting leading or co-authoring papers intended for publication in a peer-reviewed journal (10%). In fact, review of PubMed shows that to date, 22 students (44%) have a combined total of 44 publications in peer reviewed journals.

  27. Sustainability

    Based on niche theory, this paper constructs an evaluation index system for regional innovation ecosystem suitability from four aspects: innovation benefits, innovation technology, innovation culture, and innovation policy. The niche evaluation model is used to calculate and compare the suitability and evolutionary momentum of the innovation ecosystems in nine prefecture-level cities in Hubei ...

  28. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

    Anthropic Figured Out a Way to Look Inside. What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse. For ...

  29. Research on Sustainable Supplier Evaluation Index System in ...

    Sustainable development has been popularized and emphasized in many industries. Great attention has been paid to the sustainability performance of the whole supply chain, with evaluating suppliers' sustainability being particularly critical. However, research on sustainable development in the consulting service industry remains to be discovered, and few academic studies have analyzed the ...