Assessment and Evaluation in Nursing Education: A Simulation Perspective

  • First Online: 29 February 2024

Cite this chapter

summative assessment in nursing education

  • Loretta Garvey 7 &
  • Debra Kiegaldie 8  

Part of the book series: Comprehensive Healthcare Simulation ((CHS))

168 Accesses

Assessment and evaluation are used extensively in nursing education. In many instances, these terms are often used interchangeably, which can create confusion, yet key differences are associated with each.

Assessment in undergraduate nursing education is designed to ascertain whether students have achieved their potential and have acquired the knowledge, skills, and abilities set out within their course. Assessment aims to understand and improve student learning and must be at the forefront of curriculum planning to ensure assessments are well aligned with learning outcomes. In the past, the focus of assessment has often been on a single assessment. However, it is now understood that we must examine the whole system or program of assessment within a course of study to ensure integration and recognition of all assessment elements to holistically achieve overall course aims and objectives. Simulation is emerging as a safe and effective assessment tool that is increasingly used in undergraduate nursing.

Evaluation, however, is more summative in that it evaluates student attainment of course outcomes and their views on the learning process to achieve those outcomes. Program evaluation takes assessment of learning a step further in that it is a systematic method to assess the design, implementation, improvement, or outcomes of a program. According to Frye and Hemmer, student assessments (measurements) can be important to the evaluation process, but evaluation measurements come from various sources (Frye and Hemmer. Med Teacher 34:e288-e99, 2012). Essentially, program evaluation is concerned with the utility of its process and results (Alkin and King. Am J Evalu 37:568–79, 2016). The evaluation of simulation as a distinct program of learning is an important consideration when designing and implementing simulation into undergraduate nursing. This chapter will examine assessment and program evaluation from the simulation perspective in undergraduate nursing to explain the important principles, components, best practice approaches, and practical applications that must be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Masters GN. Reforming Education Assessment: Imperatives, principles, and challenges. Camberwell: ACER Press; 2013.

Google Scholar  

MacLellan E. Assessment for Learning: the differing perceptions of tutors and students. Assess Eval High Educ. 2001;26(4):307–18.

Article   Google Scholar  

Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9):S63–7.

Article   CAS   PubMed   Google Scholar  

Alinier G. Nursing students’ and lecturers’ perspectives of objective structured clinical examination incorporating simulation. Nurse Educ Today. 2003;23(6):419–26.

Article   PubMed   Google Scholar  

Norcini J, Anderson MB, Bollela V, Burch V, Costa MJ, Duvivier R, et al. 2018 Consensus framework for good assessment. Med Teach. 2018;40(11):1102–9.

Biggs J. Constructive alignment in university teaching: HERDSA. Rev High Educ. 2014;1:5–22.

Hamdy H. Blueprinting for the assessment of health care professionals. Clin Teach. 2006;3(3):175–9.

Welch S. Program evaluation: a concept analysis. Teach Learn Nurs. 2021;16(1):81–4.

Frye AW, Hemmer PA. Program evaluation models and related theories: AMEE Guide No. 67. Med Teach. 2012;34(5):e288–e99.

Johnston S, Coyer FM, Nash R. Kirkpatrick's evaluation of simulation and debriefing in health care education: a systematic review. J Nurs Educ. 2018;57(7):393–8.

ACGM. Glossary of Terms: Accreditation Council for Graduate Medical Education 2020. https://www.acgme.org/globalassets/pdfs/ab_acgmeglossary.pdf .

Shadish WR, Luellen JK. History of evaluation. In: Mathison S, editor. Encyclopedia of evaluation. Sage; 2005. p. 183–6.

Lewallen LP. Practical strategies for nursing education program evaluation. J Prof Nurs. 2015;31(2):133–40.

Kirkpatrick DL. Evaluation of training. In: Craig RL, Bittel LR, editors. New York: McGraw Hill; 1967.

Cahapay M. Kirkpatrick model: its limitations as used in higher education evaluation. Int J Assess Tools Educ. 2021;8(1):135–44.

Yardley S, Dornan T. Kirkpatrick's levels and education 'evidence'. Med Educ. 2012;46(1):97–106.

Kirkpatrick J, Kirkpatrick W. An introduction to the new world Kirkpatrick model. Kirkpatrick Partners; 2021.

Bhatia M, Stewart AE, Wallace A, Kumar A, Malhotra A. Evaluation of an in-situ neonatal resuscitation simulation program using the new world Kirkpatrick model. Clin Simul Nurs. 2021;50:27–37.

Lippe M, Carter P. Using the CIPP model to assess nursing education program quality and merit. Teach Learn Nurs. 2018;13(1):9–13.

Kardong-Edgren S, Adamson KA, Fitzgerald C. A review of currently published evaluation instruments for human patient simulation. Clin Simul Nurs. 2010;6(1):e25–35.

Solutions S. Reliability and Validity; 2022

Rauta S, Salanterä S, Vahlberg T, Junttila K. The criterion validity, reliability, and feasibility of an instrument for assessing the nursing intensity in perioperative settings. Nurs Res Pract. 2017;2017:1048052.

PubMed   PubMed Central   Google Scholar  

Jeffries PR, Rizzolo MA. Designing and implementing models for the innovative use of simulation to teach nursing care of ill adults and children: a national, multi-site, multi-method study (summary report). Sci Res. 2006;

Unver V, Basak T, Watts P, Gaioso V, Moss J, Tastan S, et al. The reliability and validity of three questionnaires: The Student Satisfaction and Self-Confidence in Learning Scale, Simulation Design Scale, and Educational Practices Questionnaire. Contemp Nurse. 2017;53(1):60–74.

Franklin AE, Burns P, Lee CS. Psychometric testing on the NLN Student Satisfaction and Self-Confidence in Learning, Simulation Design Scale, and Educational Practices Questionnaire using a sample of pre-licensure novice nurses. Nurse Educ Today. 2014;34(10):1298–304.

Guise J-M, Deering SH, Kanki BG, Osterweil P, Li H, Mori M, et al. Validation of a tool to measure and promote clinical teamwork. Simul Healthc. 2008;3(4)

Millward LJ, Jeffries N. The team survey: a tool for health care team development. J Adv Nurs. 2001;35(2):276–87.

Download references

Author information

Authors and affiliations.

Federation University Australia, University Dr, Mount Helen, VIC, Australia

Loretta Garvey

Holmesglen Institute, Healthscope Hospitals, Monash University, Mount Helen, VIC, Australia

Debra Kiegaldie

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Loretta Garvey .

Editor information

Editors and affiliations.

Emergency Medicine, Icahn School of Medicine at Mount Sinai, Director of Emergency Medicine Simulation, Mount Sinai Hospital, New York, NY, USA

Jared M. Kutzin

School of Nursing, University of California San Francisco, San Francisco, CA, USA

Perinatal Patient Safety, Kaiser Permanente, Pleasanton, CA, USA

Connie M. Lopez

Eastern Health Clinical School, Faculty of Medicine, Nursing & Health Sciences, Monash University, Melbourne, VIC, Australia

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Garvey, L., Kiegaldie, D. (2023). Assessment and Evaluation in Nursing Education: A Simulation Perspective. In: Kutzin, J.M., Waxman, K., Lopez, C.M., Kiegaldie, D. (eds) Comprehensive Healthcare Simulation: Nursing. Comprehensive Healthcare Simulation. Springer, Cham. https://doi.org/10.1007/978-3-031-31090-4_14

Download citation

DOI : https://doi.org/10.1007/978-3-031-31090-4_14

Published : 29 February 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-31089-8

Online ISBN : 978-3-031-31090-4

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Advancing simulation practice
  • Open access
  • Published: 28 December 2022

Simulation-based summative assessment in healthcare: an overview of key principles for practice

  • Clément Buléon   ORCID: orcid.org/0000-0003-4550-3827 1 , 2 , 3 ,
  • Laurent Mattatia 4 ,
  • Rebecca D. Minehart 3 , 5 , 6 ,
  • Jenny W. Rudolph 3 , 5 , 6 ,
  • Fernande J. Lois 7 ,
  • Erwan Guillouet 1 , 2 ,
  • Anne-Laure Philippon 8 ,
  • Olivier Brissaud 9 ,
  • Antoine Lefevre-Scelles 10 ,
  • Dan Benhamou 11 ,
  • François Lecomte 12 ,
  • the SoFraSimS Assessment with simulation group ,
  • Anne Bellot ,
  • Isabelle Crublé ,
  • Guillaume Philippot ,
  • Thierry Vanderlinden ,
  • Sébastien Batrancourt ,
  • Claire Boithias-Guerot ,
  • Jean Bréaud ,
  • Philine de Vries ,
  • Louis Sibert ,
  • Thierry Sécheresse ,
  • Virginie Boulant ,
  • Louis Delamarre ,
  • Laurent Grillet ,
  • Marianne Jund ,
  • Christophe Mathurin ,
  • Jacques Berthod ,
  • Blaise Debien ,
  • Olivier Gacia ,
  • Guillaume Der Sahakian ,
  • Sylvain Boet ,
  • Denis Oriot &
  • Jean-Michel Chabot  

Advances in Simulation volume  7 , Article number:  42 ( 2022 ) Cite this article

7055 Accesses

6 Citations

7 Altmetric

Metrics details

Healthcare curricula need summative assessments relevant to and representative of clinical situations to best select and train learners. Simulation provides multiple benefits with a growing literature base proving its utility for training in a formative context. Advancing to the next step, “the use of simulation for summative assessment” requires rigorous and evidence-based development because any summative assessment is high stakes for participants, trainers, and programs. The first step of this process is to identify the baseline from which we can start.

First, using a modified nominal group technique, a task force of 34 panelists defined topics to clarify the why, how, what, when, and who for using simulation-based summative assessment (SBSA). Second, each topic was explored by a group of panelists based on state-of-the-art literature reviews technique with a snowball method to identify further references. Our goal was to identify current knowledge and potential recommendations for future directions. Results were cross-checked among groups and reviewed by an independent expert committee.

Seven topics were selected by the task force: “What can be assessed in simulation?”, “Assessment tools for SBSA”, “Consequences of undergoing the SBSA process”, “Scenarios for SBSA”, “Debriefing, video, and research for SBSA”, “Trainers for SBSA”, and “Implementation of SBSA in healthcare”. Together, these seven explorations provide an overview of what is known and can be done with relative certainty, and what is unknown and probably needs further investigation. Based on this work, we highlighted the trustworthiness of different summative assessment-related conclusions, the remaining important problems and questions, and their consequences for participants and institutions of how SBSA is conducted.

Our results identified among the seven topics one area with robust evidence in the literature (“What can be assessed in simulation?”), three areas with evidence that require guidance by expert opinion (“Assessment tools for SBSA”, “Scenarios for SBSA”, “Implementation of SBSA in healthcare”), and three areas with weak or emerging evidence (“Consequences of undergoing the SBSA process”, “Debriefing for SBSA”, “Trainers for SBSA”). Using SBSA holds much promise, with increasing demand for this application. Due to the important stakes involved, it must be rigorously conducted and supervised. Guidelines for good practice should be formalized to help with conduct and implementation. We believe this baseline can direct future investigation and the development of guidelines.

There is a critical need for summative assessment in healthcare education [ 1 ]. Summative assessment is high stakes, both for graduation certification and for recertification in continuing medical education [ 2 , 3 , 4 , 5 ]. Knowing the consequences, the decision to validate or not validate the competencies must be reliable, based on rigorous processes, and supported by data [ 6 ]. Current methods of summative assessment such as written or oral exams are imperfect and need to be improved to better benefit programs, learners, and ultimately patients [ 7 ]. A good summative assessment should sufficiently reflect clinical practice to provide a meaningful assessment of competencies [ 1 , 8 ]. While some could argue that oral exams are a form of verbal simulation, hands-on simulation can be seen as a solution to complement current summative assessments and enhance their accuracy by bringing these tools closer to assessing the necessary competencies [ 1 , 2 ].

Simulation is now well established in the healthcare curriculum as part of a modern, comprehensive approach to medical education (e.g., competency-based medical education) [ 9 , 10 , 11 ]. Rich in various modalities, simulation provides training in a wide range of technical and non-technical skills across all disciplines. Simulation adds value to the educational training process particularly with feedback and formative assessment [ 9 ]. With the widespread use of simulation in the formative setting, the next logical step is using simulation for summative assessment.

The shift from formative to summative assessment using simulation in healthcare must be thoughtful, evidence-based, and rigorous. Program directors and educators may find it challenging to move from formative to summative use of simulation. There are currently limited experiences (e.g., OSCE [ 12 , 13 ]) but not established guidelines on how to proceed. The evidence needed for the feasibility, the validity, and the definition of the requirement for simulation-based summative assessment (SBSA) in healthcare education has not yet been formally gathered. With this evidence, we can hope to build a rigorous and fair pathway to SBSA.

The purpose of this work is to review current knowledge for SBSA by clarifying the guidance on why, how, what, when, and who. We aim at identifying areas (i) with robust evidence in the literature, (ii) with evidence that requires guidance by expert opinion, and (iii) with weak or emerging evidence. This may serve as a basis for future research and guideline development for the safe and effective use of SBSA (Fig.  1 ).

figure 1

Study question and topic level of evidence

First, we performed a modified Nominal Group Technique (NGT) to define the further questions to be explored in order to have the most comprehensive understanding of SBSA. We followed recommendations on NGT for conducting and reporting this research [ 14 ]. Second, we conducted state-of-the-art literature reviews to assess the current knowledge on the questions/topics identified by the modified NGT. This work did not require Institutional Review Board involvement.

A discussion on the use of SBSA was led by executive committee members of the Société Francophone de Simulation en Santé (SoFraSimS) in a plenary session and involved congress participants in May 2018 at the SoFraSimS annual meeting in Strasbourg, France. Key points addressed during this meeting were the growing interest in using SBSA, its informal uses, and its inclusion in some formal healthcare curricula. The discussion identified that these important topics lacked current guidelines. To reduce knowledge gaps, the SoFraSimS executive committee assigned one of its members (FL, one of the authors) to lead and act as a NGT facilitator for a task force on SBSA. The task force’s mission was to map the current landscape of SBSA, the current knowledge and gaps; and potentially to identify experts’ recommendations.

Task force characteristics

The task force panelists were recruited among volunteer simulation healthcare trainers in French-speaking countries after a call for application by SoFraSimS in May 2019. Recruiting criteria were a minimum of 5 years of experience in simulation and a direct involvement in simulation programs development or currently running. There were 34 (12 women and 22 men) from 3 countries (Belgium, France, Switzerland) included. Twenty-three were physicians and 11 were nurses, while 12 total had academic positions. All were experienced trainers in simulation for more than 7 years and were involved or responsible for initial training or continuing education programs with simulation. The task force leader (FL) was responsible for recruiting panelists, organizing, and coordinating the modified NGT, synthesizing responses, and writing the final report. A facilitator (CB) assisted the task force leader with the modified NGT, the synthesis of responses, and the writing of the final report. Both NGT facilitators (FL and CB) had more than 14 years of experience in simulation, had experience in research in simulation, and were responsive to simulation programs development and running.

First part: initial question and modified nominal group technique (NGT)

To answer the challenging question of “What do we need to know for a safe and effective SBSA practice?”, following the French Haute Autorité de Santé guidelines [ 15 ], we applied a modified nominal group technique (NGT) approach [ 16 ] between September and October 2019. The goal of our modified NGT was to define the further questions to be explored to have the most comprehensive understanding of the current SBSA (Fig.  2 ). The modifications to NGT included interactions that were not in-person and were asynchronous for some. Those modifications were introduced as a result of the geographic dispersion of the panelists across multiple countries and the context of the COVID-19 pandemic.

figure 2

Study flowchart

The first two steps of the NGT (generation of ideas and round robin) facilitated by the task force leader (FL) were conducted online simultaneously and asynchronously via email exchanges and online surveys over a 6-week period. For the initiation of the first step (generation of ideas), the task force leader (FL) sent an initial non-exhaustive literature review of 95 articles and proposed the initial following items for reflection: definition of assessment, educational principles of simulation, place of summative assessment and its implementation, assessment of technical and non-technical skills in initial training, continuing education, and interprofessional training. The task force leader (FL) asked the panelists to formulate topics or questions to propose for exploration in Part 2 based on their knowledge and the literature provided Panelists independently elaborated proposals and sent them back to the task force leader (FL) who regularly synthesized them and sent the status of the questions/topics to the whole task force while preserving the anonymity of the contributors and asking them to check the accuracy of the synthesized elements (second step, as a “round robin”).

The third step of the NGT (clarification) was carried out during a 2-h video conference session. All panelists were able to discuss the proposed ideas, group the ideas into topics, and make the necessary clarifications. As a result of this step, 24 preliminary questions were defined for the fourth step (Supplemental Digital Content 1).

The fourth step of the NGT (vote) consisted of four distinct asynchronous and anonymous online vote rounds that led to a final set of topics with related sub-questions (Supplemental Digital content 2). Panelists were asked to vote to regroup, separate, keep, or discard questions/topics. All vote rounds followed similar validation rules. We [NGT facilitators (FL and CB)] kept items (either questions or topics) with more than 70% approval ratings by panelists. We reworded and resubmitted in the next round all items with 30–70% approval. We discarded items with less than 30% approval. The task force discussed discrepancies and achieved final ratings with a complete agreement for all items. For each round, we sent reminders to reach a minimum participation rate of 80% of the panelists. Then, we split the task force into 7 groups, one for each of the 7 topics defined at the end of the vote (step 4).

Second part: literature review

From November 2019 to October 2020, the groups each identified existing literature containing the current knowledge, and potential recommendations on the topic they were to address. This identification was done based on a non-systematic review of the existing literature. To identify existing literature, the groups conducted state-of-the-art reviews [ 17 ] and expanded their reviews with a snowballing literature review technique [ 18 ] based on the articles’ references. The selected literature search performed by each group was inserted into the task force's common library on SBSA in healthcare as it was conducted.

For references, we searched electronic databases (MEDLINE), gray literature databases (including digital theses), simulation societies and centers’ websites, generic web searches (e.g., Google Scholar), and reference lists from articles. We selected publications related to simulation in healthcare with keywords “summative assessment,” “summative evaluation,” and also specific keywords related to topics. The search was iterative to seek all available data until saturation was achieved. Ninety-five references were initially provided to the task force by the NGT facilitator leader (FL). At the end of the work, the task force common library contained a total of 261 references.

Techniques to enhance trustworthiness from primary reports to the final report

The groups’ primary reports were reviewed and critiqued by other groups. After group cross-reviewing, primary reports were compiled by NGT facilitators (FL and CB) in a single report. This report, responding to the 7 topics, was drafted in December 2020 and submitted as a single report to an external review committee composed of 4 senior experts in education, training, and research from 3 countries (Belgium, Canada, France) with at least 15 years of experience in simulation. NGT facilitators (FL and CB) responded directly to reviewers when possible and sought assistance from the groups when necessary. The final version of the report was approved by the SoFraSimS executive committee in January 2021.

First part: modified nominal group technique (NGT)

The first two steps of the NGT by their nature (generation of ideas and “round robin”) did not provide results. The third step (clarification phase), identified 24 preliminary questions (Supplemental digital content 1) to be submitted to the fourth step (vote). The 4 rounds of voting (step 4) resulted in 7 topics with sub-questions (Supplemental Digital content 2): (1) “What can be assessed in simulation?” (2) “Assessment tools for SBSA,” (3) “Consequences of undergoing the SBSA process,” (4) “Simulation scenarios for SBSA,” (5) “Debriefing, video, research and SBSA strategies,” (6) Trainers for SBSA,” (7) “Implementation of SBSA in healthcare”. These 7 topics and their sub-questions were the starting point for the state-of-the-art literature reviews of each group for the second part.

For each of the 7 topics, the groups highlighted what appears to be validated in the literature, the remaining important problems and questions, and their consequences for participants and institutions of how SBSA is conducted. Results in this section present the major ideas and principles from the literature review, including their nuances where necessary.

What can be assessed in simulation?

Healthcare faculty and institutions must ensure that each graduate is practice ready. Readiness to practice implies mastering certain competencies, which is dependent on learning them appropriately. The competency approach involves explicit definitions of the acquired core competencies necessary to be a “good professional.” Professional competency could be defined as the ability of a professional to use judgment, knowledge, skills, and attitudes associated with their profession to solve complex problems [ 19 , 20 , 21 ]. Competency is a complex “knowing how to act” based on the effective mobilization and combination of a variety of internal and external resources in a range of situations [ 19 ]. Competency is not directly observable; it is the performance in a situation that can be observed [ 19 ]. Performance can vary depending on human factors such as stress, fatigue, etc.… During simulation, competencies can be assessed by observing “key” actions using assessment tools [ 22 ]. Simulation’s limitations must consider when defining the assessable competencies. Not all simulation methods are equivalent to assessing specific competencies [ 22 ].

Most healthcare competencies can be assessed with simulation, throughout at curriculum, if certain conditions are met. First, the competency being assessed summatively must have already been assessed formatively with simulation [ 23 , 24 ]. Second, validated assessment tools must be available to conduct this summative assessment [ 25 , 26 ]. These tools must be reliable, objective, reproducible, acceptable, and practical [ 27 , 28 , 29 , 30 ]. The small number of currently validated tools limits the use of simulation for competency certification [ 31 ]. Third, it is not necessary or desirable to certify all competencies [ 32 ]. The situations chosen must be sufficiently frequent in the student’s future professional practice (or potentially impactful for the patient) and must be hard or impossible to assess and validate in other circumstances (e.g., clinical internships) [ 2 ]. Fourth, simulation can be used for certification throughout the curriculum [ 33 , 34 , 35 ]. Finally, limitations for the use of simulation throughout the curriculum may be a lack of logistical resources [ 36 ]. Based on our findings in the literature, we have summarized in Table 1 the educational consideration when implementing a SBSA.

Assessment tools for simulation-based summative assessment

One of the challenges of assessing competency lies in the quality of the measurement tools [ 31 ]. A tool that allows the raters to collect data must also allow them to give meaning to their assessment, while securing that it is really measuring what it aims to [ 25 , 37 ]. A tool must be valid and, capable of measuring the assessed competency with fidelity and, reliability while providing reproducible data [ 38 ]. Since a competency is not directly measurable, it will be analyzed on the basis of learning expectations, the most “concrete” and observable form of a competency [ 19 ]. Several authors have described definitions of the concept of validity and the steps to achieve it [ 38 , 39 , 40 , 41 ]. Despite different validation approaches, the objectives are similar: to ensure that the tool is valid, the scoring items reflect the assessed competency, and the contents are appropriated for the targeted learners and raters [ 20 , 39 , 42 , 43 ]. A tool should have psychometric characteristics that allow users to be confident of its reproducibility, discriminatory nature, reliability, and external consistency [ 44 ]. A way to ensure that a tool has acceptable validity is to compare it to existing and validated tools that assess the same skills for the same learners. Finally, it is important to consider the consequences of the test to determine whether it best discriminates competent students from others [ 38 , 43 ].

Like a diagnostic score, a relevant assessment tool must be specific [ 30 , 39 , 41 ]. It is not good or bad, but valid through a rigorous validation process [ 39 , 41 , 42 ]. This validation process determines whether the tool measures what it is supposed to measure and whether this measurement is reproducible at different times (test–retest) or with 2 observers simultaneously. It also determines if the tool results are correlated with another measure of the same ability or competency and if the consequences of the tool results are related to the learners’ actual competency [ 38 ].

Following Messick’s framework, which aimed to gather different sources of validity in one global concept (unified validity), Downing describes five sources of validity, which must be assessed with the validation process [ 38 , 45 , 46 ]. Table 2 presents an illustration of the development used in SBSA according to the unified validity framework for a technical task [ 38 , 45 , 46 ]. An alternative framework using three sources of validity for teamwork’s non-technical skills are presented in Table 3 .

A tool is validated in a language. Theoretically, this tool can only be used in this language, given the nuances present with interpretation [ 49 ]. In certain circumstances, a “translated” tool, but not a “translated and validated in a specific language” tool, can lead to semantic biases that can affect the meaning of the content and its representation [ 49 , 50 , 51 , 52 , 53 , 54 , 55 ]. For each assessment sequence, validity criteria consist of using different tools in different assessment situations and integrating them into a comprehensive program which considers all aspects of competency. The rating made with a validated tool for one situation must be combined with other assessment situations, since there is no “ideal” tool [ 28 , 56 ] A given tool can be used with different professions or with participants at different levels of expertise or in different languages if it is validated for these situations [ 57 , 58 ]. In a summative context, a tool must have demonstrated a high-level of validity to be used because of the high stake for the participants [ 56 ]. Finally, the use or creation of an assessment tool requires trainers to question its various aspects, from how it was created to its reproducibility and the meaning of the results generated [ 59 , 60 ].

Two types of assessment tools should be distinguished: tools that can be adapted to different situations and tools that are specific to a situation [ 61 ]. Thus, technical skills may have a dedicated assessment tool (e.g., intraosseous) [ 47 ] or an assessment grid generated from a list of pre-established and validated items (e.g., TAPAS scale) [ 62 ]. Non-technical skills can be observed using scales that are not situation-specific (e.g., ANTS, NOTECHS) [ 63 , 64 ] or that are situation-specific (e.g., TEAM scale for resuscitation) [ 57 , 65 ]. Assessment tools should be provided to participants and should be included in the scenario framework, at least as a reference [ 66 , 67 , 68 , 69 ]. In the summative assessment of a procedure, structured assessment tools should probably be used, using a structured objective assessment form for technical skills [ 70 ]. The use of a scale, in the context of the assessment of a technical gesture, seems essential. As with other tools, any scale must be validated beforehand [ 47 , 70 , 71 , 72 ].

Consequences of undergoing the simulation-based summative assessment process

Summative assessment has two notable consequences on learning strategies. First, it may drive the learner’s behavior during the assessment, while it is essential to assess the competencies targeted, not the ability of the participant to adapt to the assessment tool [ 6 ]. Second, the pedagogy key concept of “pedagogical alignment” must be respected [ 23 , 73 ]. It means that assessment methods must be coherent with the pedagogical activities and objectives. For this to happen, participants must have formative simulation training focusing on the assessed competencies prior to the SBSA [ 24 ].

Participants have been reported as experiencing commonly mild (e.g., appearing slightly upset, distracted, teary-eyed, quiet, or resistant to participating in the debriefing) or moderate (e.g., crying, making loud, and frustrated comments) psychological events in the simulation [ 74 ]. While voluntary recruitment for formative simulation is commonplace, all students are required to take summative assessments in training. This required participation in high-stake assessment may have a more consequential psychological impact [ 26 , 75 ]. This impact can be modulated by training and assessment conditions [ 75 ]. First, the repetition of formative simulations reduces the psychological impact of SBSA on participants [ 76 ]. Second, the transparency on the objectives and methods of assessment limits detrimental psychological impact [ 77 , 78 ]. Finally, detrimental psychological impacts are increased by abnormally high physiological or emotional stress such as fatigue, and stressful events in the 36 h preceding the assessment, such that students with a history of post-traumatic stress disorder or psychological disorder may be strongly and negatively impacted by the simulation [ 76 , 79 , 80 , 81 ].

It is necessary to optimize SBSA implementation to limit its pedagogical and psychological negative impacts. Ideally, during the summative assessment, it has been proposed to take into account the formative assessment that has already been carried out [ 1 , 20 , 21 ]. Similarly in continuing education, the professional context of the person assessed should be considered. In the event of failure, it will be necessary to ensure sympathetic feedback and to propose a new assessment if necessary [ 21 ].

Scenarios for simulation-based summative assessment

Some authors argue that there are differences between summative and formative assessment scenarios [ 76 , 79 , 80 , 81 ]. The development of a SBSA scenario begins with the choice of a theme, which is most often agreed upon by experts at the local level [ 66 ]. The themes are most often chosen based on the participants’ competencies to be assessed and included in the competencies requirement for the initial [ 82 ] and continuing education [ 35 , 83 ]. A literature review even suggests the need to choose themes covering all the competences to be assessed [ 41 ]. These choices of themes and objectives also depend on the simulation tools technically available: “The themes were chosen if and only if the simulation tools were capable of reproducing “a realistic simulation” of the case.” [ 84 ].

The main quality criterion for SBSA is that the cases selected and developed are guided by the assessment objectives [ 85 ]. It is necessary to be clear about the assessment objectives of each scenario to select the right assessment tool [ 86 ]. Scenarios should meet four main principles: predictability, programmability, standardizability, and reproducibility [ 25 ]. Scenario writing should include a specific script, cues, timing, and events to practice and assess the targeted competencies [ 87 ]. The implementation of variable scenarios remains a challenge [ 88 ]. Indeed, most authors develop only one scenario per topic and skill to be assessed [ 85 ]. There are no recommendations for setting a predictable duration for a scenario [ 89 ]. Based on our findings we suggest some key elements for structuring a SBSA scenario in Table 4 . For technical skill assessment, scenarios will be short and the assessment is based on an analytical score [ 82 , 89 ]. For non-technical skill assessment, scenarios will be longer and the assessment based on analytical and holistic scores [ 82 , 89 ].

Debriefing, video, and research for simulation-based summative assessment

Studies have shown that debriefings are essential in formative assessment [ 90 , 91 ]. No such studies are available for summative assessment. Good practice requires debriefing in both formative and summative simulation-based assessments [ 92 , 93 ]. In SBSA, debriefing is often brief feedback given at the end of the simulation session, in groups [ 85 , 94 , 95 ], or individually [ 83 ]. Debriefing can also be done later with a trainer and help of video, or via written reports [ 96 ]. These debriefings make it possible to assess clinical skills for summative assessment purposes [ 97 ]. Some tools have been developed to facilitate this assessment of clinical reasoning [ 97 ].

Video can be used for four purposes: session preparation, simulation improvement, debriefing, and rating (Table 5 ) [ 95 , 98 ]. In SBSA sessions, video can be used during the prebriefing to provide participants with standardized and reproducible information [ 99 ]. A video can increase the realism of the situation during the simulation with ultrasound loops and laparoscopy footage. Simulation records can be reviewed either for debriefing or rating purposes [ 34 , 71 , 100 , 101 ]. A video is very useful for the training raters (e.g., for calibration and recalibration) [ 102 ]. It enables raters to rate the participants’ performance offline and to have an external review if necessary [ 34 , 71 , 101 ]. Despite the technical difficulties to be considered [ 42 , 103 ], it can be expected that video-based automated scoring assistance will facilitate assessments in the future.

The constraints associated with the use of video rely on the participants’ agreement, the compliance with local rules, and that the structure in charge of the assessment with video secures the protection of the rights of individuals and data safety, both at a national and at the higher (e.g., European GDPR) level [ 104 , 105 ].

In Table 5 , we list the main uses of video during simulation sessions found in the literature.

Research in SBSA can focus, as in formative assessment, on the optimization of simulation processes (programs, structures, human resources). Research can also explore the development and validation of summative assessment tools, the automation and assistance of assessment resources, and the pedagogical and clinical consequences of SBSA.

Trainers for simulation-based summative assessment

Trainers for SBSA probably need specific skills because of the high number of potential errors or biases in SBSA, despite the quest for objectivity (Table 6 ) [ 106 ]. The difficulty in ensuring objectivity is likely the reason why the use of self or peer assessment in the context of SBSA is not well documented and the literature does not yet support it [ 59 , 60 , 107 , 108 ].

SBSA requires the development of specific scenarios, staged in a reproducible way, and the mastery of assessment tools to avoid assessment bias [ 111 , 112 , 113 , 114 ]. Fulfilling those requirements calls for specific abilities to fit with the different roles of the trainer. These different roles of trainers would require specific initial and ongoing training tailored to their tasks [ 111 , 113 ]. In the future, concepts of the roles and tasks of these trainers should be integrated into any “training of trainers” in simulation.

Implementation of simulation-based summative assessment in healthcare

The use of SBSA has been described by Harden in 1975 with Objective and Structured Clinical Examination (OSCE) tests for medical students [ 115 ]. The summative use of simulation has been introduced in different ways depending on the professional field and the country [ 116 ]. There is more literature on certification at the undergraduate and graduate levels than on recertification at the postgraduate level. The use of SBSA in re-certification is currently more limited [ 83 , 117 ]. Participation is often mandated, and it does not provide a formal assessment of competency [ 83 ]. Some countries are defining processes for the maintenance of certification in which simulation is likely to play a role (e.g., in the USA [ 118 ] and France [ 116 ]). Recommendations regarding the development of SBSA for OSCE were issued by the AMEE (Association for Medical Education in Europe) in 2013 [ 12 , 119 ]. Combined with other recommendations that address the organization of examinations on other immersive simulation modalities, in particular, full-scale sessions using complex mannequins [ 22 , 85 ], they give us a solid foundation for the implementation of SBSA.

The overall process to ensure a high-quality examination by simulation is therefore defined but particularly demanding. It mobilizes many material and human resources (administrative staff, trainers, standardized patients, and healthcare professionals) and requires a long development time (several months to years depending on the stakes) [ 36 ]. We believe that the steps to overcome during the implementation of SBSA range from setting up a coordination team, to supervising the writers, the raters, and the standardized patients, as well as taking into account the logistical and practical pitfalls.

The development of a competency framework valid for an entire curriculum (e.g., medical studies) satisfies a fundamental need [ 7 , 120 ]. This development allows identifying competencies to be assessed with simulation, those to be assessed by other methods, and those requiring triangulation by several assessment methods. This identification then guides scenarios’ writing and examination’s development with good content validity. Scenarios and examinations will form a bank of reproducible assessment exercises. The examination quality process, including psychometric analyses, is part of the development process from the beginning [ 85 ].

We have summarized in Table 7 the different steps in the implementation of SBSA.

Recertification

Recertification programs for various healthcare domains are currently being implemented or planned in many countries (e.g., in the USA [ 118 ] and France [ 116 ]). This is a continuation of the movement to promote the maintenance of competencies. Examples can be cited in France with the creation of an agency for continuing professional development or in the USA with the Maintenance Of Certification [ 83 , 126 ]. The certification of health care facilities and even teams is also being studied [ 116 ]. Simulation is regularly integrated into these processes (e.g., in the USA [ 118 ] and France [ 116 ]). Although we found some commonalities basis between the certification and recertification processes, there are many differences (Table 8 ).

Currently, when simulation-based training is mandatory (e.g., within the American Board of Anesthesiology’s “Maintenance Of Certification in Anesthesiology,” or MOCA 2.0® in the US), it is most often a formative process [ 34 , 83 ]. SBSA has a place in the recertification process, but there are many pitfalls to avoid. In the short term, we believe that it will be easier to incorporate formative sessions as a first step. The current consensus seems to be that there should be no pass/fail recertification simulation without personalized global professional support, but which is not limited to a binary aptitude/inaptitude approach [ 21 , 116 ].

Many important issues and questions remain regarding the field of SBSA. This discussion will return to our identified 7 topics and highlight these points, their implications for the future, and some possible leads for future research and guidelines development for the safe and effective use of this tool in SBSA.

SBSA is currently mainly used in initial training in uni-professional and individual settings via standardized patients or task trainers (OSCE) [ 12 , 13 ]. In the future, SBSA will also be used in continuing education for professionals who will be assessed throughout their career (re-certification) as well as in interprofessional settings [ 83 ]. When certifying competencies, it is important to keep in mind the differences between the desired competencies and the observed performances [ 128 ]. Indeed, it must be that “what is a competency” is specifically defined [ 6 , 19 , 21 ]. Competencies are what we wish to evaluate during the summative assessment to validate or revalidate a professional for his/her practice. Performance is what can be observed during an assessment [ 20 , 21 ]. In this context, we consider three unresolved issues. The first issue is that an assessment only gives access to a performance at a given moment (“Performance is a snapshot of a competency”), whereas one would like to assess a competency more generally [ 128 ]. The second issue is: How does an observed performance—especially in simulation—reveal a real competency in real life? [ 129 ] In other words, does the success or failure of a single SBSA really reflect actual real-life competency? [ 130 ] The third issue is the assessment of a team performance/competency [ 131 , 132 , 133 ]. Until now, SBSA has come from the academic field and has been an individual assessment (e.g., OSCE). Future SBSA could involve teams, driven by governing bodies, institutions, or insurances [ 134 , 135 ]. The competency of a team is not the sum of the competencies of the individuals who compose it. How can we proceed to assess teams as a specific entity, both composed of individuals and independent of them? To make progress in answering these three issues, we believe it is probably necessary to consider the approximation between observed and assessed performance and competency as acceptable, but only by specifying the scope of validity. Research in these areas is needed to define it and answer these questions.

The consequence of undergoing SBSA has focused on the psychological aspect and have set aside the more usual consequences such as achieving (or not) the minimum passing score. Future research should embrace more global SBSA consequence field, including how reliable SBSA is at determining how someone is competent.

Rigor and method in the development and selection of assessment tools are paramount to the quality of the summative assessment [ 136 ]. The literature shows that is necessary that assessment tools be specific to their intended use that their intrinsic characteristics be described and that they be validated [ 38 , 40 , 41 , 137 ]. These specific characteristics must be respected to avoid two common issues [ 1 , 6 ]. The first issue is that of a poorly designed or constructed assessment tool. This tool can only give poor assessments because it will be unable to capture performance correctly and therefore to approach the skill to be assessed in a satisfactory way [ 56 ]. The second issue is related to poor or incomplete tool evaluation or inadequate tool selection. If the tool is poorly evaluated, its quality is unknown [ 56 ]. The scope of the assessment that is done with it is limited by the imprecision of the tool’s quality. If the tool is poorly selected, it will not accurately capture the performance being assessed. Again, summative assessment will be compromised. It is currently difficult to find tools that meet all the required quality and validation criteria [ 56 ]. On the one hand, this requires complex and rigorous work; on the other hand, the potential number of tools required is large. Thus, the overall volume of work to rigorously produce assessment tools is substantial. However, the literature provides the characteristics of validity (content, response process, internal structure, comparison with other variables, and consequences), and the process of developing qualitative and reliable assessment tools [ 38 , 39 , 40 , 41 , 45 ]. It therefore seems important to systematize the use of these guidelines for the selection, development, and validation of assessment tools [ 137 ]. Work in this area is needed and network collaboration could be a solution to move forward more quickly toward a bank of valid and validated assessment tools [ 39 ].

We had focused our discussion on the consequences of SBSA excluding the determining of the competencies and passing rates. Establishing and maintaining psychological safety is mandatory in simulation [ 138 ]. Considering the psychological and physiological consequences of SBSA is fundamental to control and limit negative impacts. Summative assessment has consequences for both the participants and the trainers [ 139 ]. These consequences are often ignored or underestimated. However, these consequences can have an impact on the conduct or results of the summative assessment. The consequences can be positive or negative. The “testing effect” can have a positive impact on long-term memory [ 139 ]. On the other hand, negative psychological (e.g., stress or post-traumatic stress disease), and physiological (e.g., sleep) consequences can occur or degrade a fragile state [ 139 , 140 ]. These negative consequences can lead to questioning the tools used and the assessments made. These consequences must therefore be logically considered when designing and conducting the SBSA. We believe that strategies to mitigate their impact must be put in place. The trainers and the participants must be aware of these difficulties to better anticipate them. It is a real duality for the trainer: he/she has to carry out the assessment in order to determine a mark and at the same time guarantee the psychological safety of the participants. It seems fundamental to us that trainers master all aspects of SBSA as well as the concept of the safe container [ 138 ] to maximize the chances of a good experience for all. We believe that ensuring a fluid pedagogical continuum, from training to (re)certification in both initial and continuing education using modern pedagogical techniques (e.g., mastery learning, rapid cycle deliberate practice) [ 141 , 142 , 143 , 144 ] could help maximize the psychological and physiological safety of participants.

The structure and use of scenarios in a summative setting are unique and therefore require specific development and skills [ 83 , 88 ]. SBSA scenarios differ from formative assessment scenarios by the different educational objectives that guide their development. Summative scenarios are designed to assess a skill through observation of performance, while formative scenarios are designed to learn and progress in mastering this same skill. Although there may be a continuum between the two, they remain distinct. SBSA scenarios must be predictable, programmable, standardizable, and reproductible [ 25 ] to ensure fairly assessed performances among participants. This is even more crucial when standardized patients are involved (e.g., OSCE) [ 119 , 145 ]. In this case, a specific script with expectations and training is needed for the standardized patient. The problem is that currently there are many formative scenarios but few summative scenarios. The rigor and expertise required to develop them is time-consuming and requires expert trainer resources. We believe that a goal should be to homogenize the scenarios, along with preparing the human resources who will implement them (trainers and standardized patients) and their application. We believe one solution would be to develop a methodology for converting formative scenarios into summative ones in order to create a structuring model for summative scenarios. This would reinvest the time and expertise already used for developing = formative scenarios.

Debriefing for simulation-based summative assessment

The place of debriefing in SBSA is currently undefined and raises important questions that need exploration [ 77 , 90 , 146 , 147 , 148 ]. Debriefing for formative assessment promotes knowledge retention and helps to anchor good behaviors while correcting less ideal ones [ 149 , 150 , 151 ]. In general, taking an exam promotes learning of the topic [ 139 , 152 ]. Formative assessment without a debriefing has been shown to be detrimental, so it is reasonable to assume that the same is true in summative assessment [ 91 ]. The ideal modalities for debriefing in SBSA are currently unknown [ 77 , 90 , 146 , 147 , 148 ]. Integrating debriefing into SBSA raises a number of organizational, pedagogical, cognitive, and ethical issues that need to be clarified. From an organizational perspective, we consider that debriefing is time and human resource-consuming. The extent of the organizational impact varies according to whether the feedback is automatized, standardized, personalized, and collective or individual. From an educational perspective, debriefing ensures pedagogical continuity and continued learning. We believe this notion is nuanced, depending on whether the debriefing is integrated into the summative assessment or instead follows the assessment while focusing on formative assessment elements. We believe that if the debriefing is part of the SBSA, it is no longer a “teaching moment.” This must be factored into the instructional strategy. How should the trainer prioritize debriefing points between those established in advance for the summative assessment and those that would emerge from any individuals’ performance? From a cognitive perspective, whether the debriefing is integrated into the summative assessment may alter the interactions between the trainer and the participants. We believe that if the debriefing is integrated into the SBSA, the participant will sometimes be faced with the cognitive dilemma of whether to express his/her “true” opinions or instead attempt to provide the expected answers. The trainer then becomes uncertain of what he/she is actually assessing. Finally, from an ethical perspective, in the case of a mediocre or substandard clinical performance, there is a question of how the trainer resolves discrepancies between observed behavior and what the participant reveals during the debriefing. What weight should be given to the simulation and to the debriefing for the final rating? We believe there is probably no single solution to how and when the debriefing is conducted during a summative assessment but rather promote the idea of adapting debriefing approaches (e.g., group or individualized debriefing) to various conditions (e.g., success or failure in the summative assessment). These questions need to be explored to provide answers as to how debriefing should be ideally conducted in SBSA. We believe a balance must be found that is ethically and pedagogically satisfactory, does not induce a cognitive dilemma for the trainer, and is practically manageable.

The skills and training of trainers required for SBSA are crucial and must be defined [ 136 , 153 ]. We consider that skills and training for SBSA closely mirror skills and training needed for formative assessment in simulation. This continuity is part of the pedagogical alignment. These different steps have common characteristics (e.g., rules in simulation, scenario flow) and specific ones (e.g., using assessment tools, validating competence). To ensure pedagogical continuity, the trainers who supervise these courses must be trained in and be masterful in simulation, adhering to pedagogical theories. We believe training for SBSA represents new skills and a potentially greater cognitive load for the trainers. It is necessary to provide solutions to both of these issues. For the new skills of trainers, we consider it necessary to adapt or complete the training of trainers by integrating knowledge and skills needed for properly conducting SBSA: good assessment practices, assessment bias mitigation, rater calibration, mastery of assessment tools, etc. [ 154 ]. To optimize the cognitive load induced by the tasks and challenges of SBSA, we suggest that it could be helpful to divide the tasks between the different trainers’ roles. We believe that conducting a SBSA therefore requires three types of trainers whose training is adapted to their specific role. First, three are the trainer-designers who are responsible for designing the assessment situation, selecting the assessment tool(s), training the trainer-rater(s), and supervising the SBSA sessions. Second, there should be the trainer-operators responsible for running the simulation conditions that support the assessment. Third, there are the trainer-raters who conduct the assessment using the assessment tool(s) selected by the trainer-designer(s) for which these trainer-raters have been specifically trained. The high-stake nature of SBSA requires a high level of rigor and professionalism from the three levels of trainers, which implies they have a working definition of the skills and the necessary training to be up to the task.

Implementing simulation-based summative assessment in healthcare

Implementing SBSA is delicate, requires rigor, respect for each step, and must be evidence-based. While OSCEs are simulation-based, simulation is not limited to OSCEs. Summative assessment with OSCEs has been used and studied for many years [ 12 , 13 ]. This literature is therefore a valuable source for learning lessons about summative assessment applied to simulation as a whole [ 22 , 85 , 155 ]. Knowledge from OSCE summative assessment needs to be supplemented so that simulation can perform summative assessment according to good evidence-based practices. Given the high stakes of SBSA, we believe it necessary to rigorously and methodically adapt what is already validated during implementation (e.g., scenarios, tools) and to proceed with caution for what has not yet been validated. As described above, many steps and prerequisites are necessary for optimal implementation, including (but not limited to) identifying objectives; identifying and validating assessment tools; preparing simulations scenarios, trainers, and raters; and planning a global strategy beginning from integrating the summative assessment in the curriculum to the managing the consequences of this assessment. SBSA must be conducted within a strict framework for its own sake and that of the people involved. Poor implementation would be detrimental to all participants, trainers, and the practice SBSA. This risk is greater for recertification than for certification [ 156 ], while initial training is able to accommodate SBSA easily because it is familiar (e.g., trainees engage in OSCEs at some point in their education), including SBSA in recertifying practicing professionals is not as obvious and may be context-dependent [ 157 ]. We understand that the consequences of failed recertification are potentially more impactful, both psychologically and for professional practice. We believe that solutions must be developed, tested, and validated that both fill gaps and preserve professionals and patients. Implementing SBSA therefore must be progressive, rigorous, and evidence-based to be accepted and successful [ 158 ].

Limitations

This work has some limitations that should be emphasized. First, this work covers only a limited number of issues related to SBSA. The entire topic is possibly not covered and we may not have explored other questions of interest. Nevertheless, the NGT methodology allowed this work to focus on those issues that were most relevant and challenging to the panel. Second, the literature review method (state-of-the-art literature reviews expanded with a snowball technique) does not guarantee exhaustiveness, and publications on the topic may have escaped the screening phase. However, it is likely that we have identified key articles focused on the questions explored. Potentially unidentified articles would therefore either not be important to the topic or would address questions not selected by the NGT. Third, this work was done by a French-speaking group, and a Francophone-specific approach to simulation, although not described to our knowledge, cannot be ruled out. This risk is reduced by the fact that the work is based on international literature from different specialties in different countries and that the panelists and reviewers were from different countries. Fourth, the analysis and discussion of the consequences of SBSA were focused on psychological consequences. This does not cover the full range of consequences including the impact on subsequent curricula or career pathways. Data in the literature exist on the subject and probably deserve a specific body of work. Despite these limitations, however, we believe this work is valuable because it raises questions and offers some leads toward solutions.

Conclusions

The use of SBSA is very promising with a growing demand for its application. Indeed, SBSA is a logical extension of simulation-based formative assessment and competency-based medical education development. It is probably wise to anticipate and plan for approaches to SBSA, as many important moving parts, questions, and consequences are emerging. Clearly identifying these elements and their interactions will aid in developing reliable, accurate, and reproducible models. All this requires a meticulous and rigorous approach to preparation commensurate with the challenges of certifying or recertifying the skills of healthcare professionals. We have explored the current knowledge on SBSA and have now shared an initial mapping of the topic. Among the seven topics investigate, we have identified (i) areas with robust evidence (what can be assessed with simulation?); (ii) areas with limited evidence that can be assisted by expert opinion and research (assessment tools, scenarios, and implementation); and (iii) areas with weak or emerging evidence requiring guidance by expert opinion and research (consequences, debriefing, and trainers) (Fig.  1 ). We modestly hope that this work can help reflection on SBSA for future investigations and can drive guideline development for SBSA.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.

Abbreviations

General data protection regulation

Nominal group technique

Objective structured clinical examination

Simulation-based summative assessment

van der Vleuten CPM, Schuwirth LWT. Assessment in the context of problem-based learning. Adv Health Sci Educ Theory Pract. 2019;24:903–14.

Article   Google Scholar  

Boulet JR. Summative assessment in medicine: the promise of simulation for high-stakes evaluation. Acad Emerg Med. 2008;15:1017–24.

Green M, Tariq R, Green P. Improving patient safety through simulation training in anesthesiology: where are we? Anesthesiol Res Pract. 2016;2016:4237523.

Google Scholar  

Krage R, Erwteman M. State-of-the-art usage of simulation in anesthesia: skills and teamwork. Curr Opin Anaesthesiol. 2015;28:727–34.

Askew K, Manthey DE, Potisek NM, Hu Y, Goforth J, McDonough K, et al. Practical application of assessment principles in the development of an innovative clinical performance evaluation in the entrustable professional activity era. Med Sci Educ. 2020;30:499–504.

Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet. 2001;357:945–9.

Article   CAS   Google Scholar  

Boulet JR, Murray D. Review article: assessment in anesthesiology education. Can J Anaesth. 2012;59:182–92.

Bauer D, Lahner F-M, Schmitz FM, Guttormsen S, Huwendiek S. An overview of and approach to selecting appropriate patient representations in teaching and summative assessment in medical education. Swiss Med Wkly. 2020;150: w20382.

Park CS. Simulation and quality improvement in anesthesiology. Anesthesiol Clin. 2011;29:13–28.

Higham H, Baxendale B. To err is human: use of simulation to enhance training and patient safety in anaesthesia. British Journal of Anaesthesia [Internet]. 2017 [cited 2021 Sep 16];119:i106–14. Available from: https://www.sciencedirect.com/science/article/pii/S0007091217541215 .

Mann S, Truelove AH, Beesley T, Howden S, Egan R. Resident perceptions of competency-based medical education. Can Med Educ J. 2020;11:e31-43.

Khan KZ3, Ramachandran S, Gaunt K, Pushkar P. The objective structured clinical examination (OSCE): AMEE Guide No. 81. Part I: an historical and theoretical perspective. Med Teach. 2013;35(9):e1437-1446.

Daniels VJ, Pugh D. Twelve tips for developing an OSCE that measures what you want. Med Teach. 2018;40:1208–13.

Humphrey-Murto S, Varpio L, Gonsalves C, Wood TJ. Using consensus group methods such as Delphi and Nominal Group in medical education research. Med Teach. 2017;39:14–9.

Haute Autorité de Santé. Recommandations par consensus formalisé (RCF) [Internet]. Haute Autorité de Santé. 2011 [cited 2020 Oct 29]. Available from: https://www.has-sante.fr/jcms/c_272505/fr/recommandations-par-consensus-formalise-rcf .

Humphrey-Murto S, Varpio L, Wood TJ, Gonsalves C, Ufholz L-A, Mascioli K, et al. The use of the delphi and other consensus group methods in medical education research: a review. Academic Medicine [Internet]. 2017 [cited 2021 Jul 20];92:1491–8. Available from: https://journals.lww.com/academicmedicine/Fulltext/2017/10000/The_Use_of_the_Delphi_and_Other_Consensus_Group.38.aspx .

Booth A, Sutton A, Papaioannou D. Systematic approaches to a successful literature review [Internet]. Second edition. Los Angeles: Sage; 2016. Available from: https://uk.sagepub.com/sites/default/files/upm-assets/78595_book_item_78595.pdf .

Morgan DL. Snowball Sampling. In: Given LM, editor. The Sage encyclopedia of qualitative research methods [Internet]. Los Angeles, Calif: Sage Publications; 2008. p. 815–6. Available from: http://www.yanchukvladimir.com/docs/Library/Sage%20Encyclopedia%20of%20Qualitative%20Research%20Methods-%202008.pdf .

ten Cate O, Scheele F. Competency-based postgraduate training: can we bridge the gap between theory and clinical practice? Acad Med. 2007;82:542–7.

Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65:S63-67.

Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387–96.

Boulet JR, Murray DJ. Simulation-based assessment in anesthesiology: requirements for practical implementation. Anesthesiology. 2010;112:1041–52.

Bédard D, Béchard JP. L’innovation pédagogique dans le supérieur : un vaste chantier. Innover dans l’enseignement supérieur. Paris: Presses Universitaires de France; 2009. p. 29–43.

Biggs J. Enhancing teaching through constructive alignment. High Educ [Internet]. 1996 [cited 2020 Oct 25];32:347–64. Available from: https://doi.org/10.1007/BF00138871 .

Wong AK. Full scale computer simulators in anesthesia training and evaluation. Can J Anaesth. 2004;51:455–64.

Messick S. Evidence and ethics in the evaluation of tests. Educational Researcher [Internet]. 1981 [cited 2020 Mar 19];10:9–20. Available from: http://journals.sagepub.com/doi/ https://doi.org/10.3102/0013189X010009009 .

Bould MD, Crabtree NA, Naik VN. Assessment of procedural skills in anaesthesia. Br J Anaesth. 2009;103:472–83.

Schuwirth LWT, van der Vleuten CPM. Programmatic assessment and Kane’s validity perspective. Med Educ. 2012;46:38–48.

Brailovsky C, Charlin B, Beausoleil S, Coté S, Van der Vleuten C. Measurement of clinical reflective capacity early in training as a predictor of clinical reasoning performance at the end of residency: an experimental study on the script concordance test. Med Educ. 2001;35:430–6.

van der Vleuten CPM, Schuwirth LWT. Assessing professional competence: from methods to programmes. Med Educ. 2005;39:309–17.

Gordon M, Farnan J, Grafton-Clarke C, Ahmed R, Gurbutt D, McLachlan J, et al. Non-technical skills assessments in undergraduate medical education: a focused BEME systematic review: BEME Guide No. 54. Med Teach. 2019;41(7):732–45.

Jouquan J. L’évaluation des apprentissages des étudiants en formation médicale initiale. Pédagogie Médicale [Internet]. 2002 [cited 2020 Feb 2];3:38–52. Available from: http://www.pedagogie-medicale.org/ https://doi.org/10.1051/pmed:2002006 .

Gale TCE, Roberts MJ, Sice PJ, Langton JA, Patterson FC, Carr AS, et al. Predictive validity of a selection centre testing non-technical skills for recruitment to training in anaesthesia. Br J Anaesth. 2010;105:603–9.

Gallagher CJ, Tan JM. The current status of simulation in the maintenance of certification in anesthesia. Int Anesthesiol Clin. 2010;48:83–99.

S DeMaria Jr ST Samuelson AD Schwartz AJ Sim AI Levine Simulation-based assessment and retraining for the anesthesiologist seeking reentry to clinical practice: a case series. Anesthesiology [Internet]. 2013 [cited 2021 Sep 6];119:206–17 Available from: https://doi.org/10.1097/ALN.0b013e31829761c8 .

Amin Z, Boulet JR, Cook DA, Ellaway R, Fahal A, Kneebone R, et al. Technology-enabled assessment of health professions education: consensus statement and recommendations from the Ottawa 2010 conference. Medical Teacher [Internet]. 2011 [cited 2021 Jul 7];33:364–9. Available from: http://www.tandfonline.com/doi/full/ https://doi.org/10.3109/0142159X.2011.565832 .

Scallon G. L’évaluation des apprentissages dans une approche par compétences. Bruxelles: De Boeck Université-Bruxelles; 2007.

Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ. 2003;37:830–7.

Cook DA, Hatala R. Validation of educational assessments: a primer for simulation and beyond. Adv Simul [Internet]. 2016 [cited 2021 Aug 24];1:31. Available from: http://advancesinsimulation.biomedcentral.com/articles/ https://doi.org/10.1186/s41077-016-0033-y .

Kane MT. Validating the interpretations and uses of test scores. Journal of Educational Measurement [Internet]. 2013 [cited 2020 Sep 9];50:1–73. Available from: https://onlinelibrary.wiley.com/doi/abs/ https://doi.org/10.1111/jedm.12000 .

Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med Educ. 2015;49:560–75.

DA Cook B Zendejas SJ Hamstra R Hatala R Brydges What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv in Health Sci Educ [Internet]. 2014 [cited 2020 Feb 2];19:233–50 Available from: https://doi.org/10.1007/s10459-013-9458-4 .

Cook DA, Lineberry M. Consequences validity evidence: evaluating the impact of educational assessments. Acad Med [Internet]. 2016 [cited 2020 Oct 24];91:785–95. Available from: http://journals.lww.com/00001888-201606000-00018 .

Tavakol M, Dennick R. Post-examination analysis of objective tests. Med Teach. 2011;33:447–58.

Messick S. The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher [Internet]. 1994 [cited 2021 Feb 15];23:13–23. Available from: http://journals.sagepub.com/doi/ https://doi.org/10.3102/0013189X023002013 .

Validity MS. Education measurement. 3rd ed. New York: R. L. Linn; 1989. p. 13–103.

Oriot D, Darrieux E, Boureau-Voultoury A, Ragot S, Scépi M. Validation of a performance assessment scale for simulated intraosseous access. Simul Healthc. 2012;7:171–5.

Guise J-M, Deering SH, Kanki BG, Osterweil P, Li H, Mori M, et al. Validation of a tool to measure and promote clinical teamwork. Simul Healthc. 2008;3:217–23.

Sousa VD, Rojjanasrirat W. Translation, adaptation and validation of instruments or scales for use in cross-cultural health care research: a clear and user-friendly guideline: Validation of instruments or scales. Journal of Evaluation in Clinical Practice . 2011 [cited 2022 Jul 22];17:268–74. Available from: https://onlinelibrary.wiley.com/doi/ https://doi.org/10.1111/j.1365-2753.2010.01434.x .

Stoyanova-Piroth G, Milanov I, Stambolieva K. Translation, adaptation and validation of the Bulgarian version of the King’s Parkinson’s Disease Pain Scale. BMC Neurol [Internet]. 2021 [cited 2022 Jul 22];21:357. Available from: https://bmcneurol.biomedcentral.com/articles/ https://doi.org/10.1186/s12883-021-02392-5 .

Behari M, Srivastava A, Achtani R, Nandal N, Dutta R. Pain assessment in Indian Parkinson’s disease patients using King’s Parkinson’s disease pain scale. Ann Indian Acad Neurol [Internet]. 2020 [cited 2022 Jul 22];0:0. Available from: http://www.annalsofian.org/preprintarticle.asp?id=300170;type=0 .

Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. Journal of Clinical Epidemiology [Internet]. 1993 [cited 2022 Jul 22];46:1417–32. Available from: https://linkinghub.elsevier.com/retrieve/pii/089543569390142N .

Franc JM, Verde M, Gallardo AR, Carenzo L, Ingrassia PL. An Italian version of the Ottawa crisis resource management global rating scale: a reliable and valid tool for assessment of simulation performance. Intern Emerg Med. 2017;12:651–6.

Gosselin É, Marceau M, Vincelette C, Daneau C-O, Lavoie S, Ledoux I. French translation and validation of the Mayo High Performance Teamwork Scale for nursing students in a high-fidelity simulation context. Clinical Simulation in Nursing [Internet]. 2019 [cited 2022 Jul 25];30:25–33. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1876139918301890 .

Sánchez-Marco M, Escribano S, Cabañero-Martínez M-J, Espinosa-Ramírez S, José Muñoz-Reig M, Juliá-Sanchis R. Cross-cultural adaptation and validation of two crisis resource management scales. International Emergency Nursing [Internet]. 2021 [cited 2022 Jul 25];57:101016. Available from: https://www.sciencedirect.com/science/article/pii/S1755599X21000549 .

Schuwirth LWT, Van der Vleuten CPM. Programmatic assessment: from assessment of learning to assessment for learning. Medical Teacher [Internet]. 2011 [cited 2021 Sep 6];33:478–85. Available from: http://www.tandfonline.com/doi/full/ https://doi.org/10.3109/0142159X.2011.565828 .

Maignan M, Koch F-X, Chaix J, Phellouzat P, Binauld G, Collomb Muret R, et al. Team Emergency Assessment Measure (TEAM) for the assessment of non-technical skills during resuscitation: validation of the French version. Resuscitation [Internet]. 2016 [cited 2019 Mar 12];101:115–20. Available from: http://www.sciencedirect.com/science/article/pii/S0300957215008989 .

Pires S, Monteiro S, Pereira A, Chaló D, Melo E, Rodrigues A. Non-technical skills assessment for prelicensure nursing students: an integrative review. Nurse Educ Today. 2017;58:19–24.

Khan R, Payne MWC, Chahine S. Peer assessment in the objective structured clinical examination: a scoping review. Med Teach. 2017;39:745–56.

Hegg RM, Ivan KF, Tone J, Morten A. Comparison of peer assessment and faculty assessment in an interprofessional simulation-based team training program. Nurse Educ Pract. 2019;42: 102666.

Scavone BM, Sproviero MT, McCarthy RJ, Wong CA, Sullivan JT, Siddall VJ, et al. Development of an objective scoring system for measurement of resident performance on the human patient simulator. Anesthesiology. 2006;105:260–6.

Oriot D, Bridier A, Ghazali DA. Development and assessment of an evaluation tool for team clinical performance: the Team Average Performance Assessment Scale (TAPAS). Health Care : Current Reviews [Internet]. 2016 [cited 2018 Jan 17];4:1–7. Available from: https://www.omicsonline.org/open-access/development-and-assessment-of-an-evaluation-tool-for-team-clinicalperformance-the-team-average-performance-assessment-scale-tapas-2375-4273-1000164.php?aid=72394 .

Flin R, Patey R, Glavin R, Maran N. Anaesthetists’ non-technical skills. Br J Anaesth. 2010;105:38–44.

Mishra A, Catchpole K, McCulloch P. The Oxford NOTECHS System: reliability and validity of a tool for measuring teamwork behaviour in the operating theatre. Quality and Safety in Health Care [Internet]. 2009 [cited 2021 Jul 6];18:104–8. Available from: https://qualitysafety.bmj.com/lookup/doi/ https://doi.org/10.1136/qshc.2007.024760 .

Cooper S, Cant R, Porter J, Sellick K, Somers G, Kinsman L, et al. Rating medical emergency teamwork performance: development of the Team Emergency Assessment Measure (TEAM). Resuscitation. 2010;81:446–52.

Adler MD, Trainor JL, Siddall VJ, McGaghie WC. Development and evaluation of high-fidelity simulation case scenarios for pediatric resident education. Ambul Pediatr. 2007;7:182–6.

Brydges R, Hatala R, Zendejas B, Erwin PJ, Cook DA. Linking simulation-based educational assessments and patient-related outcomes: a systematic review and meta-analysis. Acad Med. 2015;90:246–56.

Cazzell M, Howe C. Using Objective Structured Clinical Evaluation for Simulation Evaluation: Checklist Considerations for Interrater Reliability. Clinical Simulation In Nursing [Internet]. 2012;8(6):e219–25. [cited 2019 Dec 14] Available from: https://www.nursingsimulation.org/article/S1876-1399(11)00249-0/abstract .

Maignan M, Viglino D, Collomb Muret R, Vejux N, Wiel E, Jacquin L, et al. Intensity of care delivered by prehospital emergency medical service physicians to patients with deliberate self-poisoning: results from a 2-day cross-sectional study in France. Intern Emerg Med. 2019;14:981–8.

Alcaraz-Mateos E, Jiang X “Sara,” Mohammed AAR, Turic I, Hernández-Sabater L, Caballero-Alemán F, et al. A novel simulator model and standardized assessment tools for fine needle aspiration cytology training. Diagn Cytopathol [Internet]. 2019 [cited 2020 Feb 3];47:297–301. Available from: http://doi.wiley.com/ https://doi.org/10.1002/dc.24105 .

I Ghaderi M Vaillancourt G Sroka PA Kaneva MC Vassiliou I Choy Evaluation of surgical performance during laparoscopic incisional hernia repair: a multicenter study. Surg Endosc [Internet]. et al 2011 [cited 2020 Feb 2];25:2555–63 Available from: https://doi.org/10.1007/s00464-011-1586-4 .

IJgosse WM, Leijte E, Ganni S, Luursema J-M, Francis NK, Jakimowicz JJ, et al. Competency assessment tool for laparoscopic suturing: development and reliability evaluation. Surg Endosc. 2020;34(7):2947–53.

Pelaccia T, Tardif J. In: Comment [mieux] former et évaluer les étudiants en médecine et en sciences de la santé? 1ère. Louvain-la-Neuve: De Boeck supérieur; 2016. p. 343–56. (Guides pratiques).

Henricksen JW, Altenburg C, Reeder RW. Operationalizing healthcare simulation psychological safety: a descriptive analysis of an intervention. Simul Healthc. 2017;12:289–97.

Gaba DM. Simulations that are challenging to the psyche of participants: how much should we worry and about what? Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare [Internet]. 2013 [cited 2020 Mar 17];8:4–7. Available from: http://journals.lww.com/01266021-201302000-00002 .

Ghazali DA, Breque C, Sosner P, Lesbordes M, Chavagnat J-J, Ragot S, et al. Stress response in the daily lives of simulation repeaters. A randomized controlled trial assessing stress evolution over one year of repetitive immersive simulations. PLoS One. 2019;14(7):e0220111.

Rudolph JW, Simon R, Raemer DB, Eppich WJ. Debriefing as formative assessment: closing performance gaps in medical education. Acad Emerg Med. 2008;15:1010–6.

Kang SJ, Min HY. Psychological safety in nursing simulation. Nurse Educ. 2019;44:E6-9.

Howard SK, Gaba DM, Smith BE, Weinger MB, Herndon C, Keshavacharya S, et al. Simulation study of rested versus sleep-deprived anesthesiologists. Anesthesiology. 2003;98(6):1345–55.

Neuschwander A, Job A, Younes A, Mignon A, Delgoulet C, Cabon P, et al. Impact of sleep deprivation on anaesthesia residents’ non-technical skills: a pilot simulation-based prospective randomized trial. Br J Anaesth. 2017;119:125–31.

Eastridge BJ, Hamilton EC, O’Keefe GE, Rege RV, Valentine RJ, Jones DJ, et al. Effect of sleep deprivation on the performance of simulated laparoscopic surgical skill. Am J Surg. 2003;186:169–74.

Boulet JR, Murray D, Kras J, Woodhouse J, McAllister J, Ziv A. Reliability and validity of a simulation-based acute care skills assessment for medical students and residents. Anesthesiology. 2003;99:1270–80.

Levine AI, Flynn BC, Bryson EO, Demaria S. Simulation-based Maintenance of Certification in Anesthesiology (MOCA) course optimization: use of multi-modality educational activities. J Clin Anesth. 2012;24:68–74.

Boulet JR, Murray D, Kras J, Woodhouse J. Setting performance standards for mannequin-based acute-care scenarios: an examinee-centered approach. Simul Healthc. 2008;3:72–81.

Furman GE, Smee S, Wilson C. Quality assurance best practices for simulation-based examinations. Simul Healthc. 2010;5:226–31.

Kane MT. The assessment of professional competence. Eval Health Prof [Internet]. 1992 [cited 2022 Jul 22];15:163–82. Available from: http://journals.sagepub.com/doi/ https://doi.org/10.1177/016327879201500203 .

Blum RH, Boulet JR, Cooper JB, Muret-Wagstaff SL. Harvard Assessment of Anesthesia Resident Performance Research Group. Simulation-based assessment to identify critical gaps in safe anesthesia resident performance. Anesthesiol. 2014;120(1):129–41.

Rizzolo MA, Kardong-Edgren S, Oermann MH, Jeffries PR. The national league for nursing project to explore the use of simulation for high-stakes assessment: process, outcomes, and recommendations: nursing education perspectives . 2015 [cited 2020 Feb 3];36:299–303. Available from: http://Insights.ovid.com/crossref?an=00024776-201509000-00006 .

Mudumbai SC, Gaba DM, Boulet JR, Howard SK, Davies MF. External validation of simulation-based assessments with other performance measures of third-year anesthesiology residents. Simul Healthc. 2012;7:73–80.

Fanning RM, Gaba DM. The role of debriefing in simulation-based learning. Simul Healthc. 2007;2:115–25.

Savoldelli GL, Naik VN, Park J, Joo HS, Chow R, Hamstra SJ. Value of debriefing during simulated crisis managementoral versus video-assisted oral feedback. Anesthesiology . American Society of Anesthesiologists; 2006 [cited 2020 Oct 19];105:279–85. Available from: https://pubs.asahq.org/anesthesiology/article/105/2/279/6669/Value-of-Debriefing-during-Simulated-Crisis .

Haute Autorité de Santé. Guide de bonnes pratiques en simulation en santé . 2012 [cited 2020 Feb 2]. Available from: https://www.has-sante.fr/upload/docs/application/pdf/2013-01/guide_bonnes_pratiques_simulation_sante_guide.pdf .

INACSL Standards Committee. INACSL Standards of best practice: simulation. Simulation design. Clinical Simulation In Nursing . 2016 [cited 2020 Feb 2];12:S5–12. Available from: https://www.nursingsimulation.org/article/S1876-1399(16)30126-8/abstract .

Norcini J, Anderson B, Bollela V, Burch V, Costa MJ, Duvivier R, et al. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 2011;33:206–14.

Gantt LT. The effect of preparation on anxiety and performance in summative simulations. Clinical Simulation in Nursing. 2013 [cited 2020 Feb 2];9:e25–33. Available from: http://www.sciencedirect.com/science/article/pii/S1876139911001277 .

Frey-Vogel AS, Scott-Vernaglia SE, Carter LP, Huang GC. Simulation for milestone assessment: use of a longitudinal curriculum for pediatric residents. Simul Healthc. 2016;11:286–92.

Durning SJ, Artino A, Boulet J, La Rochelle J, Van der Vleuten C, Arze B, et al. The feasibility, reliability, and validity of a post-encounter form for evaluating clinical reasoning. Med Teach. 2012;34:30–7.

Stone J. Moving interprofessional learning forward through formal assessment. Medical Education. 2010 [cited 2020 Feb 12];44:396–403. Available from: http://doi.wiley.com/ https://doi.org/10.1111/j.1365-2923.2009.03607.x .

Manser T, Dieckmann P, Wehner T, Rallf M. Comparison of anaesthetists’ activity patterns in the operating room and during simulation. Ergonomics. 2007;50:246–60.

Perrenoud P. Évaluation formative et évaluation certificative : postures contradictoires ou complémentaires ? Formation Professionnelle suisse . 2001 [cited 2020 Oct 29];4:25–8. Available from: https://www.unige.ch/fapse/SSE/teachers/perrenoud/php_main/php_2001/2001_13.html .

Atesok K, Hurwitz S, Anderson DD, Satava R, Thomas GW, Tufescu T, et al. Advancing simulation-based orthopaedic surgical skills training: an analysis of the challenges to implementation. Adv Orthop. 2019;2019:1–7.

Chiu M, Tarshis J, Antoniou A, Bosma TL, Burjorjee JE, Cowie N, et al. Simulation-based assessment of anesthesiology residents’ competence: development and implementation of the Canadian National Anesthesiology Simulation Curriculum (CanNASC). Can J Anesth/J Can Anesth. 2016 [cited 2020 Feb 2];63:1357–63. Available from: https://doi.org/10.1007/s12630-016-0733-8 .

TC Everett RJ McKinnon E Ng P Kulkarni BCR Borges M Letal Simulation-based assessment in anesthesia: an international multicentre validation study. Can J Anesth, J Can Anesth. et al 2019 [cited 2020 Feb 2];66:1440–9 Available from: https://doi.org/10.1007/s12630-019-01488-4 .

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). May 4, 2016. Available from: http://data.europa.eu/eli/reg/2016/679/2016-05-04/eng .

Commission Nationale de l’Informatique et des Libertés. RGPD : passer à l’action. 2021 [cited 2021 Jul 8]. Available from: https://www.cnil.fr/fr/rgpd-passer-a-laction .

Ten Cate O, Regehr G. The power of subjectivity in the assessment of medical trainees. Acad Med. 2019;94:333–7.

Weller JM, Robinson BJ, Jolly B, Watterson LM, Joseph M, Bajenov S, et al. Psychometric characteristics of simulation-based assessment in anaesthesia and accuracy of self-assessed scores. Anaesthesia. 2005;60:245–50.

Wikander L, Bouchoucha SL. Facilitating peer based learning through summative assessment - an adaptation of the objective structured clinical assessment tool for the blended learning environment. Nurse Educ Pract. 2018;28:40–5.

Gaugler BB, Rudolph AS. The influence of assessee performance variation on assessors’ judgments. Pers Psychol. 1992;45:77–98.

Feldman M, Lazzara EH, Vanderbilt AA, DiazGranados D. Rater training to support high-stakes simulation-based assessments. J Contin Educ Health Prof. 2012 [cited 2019 Dec 14];32:279–86. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3646087/ .

Pelgrim E a. M, Kramer AWM, Mokkink HGA, van den Elsen L, Grol RPTM, van der Vleuten CPM. In-training assessment using direct observation of single-patient encounters: a literature review. Adv Health Sci Educ Theory Pract. 2011;16(1):131–42.

Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med. 2006;18:50–7.

Berkenstadt H, Ziv A, Gafni N, Sidi A. Incorporating simulation-based objective structured clinical examination into the Israeli National Board Examination in Anesthesiology. Anesth Analg. 2006;102:853–8.

Hedge JW, Kavanagh MJ. Improving the accuracy of performance evaluations: comparison of three methods of performance appraiser training. J Appl Psychol. 1988;73:68–73.

Harden RM, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured examination. Br Med J. 1975;1:447–51.

Uzan S. Mission de recertification des médecins - Exercer une médecine de qualit | Vie publique.fr. Ministère des Solidarités et de la Santé - Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation; 2018 Nov. Available from: https://www.vie-publique.fr/rapport/37741-mission-de-recertification-des-medecins-exercer-une-medecine-de-qualit .

Mann KV, MacDonald AC, Norcini JJ. Reliability of objective structured clinical examinations: four years of experience in a surgical clerkship. Teaching and Learning in Medicine. 1990 [cited 2021 May 1];2:219–24. Available from: http://www.tandfonline.com/doi/abs/ https://doi.org/10.1080/10401339009539464 .

Maintenance Of Certification in Anesthesiology (MOCA) 2.0. [cited 2021 Sep 18]. Available from: https://theaba.org/about%20moca%202.0.html .

Khan KZ, Gaunt K, Ramachandran S, Pushkar P. The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Med Teach. 2013 [cited 2020 Oct 29];35:e1447–63. Available from: http://www.tandfonline.com/doi/full/ https://doi.org/10.3109/0142159X.2013.818635 .

Coderre S, Woloschuk W, McLaughlin K. Twelve tips for blueprinting. Med Teach. 2009;31:322–4.

Murray DJ, Boulet JR. Anesthesiology board certification changes: a real-time example of “assessment drives learning.” Anesthesiology. 2018;128:704–6.

Roberts C, Newble D, Jolly B, Reed M, Hampton K. Assuring the quality of high-stakes undergraduate assessments of clinical competence. Med Teach. 2006;28:535–43.

Newble D. Techniques for measuring clinical competence: objective structured clinical examinations. Med Educ. 2004;38:199–203.

Der Sahakian G, Lecomte F, Buléon C, Guevara F, Jaffrelot M, Alinier G. Référentiel sur l’élaboration de scénarios de simulation en immersion clinique.  Paris: Société Francophone de Simulation en Santé; 2017 p. 22. Available from: https://sofrasims.org/wp-content/uploads/2019/10/R%C3%A9f%C3%A9rentiel-Scenario-Simulation-Sofrasims.pdf .

Lewis KL, Bohnert CA, Gammon WL, Hölzer H, Lyman L, Smith C, et al. The Association of Standardized Patient Educators (ASPE) Standards of Best Practice (SOBP). Adv Simul. 2017;2:10.

Board of Directors of the American Board of Medical Specialties (ABMS). Standards for the ABMS Program for Maintenance of Certification (MOC). American Board of Medical Specialties; 2014 Jan p. 13. Available from: https://www.abms.org/media/1109/standards-for-the-abms-program-for-moc-final.pdf .

Hodges B, McNaughton N, Regehr G, Tiberius R, Hanson M. The challenge of creating new OSCE measures to capture the characteristics of expertise. Med Educ. 2002;36:742–8.

Hays RB, Davies HA, Beard JD, Caldon LJM, Farmer EA, Finucane PM, et al. Selecting performance assessment methods for experienced physicians. Med Educ. 2002;36:910–7.

Ram P, Grol R, Rethans JJ, Schouten B, van der Vleuten C, Kester A. Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Med Educ. 1999;33:447–54.

Weersink K, Hall AK, Rich J, Szulewski A, Dagnone JD. Simulation versus real-world performance: a direct comparison of emergency medicine resident resuscitation entrustment scoring. Adv Simul. 2019 [cited 2020 Feb 12];4:9. Available from: https://advancesinsimulation.biomedcentral.com/articles/ https://doi.org/10.1186/s41077-019-0099-4 .

Buljac-Samardzic M, Doekhie KD, van Wijngaarden JDH. Interventions to improve team effectiveness within health care: a systematic review of the past decade. Hum Resour Health. 2020;18:2.

Eddy K, Jordan Z, Stephenson M. Health professionals’ experience of teamwork education in acute hospital settings: a systematic review of qualitative literature. JBI Database System Rev Implement Rep. 2016;14:96–137.

Leblanc VR. Review article: simulation in anesthesia: state of the science and looking forward. Can J Anaesth. 2012;59:193–202.

Hanscom R. Medical simulation from an insurer’s perspective. Acad Emerg Med. 2008;15:984–7.

McCarthy J, Cooper JB. Malpractice insurance carrier provides premium incentive for simulation-based training and believes it has made a difference. Anesth Patient Saf Found Newsl. 2007 [cited 2021 Sep 17];17. Available from: https://www.apsf.org/article/malpractice-insurance-carrier-provides-premium-incentive-for-simulation-based-training-and-believes-it-has-made-a-difference/ .

Edler AA, Fanning RG, Chen MichaelI, Claure R, Almazan D, Struyk B, et al. Patient simulation: a literary synthesis of assessment tools in anesthesiology. J Educ Eval Health Prof. 2009 [cited 2021 Sep 17];6:3. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796725/ .

Borgersen NJ, Naur TMH, Sørensen SMD, Bjerrum F, Konge L, Subhi Y, et al. Gathering validity evidence for surgical simulation: a systematic review. Annals of Surgery. 2018 [cited 2022 Sep 25];267:1063–8. Available from: https://journals.lww.com/00000658-201806000-00014 .

Rudolph JW, Raemer DB, Simon R. Establishing a safe container for learning in simulation: the role of the presimulation briefing. Simul Healthc. 2014;9:339–49.

Cilliers FJ, Schuwirth LW, Adendorff HJ, Herman N, van der Vleuten CP. The mechanism of impact of summative assessment on medical students’ learning. Adv Health Sci Educ Theory Pract. 2010;15:695–715.

Hadi MA, Ali M, Haseeb A, Mohamed MMA, Elrggal ME, Cheema E. Impact of test anxiety on pharmacy students’ performance in Objective Structured Clinical Examination: a cross-sectional survey. Int J Pharm Pract. 2018;26:191–4.

Dunn W, Dong Y, Zendejas B, Ruparel R, Farley D. Simulation, mastery learning and healthcare. Am J Med Sci. 2017;353:158–65.

McGaghie WC. Mastery learning: it is time for medical education to join the 21st century. Acad Med. 2015;90:1438–41.

Ng C, Primiani N, Orchanian-Cheff A. Rapid cycle deliberate practice in healthcare simulation: a scoping review. Med Sci Educ. 2021;31:2105–20.

Taras J, Everett T. Rapid cycle deliberate practice in medical education - a systematic review. Cureus. 2017;9: e1180.

Cleland JA, Abe K, Rethans J-J. The use of simulated patients in medical education: AMEE Guide No 42. Med Teach. 2009;31:477–86.

Garden AL, Le Fevre DM, Waddington HL, Weller JM. Debriefing after simulation-based non-technical skill training in healthcare: a systematic review of effective practice. Anaesth Intensive Care. 2015;43:300–8.

Sawyer T, Eppich W, Brett-Fleegler M, Grant V, Cheng A. More than one way to debrief: a critical review of healthcare simulation debriefing methods. Simul Healthc. 2016;11:209–17.

Rudolph JW, Simon R, Dufresne RL, Raemer DB. There’s no such thing as “nonjudgmental” debriefing: a theory and method for debriefing with good judgment. Simul Healthc. 2006;1:49–55.

Levett-Jones T, Lapkin S. A systematic review of the effectiveness of simulation debriefing in health professional education. Nurse Educ Today. 2014;34:e58-63.

Palaganas JC, Fey M, Simon R. Structured debriefing in simulation-based education. AACN Adv Crit Care. 2016;27:78–85.

Rudolph JW, Foldy EG, Robinson T, Kendall S, Taylor SS, Simon R. Helping without harming: the instructor’s feedback dilemma in debriefing–a case study. Simul Healthc. 2013;8:304–16.

Larsen DP, Butler AC, Roediger III HL. Test-enhanced learning in medical education. Medical Education. 2008 [cited 2021 Aug 25];42:959–66. Available from: https://onlinelibrary.wiley.com/doi/ https://doi.org/10.1111/j.1365-2923.2008.03124.x .

Koster MA, Soffler M. Navigate the challenges of simulation for assessment: a faculty development workshop. MedEdPORTAL. 2021;17:11114.

Devitt JH, Kurrek MM, Cohen MM, Fish K, Fish P, Murphy PM, et al. Testing the raters: inter-rater reliability of standardized anaesthesia simulator performance. Can J Anaesth. 1997;44:924–8.

Kelly MA, Mitchell ML, Henderson A, Jeffrey CA, Groves M, Nulty DD, et al. OSCE best practice guidelines—applicability for nursing simulations. Adv Simul. 2016 [cited 2020 Feb 3];1:10. Available from: http://advancesinsimulation.biomedcentral.com/articles/ https://doi.org/10.1186/s41077-016-0014-1 .

Weinger MB, Banerjee A, Burden AR, McIvor WR, Boulet J, Cooper JB, et al. Simulation-based assessment of the management of critical events by board-certified anesthesiologists. Anesthesiology. 2017;127:475–89.

Sinz E, Banerjee A, Steadman R, Shotwell MS, Slagle J, McIvor WR, et al. Reliability of simulation-based assessment for practicing physicians: performance is context-specific. BMC Med Educ. 2021;21:207.

Ryall T, Judd BK, Gordon CJ. Simulation-based assessments in health professional education: a systematic review. J Multidiscip Healthc. 2016;9:69–82.

Download references

Acknowledgements

The authors thank SoFraSimS Assessment with simulation group members: Anne Bellot, Isabelle Crublé, Guillaume Philippot, Thierry Vanderlinden, Sébastien Batrancourt, Claire Boithias-Guerot, Jean Bréaud, Philine de Vries, Louis Sibert, Thierry Sécheresse, Virginie Boulant, Louis Delamarre, Laurent Grillet, Marianne Jund, Christophe Mathurin, Jacques Berthod, Blaise Debien, and Olivier Gacia who have contributed to this work. The authors thank the external experts committee members: Guillaume Der Sahakian, Sylvain Boet, Denis Oriot and Jean-Michel Chabot; and the SoFraSimS executive Committee for their review and feedback.

This work has been supported by the French Speaking Society for Simulation in Healthcare (SoFraSimS).

This work is a part of CB PhD which has been support by grants from the French Society for Anesthesiology and Intensive Care (SFAR), the Arthur Sachs-Harvard Foundation, the University Hospital of Caen, the North-West University Hospitals Group (G4), and the Charles Nicolle Foundation. Funding bodies did not have any role in the design of the study, collection, analysis, and interpretation of the data and in writing the manuscript.

Author information

Authors and affiliations.

Department of Anesthesiology, Intensive Care and Perioperative Medicine, Caen Normandy University Hospital, 6th Floor, Caen, France

Clément Buléon & Erwan Guillouet

Medical School, University of Caen Normandy, Caen, France

Center for Medical Simulation, Boston, MA, USA

Clément Buléon, Rebecca D. Minehart & Jenny W. Rudolph

Department of Anesthesiology, Intensive Care and Perioperative Medicine, Nîmes University Hospital, Nîmes, France

Laurent Mattatia

Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA, USA

Rebecca D. Minehart & Jenny W. Rudolph

Harvard Medical School, Boston, MA, USA

Department of Anesthesiology, Intensive Care and Perioperative Medicine, Liège University Hospital, Liège, Belgique

Fernande J. Lois

Department of Emergency Medicine, Pitié Salpêtrière University Hospital, APHP, Paris, France

Anne-Laure Philippon

Department of Pediatric Intensive Care, Pellegrin University Hospital, Bordeaux, France

Olivier Brissaud

Department of Emergency Medicine, Rouen University Hospital, Rouen, France

Antoine Lefevre-Scelles

Department of Anesthesiology, Intensive Care and Perioperative Medicine, Kremlin Bicêtre University Hospital, APHP, Paris, France

Dan Benhamou

Department of Emergency Medicine, Cochin University Hospital, APHP, Paris, France

François Lecomte

You can also search for this author in PubMed   Google Scholar

Contributions

CB helped with the study conception and design, data contribution, data analysis, data interpretation, writing, visualization, review, and editing. FL helped with the study conception and design, data contribution, data analysis, data interpretation, writing, review, and editing. RDM, JWR, and DB helped with the study writing, and review and editing. JWR and DB helped with the data interpretation, writing, and review and editing. LM, FJL, EG, ALP, OB, and ALS helped with the data contribution, data analysis, data interpretation, and review. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Clément Buléon .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Buléon, C., Mattatia, L., Minehart, R.D. et al. Simulation-based summative assessment in healthcare: an overview of key principles for practice. Adv Simul 7 , 42 (2022). https://doi.org/10.1186/s41077-022-00238-9

Download citation

Received : 02 March 2022

Accepted : 30 November 2022

Published : 28 December 2022

DOI : https://doi.org/10.1186/s41077-022-00238-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical education
  • Competency-based education

Advances in Simulation

ISSN: 2059-0628

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

summative assessment in nursing education

  • Open access
  • Published: 02 May 2024

The Ottawa resident observation form for nurses (O-RON): evaluation of an assessment tool’s psychometric properties in different specialties

  • Hedva Chiu 1 ,
  • Timothy J. Wood 2 ,
  • Adam Garber 3 ,
  • Samantha Halman 4 ,
  • Janelle Rekman 5 ,
  • Wade Gofton 6 &
  • Nancy Dudek 7  

BMC Medical Education volume  24 , Article number:  487 ( 2024 ) Cite this article

Metrics details

Workplace-based assessment (WBA) used in post-graduate medical education relies on physician supervisors’ feedback. However, in a training environment where supervisors are unavailable to assess certain aspects of a resident’s performance, nurses are well-positioned to do so. The Ottawa Resident Observation Form for Nurses (O-RON) was developed to capture nurses’ assessment of trainee performance and results have demonstrated strong evidence for validity in Orthopedic Surgery. However, different clinical settings may impact a tool’s performance. This project studied the use of the O-RON in three different specialties at the University of Ottawa.

O-RON forms were distributed on Internal Medicine, General Surgery, and Obstetrical wards at the University of Ottawa over nine months. Validity evidence related to quantitative data was collected. Exit interviews with nurse managers were performed and content was thematically analyzed.

179 O-RONs were completed on 30 residents. With four forms per resident, the ORON’s reliability was 0.82. Global judgement response and frequency of concerns was correlated ( r  = 0.627, P  < 0.001).

Conclusions

Consistent with the original study, the findings demonstrated strong evidence for validity. However, the number of forms collected was less than expected. Exit interviews identified factors impacting form completion, which included clinical workloads and interprofessional dynamics.

Peer Review reports

As the practice of medicine evolves, medical educators strive to refine the teaching curriculum and find innovative ways to train physicians who can adapt to and thrive within this changing landscape. In 2015, The Royal College of Physicians & Surgeons of Canada published the updated CanMEDS competency framework [ 1 ], which emphasizes the importance of intrinsic roles in addition to the skills needed to be a medical expert. These intrinsic roles are important in developing well-rounded physicians, but are less tangible and can be challenging to integrate into traditional assessment formats [ 2 , 3 , 4 ]. Knowing this, medical educators are given the task of developing new ways to assess these skills in resident physicians.

Another innovation in medical education is the shift from a traditional time-based curriculum to a competency-based curriculum (or competency-based medical education, “CBME”). This shift allows for an increased focus on a resident’s learning needs and achievements. It encourages a culture of frequent observed formative assessments [ 5 ]. This shift calls for assessment tools that accurately reflect a resident’s competence and can be feasibly administered in the training environment.

Workplace-based assessments (WBA) are considered one of the best methods to assess professional competence in the post-graduate medical education curriculum because they can be feasibly administered in the clinical setting [ 6 , 7 ]. Most WBA relies on physician supervisors making observations of residents. However, restraints of a complex and busy training environment mean that supervisors are not always available to observe some aspects of a resident’s performance. For example, when a resident rounds on patients independently or attends to on-call scenarios in the middle of the night, the physician supervisor may not be present. Physician supervisors may also not be present during multi-disciplinary team meetings where residents participate in the co-management of patients with other health professionals.

On a hospital ward, the health professional that most often interacts with a resident is a nurse. Given this, it makes sense to consider obtaining assessment information from a nurse’s viewpoint. This has the potential to be valuable for several reasons. First, they may provide authentic information about resident performance because residents may perform differently when they know that they are not being directly observed by their physician supervisors [ 8 ]. Second, nurses play an integral role in patient care, and often serve as a liaison between patients, their families and physicians regarding daily care needs and changes to clinical conditions. This liaison role provides nurses with a unique perspective on the intrinsic roles of physician competence in patient management, communication, and leadership skills that would also improve collaboration between nurses and physicians [ 9 ]. As such, using a WBA tool that incorporates nursing-identified elements of physician competence to assess a resident’s ability to demonstrate those elements in their workplace is important in training future physicians.

Although assessment of resident performance by nurses is captured with multi-source feedback (MSF) tools, there are some concerns if relying solely on this approach, as MSF tools generally present the data as an aggregate score regardless of individual rater roles. This convergence of ratings may not be helpful in feedback settings because it disregards how behaviour can change in different contexts (i.e., the specific situation and the relationship of the rater with the one being rated) [ 10 ]. Furthermore, there is evidence that different groups of health professionals rate the same individuals differently, more specifically, there is evidence to suggest that nursing perspectives often differ from other health professionals and physician supervisors [ 11 , 12 , 13 , 14 , 15 , 16 ]. When the groups are combined, the perspective of one group can be lost. It is not a weakness that different groups have different perspectives, but it needs to be documented to provide more useful formative feedback. Therefore, there is a need for a tool that uniquely captures the nurses’ perspective of resident performance.

To address this issue, Dudek et al. (2021) developed The Ottawa Resident Observation Form for Nurses (O-RON), a tool that captures nurses’ assessment of resident performance in a hospital ward environment (Fig.  1 ). This tool allows nurses to identify concerning behaviours in resident performance. The tool was implemented and studied in the Orthopedic Surgery Residency Program at the University of Ottawa, Canada. Nurses voluntarily completed the O-RON and indicated that it was easy to use. Validity evidence related to internal processes was gathered by calculating the reliability of the scale using a generalizability analysis and decision study. The results showed that with eight forms per resident the reliability of the O-RON was 0.8 and with three forms per resident, the reliability was 0.59. A reliability of 0.8 is considered acceptable for summative assessments [ 17 ]. These results suggest that the O-RON could be a promising WBA tool that provides residents and training programs with important feedback on aspects of residents’ performance on a hospital ward through the eyes of the nurses.

figure 1

The Ottawa resident observation form for nurses (O-RON)

The O-RON garnered international interest. Busch et al. translated the O-RON into Spanish and implemented it in two cardiology centres in Buenos Aires [ 18 ]. Their findings also demonstrated strong evidence for validity, although they required a higher number of forms ( n  = 60) to achieve high reliability (G coefficient = 0.72).

The demonstrated psychometric characteristics of the tool for these two studies were determined in single specialties. Local assessment culture, clinical setting, interprofessional dynamics and rater experience are some of the factors that can affect how a nurse may complete the O-RON [ 19 , 20 , 21 , 22 , 23 ]. These external factors can lead to measurement errors, which in turn would impact the generalizability and validity of the O-RON. Therefore, further testing is vital to determine whether the O-RON will perform consistently in other environments [ 24 , 25 ].

The primary objective of this project was to collect additional validity evidence related to the O-RON by implementing it in multiple residency programs including both surgical and medical specialties, which represent different assessment cultures and clinical contexts. However, it became evident throughout the data collection period that the number of completed forms was lower than anticipated. As such, there needed to be shift in focus to also explore challenges surrounding implementation of a new assessment tool in different programs. Therefore, the secondary objective of this study was to better understand the barriers to the implementation of the O-RON.

This study sought to assess the psychometric properties of the O-RON in three specialties at the University of Ottawa, Canada, using modern validity theory as a framework to guide the evaluation of the O-RON [ 25 ]. The O-RON was used in the Core Internal Medicine, General Surgery, and Obstetrics and Gynecology residency programs at the University of Ottawa. These programs did not have an assessment tool completed exclusively by nurses to evaluate their residents prior to the start of the project. They agreed to provide the research team with the anonymized data from this tool to study its psychometric properties. Ethics approval was granted by the Ottawa Health Science Network Research Ethics Board.

Dudek et al. (2021) developed the O-RON through a nominal group technique where nurses identified dimensions of performance that they perceived as reflective of high-quality physician performance on a hospital ward. These were included as items, of which there were 15, on the O-RON. Each item is rated on a 3-point frequency scale (no concerns, minor concerns, major concerns) with a fourth option of “unable to assess”. There is an additional “yes/no” question regarding whether the nurse would want to work with the resident as a team member (“global assessment question”) and a space for comments.

Residents from the three residency programs were provided a briefing by their program director on the use of the O-RON prior to the start of the project. Nurses on the internal medicine, general surgery, and obstetrics wards at two hospital campuses were asked to complete the O-RON for the residents on rotation. Nurse managers reviewed the form with the nurses at the start of the project and were available for questions. This was consistent with how the tool was used in the original study. At the end of each four-week rotation, 10 O-RON forms per resident were distributed to the nurse manager, who then distributed them to their nurses. Nurses were assigned a code by the nurse manager so that they could anonymously complete the forms. Any nurse who felt that they would like to provide an assessment on a resident, received a form to complete and returned it to the nurse manager within two weeks. The completed forms were collected by the research assistant at the two-week mark who collated the data for each resident and provided a summary sheet to their program director. The research assistant assigned a code for each resident and recorded the anonymized O-RON data for the study analysis.

Sample size

In the original study [ 26 ] of the O-RON the results demonstrated a strong reliability coefficient (0.80) with a sample of eight forms per resident. Using the procedure described by Streiner and Norman [ 24 ], an estimate of 256 forms in total was needed to achieve a desired reliability of 0.80 with a 95% confidence interval of +/- 10%. Typically, there were 16 residents ranging from PGY1-3 participating in a general internal medicine ward, 16 residents ranging from PGY 1–5 participating in a general surgery ward, and eight residents ranging from PGY1-5 participating in a labour and delivery ward at any time. To have at least 256 forms per specialty and considering that nurses were unlikely to complete 10 forms on each resident each time and fluctuations in resident numbers between rotations is expected, a collection period of six months was established.

Response to low participation rate

The completion rate was closely monitored throughout the collection period. There was a low rate of participation after six rounds of collection. In response, we initiated improvement processes including (a) displaying photos of the residents with their names in the nursing office, (b) displaying a poster about the project as a reminder for the nurses in the nursing office, (c) reaching out to nurse managers to review the project. We also extended the collection period for additional three rotations for a total of nine rotations to allow time for the improvement processes to work.

At the end of the extended collection period, we conducted semi-structured interviews with each nurse manager individually at each of the O-RON collection sites to further explore reasons behind low participation rate.

Quantitative analyses

Analyses were conducted using SPSS v27 statistical software. Rating response frequencies were calculated across scale items and “yes/no” frequencies were calculated for the global assessment question. Chi-square tests were conducted on each item against the global assessment response to determine the effect of concerns on the global assessment. Total O-RON score was calculated for the purposes of data analysis by counting the number of items that had a minor or major rating and dividing by the number of items that had a valid rating. A higher score indicated more concerns. Invalid rating items with either “unable to assess” as a response or left blank were excluded from this analysis. Tests of between-subjects effects were conducted between total O-RON score and the global assessment rating.

The reliability of the O-RON was calculated using a generalizability analysis (g-study) and the number of forms required for an acceptable level of reliability was determined through a decision study. These outcomes contributed to validity evidence related to internal processes.

A g-study calculates variance components, which can be used to derive the reliability of the O-RON. Variance components are associated with each facet used in the analysis and reflect the degree to which overall variance in scores is attributed to each facet. For this study, this was calculated using the mean total scores, which were analyzed using a between subjects ANOVA with round as a grouping facet, and people and forms as nested facets. Using the results from the generalizability analysis, a decision study derives estimates of reliability based on varying the facets used in the analysis. For our study, we varied the number of forms per resident to understand its impact on the reliability of the O-RON.

Qualitative analyses

Semi-structured exit interviews were conducted by the study principal investigator (HC) with each nurse manager. They were voice-recorded and transcribed into text documents. Using conventional content analysis, interview content was thematically analysed and coded by two of the study’s co-investigators (HC and ND) independently. The codes were compared between the two researchers and a consensus was met. This coding structure was then used to code all six interviews.

Quantitative

180 O-RONs were completed on 30 residents over the study period with an average of six forms per resident (range = 1–34). The large range is due to some residents being assessed on more than one rotation. One form was excluded from analysis because it had a value of “could not assess” for every item. A total of 179 O-RONs were included for analysis.

The Obstetrics units had the highest frequency of O-RONs completed (74.3%), followed by General Surgery (16.2%), and Internal Medicine (9.5%). Due to the small numbers within each specialty, subsequent analysis was done on the aggregate data.

Across forms and items, the frequency of reported rating in descending order was “no concerns” (80.7%), “minor concerns” (11.5%), “unable to assess” (3.0%), and “major concerns” (1.9%). Blank items accounted for 2.9% of responses. For the global assessment rating, 92.3% of valid responses were “yes” for whether they wanted this physician on their team (Table  1 ).

In terms of item-level analysis, nurses reported the least concern for item 13 (“acts with honesty and integrity”) (90.5% - no concerns). They reported the most major concerns for item 1 (“basic medical knowledge is appropriate to his/her stage of training”) (4.5% - major concerns), and the most overall concerns for item 8 (“Accepts feedback/expertise from nurses appropriately”) (21.8% - minor + major concerns). The raters were most frequently unable to assess item 15 (“advocates for patients without disrupting, discrediting, other HCP”) at 7.8%.

2 × 2 comparison tests were used to assess the presence of concern as a function of their response to the global assessment question (Table  2 ). Since there was only a small number of major concerns for each item, minor and major concerns were combined (“any concerns”). All items except four (items 10, 12, 13 and 14) showed a statistically significant difference ( P  < 0.01). Tests of between-subjects effects was used to compare between total O-RON score and response to the global assessment question, which showed a correlation between global response and frequency of concerns ( r  = 0.627, P  < 0.001).

The g-study results showed that people (object of measurement) accounted for 54% of the variance. Rotation did not account for any variance indicating that ratings were similar across all nine rotations. The decision study results showed that with three forms per resident, the reliability was 0.78 and with four forms, the reliability was 0.82.

Qualitative

Factors impacting the implementation of the o-ron.

Five themes were identified as factors that had an impact, whether positive or negative, on the implementation of the O-RON (Table  3 ).

Strong project lead on the unit

Units where clinical managers described strong involvement of a lead person (usually themselves) who was persistent in reminding nurses to complete O-RONs and were passionate about using the tool had higher completion O-RON rates. Conversely, if there was not such a strong lead, there was a much lower O-RON completion rate.

“If I was to step away from this position and it was a different manager coming in, would they do the same that I would do in this process, I don’t know. So[…]I know it works okay for me because […] I don’t see it as a huge investment of time[…]but if I’m off or I’m not here[…]it’s finding a nurse who would be responsible to do it.” (Participant 2) . “[…]from the leadership perspective, we talk about it, but we don’t own it […] The feedback doesn’t change anything to me as a leader, as a manager. […] Not that I don’t concentrate on the O-RON, I do talk about it, but I’m not passionate about it.” (Participant 4) .

Familiarity with residents

Clinical managers expressed the importance of having collegial relationships with the residents. This was usually facilitated by having a smaller number of residents or having in-person ward rounds. Because of this, the nurses knew the residents better, had more time to work with them personally, and were able to match their faces to their names more frequently. Conversely, if a unit employed virtual rounds, had a lot of residents, or mainly used technology to communicate with residents, the nurses were unfamiliar with the residents and felt they were not able to comment as easily on resident performance.

“So with our group, […] our […] residents, is tiny. There’s two of them on at a time in a month. Maybe only one. So, […] they’re here 24 hours, with our nurses, working, they get to know each other quite well, so, that could be a contributing factor potentially.” (Participant 2) . “Where before we used to have rounds and the residents would come and the staff would come, so we could have that connection with the resident. We could put a face to them, a name to them. We knew who they were. Where, with EPIC [electronic medical record system], first of all the nurses don’t attend EPIC rounds. We don’t see the residents, we don’t see the staff. Like I have no idea, who […] is because I don’t see him. So, it’s very difficult for me to do an evaluation on someone I have not met, not seen, and only see through EPIC. A lot of the conversations the nurses have are also through EPIC, they’ll send an EPIC chat. The resident will email back. So, you know, it’s missing that piece.” (Participant 1) .

Nursing workload

Clinical managers mentioned that completing the O-RON was an additional item to their existing full workload. This was largely driven by an overall shortage of staff and a large number of new nurses joining the units. The new nurses are trying to learn new protocols and clinical skills and had little capacity to do extra work.

“I mean every day we are working short, right? We’re missing one or two nurses. I have nurses from other units, I have nurses that have never been here. So yes, I could see how that would have contributed to having a lower response.” (Participant 1) .

“I’m going to say about 60% of our staff have less than one year experience and we’ve also re-introduced RPNs to the unit. And so the unit right now is really burdened with new staff. But it’s not only new staff, but it’s new staff whose skillset are not as advanced as what they potentially would have been five years ago. And so the staff are really concentrating on beefing up their skillset, just really integrating into the unit. And so, there is really not a lot of thought or concentration necessarily on trying to do the extras, such as doing the surveys.” (Participant 4) .

Work experience of nurses

In addition to new nursing staff having less time for non-essential tasks, clinical managers also pointed out that newer nurses tended to be more hesitant to comment on a resident’s performance compared to a more experienced nurse.

“A lot of junior staff that I don’t know if they would take that initiative to […] put some feedback on a piece of paper for a resident even though it’s almost untraceable to them. You know, a little bit more timid and shy.” (Participant 6) .

“Most of them [those who filled out the form] were the […]mid-career nurses. So, right now, my mid-career nurses have been around for five to ten years. […] And so those nurses are the ones who are still very engaged, wanting to do different projects. Those were the nurses that were doing it, it was not the newer hires, and it was not the nurses who have been here for, you know, 20 + years.” (Participant 4) .

Culture of assessment

All clinical managers interviewed noted that there was not a strong culture of nurses providing any feedback or assessment of residents prior to the implementation of the O-RON. There may have been informal discussions and feedback, but there was no formal process or tool.

Suggestions for improvement

Four suggested areas for improvement of the implementation of the O-RON were identified (Table  3 ).

Mixed leadership roles

Clinical managers suggested that having physicians promote the O-RON in addition to themselves may be helpful.

“But I’m even thinking, like if it didn’t just come from me, if the staff [doctor] would come around and say, “Hey guys, I would really appreciate it.” […] say if it came just from me, from oh the manager is asking for us to fill out another sheet, or something to that effect. It may help a little bit.” (Participant 1) . “I think at the huddle, if one of you can come (Staff physician), although we mention it, but I think it would be important, even if it’s only once a month, you know. […] Or you know, come on the unit anytime and just you know, remind the nurses.” (Participant 3) .

Increase familiarity between nurses and residents

Clinical managers suggested increasing familiarity between nurses and residents by having more in-person rounds where residents regularly attend and involving the residents in the distribution of O-RONs.

“My recommendation would be to bring back rounds, in-person rounds. Also, it would be nice if we would have like an introduction. ‘This is the resident for Team C,’ you know something to that effect. I know they come around and they sit, and they look at EPIC and they chat, but we sometimes don’t make the connection of who is this resident, you know, what team is he part of.” (Participant 1) .

“I guess maybe a suggestion would be to have the residents go around, and not every single day, but maybe once a week, prioritise 30 minutes and take their own surveys and go up to the nursing staff and say, “Hey, I’m looking for your feedback, will you complete this survey for me?” And then hand the nurse the survey that relates directly to that particular resident.” (Participant 4) .

Transparent feedback procedure

Clinical managers highlighted the importance of having a clear loop back procedure that allows the nurses to know that their feedback is being reviewed and shared with the residents. They felt that this is very important for maintaining nursing participation in resident assessment.

“I guess the one question is, they fill this in, but now we’re getting to a point of, how do we know that information or how is that information getting to the residents? What sort of structure is that? So that at least I can have a conversation explaining that yeah, when you fill this in, this is the next steps that happen of how it loops back with the individuals. So I think the further along we get into this and not having that closed loop on it, we may start to lose some engagement because then their maybe not going to see a worth or value to doing it.” (Participant 2) .

Format of the O-RON

Some clinical managers felt having different formats of the O-RON available for use (paper and digital) may increase engagement. They pointed out that some nurses really like the option of a digital version of surveys that they have used in different projects. On the other hand, others pointed out that some of their staff preferred a paper form.

WBAs that rely on observations by physician supervisions is a predominant method used to assess professional competency in the post-graduate medical education curriculum [ 7 ]. However, in a complex training environment where supervisors are unavailable to observe certain aspects of a trainee’s performance, nurses are well-positioned to do so. The O-RON was developed to capture nurses feedback, which is critical in identifying and fostering the development of physician characteristics that improve collaboration between nurses and physicians [ 9 ]. Our study assessed the use of the O-RON in three different residency programs at the University of Ottawa to gather more validity evidence and allow us to generalize results to multiple contexts.

As in the original study, our findings demonstrated strong validity evidence for internal processes, which was demonstrated by the calculation of reliability using the generalizability analysis and decision study. With only four forms per resident, the O-RON had a reliability of 0.82, and with three forms, the O-RON had a reliability of 0.78. A reliability range of 0.8–0.89 is considered acceptable for moderate stakes summative assessments and a reliability range of 0.7–0.79 is considered acceptable for formative assessments [ 17 ]. The results of the 2 × 2 comparison tests highlighted the correlation between global assessment and presence of concern, which reflected that nurses would more likely want to work with a physician who showed no concerning behaviour on the O-RON items. This further supports the consistency of the tool in identifying concerning behaviour through the eyes of nurses.

However, in our study we had substantially fewer forms completed than in the original study (180 forms for 30 residents over nine months versus 1079 forms for 38 residents over 11 months) and less than the intended sample size of 256 forms per specialty. Because of that, we were only able to analyze the data as an aggregate rather than per specialty and were not able to make comparisons between specialty groups. Nonetheless, there was a sufficient number of submitted forms to perform the generalizability analysis and the dependability analysis allowed us to estimate the reliability of the O-RON with a range of submitted forms. Furthermore, the resulting reliability was greater than was obtained in the original study [ 26 ].

To better understand the reasons behind this difference, we conducted semi-structured interviews with the clinical managers on each unit individually. Five major themes were identified that had an impact on the implementation of the O-RON. Better implementation occurred when there was strong leadership for the implementation of the tool, there were a higher number of experienced nurses, and the nurses knew the residents. When these factors were absent, uptake of the tool was limited. Additionally, heavy clinical workloads related to staffing shortages caused both by the COVID pandemic and the current nursing staffing crisis in Canada had a significant negative impact. Furthermore, certain COVID protocols and the implementation of the electronic health record made a lot of nurse-resident interaction more virtual instead of in-person. It is also worth noting that the wards in the original study had an established culture of feedback collected by the clinical managers who reported it on a regular basis using their own form to the residency program director. This may also have contributed to the more successful implantation of the O-RON in the original study.

The barriers to implementation we identified in our study are consistent with the literature on challenges facing implementation of new assessment tools. Local assessment culture, clinical setting, interprofessional dynamics, leadership engagement and time constraint issues have all been previously identified [ 27 , 28 , 29 ]. Our study was able to additionally highlight nursing suggestions to address these barriers, which include mixed leadership roles, ways to improve collegial familiarity, and feedback transparency (Table  3 ).

Despite the challenges identified, clinical managers were appreciative of the O-RON as an avenue for nurses to be assessors and felt that it was a valuable tool. That, in combination with its growing evidence for validity, suggest that future work should be targeted towards addressing the barriers prior to implementation of the O-RON. Our study participants offered several suggestions for this. They also emphasized the importance on ensuring that nurses are made aware of how their assessments will be provided and followed up on with residents.

Our study has limitations. First, there was a relatively smaller number of completed O-RONs compared to what we had anticipated. Because of that, we needed to aggregate the data between all specialties for further analysis rather than analyse them separately. This also led us to pursue the qualitative portion of our study, which characterized why this was the case. This new information may be beneficial for future work. Second, this study was performed in a single university and three specific specialties. To generate further evidence for validity of the O-RON as an assessment tool, implementing the O-RON at different institutions and specialties should be considered.

The O-RON is a useful tool to capture nurses’ assessment of resident performance. The findings of our study demonstrated reliable results in various clinical settings thus adding to the validity of the results. However, understanding the assessment environment and ensuring it has the capacity to perform this assessment is crucial for its successful implementation. Future research should focus on how we can create conditions whereby implementing this tool is feasible from the perspective of nurses.

Data availability

The datasets used and/or analysed during the study are available from the corresponding author on reasonable request.

Abbreviations

Competency-based medical education

Ottawa Resident Observation Form for Nurses

  • Workplace-based assessment

Snell L, Frank JR, Sherbino J. CanMEDS 2015 Physician Competency Framework. Royal College of Physicians & Surgeons of Canada; 2015. https://books.google.ca/books?id=1-iAjgEACAAJ .

McConnell M, Gu A, Arshad A, Mokhtari A, Azzam K. An innovative approach to identifying learning needs for intrinsic CanMEDS roles in continuing professional development. Med Educ Online. 2018;23(1):1497374.

Article   Google Scholar  

Binnendyk J, Pack R, Field E, Watling C. Not wanted on the voyage: highlighting intrinsic CanMEDS gaps in competence by design curricula. Can Med Educ J. 2021;12(4):39–47.

Google Scholar  

Rida TZ, Dubois D, Hui Y, Ghatalia J, McConnell M, LaDonna K. Assessment of CanMEDS Competencies in Work-Based Assessment: Challenges and Lessons Learned. In: 2020 CAS Annual Meeting. 2020. p. 4.

Hawkins RE, Welcher CM, Holmboe ES, Kirk LM, Norcini JJ, Simons KB, et al. Implementation of competency-based medical education: are we addressing the concerns and challenges? Med Educ. 2015;49(11):1086–102.

Epstein RM, Hundert EM. Defining and assessing professional competence. JAMA. 2002;287(2):226–35.

Prentice S, Benson J, Kirkpatrick E, Schuwirth L. Workplace-based assessments in postgraduate medical education: a hermeneutic review. Med Educ. 2020;54(11):981–92.

LaDonna KA, Hatala R, Lingard L, Voyer S, Watling C. Staging a performance: learners’ perceptions about direct observation during residency. Med Educ. 2017;51(5):498–510.

Bhat C, LaDonna KA, Dewhirst S, Halman S, Scowcroft K, Bhat S, et al. Unobserved observers: nurses’ perspectives about sharing feedback on the performance of Resident Physicians. Acad Med. 2022;97(2):271.

Batista-Foguet JM, Saris W, Boyatzis RE, Serlavós R, Velasco Moreno F. Multisource Assessment for Development Purposes: Revisiting the Methodology of Data Analysis. Front Psychol. 2019 Jan 4 [cited 2021 Jan 19];9. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6328456/ .

Allerup P, Aspegren K, Ejlersen E, Jørgensen G, Malchow-Møller A, Møller MK, et al. Use of 360-degree assessment of residents in internal medicine in a Danish setting: a feasibility study. Med Teach. 2007;29(2–3):166–70.

Ogunyemi D, Gonzalez G, Fong A, Alexander C, Finke D, Donnon T, et al. From the eye of the nurses: 360-degree evaluation of residents. J Contin Educ Health Prof. 2009;29(2):105–10.

Bullock AD, Hassell A, Markham WA, Wall DW, Whitehouse AB. How ratings vary by staff group in multi-source feedback assessment of junior doctors. Med Educ. 2009;43(6):516–20.

Castonguay V, Lavoie P, Karazivan P, Morris J, Gagnon R. P030: Multisource feedback for emergency medicine residents: different, relevant and useful information. Can J Emerg Med. 2017;19(S1):S88–88.

Jong M, Elliott N, Nguyen M, Goyke T, Johnson S, Cook M, et al. Assessment of Emergency Medicine Resident performance in an adult Simulation using a Multisource Feedback Approach. West J Emerg Med. 2019;20(1):64–70.

Bharwani A, Swystun D, Paolucci EO, Ball CG, Mack LA, Kassam A. Assessing leadership in junior resident physicians: using a new multisource feedback tool to measure Learning by Evaluation from All-inclusive 360 Degree Engagement of Residents (LEADER). BMJ Leader. 2020;leader.

Downing SM. Reliability: on the reproducibility of assessment data. Med Educ. 2004;38(9):1006–12.

Busch G, Rodríguez Borda MV, Morales PI, Weiss M, Ciambrone G, Costabel JP, et al. Validation of a form for assessing the professional performance of residents in cardiology by nurses. J Educ Health Promot. 2023;12:127.

Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment in competency-based medical education. Med Teach. 2010;32(8):676–82.

Govaerts MJB, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: effects of rater expertise. Adv Health Sci Educ Theory Pract. 2011;16(2):151–65.

Yeates P, O’Neill P, Mann K, Eva K. Seeing the same thing differently: mechanisms that contribute to assessor differences in directly-observed performance assessments. Adv Health Sci Educ Theory Pract. 2013;18(3):325–41.

Briesch AM, Swaminathan H, Welsh M, Chafouleas SM. Generalizability theory: a practical guide to study design, implementation, and interpretation. J Sch Psychol. 2014;52(1):13–35.

Kogan JR, Conforti LN, Iobst WF, Holmboe ES. Reconceptualizing variable rater assessments as both an educational and clinical care problem. Acad Med. 2014;89(5):721–7.

Streiner DL, Norman GR. Health Measurement Scales., Oxford P. 2008 [cited 2021 Sep 4]. https://oxford.universitypressscholarship.com/view/10.1093/acprof:oso/9780199231881.001.0001/acprof-9780199231881 .

American Educational Research Association. Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 2014.

Dudek N, Duffy MC, Wood TJ, Gofton W. The Ottawa Resident Observation Form for Nurses (O-RON): Assessment of Resident Performance through the Eyes of the Nurses. Journal of Surgical Education. 2021 Jun 3 [cited 2021 Sep 4]; https://www.sciencedirect.com/science/article/pii/S1931720421000672 .

Dudek NL, Papp S, Gofton WT. Going Paperless? Issues in converting a Surgical Assessment Tool to an Electronic Version. Teach Learn Med. 2015;27(3):274–9.

Hess LM, Foradori DM, Singhal G, Hicks PJ, Turner TL. PLEASE complete your evaluations! Strategies to Engage Faculty in Competency-based assessments. Acad Pediatr. 2021;21(2):196–200.

Young JQ, Sugarman R, Schwartz J, O’Sullivan PS. Faculty and Resident Engagement with a workplace-based Assessment Tool: use of implementation science to explore enablers and barriers. Acad Med. 2020;95(12):1937–44.

Download references

Acknowledgements

The authors would like to acknowledge Katherine Scowcroft and Amanda Pace for their assistance in managing this project.

This study was funded by a Physicians’ Services Incorporated Foundation Resident Research Grant – Number R22-09, and the University of Ottawa Department of Innovation in Medical Education Health Professions Education Research Grant – Number 603978-152399-2001.

Author information

Authors and affiliations.

Department of Medicine, Division of Physical Medicine & Rehabilitation, University of Ottawa, Ottawa, Canada

Department of Innovation in Medical Education, University of Ottawa, Ottawa, Canada

Timothy J. Wood

Department of Obstetrics and Gynecology, University of Ottawa, Ottawa, Canada

Adam Garber

Department of Medicine, Division of General Internal Medicine, University of Ottawa, Ottawa, Canada

Samantha Halman

Department of Surgery, Division of General Surgery, University of Ottawa, Ottawa, Canada

Janelle Rekman

Department of Surgery, Division of Orthopedic Surgery, University of Ottawa, Ottawa, Canada

Wade Gofton

Department of Medicine, Division of Physical Medicine & Rehabilitation), The Ottawa Hospital, University of Ottawa, Ottawa, ON, Canada

Nancy Dudek

You can also search for this author in PubMed   Google Scholar

Contributions

All authors made contributions to the conception of this project. H.C., T.W. and N.D. made contributions to the data analysis and interpretation. All authors made contributions to the critical revision of the article and final approval of the version to be published.

Corresponding author

Correspondence to Hedva Chiu .

Ethics declarations

Ethics approval and consent to participate.

Ethics approval was granted by the Ottawa Health Science Network Research Ethics Board. All research was carried out in accordance with the Declaration of Helsinki. Informed consent was obtained from all participants of the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Chiu, H., Wood, T.J., Garber, A. et al. The Ottawa resident observation form for nurses (O-RON): evaluation of an assessment tool’s psychometric properties in different specialties. BMC Med Educ 24 , 487 (2024). https://doi.org/10.1186/s12909-024-05476-1

Download citation

Received : 18 October 2023

Accepted : 26 April 2024

Published : 02 May 2024

DOI : https://doi.org/10.1186/s12909-024-05476-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Post-graduate medical education
  • Inter-professional assessment
  • Professionalism

BMC Medical Education

ISSN: 1472-6920

summative assessment in nursing education

IMAGES

  1. 39 Printable Nursing Assessment Forms (+Examples)

    summative assessment in nursing education

  2. Formative vs Summative Assessment Comparison Chart

    summative assessment in nursing education

  3. Nursing Assessment

    summative assessment in nursing education

  4. How Formative Assessment Helps Nursing Students Succeed in the

    summative assessment in nursing education

  5. The Ultimate Guide to Summative Assessments (2023)

    summative assessment in nursing education

  6. Rollins School of Public Health

    summative assessment in nursing education

VIDEO

  1. Summative Assessment: Overview & Examples

  2. Nursing Process Overview: ADPIE (Assessment, Diagnosis, Planning, Implementation and Evaluation)

  3. Formative and Summative Assessment

  4. Understanding the Smarter Balanced Summative Assessment

  5. How to perform a NURSING ASSESSMENT

  6. Patient Education: Importance, Evaluating Understanding, & Methods

COMMENTS

  1. Comparing formative and summative simulation-based assessment in undergraduate nursing students: nursing competency acquisition and clinical simulation satisfaction

    Background. Clinical simulation methodology has increased exponentially over the last few years and has gained acceptance in nursing education. Simulation-based education (SBE) is considered an effective educational methodology for nursing students to achieve the competencies needed for their professional future [1-5].In addition, simulation-based educational programs have demonstrated to be ...

  2. Comparing formative and summative simulation-based assessment in

    Formative and summative evaluation are widely employed in simulated-based assessment. The aims of our study were to evaluate the acquisition of nursing competencies through clinical simulation in undergraduate nursing students and to compare their satisfaction with this methodology using these two evaluation strategies. Two hundred eighteen undergraduate nursing students participated in a ...

  3. Best practices in summative assessment

    The goal of this review is to highlight key elements underpinning excellent high-stakes summative assessment. This guide is primarily aimed at faculty members with the responsibility of assigning student grades and is intended to be a practical tool to help throughout the process of planning, developing, and deploying tests as well as monitoring their effectiveness. After a brief overview of ...

  4. Summative assessment of clinical practice of student nurses: A review

    To evaluate clinical competence assessment in BSc nursing registration education programs. Quantitative and qualitative Preceptors: Focus groups (n = 16) and survey (n = 837)The flexible nature of competencies was valued. Difficulties were found with the wording of competency documentation. Timing of assessment was challenging.

  5. Assessment and Evaluation in Nursing Education: A Simulation ...

    Assessment as learning occurs when students reflect and self-assess their progress to inform their future learning goals (formative assessment). Through this process, students can learn about themselves as learners and become aware of how they learn. Examples of this type of assessment used in the undergraduate nursing space include immersive simulations using manikins or simulated participants.

  6. Summative Simulated-Based Assessment in Nursing Programs

    Conclusion: Summative simulation-based assessments need to be valid, measuring the knowledge and skills they are intended to, and reliable, with results being reproduced by different evaluators and by the same evaluator at another time. [ J Nurs Educ. 2016;55 (6):323-328.]

  7. Assessment and evaluation: Nursing education and ACEN ...

    Assessment and evaluation are essential components in nursing education used to determine program effectiveness, guide decision-making, determine if changes are needed, and to enhance the achievement of student learning (Halstead, 2019). Both formative and summative evaluations provide valuable information that can be used by students, faculty ...

  8. The paradox of an expected level: The assessment of nursing students

    Thus, summative assessment can also be part of the assessment process, as it reflects the students' work (Taras, 2005). The teacher's presence is considered as important as s/he represents the educational institution and has formal responsibility for the assessment process (National Curriculum on Nursing Education, 2008).

  9. Simulation-based summative assessment in healthcare: an overview of key

    There is a critical need for summative assessment in healthcare education [].Summative assessment is high stakes, both for graduation certification and for recertification in continuing medical education [2,3,4,5].Knowing the consequences, the decision to validate or not validate the competencies must be reliable, based on rigorous processes, and supported by data [].

  10. PDF Guiding Principles for Competency-Based Education

    The following key components of competency-based education (CBE) provide a foundation for implementing CBE: outcome competencies, sequenced progression, tailored learning experiences, 1 competency-focused instruction, and programmatic assessment. A shared understanding of the components and approaches of CBE is needed for successful ...

  11. Exploring the formal assessment discussions in clinical nursing

    A summative assessment is a summary of a student's achievements and judgement as to whether he/she has met the required learning outcomes [4, 5]. ... Löfmark A, Thorell-Ekstrand I. Nursing students' and preceptors' perceptions of using a revised assessment form in clinical nursing education. Nurse Educ Prac. 2014;14(3):275-80.

  12. New Trends in Formative-Summative Evaluations for Adult Education

    4. Assessment goal is formative or assessment for learning, that is, to improve the performance during the process but evaluation is summative since it is preformed after the program has been completed to judge the quality. 5. Assessment targets the process, whereas evaluation is aimed to the outcome. 6.

  13. (PDF) Assessment in Nursing education

    Assessment refers to a systematic and continuous process used to measure, document and evaluate the effectiveness of teaching methods, learning processes. and identification of the learning needs ...

  14. Summative, Formative, & Benchmark Assessments in Nursing

    Summative, formative, and benchmark assessments all have their place in nursing education. Because summative and benchmark assessments are evaluative in nature, they can help determine if students are on track with educational objectives; however, formative assessments are better at identifying at-risk students earlier and increasing student ...

  15. PDF Comparing formative and summative simulation-based assessment in

    in nursing education. Simulation-based education (SBE) is considered an effective educational methodology for nursing students to achieve the competencies needed for their professional future [1-5]. In addition, simulation-based educational programs have demonstrated to be more useful than traditional teaching methodologies [4, 6].

  16. How Formative Assessment Helps Nursing Students Succeed in the

    In nursing education, formative assessment has been proven to be highly effective not only for student learning, but for faculty teaching and, as a result, increases the overall quality of learning. ... Summative assessment. The goal is to monitor student learning to provide ongoing feedback during the learning process that can be used by ...

  17. The Ottawa resident observation form for nurses (O-RON): evaluation of

    Workplace-based assessment (WBA) used in post-graduate medical education relies on physician supervisors' feedback. However, in a training environment where supervisors are unavailable to assess certain aspects of a resident's performance, nurses are well-positioned to do so. The Ottawa Resident Observation Form for Nurses (O-RON) was developed to capture nurses' assessment of trainee ...