Menu Trigger

Why Schools Need to Change Yes, We Can Define, Teach, and Assess Critical Thinking Skills

how to assess students critical thinking skills

Jeff Heyck-Williams (He, His, Him) Director of the Two Rivers Learning Institute in Washington, DC

critical thinking

Today’s learners face an uncertain present and a rapidly changing future that demand far different skills and knowledge than were needed in the 20th century. We also know so much more about enabling deep, powerful learning than we ever did before. Our collective future depends on how well young people prepare for the challenges and opportunities of 21st-century life.

Critical thinking is a thing. We can define it; we can teach it; and we can assess it.

While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the advent of the term “21st century skills” and discussions of deeper learning. There is increasing agreement among education reformers that critical thinking is an essential ingredient for long-term success for all of our students.

However, there are still those in the education establishment and in the media who argue that critical thinking isn’t really a thing, or that these skills aren’t well defined and, even if they could be defined, they can’t be taught or assessed.

To those naysayers, I have to disagree. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. In fact, as part of a multi-year Assessment for Learning Project , Two Rivers Public Charter School in Washington, D.C., has done just that.

Before I dive into what we have done, I want to acknowledge that some of the criticism has merit.

First, there are those that argue that critical thinking can only exist when students have a vast fund of knowledge. Meaning that a student cannot think critically if they don’t have something substantive about which to think. I agree. Students do need a robust foundation of core content knowledge to effectively think critically. Schools still have a responsibility for building students’ content knowledge.

However, I would argue that students don’t need to wait to think critically until after they have mastered some arbitrary amount of knowledge. They can start building critical thinking skills when they walk in the door. All students come to school with experience and knowledge which they can immediately think critically about. In fact, some of the thinking that they learn to do helps augment and solidify the discipline-specific academic knowledge that they are learning.

The second criticism is that critical thinking skills are always highly contextual. In this argument, the critics make the point that the types of thinking that students do in history is categorically different from the types of thinking students do in science or math. Thus, the idea of teaching broadly defined, content-neutral critical thinking skills is impossible. I agree that there are domain-specific thinking skills that students should learn in each discipline. However, I also believe that there are several generalizable skills that elementary school students can learn that have broad applicability to their academic and social lives. That is what we have done at Two Rivers.

Defining Critical Thinking Skills

We began this work by first defining what we mean by critical thinking. After a review of the literature and looking at the practice at other schools, we identified five constructs that encompass a set of broadly applicable skills: schema development and activation; effective reasoning; creativity and innovation; problem solving; and decision making.

critical thinking competency

We then created rubrics to provide a concrete vision of what each of these constructs look like in practice. Working with the Stanford Center for Assessment, Learning and Equity (SCALE) , we refined these rubrics to capture clear and discrete skills.

For example, we defined effective reasoning as the skill of creating an evidence-based claim: students need to construct a claim, identify relevant support, link their support to their claim, and identify possible questions or counter claims. Rubrics provide an explicit vision of the skill of effective reasoning for students and teachers. By breaking the rubrics down for different grade bands, we have been able not only to describe what reasoning is but also to delineate how the skills develop in students from preschool through 8th grade.

reasoning rubric

Before moving on, I want to freely acknowledge that in narrowly defining reasoning as the construction of evidence-based claims we have disregarded some elements of reasoning that students can and should learn. For example, the difference between constructing claims through deductive versus inductive means is not highlighted in our definition. However, by privileging a definition that has broad applicability across disciplines, we are able to gain traction in developing the roots of critical thinking. In this case, to formulate well-supported claims or arguments.

Teaching Critical Thinking Skills

The definitions of critical thinking constructs were only useful to us in as much as they translated into practical skills that teachers could teach and students could learn and use. Consequently, we have found that to teach a set of cognitive skills, we needed thinking routines that defined the regular application of these critical thinking and problem-solving skills across domains. Building on Harvard’s Project Zero Visible Thinking work, we have named routines aligned with each of our constructs.

For example, with the construct of effective reasoning, we aligned the Claim-Support-Question thinking routine to our rubric. Teachers then were able to teach students that whenever they were making an argument, the norm in the class was to use the routine in constructing their claim and support. The flexibility of the routine has allowed us to apply it from preschool through 8th grade and across disciplines from science to economics and from math to literacy.

argumentative writing

Kathryn Mancino, a 5th grade teacher at Two Rivers, has deliberately taught three of our thinking routines to students using the anchor charts above. Her charts name the components of each routine and has a place for students to record when they’ve used it and what they have figured out about the routine. By using this structure with a chart that can be added to throughout the year, students see the routines as broadly applicable across disciplines and are able to refine their application over time.

Assessing Critical Thinking Skills

By defining specific constructs of critical thinking and building thinking routines that support their implementation in classrooms, we have operated under the assumption that students are developing skills that they will be able to transfer to other settings. However, we recognized both the importance and the challenge of gathering reliable data to confirm this.

With this in mind, we have developed a series of short performance tasks around novel discipline-neutral contexts in which students can apply the constructs of thinking. Through these tasks, we have been able to provide an opportunity for students to demonstrate their ability to transfer the types of thinking beyond the original classroom setting. Once again, we have worked with SCALE to define tasks where students easily access the content but where the cognitive lift requires them to demonstrate their thinking abilities.

These assessments demonstrate that it is possible to capture meaningful data on students’ critical thinking abilities. They are not intended to be high stakes accountability measures. Instead, they are designed to give students, teachers, and school leaders discrete formative data on hard to measure skills.

While it is clearly difficult, and we have not solved all of the challenges to scaling assessments of critical thinking, we can define, teach, and assess these skills . In fact, knowing how important they are for the economy of the future and our democracy, it is essential that we do.

Jeff Heyck-Williams (He, His, Him)

Director of the two rivers learning institute.

Jeff Heyck-Williams is the director of the Two Rivers Learning Institute and a founder of Two Rivers Public Charter School. He has led work around creating school-wide cultures of mathematics, developing assessments of critical thinking and problem-solving, and supporting project-based learning.

Read More About Why Schools Need to Change

high school student invention team

Nurturing STEM Identity and Belonging: The Role of Equitable Program Implementation in Project Invent

Alexis Lopez (she/her)

May 9, 2024

NGLC's Bravely 2024-2025

Bring Your Vision for Student Success to Life with NGLC and Bravely

March 13, 2024

teacher using Canva on laptop

For Ethical AI, Listen to Teachers

Jason Wilmot

October 23, 2023

how to assess students critical thinking skills

Bookmark this page

  • A Model for the National Assessment of Higher Order Thinking
  • International Critical Thinking Essay Test
  • Online Critical Thinking Basic Concepts Test
  • Online Critical Thinking Basic Concepts Sample Test

Consequential Validity: Using Assessment to Drive Instruction

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

how to assess students critical thinking skills

Critical Thinking Testing and Assessment

The purpose of assessment in instruction is improvement. The purpose of assessing instruction for critical thinking is improving the teaching of discipline-based thinking (historical, biological, sociological, mathematical, etc.) It is to improve students’ abilities to think their way through content using disciplined skill in reasoning. The more particular we can be about what we want students to learn about critical thinking, the better we can devise instruction with that particular end in view.

how to assess students critical thinking skills

The Foundation for Critical Thinking offers assessment instruments which share in the same general goal: to enable educators to gather evidence relevant to determining the extent to which instruction is teaching students to think critically (in the process of learning content). To this end, the Fellows of the Foundation recommend:

that academic institutions and units establish an oversight committee for critical thinking, and

that this oversight committee utilizes a combination of assessment instruments (the more the better) to generate incentives for faculty, by providing them with as much evidence as feasible of the actual state of instruction for critical thinking.

The following instruments are available to generate evidence relevant to critical thinking teaching and learning:

Course Evaluation Form : Provides evidence of whether, and to what extent, students perceive faculty as fostering critical thinking in instruction (course by course). Machine-scoreable.

Online Critical Thinking Basic Concepts Test : Provides evidence of whether, and to what extent, students understand the fundamental concepts embedded in critical thinking (and hence tests student readiness to think critically). Machine-scoreable.

Critical Thinking Reading and Writing Test : Provides evidence of whether, and to what extent, students can read closely and write substantively (and hence tests students' abilities to read and write critically). Short-answer.

International Critical Thinking Essay Test : Provides evidence of whether, and to what extent, students are able to analyze and assess excerpts from textbooks or professional writing. Short-answer.

Commission Study Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Based on the California Commission Study . Short-answer.

Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Short-answer.

Protocol for Interviewing Students Regarding Critical Thinking : Provides evidence of whether, and to what extent, students are learning to think critically at a college or university. Can be adapted for high school). Short-answer. 

Criteria for Critical Thinking Assignments : Can be used by faculty in designing classroom assignments, or by administrators in assessing the extent to which faculty are fostering critical thinking.

Rubrics for Assessing Student Reasoning Abilities : A useful tool in assessing the extent to which students are reasoning well through course content.  

All of the above assessment instruments can be used as part of pre- and post-assessment strategies to gauge development over various time periods.

Consequential Validity

All of the above assessment instruments, when used appropriately and graded accurately, should lead to a high degree of consequential validity. In other words, the use of the instruments should cause teachers to teach in such a way as to foster critical thinking in their various subjects. In this light, for students to perform well on the various instruments, teachers will need to design instruction so that students can perform well on them. Students cannot become skilled in critical thinking without learning (first) the concepts and principles that underlie critical thinking and (second) applying them in a variety of forms of thinking: historical thinking, sociological thinking, biological thinking, etc. Students cannot become skilled in analyzing and assessing reasoning without practicing it. However, when they have routine practice in paraphrasing, summariz­ing, analyzing, and assessing, they will develop skills of mind requisite to the art of thinking well within any subject or discipline, not to mention thinking well within the various domains of human life.

For full copies of this and many other critical thinking articles, books, videos, and more, join us at the Center for Critical Thinking Community Online - the world's leading online community dedicated to critical thinking!   Also featuring interactive learning activities, study groups, and even a social media component, this learning platform will change your conception of intellectual development.

Promoting and Assessing Critical Thinking

Critical thinking is a high priority outcome of higher education – critical thinking skills are crucial for independent thinking and problem solving in both our students’ professional and personal lives. But, what does it mean to be a critical thinker and how do we promote and assess it in our students? Critical thinking can be defined as being able to examine an issue by breaking it down, and evaluating it in a conscious manner, while providing arguments/evidence to support the evaluation. Below are some suggestions for promoting and assessing critical thinking in our students.

Thinking through inquiry

Asking questions and using the answers to understand the world around us is what drives critical thinking. In inquiry-based instruction, the teacher asks students leading questions to draw from them information, inferences, and predictions about a topic. Below are some example generic question stems that can serve as prompts to aid in generating critical thinking questions. Consider providing prompts such as these to students to facilitate their ability to also ask these questions of themselves and others. If we want students to generate good questions on their own, we need to teach them how to do so by providing them with the structure and guidance of example questions, whether in written form, or by our use of questions in the classroom.

Generic question stems

  • What are the strengths and weaknesses of …?
  • What is the difference between … and …?
  • Explain why/how …?
  • What would happen if …?
  • What is the nature of …?
  • Why is … happening?
  • What is a new example of …?
  • How could … be used to …?
  • What are the implications of …?
  • What is … analogous to?
  • What do we already know about …?
  • How does … affect …?
  • How does … tie in with what we have learned before?
  • What does … mean?
  • Why is … important?
  • How are … and … similar/different?
  • How does … apply to everyday life?
  • What is a counterarguement for …?
  • What is the best …and why?
  • What is a solution to the problem of …?
  • Compare … and … with regard to …?
  • What do you think causes …? Why?
  • Do you agree or disagree with this statement? What evidence is there to support your answer?
  • What is another way to look at …?

Critical thinking through writing

Another essential ingredient in critical thinking instruction is the use of writing. Writing converts students from passive to active learners and requires them to identify issues and formulate hypotheses and arguments. The act of writing requires students to focus and clarify their thoughts before putting them down on paper, hence taking them through the critical thinking process. Writing requires that students make important critical choices and ask themselves (Gocsik, 2002):

  • What information is most important?
  • What might be left out?
  • What is it that I think about this subject?
  • How did I arrive at what I think?
  • What are my assumptions? Are they valid?
  • How can I work with facts, observations, and so on, in order to convince others of what I think?
  • What do I not yet understand?

Consider providing the above questions to students so that they can evaluate their own writing as well. Some suggestions for critical thinking writing activities include:

  • Give students raw data and ask them to write an argument or analysis based on the data.
  • Have students explore and write about unfamiliar points of view or “what if” situations.
  • Think of a controversy in your field, and have the students write a dialogue between characters with different points of view.
  • Select important articles in your field and ask the students to write summaries or abstracts of them. Alternately, you could ask students to write an abstract of your lecture.
  • Develop a scenario that place students in realistic situations relevant to your discipline, where they must reach a decision to resolve a conflict.

See the Centre for Teaching Excellence (CTE) teaching tip “ Low-Stakes Writing Assignments ” for critical thinking writing assignments.

Critical thinking through group collaboration

Opportunities for group collaboration could include discussions, case studies, task-related group work, peer review, or debates. Group collaboration is effective for promoting critical thought because:

  • An effective team has the potential to produce better results than any individual,
  • Students are exposed to different perspectives while clarifying their own ideas,
  • Collaborating on a project or studying with a group for an exam generally stimulates interest and increases the understanding and knowledge of the topic.

See the CTE teaching tip “ Group Work in the Classroom: Types of Small Groups ” for suggestions for forming small groups in your classroom.

Assessing critical thinking skills

You can also use the students’ responses from the activities that promote critical thinking to assess whether they are, indeed, reaching your critical thinking goals. It is important to establish clear criteria for evaluating critical thinking. Even though many of us may be able to identify critical thinking when we see it, explicitly stated criteria help both students and teachers know the goal toward which they are working. An effective criterion measures which skills are present, to what extent, and which skills require further development. The following are characteristics of work that may demonstrate effective critical thinking:

  • Accurately and thoroughly interprets evidence, statements, graphics, questions, literary elements, etc.
  • Asks relevant questions.
  • Analyses and evaluates key information, and alternative points of view clearly and precisely.
  • Fair-mindedly examines beliefs, assumptions, and opinions and weighs them against facts.
  • Draws insightful, reasonable conclusions.
  • Justifies inferences and opinions.
  • Thoughtfully addresses and evaluates major alternative points of view.
  • Thoroughly explains assumptions and reasons.

It is also important to note that assessment is a tool that can be used throughout a course, not just at the end. It is more useful to assess students throughout a course, so you can see if criteria require further clarification and students can test out their understanding of your criteria and receive feedback. Also consider distributing your criteria with your assignments so that students receive guidance about your expectations. This will help them to reflect on their own work and improve the quality of their thinking and writing.

See the CTE teaching tip sheets “ Rubrics ” and “ Responding to Writing Assignments: Managing the Paper Load ” for more information on rubrics.

If you would like support applying these tips to your own teaching, CTE staff members are here to help.  View the  CTE Support  page to find the most relevant staff member to contact. 

  • Gocsik, K. (2002). Teaching Critical Thinking Skills. UTS Newsletter, 11(2):1-4
  • Facione, P.A. and Facione, N.C. (1994). Holistic Critical Thinking Scoring Rubric. Millbrae, CA: California Academic Press. www.calpress.com/rubric.html (retrieved September 2003)
  • King, A. (1995). Inquiring minds really do want to know: using questioning to teach critical thinking. Teaching of Psychology, 22(1): 13-17
  • Wade, C. and Tavris, C. (1987). Psychology (1st ed.) New York: Harper. IN: Wade, C. (1995). Using Writing to Develop and Assess Critical Thinking. Teaching of Psychology, 22(1): 24-28.

teaching tips

Catalog search

Teaching tip categories.

  • Assessment and feedback
  • Blended Learning and Educational Technologies
  • Career Development
  • Course Design
  • Course Implementation
  • Inclusive Teaching and Learning
  • Learning activities
  • Support for Student Learning
  • Support for TAs

APS

  • Teaching Tips

A Brief Guide for Teaching and Assessing Critical Thinking in Psychology

In my first year of college teaching, a student approached me one day after class and politely asked, “What did you mean by the word ‘evidence’?” I tried to hide my shock at what I took to be a very naive question. Upon further reflection, however, I realized that this was actually a good question, for which the usual approaches to teaching psychology provided too few answers. During the next several years, I developed lessons and techniques to help psychology students learn how to evaluate the strengths and weaknesses of scientific and nonscientific kinds of evidence and to help them draw sound conclusions. It seemed to me that learning about the quality of evidence and drawing appropriate conclusions from scientific research were central to teaching critical thinking (CT) in psychology.

In this article, I have attempted to provide guidelines to psychol­ogy instructors on how to teach CT, describing techniques I devel­oped over 20 years of teaching. More importantly, the techniques and approach described below are ones that are supported by scientific research. Classroom examples illustrate the use of the guidelines and how assessment can be integrated into CT skill instruction.

Overview of the Guidelines

Confusion about the definition of CT has been a major obstacle to teaching and assessing it (Halonen, 1995; Williams, 1999). To deal with this problem, we have defined CT as reflective think­ing involved in the evaluation of evidence relevant to a claim so that a sound or good conclusion can be drawn from the evidence (Bensley, 1998). One virtue of this definition is it can be applied to many thinking tasks in psychology. The claims and conclusions psychological scientists make include hypotheses, theoretical state­ments, interpretation of research findings, or diagnoses of mental disorders. Evidence can be the results of an experiment, case study, naturalistic observation study, or psychological test. Less formally, evidence can be anecdotes, introspective reports, commonsense beliefs, or statements of authority. Evaluating evidence and drawing appropriate conclusions along with other skills, such as distin­guishing arguments from nonarguments and finding assumptions, are collectively called argument analysis skills. Many CT experts take argument analysis skills to be fundamental CT skills (e.g., Ennis, 1987; Halpern, 1998). Psychology students need argument analysis skills to evaluate psychological claims in their work and in everyday discourse.

Some instructors expect their students will improve CT skills like argument analysis skills by simply immersing them in challenging course work. Others expect improvement because they use a textbook with special CT questions or modules, give lectures that critically review the literature, or have students complete written assignments. While these and other traditional techniques may help, a growing body of research suggests they are not sufficient to efficiently produce measurable changes in CT skills. Our research on acquisition of argument analysis skills in psychology (Bensley, Crowe, Bernhardt, Buchner, & Allman, in press) and on critical reading skills (Bensley & Haynes, 1995; Spero & Bensley, 2009) suggests that more explicit, direct instruction of CT skills is necessary. These results concur with results of an earlier review of CT programs by Chance (1986) and a recent meta-analysis by Abrami et al., (2008).

Based on these and other findings, the following guidelines describe an approach to explicit instruction in which instructors can directly infuse CT skills and assessment into their courses. With infusion, instructors can use relevant content to teach CT rules and concepts along with the subject matter. Directly infus­ing CT skills into course work involves targeting specific CT skills, making CT rules, criteria, and methods explicit, providing guided practice in the form of exercises focused on assessing skills, and giving feedback on practice and assessments. These components are similar to ones found in effective, direct instruc­tion approaches (Walberg, 2006). They also resemble approaches to teaching CT proposed by Angelo (1995), Beyer (1997), and Halpern (1998). Importantly, this approach has been successful in teaching CT skills in psychology (e.g., Bensley, et al., in press; Bensley & Haynes, 1995; Nieto & Saiz, 2008; Penningroth, Despain, & Gray, 2007). Directly infusing CT skill instruction can also enrich content instruction without sacrificing learning of subject matter (Solon, 2003). The following seven guidelines, illustrated by CT lessons and assessments, explicate this process.

Seven Guidelines for Teaching and Assessing Critical Thinking

1. Motivate your students to think critically

Critical thinking takes effort. Without proper motivation, students are less inclined to engage in it. Therefore, it is good to arouse interest right away and foster commitment to improving CT throughout a course. One motivational strategy is to explain why CT is important to effective, professional behavior. Often, telling a compelling story that illustrates the consequences of failing to think critically can mo­tivate students. For example, the tragic death of 10-year-old Candace Newmaker at the hands of her therapists practicing attachment therapy illustrates the perils of using a therapy that has not been supported by good empirical evidence (Lilienfeld, 2007).

Instructors can also pique interest by taking a class poll posing an interesting question on which students are likely to have an opinion. For example, asking students how many think that the full moon can lead to increases in abnormal behavior can be used to introduce the difference between empirical fact and opinion or common sense belief. After asking students how psychologists answer such questions, instructors might go over the meta-analysis of Rotton and Kelly (1985). Their review found that almost all of the 37 studies they reviewed showed no association between the phase of the moon and abnormal behavior with only a few, usually poorly, controlled studies supporting it. Effect size over all stud­ies was very small (.01). Instructors can use this to illustrate how psychologists draw a conclusion based on the quality and quantity of research studies as opposed to what many people commonly believe. For other interesting thinking errors and misconceptions related to psychology, see Bensley (1998; 2002; 2008), Halpern (2003), Ruscio (2006), Stanovich (2007), and Sternberg (2007).

Attitudes and dispositions can also affect motivation to think critically. If students lack certain CT dispositions such as open-mindedness, fair-mindedness, and skepticism, they will be less likely to think critically even if they have CT skills (Halpern, 1998). Instructors might point out that even great scientists noted for their powers of reasoning sometimes fail to think critically when they are not disposed to use their skills. For example, Alfred Russel Wallace who used his considerable CT skills to help develop the concept of natural selection also believed in spiritualistic contact with the dead. Despite considerable evidence that mediums claiming to contact the dead were really faking such contact, Wallace continued to believe in it (Bensley, 2006). Likewise, the great American psychologist William James, whose reasoning skills helped him develop the seeds of important contemporary theories, believed in spiritualism despite evidence to the contrary.

2. Clearly state the CT goals and objectives for your class

Once students are motivated, the instructor should focus them on what skills they will work on during the course. The APA task force on learning goals and objectives for psychology listed CT as one of 10 major goals for students (Halonen et al., 2002). Under critical thinking they have further specified outcomes such as evaluating the quality of information, identifying and evaluating the source and credibility of information, recognizing and defending against think­ing errors and fallacies. Instructors should publish goals like these in their CT course objectives in their syllabi and more specifically as assignment objectives in their assignments. Given the pragmatic penchant of students for studying what is needed to succeed in a course, this should help motivate and focus them.

To make instruction efficient, course objectives and lesson ob­jectives should explicitly target CT skills to be improved. Objectives should specify the behavior that will change in a way that can be measured. A course objective might read, “After taking this course, you will be able to analyze arguments found in psychological and everyday discussions.” When the goal of a lesson is to practice and improve specific microskills that make up argument analysis, an assignment objective might read “After successfully completing this assignment, you will be able to identify different kinds of evidence in a psychological discussion.” Or another might read “After suc­cessfully completing this assignment, you will be able to distinguish arguments from nonarguments.” Students might demonstrate they have reached these objectives by showing the behavior of correctly labeling the kinds of evidence presented in a passage or by indicating whether an argument or merely a claim has been made. By stating objectives in the form of assessable behaviors, the instructor can test these as assessment hypotheses.

Sometimes when the goal is to teach students how to decide which CT skills are appropriate in a situation, the instructor may not want to identify specific skills. Instead, a lesson objective might read, “After successfully completing this assignment, you will be able to decide which skills and knowledge are appropriate for criti­cally analyzing a discussion in psychology.”

3. Find opportunities to infuse CT that fit content and skill requirements of your course

To improve their CT skills, students must be given opportunities to practice them. Different courses present different opportunities for infusion and practice. Stand-alone CT courses usually provide the most opportunities to infuse CT. For example, the Frostburg State University Psychology Department has a senior seminar called “Thinking like a Psychologist” in which students complete lessons giving them practice in argument analysis, critical reading, critically evaluating information on the Internet, distinguishing science from pseudoscience, applying their knowledge and CT skills in simula­tions of psychological practice, and other activities.

In more typical subject-oriented courses, instructors must find specific content and types of tasks conducive to explicit CT skill instruction. For example, research methods courses present several opportunities to teach argument analysis skills. Instructors can have students critically evaluate the quality of evidence provided by studies using different research methods and designs they find in PsycINFO and Internet sources. This, in turn, could help students write better critical evaluations of research for research reports.

A cognitive psychology teacher might assign a critical evalu­ation of the evidence on an interesting question discussed in text­book literature reviews. For example, students might evaluate the evidence relevant to the question of whether people have flashbulb memories such as accurately remembering the 9-11 attack. This provides the opportunity to teach them that many of the studies, although informative, are quasi-experimental and cannot show causation. Or, students might analyze the arguments in a TV pro­gram such as the fascinating Nova program Kidnapped by Aliens on people who recall having been abducted by aliens.

4. Use guided practice, explicitly modeling and scaffolding CT.

Guided practice involves modeling and supporting the practice of target skills, and providing feedback on progress towards skill attainment. Research has shown that guided practice helps student more efficiently acquire thinking skills than unguided and discovery approaches (Meyer, 2004).

Instructors can model the use of CT rules, criteria, and proce­dures for evaluating evidence and drawing conclusions in many ways. They could provide worked examples of problems, writing samples displaying good CT, or real-world examples of good and bad thinking found in the media. They might also think out loud as they evaluate arguments in class to model the process of thinking.

To help students learn to use complex rules in thinking, instruc­tors should initially scaffold student thinking. Scaffolding involves providing product guidelines, rules, and other frameworks to support the process of thinking. Table 1 shows guidelines like those found in Bensley (1998) describing nonscientific kinds of evidence that can support student efforts to evaluate evidence in everyday psychologi­cal discussions. Likewise, Table 2 provides guidelines like those found in Bensley (1998) and Wade and Tavris (2005) describing various kinds of scientific research methods and designs that differ in the quality of evidence they provide for psychological arguments.

In the cognitive lesson on flashbulb memory described earlier, students use the framework in Table 2 to evaluate the kinds of evidence in the literature review. Table 1 can help them evaluate the kinds of evidence found in the Nova video Kidnapped by Aliens . Specifically, they could use it to contrast scientific authority with less credible authority. The video includes statements by scientific authorities like Elizabeth Loftus based on her extensive research contrasted with the nonscientific authority of Bud Hopkins, an artist turned hypnotherapist and author of popular books on alien abduction. Loftus argues that the memories of alien abduction in the children interviewed by Hopkins were reconstructed around the suggestive interview questions he posed. Therefore, his conclu­sion that the children and other people in the video were recalling actual abduction experiences was based on anecdotes, unreliable self-reports, and other weak evidence.

Modeling, scaffolding, and guided practice are especially useful in helping students first acquire CT skills. After sufficient practice, however, instructors should fade these and have students do more challenging assignments without these supports to promote transfer.

5. Align assessment with practice of specific CT skills

Test questions and other assessments of performance should be similar to practice questions and problems in the skills targeted but differ in content. For example, we have developed a series of practice and quiz questions about the kinds of evidence found in Table 1 used in everyday situations but which differ in subject matter from practice to quiz. Likewise, other questions employ research evidence examples corresponding to Table 2. Questions ask students to identify kinds of evidence, evaluate the quality of the evidence, distinguish arguments from nonarguments, and find assumptions in the examples with practice examples differing in content from assessment items.

6. Provide feedback and encourage students to reflect on it

Instructors should focus feedback on the degree of attainment of CT skill objectives in the lesson or assessment. The purpose of feedback is to help students learn how to correct faulty thinking so that in the future they monitor their thinking and avoid such problems. This should increase their metacognition or awareness and control of their thinking, an important goal of CT instruction (Halpern, 1998).

Students must use their feedback for it to improve their CT skills. In the CT exercises and critical reading assignments, students receive feedback in the form of corrected responses and written feedback on open-ended questions. They should be advised that paying attention to feedback on earlier work and assessments should improve their performance on later assessments.

7. Reflect on feedback and assessment results to improve CT instruction

Instructors should use the feedback they provide to students and the results of ongoing assessments to ‘close the loop,’ that is, use these outcomes to address deficiencies in performance and improve instruction. In actual practice, teaching and assessment strategies rarely work optimally the first time. Instructors must be willing to tinker with these to make needed improvements. Reflec­tion on reliable and valid assessment results provides a scientific means to systematically improve instruction and assessment.

Instructors may find the direct infusion approach as summarized in the seven guidelines to be efficient, especially in helping students acquire basic CT skills, as research has shown. They may especially appreciate how it allows them to take a scientific approach to the improvement of instruction. Although the direct infusion approach seems to efficiently promote acquisition of CT skills, more research is needed to find out if students transfer their skills outside of the class­room or whether this approach needs adjustment to promote transfer.

Table 1. Strengths and Weaknesses of Nonscientific Sources and Kinds of Evidence

Table 2. Strengths and Weaknesses of Scientific Research Methods/Designs Used as Sources of Evidence

Abrami, P. C., Bernard, R. M., Borokhovhovski, E., Wade, A., Surkes, M. A., Tamim, R., et al., (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4 , 1102–1134.

Angelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology , 22(1), 6–7.

Bensley, D.A. (1998). Critical thinking in psychology: A unified skills approach. Pacific Grove, CA: Brooks/Cole.

Bensley, D.A. (2002). Science and pseudoscience: A critical thinking primer. In M. Shermer (Ed.), The Skeptic encyclopedia of pseudoscience. (pp. 195–203). Santa Barbara, CA: ABC–CLIO.

Bensley, D.A. (2006). Why great thinkers sometimes fail to think critically. Skeptical Inquirer, 30, 47–52.

Bensley, D.A. (2008). Can you learn to think more like a psychologist? The Psychologist, 21, 128–129.

Bensley, D.A., Crowe, D., Bernhardt, P., Buckner, C., & Allman, A. (in press). Teaching and assessing critical thinking skills for argument analysis in psychology. Teaching of Psychology .

Bensley, D.A. & Haynes, C. (1995). The acquisition of general purpose strategic knowledge for argumentation. Teaching of Psychology, 22 , 41–45.

Beyer, B.K. (1997). Improving student thinking: A comprehensive approach . Boston: Allyn & Bacon.

Chance, P. (1986) Thinking in the classroom: A review of programs . New York: Instructors College Press.

Ennis, R.H. (1987). A taxonomy of critical thinking dispositions and abilities. In J. B. Baron & R. F. Sternberg (Eds.). Teaching thinking skills: Theory and practice (pp. 9–26). New York: Freeman.

Halonen, J.S. (1995). Demystifying critical thinking. Teaching of Psychology, 22 , 75–81.

Halonen, J.S., Appleby, D.C., Brewer, C.L., Buskist, W., Gillem, A. R., Halpern, D. F., et al. (APA Task Force on Undergraduate Major Competencies). (2002) Undergraduate psychology major learning goals and outcomes: A report. Washington, DC: American Psychological Association. Retrieved August 27, 2008, from http://www.apa.org/ed/pcue/reports.html .

Halpern, D.F. (1998). Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist , 53 , 449–455.

Halpern, D.F. (2003). Thought and knowledge: An introduction to critical thinking . (3rd ed.). Mahwah, NJ: Erlbaum.

Lilienfeld, S.O. (2007). Psychological treatments that cause harm. Perspectives on Psychological Science , 2 , 53–70.

Meyer, R.E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist , 59 , 14–19.

Nieto, A.M., & Saiz, C. (2008). Evaluation of Halpern’s “structural component” for improving critical thinking. The Spanish Journal of Psychology , 11 ( 1 ), 266–274.

Penningroth, S.L., Despain, L.H., & Gray, M.J. (2007). A course designed to improve psychological critical thinking. Teaching of Psychology , 34 , 153–157.

Rotton, J., & Kelly, I. (1985). Much ado about the full moon: A meta-analysis of lunar-lunacy research. Psychological Bulletin , 97 , 286–306.

Ruscio, J. (2006). Critical thinking in psychology: Separating sense from nonsense. Belmont, CA: Wadsworth.

Solon, T. (2007). Generic critical thinking infusion and course content learning in introductory psychology. Journal of Instructional Psychology , 34(2), 972–987.

Stanovich, K.E. (2007). How to think straight about psychology . (8th ed.). Boston: Pearson.

Sternberg, R.J. (2007). Critical thinking in psychology: It really is critical. In R. J. Sternberg, H. L. Roediger, & D. F. Halpern (Eds.), Critical thinking in psychology. (pp. 289–296) . Cambridge, UK: Cambridge University Press.

Wade, C., & Tavris, C. (2005) Invitation to psychology. (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Walberg, H.J. (2006). Improving educational productivity: A review of extant research. In R. F. Subotnik & H. J. Walberg (Eds.), The scientific basis of educational productivity (pp. 103–159). Greenwich, CT: Information Age.

Williams, R.L. (1999). Operational definitions and assessment of higher-order cognitive constructs. Educational Psychology Review , 11 , 411–427.

' src=

Excellent article.

' src=

Interesting and helpful!

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

About the Author

D. Alan Bensley is Professor of Psychology at Frostburg State University. He received his Master’s and PhD degrees in cognitive psychology from Rutgers University. His main teaching and research interests concern the improvement of critical thinking and other cognitive skills. He coordinates assessment for his department and is developing a battery of instruments to assess critical thinking in psychology. He can be reached by email at [email protected] Association for Psychological Science December 2010 — Vol. 23, No. 10

how to assess students critical thinking skills

Student Notebook: Five Tips for Working with Teaching Assistants in Online Classes

Sarah C. Turner suggests it’s best to follow the golden rule: Treat your TA’s time as you would your own.

Teaching Current Directions in Psychological Science

Aimed at integrating cutting-edge psychological science into the classroom, Teaching Current Directions in Psychological Science offers advice and how-to guidance about teaching a particular area of research or topic in psychological science that has been

European Psychology Learning and Teaching Conference

The School of Education of the Paris Lodron University of Salzburg is hosting the next European Psychology Learning and Teaching (EUROPLAT) Conference on September 18–20, 2017 in Salzburg, Austria. The main theme of the conference

Privacy Overview

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, performance assessment of critical thinking: conceptualization, design, and implementation.

how to assess students critical thinking skills

  • 1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
  • 2 Graduate School of Education, Stanford University, Stanford, CA, United States
  • 3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

www.frontiersin.org

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

  • ^ https://www.ipal-rd.com/
  • ^ https://www.aacu.org/value
  • ^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

This article is part of the Research Topic

Assessing Information Processing and Online Reasoning as a Prerequisite for Learning in Higher Education

  • Open access
  • Published: 09 March 2020

Rubrics to assess critical thinking and information processing in undergraduate STEM courses

  • Gil Reynders 1 , 2 ,
  • Juliette Lantz 3 ,
  • Suzanne M. Ruder 2 ,
  • Courtney L. Stanford 4 &
  • Renée S. Cole   ORCID: orcid.org/0000-0002-2807-1500 1  

International Journal of STEM Education volume  7 , Article number:  9 ( 2020 ) Cite this article

68k Accesses

58 Citations

4 Altmetric

Metrics details

Process skills such as critical thinking and information processing are commonly stated outcomes for STEM undergraduate degree programs, but instructors often do not explicitly assess these skills in their courses. Students are more likely to develop these crucial skills if there is constructive alignment between an instructor’s intended learning outcomes, the tasks that the instructor and students perform, and the assessment tools that the instructor uses. Rubrics for each process skill can enhance this alignment by creating a shared understanding of process skills between instructors and students. Rubrics can also enable instructors to reflect on their teaching practices with regard to developing their students’ process skills and facilitating feedback to students to identify areas for improvement.

Here, we provide rubrics that can be used to assess critical thinking and information processing in STEM undergraduate classrooms and to provide students with formative feedback. As part of the Enhancing Learning by Improving Process Skills in STEM (ELIPSS) Project, rubrics were developed to assess these two skills in STEM undergraduate students’ written work. The rubrics were implemented in multiple STEM disciplines, class sizes, course levels, and institution types to ensure they were practical for everyday classroom use. Instructors reported via surveys that the rubrics supported assessment of students’ written work in multiple STEM learning environments. Graduate teaching assistants also indicated that they could effectively use the rubrics to assess student work and that the rubrics clarified the instructor’s expectations for how they should assess students. Students reported that they understood the content of the rubrics and could use the feedback provided by the rubric to change their future performance.

The ELIPSS rubrics allowed instructors to explicitly assess the critical thinking and information processing skills that they wanted their students to develop in their courses. The instructors were able to clarify their expectations for both their teaching assistants and students and provide consistent feedback to students about their performance. Supporting the adoption of active-learning pedagogies should also include changes to assessment strategies to measure the skills that are developed as students engage in more meaningful learning experiences. Tools such as the ELIPSS rubrics provide a resource for instructors to better align assessments with intended learning outcomes.

Introduction

Why assess process skills.

Process skills, also known as professional skills (ABET Engineering Accreditation Commission, 2012 ), transferable skills (Danczak et al., 2017 ), or cognitive competencies (National Research Council, 2012 ), are commonly cited as critical for students to develop during their undergraduate education (ABET Engineering Accreditation Commission, 2012 ; American Chemical Society Committee on Professional Training, 2015 ; National Research Council, 2012 ; Singer et al., 2012 ; The Royal Society, 2014 ). Process skills such as problem-solving, critical thinking, information processing, and communication are widely applicable to many academic disciplines and careers, and they are receiving increased attention in undergraduate curricula (ABET Engineering Accreditation Commission, 2012 ; American Chemical Society Committee on Professional Training, 2015 ) and workplace hiring decisions (Gray & Koncz, 2018 ; Pearl et al., 2019 ). Recent reports from multiple countries (Brewer & Smith, 2011 ; National Research Council, 2012 ; Singer et al., 2012 ; The Royal Society, 2014 ) indicate that these skills are emphasized in multiple undergraduate academic disciplines, and annual polls of about 200 hiring managers indicate that employers may place more importance on these skills than in applicants’ content knowledge when making hiring decisions (Deloitte Access Economics, 2014 ; Gray & Koncz, 2018 ). The assessment of process skills can provide a benchmark for achievement at the end of an undergraduate program and act as an indicator of student readiness to enter the workforce. Assessing these skills may also enable instructors and researchers to more fully understand the impact of active learning pedagogies on students.

A recent meta-analysis of 225 studies by Freeman et al. ( 2014 ) showed that students in active learning environments may achieve higher content learning gains than students in traditional lectures in multiple STEM fields when comparing scores on equivalent examinations. Active learning environments can have many different attributes, but they are commonly characterized by students “physically manipulating objects, producing new ideas, and discussing ideas with others” (Rau et al., 2017 ) in contrast to students sitting and listening to a lecture. Examples of active learning pedagogies include POGIL (Process Oriented Guided Inquiry Learning) (Moog & Spencer, 2008 ; Simonson, 2019 ) and PLTL (Peer-led Team Learning) (Gafney & Varma-Nelson, 2008 ; Gosser et al., 2001 ) in which students work in groups to complete activities with varying levels of guidance from an instructor. Despite the clear content learning gains that students can achieve from active learning environments (Freeman et al., 2014 ), the non-content-gains (including improvements in process skills) in these learning environments have not been explored to a significant degree. Active learning pedagogies such as POGIL and PLTL place an emphasis on students developing non-content skills in addition to content learning gains, but typically only the content learning is assessed on quizzes and exams, and process skills are not often explicitly assessed (National Research Council, 2012 ). In order to fully understand the effects of active learning pedagogies on all aspects of an undergraduate course, evidence-based tools must be used to assess students’ process skill development. The goal of this work was to develop resources that could enable instructors to explicitly assess process skills in STEM undergraduate classrooms in order to provide feedback to themselves and their students about the students’ process skills development.

Theoretical frameworks

The incorporation of these rubrics and other currently available tools for use in STEM undergraduate classrooms can be viewed through the lenses of constructive alignment (Biggs, 1996 ) and self-regulated learning (Zimmerman, 2002 ). The theory of constructivism posits that students learn by constructing their own understanding of knowledge rather than acquiring the meaning from their instructor (Bodner, 1986 ), and constructive alignment extends the constructivist model to consider how the alignment between a course’s intended learning outcomes, tasks, and assessments affects the knowledge and skills that students develop (Biggs, 2003 ). Students are more likely to develop the intended knowledge and skills if there is alignment between the instructor’s intended learning outcomes that are stated at the beginning of a course, the tasks that the instructor and students perform, and the assessment strategies that the instructor uses (Biggs, 1996 , 2003 , 2014 ). The nature of the tasks and assessments indicates what the instructor values and where students should focus their effort when studying. According to Biggs ( 2003 ) and Ramsden ( 1997 ), students see assessments as defining what they should learn, and a misalignment between the outcomes, tasks, and assessments may hinder students from achieving the intended learning outcomes. In the case of this work, the intended outcomes are improved process skills. In addition to aligning the components of a course, it is also critical that students receive feedback on their performance in order to improve their skills. Zimmerman’s theory of self-regulated learning (Zimmerman, 2002 ) provides a rationale for tailoring assessments to provide feedback to both students and instructors.

Zimmerman’s theory of self-regulated learning defines three phases of learning: forethought/planning, performance, and self-reflection. According to Zimmerman, individuals ideally should progress through these three phases in a cycle: they plan a task, perform the task, and reflect on their performance, then they restart the cycle on a new task. If a student is unable to adequately progress through the phases of self-regulated learning on their own, then feedback provided by an instructor may enable the students to do so (Butler & Winne, 1995 ). Thus, one of our criteria when creating rubrics to assess process skills was to make the rubrics suitable for faculty members to use to provide feedback to their students. Additionally, instructors can use the results from assessments to give themselves feedback regarding their students’ learning in order to regulate their teaching. This theory is called self-regulated learning because the goal is for learners to ultimately reflect on their actions to find ways to improve. We assert that, ideally, both students and instructors should be “learners” and use assessment data to reflect on their actions, although with different aims. Students need consistent feedback from an instructor and/or self-assessment throughout a course to provide a benchmark for their current performance and identify what they can do to improve their process skills (Black & Wiliam, 1998 ; Butler & Winne, 1995 ; Hattie & Gan, 2011 ; Nicol & Macfarlane-Dick, 2006 ). Instructors need feedback on the extent to which their efforts are achieving their intended goals in order to improve their instruction and better facilitate the development of process skills through course experiences.

In accordance with the aforementioned theoretical frameworks, tools used to assess undergraduate STEM student process skills should be tailored to fit the outcomes that are expected for undergraduate students and be able to provide formative assessment and feedback to both students and faculty about the students’ skills. These tools should also be designed for everyday classroom use to enable students to regularly self-assess and faculty to provide consistent feedback throughout a semester. Additionally, it is desirable for assessment tools to be broadly generalizable to measure process skills in multiple STEM disciplines and institutions in order to increase the rubrics’ impact on student learning. Current tools exist to assess these process skills, but they each lack at least one of the desired characteristics for providing regular feedback to STEM students.

Current tools to assess process skills

Current tests available to assess critical thinking include the Critical Thinking Assessment Test (CAT) (Stein & Haynes, 2011 ), California Critical Thinking Skills Test (Facione, 1990a , 1990b ), and Watson Glaser Critical Thinking Appraisal (Watson & Glaser, 1964 ). These commercially available, multiple-choice tests are not designed to provide regular, formative feedback throughout a course and have not been implemented for this purpose. Instead, they are designed to provide summative feedback with a focus on assessing this skill at a programmatic or university level rather than for use in the classroom to provide formative feedback to students. Rather than using tests to assess process skills, rubrics could be used instead. Rubrics are effective assessment tools because they can be quick and easy to use, they provide feedback to both students and instructors, and they can evaluate individual aspects of a skill to give more specific feedback (Brookhart & Chen, 2014 ; Smit & Birri, 2014 ). Rubrics for assessing critical thinking are available, but they have not been used to provide feedback to undergraduate STEM students nor were they designed to do so (Association of American Colleges and Universities, 2019 ; Saxton et al., 2012 ). The Critical Thinking Analytic Rubric is designed specifically to assess K-12 students to enhance college readiness and has not been broadly tested in collegiate STEM courses (Saxton et al., 2012 ). The critical thinking rubric developed by the Association of American Colleges and Universities (AAC&U) as part its Valid Assessment of Learning in Undergraduate Education (VALUE) Institute and Liberal Education and America’s Promise (LEAP) initiative (Association of American Colleges and Universities, 2019 ) is intended for programmatic assessment rather than specifically giving feedback to students throughout a course. As with tests for assessing critical thinking, current rubrics to assess critical thinking are not designed to act as formative assessments and give feedback to STEM faculty and undergraduates at the course or task level. Another issue with the assessment of critical thinking is the degree to which the construct is measurable. A National Research Council report (National Research Council, 2011 ) has suggested that there is little evidence of a consistent, measurable definition for critical thinking and that it may not be different from one’s general cognitive ability. Despite this issue, we have found that critical thinking is consistently listed as a programmatic outcome in STEM disciplines (American Chemical Society Committee on Professional Training, 2015 ; The Royal Society, 2014 ), so we argue that it is necessary to support instructors as they attempt to assess this skill.

Current methods for evaluating students’ information processing include discipline-specific tools such as a rubric to assess physics students’ use of graphs and equations to solve work-energy problems (Nguyen et al., 2010 ) and assessments of organic chemistry students’ ability to “[manipulate] and [translate] between various representational forms” including 2D and 3D representations of chemical structures (Kumi et al., 2013 ). Although these assessment tools can be effectively used for their intended context, they were not designed for use in a wide range of STEM disciplines or for a variety of tasks.

Despite the many tools that exist to measure process skills, none has been designed and tested to facilitate frequent, formative feedback to STEM undergraduate students and faculty throughout a semester. The rubrics described here have been designed by the Enhancing Learning by Improving Process Skills in STEM (ELIPSS) Project (Cole et al., 2016 ) to assess undergraduate STEM students’ process skills and to facilitate feedback at the classroom level with the potential to track growth throughout a semester or degree program. The rubrics described here are designed to assess critical thinking and information processing in student written work. Rubrics were chosen as the format for our process skill assessment tools because the highest level of each category in rubrics can serve as an explicit learning outcome that the student is expected to achieve (Panadero & Jonsson, 2013 ). Rubrics that are generalizable to multiple disciplines and institutions can enable the assessment of student learning outcomes and active learning pedagogies throughout a program of study and provide useful tools for a greater number of potential users.

Research questions

This work sought to answer the following research questions for each rubric:

Does the rubric adequately measure relevant aspects of the skill?

How well can the rubrics provide feedback to instructors and students?

Can multiple raters use the rubrics to give consistent scores?

This work received Institutional Review Board approval prior to any data collection involving human subjects. The sources of data used to construct the process skill rubrics and answer these research questions were (1) peer-reviewed literature on how each skill is defined, (2) feedback from content experts in multiple STEM disciplines via surveys and in-person, group discussions regarding the appropriateness of the rubrics for each discipline, (3) interviews with students whose work was scored with the rubrics and teaching assistants who scored the student work, and (4) results of applying the rubrics to samples of student work.

Defining the scope of the rubrics

The rubrics described here and the other rubrics in development by the ELIPSS Project are intended to measure process skills, which are desired learning outcomes identified by the STEM community in recent reports (National Research Council, 2012 ; Singer et al., 2012 ). In order to measure these skills in multiple STEM disciplines, operationalized definitions of each skill were needed. These definitions specify which aspects of student work (operations) would be considered evidence for the student using that skill and establish a shared understanding of each skill by members of each STEM discipline. The starting point for this work was the process skill definitions developed as part of the POGIL project (Cole et al., 2019a ). The POGIL community includes instructors from a variety of disciplines and institutions and represented the intended audience for the rubrics: faculty who value process skills and want to more explicitly assess them. The process skills discussed in this work were defined as follows:

Critical thinking is analyzing, evaluating, or synthesizing relevant information to form an argument or reach a conclusion supported with evidence.

Information processing is evaluating, interpreting, and manipulating or transforming information.

Examples of critical thinking include the tasks that students are asked to perform in a laboratory course. When students are asked to analyze the data they collected, combine data from different sources, and generate arguments or conclusions about their data, we see this as critical thinking. However, when students simply follow the so-called “cookbook” laboratory instructions that require them to confirm pre-determined conclusions, we do not think students are engaging in critical thinking. One example of information processing is when organic chemistry students are required to re-draw molecules in different formats. The students must evaluate and interpret various pieces of one representation, and then they recreate the molecule in another representation. However, if students are asked to simply memorize facts or algorithms to solve problems, we do not see this as information processing.

Iterative rubric development

The development process was the same for the information processing rubric and the critical thinking rubric. After defining the scope of the rubric, an initial version was drafted based upon the definition of the target process skill and how each aspect of the skill is defined in the literature. A more detailed discussion of the literature that informed each rubric category is included in the “Results and Discussion” section. This initial version then underwent iterative testing in which the rubric was reviewed by researchers, practitioners, and students. The rubric was first evaluated by the authors and a group of eight faculty from multiple STEM disciplines who made up the ELIPSS Project’s primary collaborative team (PCT). The PCT was a group of faculty members with experience in discipline-based education research who employ active-learning pedagogies in their classrooms. This initial round of evaluation was intended to ensure that the rubric measured relevant aspects of the skill and was appropriate for each PCT member’s discipline. This evaluation determined how well the rubrics were aligned with each instructor’s understanding of the process skill including both in-person and email discussions that continued until the group came to consensus that each rubric category could be applied to student work in courses within their disciplines. There has been an ongoing debate regarding the role of disciplinary knowledge in critical thinking and the extent to which critical thinking is subject-specific (Davies, 2013 ; Ennis, 1990 ). This work focuses on the creation of rubrics to measure process skills in different domains, but we have not performed cross-discipline comparisons. This initial round of review was also intended to ensure that the rubrics were ready for classroom testing by instructors in each discipline. Next, each rubric was tested over three semesters in multiple classroom environments, illustrated in Table 1 . The rubrics were applied to student work chosen by each PCT member. The PCT members chose the student work based on their views of how the assignments required students to engage in process skills and show evidence of those skills. The information processing and critical thinking rubrics shown in this work were each tested in at least three disciplines, course levels, and institutions.

After each semester, the feedback was collected from the faculty testing the rubric, and further changes to the rubric were made. Feedback was collected in the form of survey responses along with in-person group discussions at annual project meetings. After the first iteration of completing the survey, the PCT members met with the authors to discuss how they were interpreting each survey question. This meeting helped ensure that the surveys were gathering valid data regarding how well the rubrics were measuring the desired process skill. Questions in the survey such as “What aspects of the student work provided evidence for the indicated process skill?” and “Are there edits to the rubric/descriptors that would improve your ability to assess the process skill?” allowed the authors to determine how well the rubric scores were matching the student work and identify necessary changes to the rubric. Further questions asked about the nature and timing of the feedback given to students in order to address the question of how well the rubrics provide feedback to instructors and students. The survey questions are included in the Supporting Information . The survey responses were analyzed qualitatively to determine themes related to each research question.

In addition to the surveys given to faculty rubric testers, twelve students were interviewed in fall 2016 and fall 2017. In the United States of America, the fall semester typically runs from August to December and is the first semester of the academic year. Each student participated in one interview which lasted about 30 min. These interviews were intended to gather further data to answer questions about how well the rubrics were measuring the identified process skills that students were using when they completed their assignments and to ensure that the information provided by the rubrics made sense to students. The protocol for these interviews is included in the Supporting Information . In fall 2016, the students interviewed were enrolled in an organic chemistry laboratory course for non-majors at a large, research-intensive university in the United States. Thirty students agreed to have their work analyzed by the research team, and nine students were interviewed. However, the rubrics were not a component of the laboratory course grading. Instead, the first author assessed the students’ reports for critical thinking and information processing, and then the students were provided electronic copies of their laboratory reports and scored rubrics in advance of the interview. The first author had recently been a graduate teaching assistant for the course and was familiar with the instructor’s expectations for the laboratory reports. During the interview, the students were given time to review their reports and the completed rubrics, and then they were asked about how well they understood the content of the rubrics and how accurately each category score represented their work.

In fall 2017, students enrolled in a physical chemistry thermodynamics course for majors were interviewed. The physical chemistry course took place at the same university as the organic laboratory course, but there was no overlap between participants. Three students and two graduate teaching assistants (GTAs) were interviewed. The course included daily group work, and process skill assessment was an explicit part of the instructor’s curriculum. At the end of each class period, students assessed their groups using portions of ELIPSS rubrics, including the two process skill rubrics included in this paper. About every 2 weeks, the GTAs assessed the student groups with a complete ELIPSS rubric for a particular skill, then gave the groups their scored rubrics with written comments. The students’ individual homework problem sets were assessed once with rubrics for three skills: critical thinking, information processing, and problem-solving. The students received the scored rubric with written comments when the graded problem set was returned to them. In the last third of the semester, the students and GTAs were interviewed about how rubrics were implemented in the course, how well the rubric scores reflected the students’ written work, and how the use of rubrics affected the teaching assistants’ ability to assess the student skills. The protocols for these interviews are included in the Supporting Information .

Gathering evidence for utility, validity, and reliability

The utility, validity, and reliability of the rubrics were measured throughout the development process. The utility is the degree to which the rubrics are perceived as practical to experts and practitioners in the field. Through multiple meetings, the PCT faculty determined that early drafts of the rubric seemed appropriate for use in their classrooms, which represented multiple STEM disciplines. Rubric utility was reexamined multiple times throughout the development process to ensure that the rubrics would remain practical for classroom use. Validity can be defined in multiple ways. For example, the Standards for Educational and Psychological Testing (Joint Committee on Standards for Educational Psychological Testing, 2014 ) defines validity as “the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use.” For the purposes of this work, we drew on the ways in which two distinct types of validity were examined in the rubric literature: content validity and construct validity. Content validity is the degree to which the rubrics cover relevant aspects of each process skill (Moskal & Leydens, 2000 ). In this case, the process skill definition and a review of the literature determined which categories were included in each rubric. The literature review was finished once the data was saturated: when no more new aspects were found. Construct validity is the degree to which the levels of each rubric category accurately reflect the process that students performed (Moskal & Leydens, 2000 ). Evidence of construct validity was gathered via the faculty surveys, teaching assistant interviews, and student interviews. In the student interviews, students were given one of their completed assignments and asked to explain how they completed the task. Students were then asked to explain how well each category applied to their work and if any changes were needed to the rubric to more accurately reflect their process. Due to logistical challenges, we were not able to obtain evidence for convergent validity, and this is further discussed in the “Limitations” section.

Adjacent agreement, also known as “interrater agreement within one,” was chosen as the measure of interrater reliability due to its common use in rubric development projects (Jonsson & Svingby, 2007 ). The adjacent agreement is the percentage of cases in which two raters agree on a rating or are different by one level (i.e., they give adjacent ratings to the same work). Jonsson and Svingby ( 2007 ) found that most of the rubrics they reviewed had adjacent agreement scores of 90% or greater. However, they noted that the agreement threshold varied based on the number of possible levels of performance for each category in the rubric, with three and four being the most common numbers of levels. Since the rubrics discussed in this report have six levels (scores of zero through five) and are intended for low-stakes assessment and feedback, the goal of 80% adjacent agreement was selected. To calculate agreement for the critical thinking and information processing rubrics, two researchers discussed the scoring criteria for each rubric and then independently assessed the organic chemistry laboratory reports.

Results and discussion

The process skill rubrics to assess critical thinking and information processing in student written work were completed after multiple rounds of revision based on feedback from various sources. These sources include feedback from instructors who tested the rubrics in their classrooms, TAs who scored student work with the rubrics, and students who were assessed with the rubrics. The categories for each rubric will be discussed in terms of the evidence that the rubrics measure the relevant aspects of the skill and how they can be used to assess STEM undergraduate student work. Each category discussion will begin with a general explanation of the category followed by more specific examples from the organic chemistry laboratory course and physical chemistry lecture course to demonstrate how the rubrics can be used to assess student work.

Information processing rubric

The definition of information processing and the focus of the rubric presented here (Fig. 1 ) are distinct from cognitive information processing as defined by the educational psychology literature (Driscoll, 2005 ). The rubric shown here is more aligned with the STEM education construct of representational competency (Daniel et al., 2018 ).

figure 1

Rubric for assessing information processing

When solving a problem or completing a task, students must evaluate the provided information for relevance or importance to the task (Hanson, 2008 ; Swanson et al., 1990 ). All the information provided in a prompt (e.g., homework or exam questions) may not be relevant for addressing all parts of the prompt. Students should ideally show evidence of their evaluation process by identifying what information is present in the prompt/model, indicating what information is relevant or not relevant, and indicating why information is relevant. Responses with these characteristics would earn high rubric scores for this category. Although students may not explicitly state what information is necessary to address a task, the information they do use can act as indirect evidence of the degree to which they have evaluated all of the available information in the prompt. Evidence for students inaccurately evaluating information for relevance includes the inclusion of irrelevant information or the omission of relevant information in an analysis or in completing a task. When evaluating the organic chemistry laboratory reports, the focus for the evaluating category was the information students presented when identifying the chemical structure of their products. For students who received a high score, this information included their measured value for the product’s melting point, the literature (expected) value for the melting point, and the peaks in a nuclear magnetic resonance (NMR) spectrum. NMR spectroscopy is a commonly used technique in chemistry to obtain structural information about a compound. Lower scores were given if students omitted any of the necessary information or if they included unnecessary information. For example, if a student discussed their reaction yield when discussing the identity of their product, they would receive a low Evaluating score because the yield does not help them determine the identity of their product; the yield, in this case, would be unnecessary information. In the physical chemistry course, students often did not show evidence that they determined which information was relevant to answer the homework questions and thus earned low evaluating scores. These omissions will be further addressed in the “Interpreting” section.

Interpreting

In addition to evaluating, students must often interpret information using their prior knowledge to explain the meaning of something, make inferences, match data to predictions, and extract patterns from data (Hanson, 2008 ; Nakhleh, 1992 ; Schmidt et al., 1989 ; Swanson et al., 1990 ). Students earn high scores for this category if they assign correct meaning to labeled information (e.g., text, tables, graphs, diagrams), extract specific details from information, explain information in their own words, and determine patterns in information. For the organic chemistry laboratory reports, students received high scores if they accurately interpreted their measured values and NMR peaks. Almost every student obtained melting point values that were different than what was expected due to measurement error or impurities in their products, so they needed to describe what types of impurities could cause such discrepancies. Also, each NMR spectrum contained one peak that corresponded to the solvent used to dissolve the students’ product, so the students needed to use their prior knowledge of NMR spectroscopy to recognize that peak did not correspond to part of their product.

In physical chemistry, the graduate teaching assistant often gave students low scores for inaccurately explaining changes to chemical systems such as changes in pressure or entropy. The graduate teaching assistant who assessed the student work used the rubric to identify both the evaluating and interpreting categories as weaknesses in many of the students’ homework submissions. However, the students often earned high scores for the manipulating and transforming categories, so the GTA was able to give students specific feedback on their areas for improvement while also highlighting their strengths.

Manipulating and transforming (extent and accuracy)

In addition to evaluating and interpreting information, students may be asked to manipulate and transform information from one form to another. These transformations should be complete and accurate (Kumi et al., 2013 ; Nguyen et al., 2010 ). Students may be required to construct a figure based on written information, or conversely, they may transform information in a figure into words or mathematical expressions. Two categories for manipulating and transforming (i.e., extent and accuracy) were included to allow instructors to give more specific feedback. It was often found that students would either transform little information but do so accurately, or transform much information and do so inaccurately; the two categories allowed for differentiated feedback to be provided. As stated above, the organic chemistry students were expected to transform their NMR spectral data into a table and provide a labeled structure of their final product. Students were given high scores if they converted all of the relevant peaks from their spectrum into the table format and were able to correctly match the peaks to the hydrogen atoms in their products. Students received lower scores if they were only able to convert the information for a few peaks or if they incorrectly matched the peaks to the hydrogen atoms.

Critical thinking rubric

Critical thinking can be broadly defined in different contexts, but we found that the categories included in the rubric (Fig. 2 ) represented commonly accepted aspects of critical thinking (Danczak et al., 2017 ) and suited the needs of the faculty collaborators who tested the rubric in their classrooms.

figure 2

Rubric for assessing critical thinking

When completing a task, students must evaluate the relevance of information that they will ultimately use to support a claim or conclusions (Miri et al., 2007 ; Zohar et al., 1994 ). An evaluating category is included in both critical thinking and information processing rubrics because evaluation is a key aspect of both skills. From our previous work developing a problem-solving rubric (manuscript in preparation) and our review of the literature for this work (Danczak et al., 2017 ; Lewis & Smith, 1993 ), the overlap was seen between information processing, critical thinking, and problem-solving. Additionally, while the Evaluating category in the information processing rubric assesses a student’s ability to determine the importance of information to complete a task, the evaluating category in the critical thinking rubric places a heavier emphasis on using the information to support a conclusion or argument.

When scoring student work with the evaluating category, students receive high scores if they indicate what information is likely to be most relevant to the argument they need to make, determine the reliability of the source of their information, and determine the quality and accuracy of the information itself. The information used to assess this category can be indirect as with the Evaluating category in the information processing rubric. In the organic chemistry laboratory reports, students needed to make an argument about whether they successfully produced the desired product, so they needed to discuss which information was relevant to their claims about the product’s identity and purity. Students received high scores for the evaluating category when they accurately determined that the melting point and nearly all peaks except the solvent peak in the NMR spectrum indicated the identity of their product. Students received lower scores for evaluating when they left out relevant information because this was seen as evidence that the student inaccurately evaluated the information’s relevance in supporting their conclusion. They also received lower scores when they incorrectly stated that a high yield indicated a pure product. Students were given the opportunity to demonstrate their ability to evaluate the quality of information when discussing their melting point. Students sometimes struggled to obtain reliable melting point data due to their inexperience in the laboratory, so the rubric provided a way to assess the student’s ability to critique their own data.

In tandem with evaluating information, students also need to analyze that same information to extract meaningful evidence to support their conclusions (Bailin, 2002 ; Lai, 2011 ; Miri et al., 2007 ). The analyzing category provides an assessment of a student’s ability to discuss information and explore the possible meaning of that information, extract patterns from data/information that could be used as evidence for their claims, and summarize information that could be used as evidence. For example, in the organic chemistry laboratory reports, students needed to compare the information they obtained to the expected values for a product. Students received high scores for the analyzing category if they could extract meaningful structural information from the NMR spectrum and their two melting points (observed and expected) for each reaction step.

Synthesizing

Often, students are asked to synthesize or connect multiple pieces of information in order to draw a conclusion or make a claim (Huitt, 1998 ; Lai, 2011 ). Synthesizing involves identifying the relationships between different pieces of information or concepts, identifying ways that different pieces of information or concepts can be combined, and explaining how the newly synthesized information can be used to reach a conclusion and/or support an argument. While performing the organic chemistry laboratory experiments, students obtained multiple types of information such as the melting point and NMR spectrum in addition to other spectroscopic data such as an infrared (IR) spectrum. Students received high scores for this category when they accurately synthesized these multiple data types by showing how the NMR and IR spectra could each reveal different parts of a molecule in order to determine the molecule’s entire structure.

Forming arguments (structure and validity)

The final key aspect of critical thinking is forming a well-structured and valid argument (Facione, 1984 ; Glassner & Schwarz, 2007 ; Lai, 2011 ; Lewis & Smith, 1993 ). It was observed that students can earn high scores for evaluating, analyzing, and synthesizing, but still struggle to form arguments. This was particularly common in assessing problem sets in the physical chemistry course.

As with the manipulating and transforming categories in the information processing rubric, two forming arguments categories were included to allow instructors to give more specific feedback. Some students may be able to include all of the expected structural elements of their arguments but use faulty information or reasoning. Conversely, some students may be able to make scientifically valid claims but not necessarily support them with evidence. The two forming arguments categories are intended to accurately assess both of these scenarios. For the forming arguments (structure) category, students earn high scores if they explicitly state their claim or conclusion, list the evidence used to support the argument, and provide reasoning to link the evidence to their claim/conclusion. Students who do not make a claim or who provide little evidence or reasoning receive lower scores.

For the forming arguments (validity) category, students earn high scores if their claim is accurate and their reasoning is logical and clearly supports the claim with provided evidence. Organic chemistry students earned high scores for the forms and supports arguments categories if they made explicit claims about the identity and purity of their product and provided complete and accurate evidence for their claim(s) such as the melting point values and positions of NMR peaks that correspond to their product. Additionally, the students provided evidence for the purity of their products by pointing to the presence or absence of peaks in their NMR spectrum that would match other potential side products. They also needed to provide logical reasoning for why the peaks indicated the presence or absence of a compound. As previously mentioned, the physical chemistry students received lower scores for the forming arguments categories than for the other aspects of critical thinking. These students were asked to make claims about the relationships between entropy and heat and then provide relevant evidence to justify these claims. Often, the students would make clearly articulated claims but would provide little evidence to support them. As with the information processing rubric, the critical thinking rubric allowed the GTAs to assess aspects of these skills independently and identify specific areas for student improvement.

Validity and reliability

The goal of this work was to create rubrics that can accurately assess student work (validity) and be consistently implemented by instructors or researchers within multiple STEM fields (reliability). The evidence for validity includes the alignment of the rubrics with literature-based descriptions of each skill, review of the rubrics by content experts from multiple STEM disciplines, interviews with undergraduate students whose work was scored using the rubrics, and interviews of the GTAs who scored the student work.

The definitions for each skill, along with multiple iterations of the rubrics, underwent review by STEM content experts. As noted earlier, the instructors who were testing the rubrics were given a survey at the end of each semester and were invited to offer suggested changes to the rubric to better help them assess their students. After multiple rubric revisions, survey responses from the instructors indicated that the rubrics accurately represented the breadth of each process skill as seen in each expert’s content area and that each category could be used to measure multiple levels of student work. By the end of the rubrics’ development, instructors were writing responses such as “N/A” or “no suggestions” to indicate that the rubrics did not need further changes.

Feedback from the faculty also indicated that the rubrics were measuring the intended constructs by the ways they responded to the survey item “What aspects of the student work provided evidence for the indicated process skill?” For example, one instructor noted that for information processing, she saw evidence of the manipulating and transforming categories when “students had to transform their written/mathematical relationships into an energy diagram.” Another instructor elicited evidence of information processing during an in-class group quiz: “A question on the group quiz was written to illicit [sic] IP [information processing]. Students had to transform a structure into three new structures and then interpret/manipulate the structures to compare the pKa values [acidity] of the new structures.” For this instructor, the structures written by the students revealed evidence of their information processing by showing what information they omitted in the new structures or inaccurately transformed. For critical thinking, an instructor assessed short research reports with the critical thinking rubric and “looked for [the students’] ability to use evidence to support their conclusions, to evaluate the literature studies, and to develop their own judgements by synthesizing the information.” Another instructor used the critical thinking rubric to assess their students’ abilities to choose an instrument to perform a chemical analysis. According to the instructor, the students provided evidence of their critical thinking because “in their papers, they needed to justify their choice of instrument. This justification required them to evaluate information and synthesize a new understanding for this specific chemical analysis.”

Analysis of student work indicates multiple levels of achievement for each rubric category (illustrated in Fig. 3 ), although there may have been a ceiling effect for the evaluating and the manipulating and transforming (extent) categories in information processing for organic chemistry laboratory reports because many students earned the highest possible score (five) for those categories. However, other implementations of the ELIPSS rubrics (Reynders et al., 2019 ) have shown more variation in student scores for the two process skills.

figure 3

Student rubric scores from an organic chemistry laboratory course. The two rubrics were used to evaluate different laboratory reports. Thirty students were assessed for information processing and 28 were assessed for critical thinking

To provide further evidence that the rubrics were measuring the intended skills, students in the physical chemistry course were interviewed about their thought processes and how well the rubric scores reflected the work they performed. During these interviews, students described how they used various aspects of information processing and critical thinking skills. The students first described how they used information processing during a problem set where they had to answer questions about a diagram of systolic and diastolic blood pressure. Students described how they evaluated and interpreted the graph to make statements such as “diastolic [pressure] is our y-intercept” and “volume is the independent variable.” The students then demonstrated their ability to transform information from one form to another, from a graph to a mathematic equation, by recognizing “it’s a linear relationship so I used Y equals M X plus B ” and “integrated it cause it’s the change, the change in V [volume]. For critical thinking, students described their process on a different problem set. In this problem set, the students had to explain why the change of Helmholtz energy and the change in Gibbs free energy were equivalent under a certain given condition. Students first demonstrated how they evaluated the relevant information and analyzed what would and would not change in their system. One student said, “So to calculate the final pressure, I think I just immediately went to the ideal gas law because we know the final volume and the number of moles won’t change and neither will the temperature in this case. Well, I assume that it wouldn’t.” Another student showed evidence of their evaluation by writing out all the necessary information in one place and stating, “Whenever I do these types of problems, I always write what I start with which is why I always have this line of information I’m given.” After evaluating and analyzing, students had to form an argument by claiming that the two energy values were equal and then defending that claim. Students explained that they were not always as clear as they could be when justifying their claim. For instance, one student said, “Usually I just write out equations and then hope people understand what I’m doing mathematically” but they “probably could have explained it a little more.”

Student feedback throughout the organic chemistry course and near the end of the physical chemistry course indicated that the rubric scores were accurate representations of the students’ work with a few exceptions. For example, some students felt like they should have received either a lower or higher score for certain categories, but they did say that the categories themselves applied well to their work. Most notably, one student reported that the forms and supports arguments categories in the critical thinking rubric did not apply to her work because she “wasn’t making an argument” when she was demonstrating that the Helmholtz and Gibbs energy values were equal in her thermodynamics assignment. We see this as an instance where some students and instructors may define argument in different ways. The process skill definitions and the rubric categories are meant to articulate intended learning outcomes from faculty members to their students, so if a student defines the skills or categories differently than the faculty member, then the rubrics can serve to promote a shared understanding of the skill.

As previously mentioned, reliability was measured by two researchers assessing ten laboratory reports independently to ensure that multiple raters could use the rubrics consistently. The average adjacent agreement scores were 92% for critical thinking and 93% for information processing. The exact agreement scores were 86% for critical thinking and 88% for information processing. Additionally, two different raters assessed a statistics assignment that was given to sixteen first-year undergraduates. The average pairwise adjacent agreement scores were 89% for critical thinking and 92% for information processing for this assignment. However, the exact agreement scores were much lower: 34% for critical thinking and 36% for information processing. In this case, neither rater was an expert in the content area. While the exact agreement scores for the statistics assignment are much lower than desirable, the adjacent agreement scores do meet the threshold for reliability as seen in other rubrics (Jonsson & Svingby, 2007 ) despite the disparity in expertise. Based on these results, it may be difficult for multiple raters to give exactly the same scores to the same work if they have varying levels of content knowledge, but it is important to note that the rubrics are primarily intended for formative assessment that can facilitate discussions between instructors and students about the ways for students to improve. The high level of adjacent agreement scores indicates that multiple raters can identify the same areas to improve in examples of student work.

Instructor and teaching assistant reflections

The survey responses from faculty members determined the utility of the rubrics. Faculty members reported that when they used the rubrics to define their expectations and be more specific about their assessment criteria, the students seemed to be better able to articulate the areas in which they needed improvement. As one instructor put it, “having the rubrics helped open conversations and discussions” that were not happening before the rubrics were implemented. We see this as evidence of the clear intended learning outcomes that are an integral aspect of achieving constructive alignment within a course. The instructors’ specific feedback to the students, and the students’ increased awareness of their areas for improvement, may enable the students to better regulate their learning throughout a course. Additionally, the survey responses indicated that the faculty members were changing their teaching practices and becoming more cognizant of how assignments did or did not elicit the process skill evidence that they desired. After using the rubrics, one instructor said, “I realize I need to revise many of my activities to more thoughtfully induce process skill development.” We see this as evidence that the faculty members were using the rubrics to regulate their teaching by reflecting on the outcomes of their practices and then planning for future teaching. These activities represent the reflection and forethought/planning aspects of self-regulated learning on the part of the instructors. Graduate teaching assistants in the physical chemistry course indicated that the rubrics gave them a way to clarify the instructor’s expectations when they were interacting with the students. As one GTA said, “It’s giving [the students] feedback on direct work that they have instead of just right or wrong. It helps them to understand like ‘Okay how can I improve? What areas am I lacking in?’” A more detailed account of how the instructors and teaching assistants implemented the rubrics has been reported elsewhere (Cole et al., 2019a ).

Student reflections

Students in both the organic and physical chemistry courses reported that they could use the rubrics to engage in the three phases of self-regulated learning: forethought/planning, performing, and reflecting. In an organic chemistry interview, one student was discussing how they could improve their low score for the synthesizing category of critical thinking by saying “I could use the data together instead of trying to use them separately,” thus demonstrating forethought/planning for their later work. Another student described how they could use the rubric while performing a task: “I could go through [the rubric] as I’m writing a report…and self-grade.” Finally, one student demonstrated how they could use the rubrics to reflect on their areas for improvement by saying that “When you have the five column [earn a score of five], I can understand that I’m doing something right” but “I really need to work on revising my reports.” We see this as evidence that students can use the rubrics to regulate their own learning, although classroom facilitation can have an effect on the ways in which students use the rubric feedback (Cole et al., 2019b ).

Limitations

The process skill definitions presented here represent a consensus understanding among members of the POGIL community and the instructors who participated in this study, but these skills are often defined in multiple ways by various STEM instructors, employers, and students (Danczak et al., 2017 ). One issue with critical thinking, in particular, is the broadness of how the skill is defined in the literature. Through this work, we have evidence via expert review to indicate that our definitions represent common understandings among a set of STEM faculty. Nonetheless, we cannot claim that all STEM instructors or researchers will share the skill definitions presented here.

There is currently a debate in the STEM literature (National Research Council, 2011 ) about whether the critical thinking construct is domain-general or domain-specific, that is, whether or not one’s critical thinking ability in one discipline can be applied to another discipline. We cannot make claims about the generalness of the construct based on the data presented here because the same students were not tested across multiple disciplines or courses. Additionally, we did not gather evidence for convergent validity, which is “the degree to which an operationalized construct is similar to other operationalized constructs that it theoretically should be similar to” (National Research Council, 2011 ). In other words, evidence for convergent validity would be the comparison of multiple measures of information processing or critical thinking. However, none of the instructors who used the ELIPSS rubrics also used a secondary measure of the constructs. Although the rubrics were examined by a multidisciplinary group of collaborators, this group was primarily chemists and included eight faculties from other disciplines, so the content validity of the rubrics may be somewhat limited.

Finally, the generalizability of the rubrics is limited by the relatively small number of students who were interviewed about their work. During their interviews, the students in the organic and physical chemistry courses each said that they could use the rubric scores as feedback to improve their skills. Additionally, as discussed in the “Validity and Reliability” section, the processes described by the students aligned with the content of the rubric and provided evidence of the rubric scores’ validity. However, the data gathered from the student interviews only represents the views of a subset of students in the courses, and further study is needed to determine the most appropriate contexts in which the rubrics can be implemented.

Conclusions and implications

Two rubrics were developed to assess and provide feedback on undergraduate STEM students’ critical thinking and information processing. Faculty survey responses indicated that the rubrics measured the relevant aspects of each process skill in the disciplines that were examined. Faculty survey responses, TA interviews, and student interviews over multiple semesters indicated that the rubric scores accurately reflected the evidence of process skills that the instructors wanted to see and the processes that the students performed when they were completing their assignments. The rubrics showed high inter-rater agreement scores, indicating that multiple raters could identify the same areas for improvement in student work.

In terms of constructive alignment, courses should ideally have alignment between their intended learning outcomes, student and instructor activities, and assessments. By using the ELIPSS rubrics, instructors were able to explicitly articulate the intended learning outcomes of their courses to their students. The instructors were then able to assess and provide feedback to students on different aspects of their process skills. Future efforts will be focused on modifying student assignments to enable instructors to better elicit evidence of these skills. In terms of self-regulated learning, students indicated in the interviews that the rubric scores were accurate representations of their work (performances), could help them reflect on their previous work (self-reflection), and the feedback they received could be used to inform their future work (forethought). Not only did the students indicate that the rubrics could help them regulate their learning, but the faculty members indicated that the rubrics had helped them regulate their teaching. With the individual categories on each rubric, the faculty members were better able to observe their students’ strengths and areas for improvement and then tailor their instruction to meet those needs. Our results indicated that the rubrics helped instructors in multiple STEM disciplines and at multiple institutions reflect on their teaching and then make changes to better align their teaching with their desired outcomes.

Overall, the rubrics can be used in a number of different ways to modify courses or for programmatic assessment. As previously stated, instructors can use the rubrics to define expectations for their students and provide them with feedback on desired skills throughout a course. The rubric categories can be used to give feedback on individual aspects of student process skills to provide specific feedback to each student. If an instructor or department wants to change from didactic lecture-based courses to active learning ones, the rubrics can be used to measure non-content learning gains that stem from the adoption of such pedagogies. Although the examples provided here for each rubric were situated in chemistry contexts, the rubrics were tested in multiple disciplines and institution types. The rubrics have the potential for wide applicability to assess not only laboratory reports but also homework assignments, quizzes, and exams. Assessing these tasks provides a way for instructors to achieve constructive alignment between their intended outcomes and their assessments, and the rubrics are intended to enhance this alignment to improve student process skills that are valued in the classroom and beyond.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

American Association of Colleges and Universities

Critical Thinking Assessment Test

Comprehensive University

Enhancing Learning by Improving Process Skills in STEM

Liberal Education and America’s Promise

Nuclear Magnetic Resonance

Primary Collaborative Team

Peer-led Team Learning

Process Oriented Guided Inquiry Learning

Primarily Undergraduate Institution

Research University

Science, Technology, Engineering, and Mathematics

Valid Assessment of Learning in Undergraduate Education

ABET Engineering Accreditation Commission. (2012). Criteria for Accrediting Engineering Programs . Retrieved from http://www.abet.org/accreditation/accreditation-criteria/criteria-for-accrediting-engineering-programs-2016-2017/ .

American Chemical Society Committee on Professional Training. (2015). Unergraduate Professional Education in Chemistry: ACS Guidelines and Evaluation Procedures for Bachelor's Degree Programs . Retrieved from https://www.acs.org/content/dam/acsorg/about/governance/committees/training/2015-acs-guidelines-for-bachelors-degree-programs.pdf

Association of American Colleges and Universities. (2019). VALUE Rubric Development Project. Retrieved from https://www.aacu.org/value/rubrics .

Bailin, S. (2002). Critical Thinking and Science Education. Science and Education, 11 , 361–375.

Article   Google Scholar  

Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32 (3), 347–364.

Biggs, J. (2003). Aligning teaching and assessing to course objectives. Teaching and learning in higher education: New trends and innovations, 2 , 13–17.

Google Scholar  

Biggs, J. (2014). Constructive alignment in university teaching. HERDSA Review of higher education, 1 (1), 5–22.

Black, P., & Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5 (1), 7–74.

Bodner, G. M. (1986). Constructivism: A theory of knowledge. Journal of Chemical Education, 63 (10), 873–878.

Brewer, C. A., & Smith, D. (2011). Vision and change in undergraduate biology education: a call to action. American Association for the Advancement of Science . DC : Washington .

Brookhart, S. M., & Chen, F. (2014). The quality and effectiveness of descriptive rubrics. Educational Review , 1–26.

Butler, D. L., & Winne, P. H. (1995). Feedback and Self-Regulated Learning: A Theoretical Synthesis. Review of Educational Research, 65 (3), 245–281.

Cole, R., Lantz, J., & Ruder, S. (2016). Enhancing Learning by Improving Process Skills in STEM. Retrieved from http://www.elipss.com .

Cole, R., Lantz, J., & Ruder, S. (2019a). PO: The Process. In S. R. Simonson (Ed.), POGIL: An Introduction to Process Oriented Guided Inquiry Learning for Those Who Wish to Empower Learners (pp. 42–68). Sterling, VA: Stylus Publishing.

Cole, R., Reynders, G., Ruder, S., Stanford, C., & Lantz, J. (2019b). Constructive Alignment Beyond Content: Assessing Professional Skills in Student Group Interactions and Written Work. In M. Schultz, S. Schmid, & G. A. Lawrie (Eds.), Research and Practice in Chemistry Education: Advances from the 25 th IUPAC International Conference on Chemistry Education 2018 (pp. 203–222). Singapore: Springer.

Chapter   Google Scholar  

Danczak, S., Thompson, C., & Overton, T. (2017). ‘What does the term Critical Thinking mean to you?’A qualitative analysis of chemistry undergraduate, teaching staff and employers' views of critical thinking. Chemistry Education Research and Practice, 18 , 420–434.

Daniel, K. L., Bucklin, C. J., Leone, E. A., & Idema, J. (2018). Towards a Definition of Representational Competence. In Towards a Framework for Representational Competence in Science Education (pp. 3–11). Switzerland: Springer.

Davies, M. (2013). Critical thinking and the disciplines reconsidered. Higher Education Research & Development, 32 (4), 529–544.

Deloitte Access Economics. (2014). Australia's STEM Workforce: a survey of employers. Retrieved from https://www2.deloitte.com/au/en/pages/economics/articles/australias-stem-workforce-survey.html .

Driscoll, M. P. (2005). Psychology of learning for instruction . Boston, MA: Pearson Education.

Ennis, R. H. (1990). The extent to which critical thinking is subject-specific: Further clarification. Educational researcher, 19 (4), 13–16.

Facione, P. A. (1984). Toward a theory of critical thinking. Liberal Education, 70 (3), 253–261.

Facione, P. A. (1990a). The California Critical Thinking Skills Test--College Level . In Technical Report #1 . Experimental Validation and Content : Validity .

Facione, P. A. (1990b). The California critical thinking skills test—college level . In Technical Report #2 . Factors Predictive of CT : Skills .

Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111 (23), 8410–8415.

Gafney, L., & Varma-Nelson, P. (2008). Peer-led team learning: evaluation, dissemination, and institutionalization of a college level initiative (Vol. 16): Springer Science & Business Media, Netherlands.

Glassner, A., & Schwarz, B. B. (2007). What stands and develops between creative and critical thinking? Argumentation? Thinking Skills and Creativity, 2 (1), 10–18.

Gosser, D. K., Cracolice, M. S., Kampmeier, J. A., Roth, V., Strozak, V. S., & Varma-Nelson, P. (2001). Peer-led team learning: A guidebook: Prentice Hall Upper Saddle River, NJ .

Gray, K., & Koncz, A. (2018). The key attributes employers seek on students' resumes. Retrieved from http://www.naceweb.org/about-us/press/2017/the-key-attributes-employers-seek-on-students-resumes/ .

Hanson, D. M. (2008). A cognitive model for learning chemistry and solving problems: implications for curriculum design and classroom instruction. In R. S. Moog & J. N. Spencer (Eds.), Process-Oriented Guided Inquiry Learning (pp. 15–19). Washington, DC: American Chemical Society.

Hattie, J., & Gan, M. (2011). Instruction based on feedback. Handbook of research on learning and instruction , 249-271.

Huitt, W. (1998). Critical thinking: an overview. In Educational psychology interactive Retrieved from http://www.edpsycinteractive.org/topics/cogsys/critthnk.html .

Joint Committee on Standards for Educational Psychological Testing. (2014). Standards for Educational and Psychological Testing : American Educational Research Association.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2 (2), 130–144.

Kumi, B. C., Olimpo, J. T., Bartlett, F., & Dixon, B. L. (2013). Evaluating the effectiveness of organic chemistry textbooks in promoting representational fluency and understanding of 2D-3D diagrammatic relationships. Chemistry Education Research and Practice, 14 , 177–187.

Lai, E. R. (2011). Critical thinking: a literature review. Pearson's Research Reports, 6 , 40–41.

Lewis, A., & Smith, D. (1993). Defining higher order thinking. Theory into Practice, 32 , 131–137.

Miri, B., David, B., & Uri, Z. (2007). Purposely teaching for the promotion of higher-order thinking skills: a case of critical thinking. Research in Science Education, 37 , 353–369.

Moog, R. S., & Spencer, J. N. (Eds.). (2008). Process oriented guided inquiry learning (POGIL) . Washington, DC: American Chemical Society.

Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: validity and reliability. Practical Assessment, Research and Evaluation, 7 , 1–11.

Nakhleh, M. B. (1992). Why some students don't learn chemistry: Chemical misconceptions. Journal of Chemical Education, 69 (3), 191.

National Research Council. (2011). Assessing 21st Century Skills: Summary of a Workshop . Washington, DC: The National Academies Press.

National Research Council. (2012). Education for Life and Work: Developing Transferable Knowledge and Skills in the 21st Century . Washington, DC: The National Academies Press.

Nguyen, D. H., Gire, E., & Rebello, N. S. (2010). Facilitating Strategies for Solving Work-Energy Problems in Graphical and Equational Representations. 2010 Physics Education Research Conference, 1289 , 241–244.

Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31 (2), 199–218.

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: a review. Educational Research Review, 9 , 129–144.

Pearl, A. O., Rayner, G., Larson, I., & Orlando, L. (2019). Thinking about critical thinking: An industry perspective. Industry & Higher Education, 33 (2), 116–126.

Ramsden, P. (1997). The context of learning in academic departments. The experience of learning, 2 , 198–216.

Rau, M. A., Kennedy, K., Oxtoby, L., Bollom, M., & Moore, J. W. (2017). Unpacking “Active Learning”: A Combination of Flipped Classroom and Collaboration Support Is More Effective but Collaboration Support Alone Is Not. Journal of Chemical Education, 94 (10), 1406–1414.

Reynders, G., Suh, E., Cole, R. S., & Sansom, R. L. (2019). Developing student process skills in a general chemistry laboratory. Journal of Chemical Education , 96 (10), 2109–2119.

Saxton, E., Belanger, S., & Becker, W. (2012). The Critical Thinking Analytic Rubric (CTAR): Investigating intra-rater and inter-rater reliability of a scoring mechanism for critical thinking performance assessments. Assessing Writing, 17 , 251–270.

Schmidt, H. G., De Volder, M. L., De Grave, W. S., Moust, J. H. C., & Patel, V. L. (1989). Explanatory Models in the Processing of Science Text: The Role of Prior Knowledge Activation Through Small-Group Discussion. J. Educ. Psychol., 81 , 610–619.

Simonson, S. R. (Ed.). (2019). POGIL: An Introduction to Process Oriented Guided Inquiry Learning for Those Who Wish to Empower Learners . Sterling, VA: Stylus Publishing, LLC.

Singer, S. R., Nielsen, N. R., & Schweingruber, H. A. (Eds.). (2012). Discipline-Based education research: understanding and improving learning in undergraduate science and engineering . Washington D.C.: The National Academies Press.

Smit, R., & Birri, T. (2014). Assuring the quality of standards-oriented classroom assessment with rubrics for complex competencies. Studies in Educational Evaluation, 43 , 5–13.

Stein, B., & Haynes, A. (2011). Engaging Faculty in the Assessment and Improvement of Students' Critical Thinking Using the Critical Thinking Assessment Test. Change: The Magazine of Higher Learning, 43 , 44–49.

Swanson, H. L., Oconnor, J. E., & Cooney, J. B. (1990). An Information-Processing Analysis of Expert and Novice Teachers Problem-Solving. American Educational Research Journal, 27 (3), 533–556.

The Royal Society. (2014). Vision for science and mathematics education: The Royal Society Science Policy Centre . London: England.

Watson, G., & Glaser, E. M. (1964). Watson-Glaser Critical Thinking Appraisal Manual . New York, NY: Harcourt, Brace, and World.

Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into Practice, 41 (2), 64–70.

Zohar, A., Weinberger, Y., & Tamir, P. (1994). The Effect of the Biology Critical Thinking Project on the Development of Critical Thinking. Journal of Research in Science Teaching, 31 , 183–196.

Download references

Acknowledgements

We thank members of our Primary Collaboration Team and Implementation Cohorts for collecting and sharing data. We also thank all the students who have allowed us to examine their work and provided feedback.

Supporting information

• Product rubric survey

• Initial implementation survey

• Continuing implementation survey

This work was supported in part by the National Science Foundation under collaborative grants #1524399, #1524936, and #1524965. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and affiliations.

Department of Chemistry, University of Iowa, W331 Chemistry Building, Iowa City, Iowa, 52242, USA

Gil Reynders & Renée S. Cole

Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia, 23284, USA

Gil Reynders & Suzanne M. Ruder

Department of Chemistry, Drew University, Madison, New Jersey, 07940, USA

Juliette Lantz

Department of Chemistry, Ball State University, Muncie, Indiana, 47306, USA

Courtney L. Stanford

You can also search for this author in PubMed   Google Scholar

Contributions

RC, JL, and SR performed an initial literature review that was expanded by GR. All authors designed the survey instruments. GR collected and analyzed the survey and interview data with guidance from RC. GR revised the rubrics with extensive input from all other authors. All authors contributed to reliability measurements. GR drafted all manuscript sections. RC provided extensive comments during manuscript revisions; JL, SR, and CS also offered comments. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Renée S. Cole .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

Supporting Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Reynders, G., Lantz, J., Ruder, S.M. et al. Rubrics to assess critical thinking and information processing in undergraduate STEM courses. IJ STEM Ed 7 , 9 (2020). https://doi.org/10.1186/s40594-020-00208-5

Download citation

Received : 01 October 2019

Accepted : 20 February 2020

Published : 09 March 2020

DOI : https://doi.org/10.1186/s40594-020-00208-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Constructive alignment
  • Self-regulated learning
  • Process skills
  • Professional skills
  • Critical thinking
  • Information processing

how to assess students critical thinking skills

Critical Thinking: Facilitating and Assessing the 21st Century Skills in Education

So many times we hear our students say, “Why am I learning this?”

Illustration of varied colorful figures with varied word balloons

I believe that Critical Thinking is the spark that begins the process of authentic learning. Before going further, we must first develop an idea of what learning is… and what learning is not.  So many times we hear our students say, “Why am I learning this?” The reason they ask is because they have not really experienced the full spectrum of learning, and because of this are actually not learning to a full rewarding  extent! We might say they are being exposed to surface learning and not authentic (real) learning. The act of authentic learning is actually an exciting and engaging concept. It allows students to see real meaning and begin to construct their own knowledge.  Critical Thinking is core to learning. It is rewarding, engaging, and life long. Without critical thinking students are left to a universe of concepts and memorization.  Yes… over twelve years of mediocrity! When educators employ critical thinking in their classrooms, a whole new world of understanding is opened up.   What are some reasons to facilitate critical thinking with our students? Let me begin:

Ten Reasons For Student Critical Thinking in the classroom

  • Allows for necessary inquiry that makes learning exciting
  • Provides a method to go beyond memorization to promote understanding.
  • Allows students to visualize thoughts, concepts, theories, models & possibilities.
  • Promotes curriculum standards, trans-disciplinary ideas & real world connections.
  • Encourages a classroom culture of collaboration that promotes deeper thinking.
  • Builds skills of problem solving, making implications, & determining consequences.
  • Facilitates goal setting, promotion of process, and perseverance to achieve.
  • Teaches self reflection and critique, and the ability to listen to others’ thoughts.
  • Encourages point of view  while developing persuasive skills.
  • Guides interpretation while developing a skill to infer and draw conclusions.

I am excited by the spark that critical thinking ignites to support real and authentic learning in the classroom. I often wonder how much time students spend in the process of critical thinking in the classroom. I ask you to reflect on your typical school day. Are your students spending time in area of surface learning , or are they plunging into the engaging culture of deeper (real) learning?  At the same time … how are you assessing your students? So many times as educators, we are bound by the standards, and we forget the importance of promoting that critical thinking process that makes our standards come alive with understanding. A culture of critical thinking is not automatic, though with intentional planning  it can become a reality. Like the other 21st century skills, it must be built and continuously facilitated. Let’s take a look at how, we as educators, can do this.

Ten Ways to Facilitate Student Critical Thinking in the Classroom and School

  • Design Critical Thinking Activities.  (This might include mind mapping, making thinking visible, Socratic discussions, meta-cognitive mind stretches, Build an inquiry wall with students and talk about the process of thinking”
  • Provide time for students to collaborate.  (Collaboration can be the button that starts critical thinking. It provides group thinking that builds on the standards. Have students work together while solving multi-step and higher order thinking problems. Sometimes this might mean slow down to increase the learning.)
  • Provide students with a Critical Thinking rubric.  (Have them look at the rubric before a critical thinking activity, and once again when they are finished)
  • Make assessment of Critical Thinking an ongoing effort.  (While the teacher can assess, have students assess themselves. Self assessment can be powerful)
  • Concentrate on specific indicators in a rubric.  (There are various indicators such as; provides inquiry, answers questions, builds an argument etc. Concentrate on just one indicator while doing a lesson. There can even be an exit ticket reflection)
  • Integrate the idea of Critical Thinking in any lesson.  ( Do not teach this skill in isolation. How does is work with a lesson, stem activity, project built, etc. What does Critical Thinking look like in the online or blended environment? Think of online discussions.)
  • Post a Critical Thinking Poster in the room.  (This poster could be a copy of a rubric or even a list of “I Can Statements”. Point it out before a critical thinking activity.
  • Make Critical Thinking part of your formative  and summative assessment.   (Move around the room, talk to groups and students, stop the whole group to make adjustments.)
  • Point out Critical Thinking found in the content standards.  (Be aware that content standards often have words like; infer, debate, conclude, solve, prioritize, compare and contrast, hypothesize, and research. Critical Thinking has always been part of the standards. Show your students Bloom’s Taxonomy and post in the room. Where are they in their learning?
  • Plan for a school wide emphasis.  (A culture that builds Critical Thinking is usually bigger then one classroom. Develop school-wide vocabulary, posters, and initiatives.)

I keep talking about the idea of surface learning and deeper learning. This can best be seen in  Bloom’s Taxonomy. Often we start with Remembering.  This might be essential in providing students the map to the further areas of Bloom’s. Of course, we then find the idea of Understanding. This is where I believe critical thinking begins. Sometimes we need to critically think in order to understand. In fact, you might be this doing right now. I believe that too much time might be spent in Remembering, which is why students get a false idea of what learning really is. As we look at the rest of Bloom’s ( Apply, Analyze, Evaluate, and Create) we can see the deeper learning take place. and even steps toward the transfer and internalization of the learning. Some educators even tip Bloom’s upside down, stating that the Creating at the top will build an understanding. This must be done with careful facilitation and intentional scaffold to make sure there is some surface learning. After-all, Critical Thinking will need this to build on.

I have been mentioning rubrics and assessment tools through out this post. To me, these are essential in building that culture of critical thinking in the classroom. I want to provide you with some great resources that will give your some powerful tools to assess the skill of Critical Thinking.  Keep in mind that students can also self assess and journal using prompts from a Critical Thinking Rubric.

Seven Resources to Help with Assessment and Facilitation of Critical Thinking

  • Habits of Mind  – I think this is an awesome place to help teachers facilitate and assess critical thinking and more. Check out the  free resources page  which even has some wonderful posters. One of my favorites is the rubrics found on this  research page . Decide on spending some time because there are a lot of great resources.
  • PBLWorks  – The number one place for PBL in the world is at PBLWorks. You may know it as the BUCK Institute or BIE. I am fortunate to be part of their National Faculty which is probably why I rank it as number one. I encourage you to visit their site for everything PBL.  This link brings you to the resource area where you will discover some amazing  rubrics to facilitate Critical Thinking. You will find rubrics for grade bands K-2, 3-5, and 6-12. This really is a great place to start. You will need to sign up to be a member of PBLWorks. This is a wonderful idea, after-all it is free!
  • Microsoft Innovative Learning  – This   website  contains some powerful rubrics for assessing the 21st Century skills. The link will bring you to a PDF file with Critical Thinking rubrics you can use tomorrow for any grade level. Check out this  two page document  defining the 4 C’s and a  movie  giving you even more of an explanation.
  • New Tech School  – This amazing PBL group of schools provide some wonderful Learning Rubrics in their free area.  Here you will find an interesting collection of rubrics that assesses student learning in multiple areas. These are sure to get you off and started.
  • Foundation for Critical Thinking  –  Check out this  amazing page  to help give you descriptors.
  • Project Zero  – While it is not necessarily assessment based, you will find some powerful  routines for making thinking visible . As you conduct these types of activities you will find yourself doing some wonderful formative assessment of critical thinking.
  • Education Week  – Take a look at this resource that provides some great reasoning and some interesting links that provide a glimpse of critical thinking in the classroom.

Critical Thinking “I Can Statements”

As you can see, I believe that Critical Thinking is key to PBL, STEM, and Deeper Learning. It improves Communication and Collaboration, while promoting Creativity.  I believe every student should have these following “I Can Statements” as part of their learning experience. Feel free to copy and use in your classroom. Perhaps this is a great starting place as you promote collaborative and powerful learning culture!

  • I can not only answer questions, but can also think of new questions to ask 
  • I can take time to see what I am thinking to promote even better understanding 
  • I can attempt to see other peoples’ thinking while explaining my own 
  • I can look at a problem and determine needed steps to find a solution 
  • I can use proper collaboration skills to work with others productively to build solutions 
  • I can set a goal, design a plan, and persevere to accomplish the goal. 
  • I can map out strategies and processes that shows the action involved in a task. 
  • I can define and show my understanding of a concept, model, theory, or process. 
  • I can take time to reflect and productively critique my work and the work of others 
  • I can understand, observe, draw inferences, hypothesize and see implications.

cross-posted at  21centuryedtech.wordpress.com

Michael Gorman oversees one-to-one laptop programs and digital professional development for Southwest Allen County Schools near Fort Wayne, Indiana. He is a consultant for Discovery Education, ISTE, My Big Campus, and November Learning and is on the National Faculty for The Buck Institute for Education. His awards include district Teacher of the Year, Indiana STEM Educator of the Year and Microsoft’s 365 Global Education Hero. Read more at  21centuryedtech.wordpress.com .

Tech & Learning Newsletter

Tools and ideas to transform education. Sign up below.

Best Summer Learning Apps & Sites

4 Ways to Increase Ebook and Audiobook Awareness For Summer Reading

 alt=

3 Ways to Address AI in Teacher Education Programs

Most Popular

how to assess students critical thinking skills

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Critical Thinking

How can one assess, for purposes of instruction or research, the degree to which a person possesses the dispositions, skills and knowledge of a critical thinker?

In psychometrics, assessment instruments are judged according to their validity and reliability.

Roughly speaking, an instrument is valid if it measures accurately what it purports to measure, given standard conditions. More precisely, the degree of validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (American Educational Research Association 2014: 11). In other words, a test is not valid or invalid in itself. Rather, validity is a property of an interpretation of a given score on a given test for a specified use. Determining the degree of validity of such an interpretation requires collection and integration of the relevant evidence, which may be based on test content, test takers’ response processes, a test’s internal structure, relationship of test scores to other variables, and consequences of the interpretation (American Educational Research Association 2014: 13–21). Criterion-related evidence consists of correlations between scores on the test and performance on another test of the same construct; its weight depends on how well supported is the assumption that the other test can be used as a criterion. Content-related evidence is evidence that the test covers the full range of abilities that it claims to test. Construct-related evidence is evidence that a correct answer reflects good performance of the kind being measured and an incorrect answer reflects poor performance.

An instrument is reliable if it consistently produces the same result, whether across different forms of the same test (parallel-forms reliability), across different items (internal consistency), across different administrations to the same person (test-retest reliability), or across ratings of the same answer by different people (inter-rater reliability). Internal consistency should be expected only if the instrument purports to measure a single undifferentiated construct, and thus should not be expected of a test that measures a suite of critical thinking dispositions or critical thinking abilities, assuming that some people are better in some of the respects measured than in others (for example, very willing to inquire but rather closed-minded). Otherwise, reliability is a necessary but not a sufficient condition of validity; a standard example of a reliable instrument that is not valid is a bathroom scale that consistently under-reports a person’s weight.

Assessing dispositions is difficult if one uses a multiple-choice format with known adverse consequences of a low score. It is pretty easy to tell what answer to the question “How open-minded are you?” will get the highest score and to give that answer, even if one knows that the answer is incorrect. If an item probes less directly for a critical thinking disposition, for example by asking how often the test taker pays close attention to views with which the test taker disagrees, the answer may differ from reality because of self-deception or simple lack of awareness of one’s personal thinking style, and its interpretation is problematic, even if factor analysis enables one to identify a distinct factor measured by a group of questions that includes this one (Ennis 1996). Nevertheless, Facione, Sánchez, and Facione (1994) used this approach to develop the California Critical Thinking Dispositions Inventory (CCTDI). They began with 225 statements expressive of a disposition towards or away from critical thinking (using the long list of dispositions in Facione 1990a), validated the statements with talk-aloud and conversational strategies in focus groups to determine whether people in the target population understood the items in the way intended, administered a pilot version of the test with 150 items, and eliminated items that failed to discriminate among test takers or were inversely correlated with overall results or added little refinement to overall scores (Facione 2000). They used item analysis and factor analysis to group the measured dispositions into seven broad constructs: open-mindedness, analyticity, cognitive maturity, truth-seeking, systematicity, inquisitiveness, and self-confidence (Facione, Sánchez, and Facione 1994). The resulting test consists of 75 agree-disagree statements and takes 20 minutes to administer. A repeated disturbing finding is that North American students taking the test tend to score low on the truth-seeking sub-scale (on which a low score results from agreeing to such statements as the following: “To get people to agree with me I would give any reason that worked”. “Everyone always argues from their own self-interest, including me”. “If there are four reasons in favor and one against, I’ll go with the four”.) Development of the CCTDI made it possible to test whether good critical thinking abilities and good critical thinking dispositions go together, in which case it might be enough to teach one without the other. Facione (2000) reports that administration of the CCTDI and the California Critical Thinking Skills Test (CCTST) to almost 8,000 post-secondary students in the United States revealed a statistically significant but weak correlation between total scores on the two tests, and also between paired sub-scores from the two tests. The implication is that both abilities and dispositions need to be taught, that one cannot expect improvement in one to bring with it improvement in the other.

A more direct way of assessing critical thinking dispositions would be to see what people do when put in a situation where the dispositions would reveal themselves. Ennis (1996) reports promising initial work with guided open-ended opportunities to give evidence of dispositions, but no standardized test seems to have emerged from this work. There are however standardized aspect-specific tests of critical thinking dispositions. The Critical Problem Solving Scale (Berman et al. 2001: 518) takes as a measure of the disposition to suspend judgment the number of distinct good aspects attributed to an option judged to be the worst among those generated by the test taker. Stanovich, West and Toplak (2011: 800–810) list tests developed by cognitive psychologists of the following dispositions: resistance to miserly information processing, resistance to myside thinking, absence of irrelevant context effects in decision-making, actively open-minded thinking, valuing reason and truth, tendency to seek information, objective reasoning style, tendency to seek consistency, sense of self-efficacy, prudent discounting of the future, self-control skills, and emotional regulation.

It is easier to measure critical thinking skills or abilities than to measure dispositions. The following eight currently available standardized tests purport to measure them: the Watson-Glaser Critical Thinking Appraisal (Watson & Glaser 1980a, 1980b, 1994), the Cornell Critical Thinking Tests Level X and Level Z (Ennis & Millman 1971; Ennis, Millman, & Tomko 1985, 2005), the Ennis-Weir Critical Thinking Essay Test (Ennis & Weir 1985), the California Critical Thinking Skills Test (Facione 1990b, 1992), the Halpern Critical Thinking Assessment (Halpern 2016), the Critical Thinking Assessment Test (Center for Assessment & Improvement of Learning 2017), the Collegiate Learning Assessment (Council for Aid to Education 2017), the HEIghten Critical Thinking Assessment (https://territorium.com/heighten/), and a suite of critical thinking assessments for different groups and purposes offered by Insight Assessment (https://www.insightassessment.com/products). The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students’ critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level certificates in critical thinking on the basis of an examination (OCR 2011). Many of these standardized tests have received scholarly evaluations at the hands of, among others, Ennis (1958), McPeck (1981), Norris and Ennis (1989), Fisher and Scriven (1997), Possin (2008, 2013a, 2013b, 2013c, 2014, 2020) and Hatcher and Possin (2021). Their evaluations provide a useful set of criteria that such tests ideally should meet, as does the description by Ennis (1984) of problems in testing for competence in critical thinking: the soundness of multiple-choice items, the clarity and soundness of instructions to test takers, the information and mental processing used in selecting an answer to a multiple-choice item, the role of background beliefs and ideological commitments in selecting an answer to a multiple-choice item, the tenability of a test’s underlying conception of critical thinking and its component abilities, the set of abilities that the test manual claims are covered by the test, the extent to which the test actually covers these abilities, the appropriateness of the weighting given to various abilities in the scoring system, the accuracy and intellectual honesty of the test manual, the interest of the test to the target population of test takers, the scope for guessing, the scope for choosing a keyed answer by being test-wise, precautions against cheating in the administration of the test, clarity and soundness of materials for training essay graders, inter-rater reliability in grading essays, and clarity and soundness of advance guidance to test takers on what is required in an essay. Rear (2019) has challenged the use of standardized tests of critical thinking as a way to measure educational outcomes, on the grounds that  they (1) fail to take into account disputes about conceptions of critical thinking, (2) are not completely valid or reliable, and (3) fail to evaluate skills used in real academic tasks. He proposes instead assessments based on discipline-specific content.

There are also aspect-specific standardized tests of critical thinking abilities. Stanovich, West and Toplak (2011: 800–810) list tests of probabilistic reasoning, insights into qualitative decision theory, knowledge of scientific reasoning, knowledge of rules of logical consistency and validity, and economic thinking. They also list instruments that probe for irrational thinking, such as superstitious thinking, belief in the superiority of intuition, over-reliance on folk wisdom and folk psychology, belief in “special” expertise, financial misconceptions, overestimation of one’s introspective powers, dysfunctional beliefs, and a notion of self that encourages egocentric processing. They regard these tests along with the previously mentioned tests of critical thinking dispositions as the building blocks for a comprehensive test of rationality, whose development (they write) may be logistically difficult and would require millions of dollars.

A superb example of assessment of an aspect of critical thinking ability is the Test on Appraising Observations (Norris & King 1983, 1985, 1990a, 1990b), which was designed for classroom administration to senior high school students. The test focuses entirely on the ability to appraise observation statements and in particular on the ability to determine in a specified context which of two statements there is more reason to believe. According to the test manual (Norris & King 1985, 1990b), a person’s score on the multiple-choice version of the test, which is the number of items that are answered correctly, can justifiably be given either a criterion-referenced or a norm-referenced interpretation.

On a criterion-referenced interpretation, those who do well on the test have a firm grasp of the principles for appraising observation statements, and those who do poorly have a weak grasp of them. This interpretation can be justified by the content of the test and the way it was developed, which incorporated a method of controlling for background beliefs articulated and defended by Norris (1985). Norris and King synthesized from judicial practice, psychological research and common-sense psychology 31 principles for appraising observation statements, in the form of empirical generalizations about tendencies, such as the principle that observation statements tend to be more believable than inferences based on them (Norris & King 1984). They constructed items in which exactly one of the 31 principles determined which of two statements was more believable. Using a carefully constructed protocol, they interviewed about 100 students who responded to these items in order to determine the thinking that led them to choose the answers they did (Norris & King 1984). In several iterations of the test, they adjusted items so that selection of the correct answer generally reflected good thinking and selection of an incorrect answer reflected poor thinking. Thus they have good evidence that good performance on the test is due to good thinking about observation statements and that poor performance is due to poor thinking about observation statements. Collectively, the 50 items on the final version of the test require application of 29 of the 31 principles for appraising observation statements, with 13 principles tested by one item, 12 by two items, three by three items, and one by four items. Thus there is comprehensive coverage of the principles for appraising observation statements. Fisher and Scriven (1997: 135–136) judge the items to be well worked and sound, with one exception. The test is clearly written at a grade 6 reading level, meaning that poor performance cannot be attributed to difficulties in reading comprehension by the intended adolescent test takers. The stories that frame the items are realistic, and are engaging enough to stimulate test takers’ interest. Thus the most plausible explanation of a given score on the test is that it reflects roughly the degree to which the test taker can apply principles for appraising observations in real situations. In other words, there is good justification of the proposed interpretation that those who do well on the test have a firm grasp of the principles for appraising observation statements and those who do poorly have a weak grasp of them.

To get norms for performance on the test, Norris and King arranged for seven groups of high school students in different types of communities and with different levels of academic ability to take the test. The test manual includes percentiles, means, and standard deviations for each of these seven groups. These norms allow teachers to compare the performance of their class on the test to that of a similar group of students.

Copyright © 2022 by David Hitchcock < hitchckd @ mcmaster . ca >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

how to assess students critical thinking skills

Instructing & Assessing 21st Century Skills: A Focus on Critical Thinking

Carla Evans

Research and Best Practices: One in a Series on 21st Century Skills

For the full collection of related blog posts and literature reviews, see the Center for Assessment’s toolkit, Assessing 21 st Century Skills .

Educational philosophers from Plato and Socrates to John Dewey highlighted the importance of critical thinking and the intrinsic value of instruction that reaches beyond simple factual recall. However there is considerable dispute about how to define critical thinking, let alone instruct and assess students’ critical thinking over time. This post briefly defines critical thinking, explains what we know from the research about how critical thinking develops and is best instructed, and provides an overview of some major assessment issues. Our full literature review on critical thinking can be accessed  here .

Overall, findings from the literature suggest that critical thinking involves both cognitive skills  and  dispositions. These two aspects are captured in a consensus definition reached by a panel of leading critical thinking scholars and researchers and reported in the Delphi Report:

“purposeful, self-regulatory  judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation  of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based”  ( Facione, 1990 , p. 3).

Debate continues about the extent to which critical thinking is generic or discipline-specific. If critical thinking is generic, then it arguably could be taught in separate courses, with the sole focus being on the development of critical thinking skills. However, if critical thinking is particular to a discipline, the instruction to develop it must be embedded within disciplinary content. Though debate exists, we argue that what constitutes critical thinking in science likely differs somewhat from what constitutes critical thinking in history or art. Therefore, critical thinking is best understood as discipline-specific with some transferable, generic commonalities.

Critical thinking is also intertwined with other cognitive, interpersonal, and intrapersonal competencies. For example, many researchers have connected creativity and critical thinking. Furthermore, one’s ability to demonstrate critical thinking relies on effective communication, metacognition, self-direction, motivation, and other related competencies.

Development

Adults do not always employ critical thinking when it’s called for. Many find personal experience more compelling than logical thought or empirical evidence. That said, research suggests that even young children can demonstrate aspects of critical thinking.

However, little is known about how critical thinking skills and dispositions develop; there are no empirically-validated learning progressions of critical thinking skills and dispositions. Indeed, the Delphi Report cautioned that its framework for critical thinking should not be interpreted as implying a developmental progression or hierarchical taxonomy.

Instruction

Empirical research shows that critical thinking can be taught and that some specific instructional approaches and strategies promote more critical thinking. These instructional approaches include explicit teaching of disciplinary content within a course that also teaches critical thinking skills.

Instructional strategies that promote critical thinking include providing…

  • Opportunities for students to solve problems with multiple solutions,
  • Structure that allows students to respond to open-ended questions and formulate solutions to problems, and
  • A variety of learning activities that allow students to choose and engage in solving authentic problems.

Implications of Research for Classroom Assessment Design

Critical thinking is typically assessed within content areas. For example, students analyze evidence, construct arguments, and evaluate the veracity of information and arguments in relation to disciplinary core ideas and content. Assessing students’ level of sophistication with critical thinking skills and dispositions requires close attention to the nature of the task used to elicit students’ critical thinking. Assessments must be thoughtfully designed and structured to (a) prompt complex judgments; (b) include open-ended tasks that allow for multiple, defensible solutions; and (c) make student reasoning visible to teachers. Each is discussed in detail below.

  • Assessment tasks should prompt complex judgments.  While some students may exhibit critical thinking without being prompted, most student responses will rise or sink to what the task requires. Therefore, the materials (visual, texts, etc.) used to elicit students’ critical thinking are crucial and have a sizable impact on the extent to which critical thinking is elicited in any given assessment experience. If the task doesn’t ask students to think critically, they likely will not demonstrate evidence of critical thinking. The task, embedded in projects or other curriculum activities, must be designed and structured thoughtfully to elicit students’ critical thinking.
  • Assessment tasks should include open-ended tasks.  Open-ended tasks are the opposite of traditional standardized assessments, which rely heavily on selected-response item types that assess limited aspects of critical thinking and other 21 st  century skills ( Ku, 2009 ;  Lai & Viering, 2012 ). Open-ended tasks allow students to decide what information is relevant, how to use the information, and how to demonstrate their understanding of the information; open-ended tasks also allow multiple solution pathways. In contrast, closed tasks typically have one correct solution, and the teacher indicates what information is relevant and how the information is to be presented.
  • Assessment tasks should make student thinking visible to teachers.  To provide formative feedback regarding the quality of students’ critical thinking, teachers must administer assessment tasks that render student thinking visible. This can be accomplished in multiple ways, but their commonality is that all approaches likely will require students to provide written or verbal evidence that support their claims, judgments, assertions, and so on.

For a more complete discussion of the topics covered in this post, the full literature review on critical thinking is available  here .

Privacy Overview

how to assess students critical thinking skills

JavaScript seems to be disabled in your browser. For the best experience on our site, be sure to turn on Javascript in your browser.

  • Order Tracking
  • Create an Account

how to assess students critical thinking skills

200+ Award-Winning Educational Textbooks, Activity Books, & Printable eBooks!

  • Compare Products

Reading, Writing, Math, Science, Social Studies

  • Search by Book Series
  • Algebra I & II  Gr. 7-12+
  • Algebra Magic Tricks  Gr. 2-12+
  • Algebra Word Problems  Gr. 7-12+
  • Balance Benders  Gr. 2-12+
  • Balance Math & More!  Gr. 2-12+
  • Basics of Critical Thinking  Gr. 4-7
  • Brain Stretchers  Gr. 5-12+
  • Building Thinking Skills  Gr. Toddler-12+
  • Building Writing Skills  Gr. 3-7
  • Bundles - Critical Thinking  Gr. PreK-9
  • Bundles - Language Arts  Gr. K-8
  • Bundles - Mathematics  Gr. PreK-9
  • Bundles - Multi-Subject Curriculum  Gr. PreK-12+
  • Bundles - Test Prep  Gr. Toddler-12+
  • Can You Find Me?  Gr. PreK-1
  • Complete the Picture Math  Gr. 1-3
  • Cornell Critical Thinking Tests  Gr. 5-12+
  • Cranium Crackers  Gr. 3-12+
  • Creative Problem Solving  Gr. PreK-2
  • Critical Thinking Activities to Improve Writing  Gr. 4-12+
  • Critical Thinking Coloring  Gr. PreK-2
  • Critical Thinking Detective  Gr. 3-12+
  • Critical Thinking Tests  Gr. PreK-6
  • Critical Thinking for Reading Comprehension  Gr. 1-5
  • Critical Thinking in United States History  Gr. 6-12+
  • CrossNumber Math Puzzles  Gr. 4-10
  • Crypt-O-Words  Gr. 2-7
  • Crypto Mind Benders  Gr. 3-12+
  • Daily Mind Builders  Gr. 5-12+
  • Dare to Compare Math  Gr. 2-7
  • Developing Critical Thinking through Science  Gr. 1-8
  • Dr. DooRiddles  Gr. PreK-12+
  • Dr. Funster's  Gr. 2-12+
  • Editor in Chief  Gr. 2-12+
  • Fun-Time Phonics!  Gr. PreK-2
  • Half 'n Half Animals  Gr. K-4
  • Hands-On Thinking Skills  Gr. K-1
  • Inference Jones  Gr. 1-6
  • James Madison  Gr. 10-12+
  • Jumbles  Gr. 3-5
  • Language Mechanic  Gr. 4-7
  • Language Smarts  Gr. 1-4
  • Mastering Logic & Math Problem Solving  Gr. 6-9
  • Math Analogies  Gr. K-9
  • Math Detective  Gr. 3-8
  • Math Games  Gr. 3-8
  • Math Mind Benders  Gr. 5-12+
  • Math Ties  Gr. 4-8
  • Math Word Problems  Gr. 4-10
  • Mathematical Reasoning  Gr. Toddler-11
  • Middle School Science  Gr. 6-8
  • Mind Benders  Gr. PreK-12+
  • Mind Building Math  Gr. K-1
  • Mind Building Reading  Gr. K-1
  • Novel Thinking  Gr. 3-6
  • OLSAT® Test Prep  Gr. PreK-K
  • Organizing Thinking  Gr. 2-8
  • Pattern Explorer  Gr. 3-9
  • Practical Critical Thinking  Gr. 8-12+
  • Punctuation Puzzler  Gr. 3-8
  • Reading Detective  Gr. 3-12+
  • Red Herring Mysteries  Gr. 4-12+
  • Red Herrings Science Mysteries  Gr. 4-9
  • Science Detective  Gr. 3-6
  • Science Mind Benders  Gr. PreK-3
  • Science Vocabulary Crossword Puzzles  Gr. 4-6
  • Sciencewise  Gr. 4-12+
  • Scratch Your Brain  Gr. 2-12+
  • Sentence Diagramming  Gr. 3-12+
  • Smarty Pants Puzzles  Gr. 3-12+
  • Snailopolis  Gr. K-4
  • Something's Fishy at Lake Iwannafisha  Gr. 5-9
  • Teaching Technology  Gr. 3-12+
  • Tell Me a Story  Gr. PreK-1
  • Think Analogies  Gr. 3-12+
  • Think and Write  Gr. 3-8
  • Think-A-Grams  Gr. 4-12+
  • Thinking About Time  Gr. 3-6
  • Thinking Connections  Gr. 4-12+
  • Thinking Directionally  Gr. 2-6
  • Thinking Skills & Key Concepts  Gr. PreK-2
  • Thinking Skills for Tests  Gr. PreK-5
  • U.S. History Detective  Gr. 8-12+
  • Understanding Fractions  Gr. 2-6
  • Visual Perceptual Skill Building  Gr. PreK-3
  • Vocabulary Riddles  Gr. 4-8
  • Vocabulary Smarts  Gr. 2-5
  • Vocabulary Virtuoso  Gr. 2-12+
  • What Would You Do?  Gr. 2-12+
  • Who Is This Kid? Colleges Want to Know!  Gr. 9-12+
  • Word Explorer  Gr. 6-8
  • Word Roots  Gr. 3-12+
  • World History Detective  Gr. 6-12+
  • Writing Detective  Gr. 3-6
  • You Decide!  Gr. 6-12+

how to assess students critical thinking skills

  • Special of the Month
  • Sign Up for our Best Offers
  • Bundles = Greatest Savings!
  • Sign Up for Free Puzzles
  • Sign Up for Free Activities
  • Toddler (Ages 0-3)
  • PreK (Ages 3-5)
  • Kindergarten (Ages 5-6)
  • 1st Grade (Ages 6-7)
  • 2nd Grade (Ages 7-8)
  • 3rd Grade (Ages 8-9)
  • 4th Grade (Ages 9-10)
  • 5th Grade (Ages 10-11)
  • 6th Grade (Ages 11-12)
  • 7th Grade (Ages 12-13)
  • 8th Grade (Ages 13-14)
  • 9th Grade (Ages 14-15)
  • 10th Grade (Ages 15-16)
  • 11th Grade (Ages 16-17)
  • 12th Grade (Ages 17-18)
  • 12th+ Grade (Ages 18+)
  • Test Prep Directory
  • Test Prep Bundles
  • Test Prep Guides
  • Preschool Academics
  • Store Locator
  • Submit Feedback/Request
  • Sales Alerts Sign-Up
  • Technical Support
  • Mission & History
  • Articles & Advice
  • Testimonials
  • Our Guarantee
  • New Products
  • Free Activities
  • Libros en Español

How to Assess Critical Thinking

Assessing Critical Thinking

October 11, 2008, by The Critical Thinking Co. Staff

Developing appropriate testing and evaluation of students is an important part of building critical thinking practice into your teaching. If students know that you expect them to think critically on tests, and the necessary guidelines and preparation are given before hand, they are more likely to take a critical thinking approach to learning all course material. Design test items that require higher-order thinking skills such as analysis, synthesis, and evaluation, rather than simple recall of facts; ask students to explain and justify all claims made; instruct them to make inferences or draw conclusions that go beyond given data. Essays and problems are the most obvious form of item to use for testing these skills, but well-constructed multiple-choice items can also work well. Consider carefully how you will evaluate and grade tests that require critical thinking and develop clear criteria that can be shared with the students.

In order to make informed decisions about student critical thinking and learning, you need to assess student performance and behavior in class as well as on tests and assignments. Paying careful attention to signs of inattention or frustration, and asking students to explain them, can provide much valuable information about what may need to change in your teaching approach; similarly, signs of strong engagement or interest can tell you a great deal about what you are doing well to get students to think. Brief classroom assessment instruments, such as asking students to write down the clearest and most confusing points for them in a class session, can be very helpful for collecting a lot of information quickly about student thinking and understanding.

  • Visit Two Rivers Public Charter School to see the school that inspired the Two Rivers Learning Institute.
  • Course Login

Two Rivers Learning Institute

Assessing Critical Thinking and Problem-Solving

Critical thinking.

How do you assess critical thinking and problem solving skills?

In considering how we assess critical thinking and problem solving skills, we wanted to answer the question of how we know whether students are learning the cognitive processes we are teaching and are able to transfer them to novel situations. In answer to this challenge, we have designed short performance tasks that target each of our constructs of critical thinking and problem solving.

What are performance tasks?

Performance tasks are specific activities that require students to demonstrate mastery of knowledge or skills through application within the task. The performance tasks that we utilize to assess critical thinking and problem solving are each aligned with a specific thinking type. In each task, students are required to make their thinking visible either through demonstration of their work, through oral description of their thinking, or through writing. How do you design performance tasks aligned with constructs of critical thinking and problem solving?

In designing performance tasks, we always begin with the cognitive skill that we want to assess. Every decision about how to design performance tasks then grows from that clear understanding of the target.

Because the focus is on a specific cognitive skill, we want to remove barriers from both the level of understanding of the content or basic math and reading skills. Thus we choose tasks that are situated in contexts with which most students are already familiar. In addition, we ensure that the literacy and math components of the task are sufficiently low that most students are not hindered by the reading or computational components.

However, we strive to design tasks that are problematic for students. In other words, students shouldn’t have a quick solution to the tasks. We make tasks problematic in a couple of ways. First, we make tasks problematic by giving open-ended assignments where there are multiple possible solutions. Second, we make tasks problematic through the complexity of the problem that students need to think through.

How do you evaluate students’ critical thinking and problem solving skills through a performance task?

When students complete performance tasks, they generate evidence of their thinking that we can utilize to evaluate their critical thinking and problem solving skills. Utilizing our rubrics we evaluate student responses across the task to each dimension on the rubric. We don’t generate a single score for each construct. Instead, students are scored on each component of the rubric. This allows us to give refined feedback to students.

Yes, We Can Define, Teach, and Assess Critical Thinking Skills

  • Share article

By Jeff Heyck-Williams, Director of Curriculum and Instruction at Two Rivers Public Charter School

While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the advent of the term “21st century skills” and discussions of deeper learning. There is increasing agreement among education reformers that critical thinking is an essential ingredient for long-term success for all of our students.

However, there are still those in the education establishment and in the media who argue that critical thinking isn’t really a thing, or that these skills aren’t well defined and, even if they could be defined, they can’t be taught or assessed.

To those naysayers, I have to disagree. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. In fact, as part of a multi-year Assessment for Learning Project , Two Rivers Public Charter School in Washington, DC, has done just that.

Before I dive into what we have done, I want to acknowledge that some of the criticism has merit.

First, there are those that argue that critical thinking can only exist when students have a vast fund of knowledge. Meaning that a student cannot think critically if they don’t have something substantive about which to think. I agree. Students do need a robust foundation of core content knowledge to effectively think critically. Schools still have a responsibility for building students’ content knowledge.

However, I would argue that students don’t need to wait to think critically until after they have mastered some arbitrary amount of knowledge. They can start building critical thinking skills when they walk in the door. All students come to school with experience and knowledge which they can immediately think critically about. In fact, some of the thinking that they learn to do helps augment and solidify the discipline-specific academic knowledge that they are learning.

The second criticism is that critical thinking skills are always highly contextual. In this argument, the critics make the point that the types of thinking that students do in history is categorically different from the types of thinking students do in science or math. Thus, the idea of teaching broadly defined, content-neutral critical thinking skills is impossible. I agree that there are domain-specific thinking skills that students should learn in each discipline. However, I also believe that there are several generalizable skills that elementary school students can learn that have broad applicability to their academic and social lives. That is what we have done at Two Rivers.

Defining Critical Thinking Skills

We began this work by first defining what we mean by critical thinking. After a review of the literature and looking at the practice at other schools, we identified five constructs that encompass a set of broadly applicable skills: schema development and activation; effective reasoning; creativity and innovation; problem solving; and decision making.

how to assess students critical thinking skills

We then created rubrics to provide a concrete vision of what each of these constructs look like in practice. Working with the Stanford Center for Assessment, Learning and Equity (SCALE) , we refined these rubrics to capture clear and discrete skills.

For example, we defined effective reasoning as the skill of creating an evidence-based claim: students need to construct a claim, identify relevant support, link their support to their claim, and identify possible questions or counter claims. Rubrics provide an explicit vision of the skill of effective reasoning for students and teachers. By breaking the rubrics down for different grade bands, we have been able not only to describe what reasoning is but also to delineate how the skills develop in students from preschool through 8th grade.

how to assess students critical thinking skills

Before moving on, I want to freely acknowledge that in narrowly defining reasoning as the construction of evidence-based claims we have disregarded some elements of reasoning that students can and should learn. For example, the difference between constructing claims through deductive versus inductive means is not highlighted in our definition. However, by privileging a definition that has broad applicability across disciplines, we are able to gain traction in developing the roots of critical thinking. In this case, to formulate well-supported claims or arguments.

Teaching Critical Thinking Skills

The definitions of critical thinking constructs were only useful to us in as much as they translated into practical skills that teachers could teach and students could learn and use. Consequently, we have found that to teach a set of cognitive skills, we needed thinking routines that defined the regular application of these critical thinking and problem-solving skills across domains. Building on Harvard’s Project Zero Visible Thinking work, we have named routines aligned with each of our constructs.

For example, with the construct of effective reasoning, we aligned the Claim-Support-Question thinking routine to our rubric. Teachers then were able to teach students that whenever they were making an argument, the norm in the class was to use the routine in constructing their claim and support. The flexibility of the routine has allowed us to apply it from preschool through 8th grade and across disciplines from science to economics and from math to literacy.

how to assess students critical thinking skills

Kathryn Mancino, a 5th grade teacher at Two Rivers, has deliberately taught three of our thinking routines to students using the anchor charts above (click to view a larger size of each image). Her charts name the components of each routine and has a place for students to record when they’ve used it and what they have figured out about the routine. By using this structure with a chart that can be added to throughout the year, students see the routines as broadly applicable across disciplines and are able to refine their application over time.

Assessing Critical Thinking Skills

By defining specific constructs of critical thinking and building thinking routines that support their implementation in classrooms, we have operated under the assumption that students are developing skills that they will be able to transfer to other settings. However, we recognized both the importance and the challenge of gathering reliable data to confirm this.

With this in mind, we have developed a series of short performance tasks around novel discipline-neutral contexts in which students can apply the constructs of thinking. Through these tasks, we have been able to provide an opportunity for students to demonstrate their ability to transfer the types of thinking beyond the original classroom setting. Once again, we have worked with SCALE to define tasks where students easily access the content but where the cognitive lift requires them to demonstrate their thinking abilities.

These assessments demonstrate that it is possible to capture meaningful data on students’ critical thinking abilities. They are not intended to be high stakes accountability measures. Instead, they are designed to give students, teachers, and school leaders discrete formative data on hard to measure skills.

While it is clearly difficult, and we have not solved all of the challenges to scaling assessments of critical thinking, we can define, teach, and assess these skills. In fact, knowing how important they are for the economy of the future and our democracy, it is essential that we do.

The opinions expressed in Next Gen Learning in Action are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Sign Up for The Savvy Principal

  • Augsburg.edu
  • Inside Augsburg

Search Strommen Center for Meaningful Work

  • Faculty & Staff
  • Graduate Students
  • First Generation
  • International
  • Students With Disabilities
  • Undocumented
  • Business & Finance
  • Culture and Language
  • Environmental Sustainability
  • Government, Law & Policy
  • Health Professions
  • Human & Social Services
  • Information Technology & Data
  • Marketing, Media & Communications
  • Resumes and Cover Letters
  • Expand Your Network / Mentor
  • Explore Your Interests / Self Assessment
  • Negotiate an Offer
  • Prepare for an Interview
  • Prepare for Graduate School
  • Search for a Job / Internship
  • Job Fair Preparation
  • Start Your Internship
  • Choosing a Major
  • Career Collaborative
  • Travelers EDGE
  • Meet the Team

Critical Thinking: A Simple Guide and Why It’s Important

  • Share This: Share Critical Thinking: A Simple Guide and Why It’s Important on Facebook Share Critical Thinking: A Simple Guide and Why It’s Important on LinkedIn Share Critical Thinking: A Simple Guide and Why It’s Important on X

Critical Thinking: A Simple Guide and Why It’s Important was originally published on Ivy Exec .

Strong critical thinking skills are crucial for career success, regardless of educational background. It embodies the ability to engage in astute and effective decision-making, lending invaluable dimensions to professional growth.

At its essence, critical thinking is the ability to analyze, evaluate, and synthesize information in a logical and reasoned manner. It’s not merely about accumulating knowledge but harnessing it effectively to make informed decisions and solve complex problems. In the dynamic landscape of modern careers, honing this skill is paramount.

The Impact of Critical Thinking on Your Career

☑ problem-solving mastery.

Visualize critical thinking as the Sherlock Holmes of your career journey. It facilitates swift problem resolution akin to a detective unraveling a mystery. By methodically analyzing situations and deconstructing complexities, critical thinkers emerge as adept problem solvers, rendering them invaluable assets in the workplace.

☑ Refined Decision-Making

Navigating dilemmas in your career path resembles traversing uncertain terrain. Critical thinking acts as a dependable GPS, steering you toward informed decisions. It involves weighing options, evaluating potential outcomes, and confidently choosing the most favorable path forward.

☑ Enhanced Teamwork Dynamics

Within collaborative settings, critical thinkers stand out as proactive contributors. They engage in scrutinizing ideas, proposing enhancements, and fostering meaningful contributions. Consequently, the team evolves into a dynamic hub of ideas, with the critical thinker recognized as the architect behind its success.

☑ Communication Prowess

Effective communication is the cornerstone of professional interactions. Critical thinking enriches communication skills, enabling the clear and logical articulation of ideas. Whether in emails, presentations, or casual conversations, individuals adept in critical thinking exude clarity, earning appreciation for their ability to convey thoughts seamlessly.

☑ Adaptability and Resilience

Perceptive individuals adept in critical thinking display resilience in the face of unforeseen challenges. Instead of succumbing to panic, they assess situations, recalibrate their approaches, and persist in moving forward despite adversity.

☑ Fostering Innovation

Innovation is the lifeblood of progressive organizations, and critical thinking serves as its catalyst. Proficient critical thinkers possess the ability to identify overlooked opportunities, propose inventive solutions, and streamline processes, thereby positioning their organizations at the forefront of innovation.

☑ Confidence Amplification

Critical thinkers exude confidence derived from honing their analytical skills. This self-assurance radiates during job interviews, presentations, and daily interactions, catching the attention of superiors and propelling career advancement.

So, how can one cultivate and harness this invaluable skill?

✅ developing curiosity and inquisitiveness:.

Embrace a curious mindset by questioning the status quo and exploring topics beyond your immediate scope. Cultivate an inquisitive approach to everyday situations. Encourage a habit of asking “why” and “how” to deepen understanding. Curiosity fuels the desire to seek information and alternative perspectives.

✅ Practice Reflection and Self-Awareness:

Engage in reflective thinking by assessing your thoughts, actions, and decisions. Regularly introspect to understand your biases, assumptions, and cognitive processes. Cultivate self-awareness to recognize personal prejudices or cognitive biases that might influence your thinking. This allows for a more objective analysis of situations.

✅ Strengthening Analytical Skills:

Practice breaking down complex problems into manageable components. Analyze each part systematically to understand the whole picture. Develop skills in data analysis, statistics, and logical reasoning. This includes understanding correlation versus causation, interpreting graphs, and evaluating statistical significance.

✅ Engaging in Active Listening and Observation:

Actively listen to diverse viewpoints without immediately forming judgments. Allow others to express their ideas fully before responding. Observe situations attentively, noticing details that others might overlook. This habit enhances your ability to analyze problems more comprehensively.

✅ Encouraging Intellectual Humility and Open-Mindedness:

Foster intellectual humility by acknowledging that you don’t know everything. Be open to learning from others, regardless of their position or expertise. Cultivate open-mindedness by actively seeking out perspectives different from your own. Engage in discussions with people holding diverse opinions to broaden your understanding.

✅ Practicing Problem-Solving and Decision-Making:

Engage in regular problem-solving exercises that challenge you to think creatively and analytically. This can include puzzles, riddles, or real-world scenarios. When making decisions, consciously evaluate available information, consider various alternatives, and anticipate potential outcomes before reaching a conclusion.

✅ Continuous Learning and Exposure to Varied Content:

Read extensively across diverse subjects and formats, exposing yourself to different viewpoints, cultures, and ways of thinking. Engage in courses, workshops, or seminars that stimulate critical thinking skills. Seek out opportunities for learning that challenge your existing beliefs.

✅ Engage in Constructive Disagreement and Debate:

Encourage healthy debates and discussions where differing opinions are respectfully debated.

This practice fosters the ability to defend your viewpoints logically while also being open to changing your perspective based on valid arguments. Embrace disagreement as an opportunity to learn rather than a conflict to win. Engaging in constructive debate sharpens your ability to evaluate and counter-arguments effectively.

✅ Utilize Problem-Based Learning and Real-World Applications:

Engage in problem-based learning activities that simulate real-world challenges. Work on projects or scenarios that require critical thinking skills to develop practical problem-solving approaches. Apply critical thinking in real-life situations whenever possible.

This could involve analyzing news articles, evaluating product reviews, or dissecting marketing strategies to understand their underlying rationale.

In conclusion, critical thinking is the linchpin of a successful career journey. It empowers individuals to navigate complexities, make informed decisions, and innovate in their respective domains. Embracing and honing this skill isn’t just an advantage; it’s a necessity in a world where adaptability and sound judgment reign supreme.

So, as you traverse your career path, remember that the ability to think critically is not just an asset but the differentiator that propels you toward excellence.

An improved metacognitive competency framework to inculcate analytical thinking among university students

  • Published: 13 May 2024

Cite this article

how to assess students critical thinking skills

  • Lilian Anthonysamy   ORCID: orcid.org/0000-0003-1241-326X 1 ,
  • Poovilashini Sugendran 1 ,
  • Lim Ooi Wei 2 &
  • Teoh Sian Hoon 3  

Enhancing students' analytical thinking skills holds great promise for bolstering a nation's economic growth, fostering a dynamic learning culture, and nurturing human capital development. It is especially critical in today's rapidly changing landscape, where the demand for skilled, adaptable graduates is high. Achieving and sustaining these skills hinges on individuals' awareness of their own thinking processes. Thus, this study aims to investigate the relationship between metacognitive knowledge and analytical thinking among university students in Malaysia. Besides, it assesses the impact of metacognitive regulation and control on analytical thinking. Data was gathered by administering web-based questionnaires to students enrolled in two public universities and two private universities situated in Malaysia's central region. Employing convenience sampling, a total of 184 respondents participated in the survey, responding to 5-point Likert scale questionnaires designed to gauge metacognition (both knowledge and regulation) and analytical thinking. The data analysis was conducted using Partial Least Squares Structural Equation Modelling (PLS-SEM), a technique chosen for its suitability to handle complex relationships within the study variables. PLS-SEM, coupled with the bootstrapping method, was employed to ensure a robust examination of the interactions among metacognitive factors and analytical thinking. The use of the bootstrapping method enhances the reliability of the results by generating multiple resamples and assessing the stability and significance of the model parameters. The results revealed a significant relationship ( p  < 0.01) between metacognitive knowledge and analytical thinking, with knowledge of persons and knowledge of strategies proving to be influential in enhancing analytical thinking abilities with the correlation coefficients (R-values) of 3.528 and 3.815, respectively. In terms of metacognitive regulation, this study identified a noteworthy positive impact, highlighting the role of metacognitive regulation and control in bolstering analytical thinking with an R-value of 2.985. These findings have far-reaching implications for educators, offering valuable guidance on empowering university students to become more self-reliant and efficient learners by imparting skills in planning, monitoring, and self-assessment of their learning processes for improved academic performance within the university environment. This, in turn, has the potential to yield improved academic performance, particularly the development of analytic thinking and offers educators the opportunity to craft more effective instructional materials and activities tailored to harness the power of metacognition in education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

how to assess students critical thinking skills

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Anthonysamy, L. (2023). Being learners with mental resilience as outcomes of metacognitive strategies in an academic context. Cogent Education , 10 (1). https://doi.org/10.1080/2331186x.2023.2219497

Anthonysamy, L., Choo, K. A., & Hin, H. S. (2020). Self-regulated learning for smart learning in a university at Cyberjaya. In Understanding digital industry  (pp. 108–111). Routledge.  https://doi.org/10.1201/9780367814557-27

Anthonysamy, L., Ah Choo, K., & Soon Hin, H. (2021). Investigating self-regulated learning strategies for digital learning relevancy. Malaysian Journal of Learning and Instruction, 18 . https://doi.org/10.32890/mjli2021.18.1.2

Areesophonpichet, S. (2013). A development of analytical thinking skills of graduate students. In The Asian Conference on Education  (pp. 1–5). The International Academic Forum.

Azmi, I. A. G., Hashim, R. C., & Yusoff, Y. M. (2018). The employability skills of Malaysian university students. International Journal of Modern Trends in Social Sciences, 1 (3), 1–14.

Google Scholar  

Bandura, A., & Cervone, D. (1986). Differential engagement of self-reactive influences in cognitive motivation. Organizational Behavior and Human Decision Processes, 38 (1), 92–113.

Article   Google Scholar  

Barnard, L., Lan, W. Y., To, Y. M., Paton, V. O., & Lai, S.-L. (2009). Measuring self-regulation in online and blended learning environments. The Internet and Higher Education, 12 (1), 1–6.

Baumeister, R. F., Vohs, K. D., & Tice, D. M. (2007). The strength model of self-control. Current Directions in Psychological Science, 16 (6), 351–355.

Brown, A. L., & Campione, J. C. (1996). Psychological theory and the design of innovative learning environments: On procedures, principles, and systems. In R. Glaser (Ed.), Innovations in learning: New environments for education (pp. 289–325). Erlbaum.

Brown, A. L., & Campione, J. C. (2013). Psychological theory and the design of innovative learning environments: On procedures, principles, and systems. In Innovations in learning (pp. 289–325). Routledge.

Byrne, B. M. (2013). Structural equation modeling with Mplus: Basic concepts, applications, and programming . Routledge.

Book   Google Scholar  

Davidson, J. E., & Sternberg, R. J. (1998). Smart problem solving: How metacognition helps. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.),  Metacognition in educational theory and practice (pp. 47–68). Lawrence Erlbaum Associates Publishers.

Davies, A. (2019). Carrying out systematic literature reviews: An introduction. British Journal of Nursing (Mark Allen Publishing), 28 (15), 1008–1014. https://doi.org/10.12968/bjon.2019.28.15.1008

Diamond, A. (2013). Executive functions. Annual Review of Psychology, 64 , 135–168.

Dweck, C. S. (2006). Mindset: The new psychology of success . Random House.

Effeney, G., Carroll, A., & Bahr, N. (2013). Self-regulated learning and executive function: Exploring the relationships in a sample of adolescent males. Educational Psychology, 33 (7), 773–796.  https://doi.org/10.1080/01443410.2013.785054

Efklides, A., & Vlachopoulos, S. P. (2012). Measurement of metacognitive knowledge of self, task, and strategies in mathematics.  European Journal of Psychological Assessment, 28 (3), 227–239. https://doi.org/10.1027/1015-5759/a00014

Erlin, E., Rahmat, A., & Rejeki, S. (2020). Use of metacognitive regulation strategies to increase student academic achievement in microbiology course. Journal of Physics: Conference Series, 1521 (4), 042016. https://doi.org/10.1088/1742-6596/1521/4/042016

Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34 (10), 906.

Gold, A. H., Malhotra, A., & Segars, A. H. (2001). Knowledge management: An organizational capabilities perspective. Journal of Management Information Systems, 18 (1), 185–214.

Güner, P., & Erbay, H. N. (2021). Metacognitive skills and problem-solving. International Journal of Research in Education and Science (IJRES), 7 (3), 715–734. https://doi.org/10.46328/ijres.1594

Hair, J., Jr., Hair, J. F., Jr., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2021). A primer on partial least squares structural equation modeling (PLS-SEM) . Sage publications.

Kounios, J., & Beeman, M. (2014). The cognitive neuroscience of insight. Annual Review of Psychology, 65 , 71–93.

Ku, K. Y., & Ho, I. T. (2010). Metacognitive strategies that enhance critical thinking. Metacognition and Learning, 5 , 251–267.

Kuhn, D., Dean, J., & David. (2004). Metacognition: A bridge between cognitive psychology and educational practice. Theory into Practice, 43 (4), 268–273.

Lee, I., & Mak, P. (2018). Metacognition and metacognitive instruction in second language writing classrooms. Tesol Quarterly, 52 (4), 1085–1097.

Li, M., & Yuan, R. (2022). Enhancing students’ metacognitive development in higher education: A classroom-based inquiry. International Journal of Educational Research, 112 , 101947.

Lilian, A. (2021). Self-Regulated Learning Strategies for Smart Learning : A Case of a Malaysian University. Asian Journal of Research in Education and Social Science, 3 (1), 72–83.

Lilian, A. (2022). Formulation of self-regulated learning strategies framework for digital learning for lifelong learning. Asian Journal of Research in Education and Social Sciences, 4 (1), 24–32.

Magno, C. (2010). The role of metacognitive skills in developing critical thinking. Metacognition and Learning, 5 , 137–156.

Mahanal, S., Zubaidah, S., Setiawan, D., Maghfiroh, H., & Muhaimin, F. G. (2022). Empowering college students’ problem-solving skills through RICOSRE. Education Sciences, 12 (3), 196.

Molin, E., Adjenughwure, K., de Bruyn, M., Cats, O., & Warffemius, P. (2020). Does conducting activities while traveling reduce the value of time? Evidence from a within-subjects choice experiment. Transportation Research Part a: Policy and Practice, 132 , 18–29.

Moorthi, S. (2018). Problem solving skills among college students. International Journal of Innovative Research Explorer, 5 (4), 207.

Nwosu, H. E., Obidike, P. C., Ugwu, J. N., Udeze, C. C., & Okolie, U. C. (2022). Applying social cognitive theory to placement learning in business firms and students’ entrepreneurial intentions. The International Journal of Management Education, 20 (1), 100602.

Obasi, N. F. K., & Ibegwam, A. (2020). Recognition of information need indicators and graduate students’ research practices in university libraries of South East Nigeria. Library Philosophy and Practice (e-journal) .

Pintrich, P., Smith, D., García, T., & McKeachie, W. (1991). A manual for the use of the motivated strategies for learning questionnaire (MSLQ) . University of Michigan.

Rivas, S. F., Saiz, C., & Ossa, C. (2022). Metacognitive strategies and development of critical thinking in higher education. Frontiers in Psychology, 13 , 913219. https://doi.org/10.3389/fpsyg.2022.913219

Schunk, D. H., & DiBenedetto, M. K. (2020). Motivation and social cognitive theory. Contemporary Educational Psychology, 60 , 101832.

Tauber, S. (U.) K., & Dunlosky, J. (2016). A brief history of metamemory research and handbook overview. In J. Dunlosky & S. K. Tauber (Eds.), The Oxford handbook of metamemory (pp. 7–21). Oxford University Press.

Urbach, N., & Ahlemann, F. (2010). Structural equation modeling in information systems research using partial least squares. Journal of Information Technology Theory and Application (JITTA), 11 (2), 2.

Weiner, B. (2012). An attribution theory of motivation. In P. A. M. Van Lange, A. W. Kruglanski, & E. T. Higgins (Eds.), Handbook of theories of social psychology (pp. 135–155). Sage Publications Ltd.  https://doi.org/10.4135/9781446249215.n8

West, R. F., Toplak, M. E., & Stanovich, K. E. (2008). Heuristics and biases as measures of critical thinking: Associations with cognitive ability and thinking dispositions. Journal of Educational Psychology, 100 (4), 930.

Zakaria, M. A., Ahmad, M. F., & Rahman, M. K. A. (2021). Higher Order Thinking Skills (HOTs): Acting method as approach of critical pedagogy in education culture. International Journal of Academic Research in Progressive Education and Development, 10 (2), 502–516.

Zimmerman, B. J., & Moylan, A. R. (2009). Self-regulation: Where metacognition and motivation intersect. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.),  Handbook of metacognition in education (pp. 299–315). Routledge.

Download references

Acknowledgements

This research is funded by the Ministry of Higher Education, Malaysia Fundamental Research Grant Scheme (FRGS/1/2023/SS02/MMU/02/3). This work was supported by Multimedia University, Malaysia under the Internal Research Funding (MMUI/220027).

Author information

Authors and affiliations.

Faculty of Management, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia

Lilian Anthonysamy & Poovilashini Sugendran

Global College, Heriot-Watt University Malaysia, 62200, Putrajaya, Selangor, Malaysia

Lim Ooi Wei

Faculty of Education, Universiti Teknologi MARA, 42300, Bandar Puncak Alam, Selangor, Malaysia

Teoh Sian Hoon

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lilian Anthonysamy .

Ethics declarations

Competing interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Anthonysamy, L., Sugendran, P., Wei, L.O. et al. An improved metacognitive competency framework to inculcate analytical thinking among university students. Educ Inf Technol (2024). https://doi.org/10.1007/s10639-024-12678-z

Download citation

Received : 12 October 2023

Accepted : 26 March 2024

Published : 13 May 2024

DOI : https://doi.org/10.1007/s10639-024-12678-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Metacognition
  • Metacognitive strategies
  • Analytical thinking
  • University students
  • Find a journal
  • Publish with us
  • Track your research

Evaluating the use of HEIghten critical thinking assessment to monitor critical thinking in dental students

Affiliations.

  • 1 Oral Biology & Diagnostic Sciences, Dental College of Georgia, Augusta, Georgia, USA.
  • 2 Restorative Sciences, Dental College of Georgia, Augusta, Georgia, USA.
  • 3 General Dentistry, Dental College of Georgia, Augusta, Georgia, USA.
  • PMID: 38676393
  • DOI: 10.1002/jdd.13550

Purpose/objectives: Critical thinking and evidence-based dentistry are skills that dental students are required to demonstrate, but monitoring and quantifying progress can be challenging. This study is investigating whether the HEIghten critical thinking assessment (HCTA) could be used as a potential tool, both for use prior to admitting students, and to monitor whether the students' skills improve over their time at dental school.

Methods: Freshman dental students (n = 92) were given the HCTA during their first semester of dental school. Statistical analyses were then performed to examine the association of Dental Admission Test (DAT) scores (overall, perceptual ability, and total science) and Grade Point Average (GPA) (overall and science) on critical thinking scores (total, analytic, and synthetic).

Results: There was a significant positive association between GPA, DAT scores and critical thinking scores.

Conclusions: Our results indicate that the HCTA may be a useful tool to enable monitoring of students analytical and synthetic skills throughout their time at dental school.

Keywords: critical thinking assessment; dental school; dental students.

© 2024 The Authors. Journal of Dental Education published by Wiley Periodicals LLC on behalf of American Dental Education Association.

IMAGES

  1. 10 Essential Critical Thinking Skills (And How to Improve Them

    how to assess students critical thinking skills

  2. Critical Thinking strategies for students and teachers

    how to assess students critical thinking skills

  3. The benefits of critical thinking for students and how to develop it

    how to assess students critical thinking skills

  4. Critical Thinking Skills

    how to assess students critical thinking skills

  5. Critical Thinking and Its Benefits in Real Life Scenarios

    how to assess students critical thinking skills

  6. Critical_Thinking_Skills_Diagram_svg

    how to assess students critical thinking skills

VIDEO

  1. What is the Importance of Critical Thinking in Evaluating Sources?

  2. Teacher Climate Summit

  3. Enhancing Mathematics and Science Education in Secondary Schools in Tanzania

  4. Critical Thinking Hacks! #facts #shorts

  5. Top Tips for Critical Thinking with Usoa Sol

  6. Top Tips for Critical Thinking with Usoa Sol

COMMENTS

  1. Teaching, Measuring & Assessing Critical Thinking Skills

    Yes, We Can Define, Teach, and Assess Critical Thinking Skills. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the ...

  2. Fostering and assessing student critical thinking: From theory to

    Note: The class-friendly assessment rubric for critical thinking is supposed to assess a task targeting the acquisition of some learning outcome in a discipline or more. It is not meant to assess a critical thinking exercise, but any exercise in which students have space to develop their critical thinking skills.

  3. Critical Thinking Testing and Assessment

    The purpose of assessing instruction for critical thinking is improving the teaching of discipline-based thinking (historical, biological, sociological, mathematical, etc.) It is to improve students' abilities to think their way through content using disciplined skill in reasoning. The more particular we can be about what we want students to ...

  4. Assessing Critical Thinking in Higher Education: Current State and

    Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...

  5. Promoting and Assessing Critical Thinking

    Critical thinking can be defined as being able to examine an issue by breaking it down, and evaluating it in a conscious manner, while providing arguments/evidence to support the evaluation. Below are some suggestions for promoting and assessing critical thinking in our students. Asking questions and using the answers to understand the world ...

  6. PDF Two Rubrics for Critical Thinking Assessment: A Mini-Training Session

    Using a Rubric to Assess Critical Thinking RUBRIC: Set of scoring guidelines for assessing student performance Ideally, an Assessment Method Should: ... • Identify the "next steps" in building student critical thinking skills. • Provide students with more appropriate feedback for student learning. • Improve interrater reliability.

  7. Eight Instructional Strategies for Promoting Critical Thinking

    Students grappled with ideas and their beliefs and employed deep critical-thinking skills to develop arguments for their claims. Embedding critical-thinking skills in curriculum that students care ...

  8. PDF Designing Rubrics to Assess Critical Thinking

    Microsoft PowerPoint - Designing Rubrics to Assess Critical Thinking.pptx. 3:00. Traditional assessment measures such as multiple choice questions are a form of selected response measures designed for knowledge recall and sometimes for decision‐making from a selection of options. In such measures, students are asked to think critically in the ...

  9. Instruments to assess students' critical thinking—A qualitative

    Critical thinking (CT) skills are essential to academic and professional success. Instruments to assess CT often rely on multiple-choice formats with inherent problems. This research presents two instruments for assessing CT, an essay and open-ended group-discussion format, which were implemented in an undergraduate business course at a large ...

  10. A Brief Guide for Teaching and Assessing Critical Thinking in

    Seven Guidelines for Teaching and Assessing Critical Thinking . 1. Motivate your students to think critically . ... To improve their CT skills, students must be given opportunities to practice them. Different courses present different opportunities for infusion and practice. Stand-alone CT courses usually provide the most opportunities to ...

  11. Frontiers

    Enhancing students' critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves ...

  12. Rubrics to assess critical thinking and information processing in

    Process skills such as critical thinking and information processing are commonly stated outcomes for STEM undergraduate degree programs, but instructors often do not explicitly assess these skills in their courses. Students are more likely to develop these crucial skills if there is constructive alignment between an instructor's intended learning outcomes, the tasks that the instructor and ...

  13. Critical Thinking: Facilitating and Assessing the 21st Century Skills

    Ten Ways to Facilitate Student Critical Thinking in the Classroom and School. Design Critical Thinking Activities. (This might include mind mapping, making thinking visible, Socratic discussions, meta-cognitive mind stretches, Build an inquiry wall with students and talk about the process of thinking". Provide time for students to collaborate.

  14. Critical Thinking > Assessment (Stanford Encyclopedia of Philosophy)

    The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students' critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level ...

  15. 21st Century Skills Critical Thinking

    Assessing students' level of sophistication with critical thinking skills and dispositions requires close attention to the nature of the task used to elicit students' critical thinking. Assessments must be thoughtfully designed and structured to (a) prompt complex judgments; (b) include open-ended tasks that allow for multiple, defensible ...

  16. How to Assess Critical Thinking

    Assessing Critical Thinking. October 11, 2008, by The Critical Thinking Co. Staff. Developing appropriate testing and evaluation of students is an important part of building critical thinking practice into your teaching. If students know that you expect them to think critically on tests, and the necessary guidelines and preparation are given ...

  17. Assessing Critical Thinking and Problem-Solving

    Performance tasks are specific activities that require students to demonstrate mastery of knowledge or skills through application within the task. The performance tasks that we utilize to assess critical thinking and problem solving are each aligned with a specific thinking type. In each task, students are required to make their thinking ...

  18. Exploring higher education students' critical thinking skills through

    Critical thinking assessment tool. Critical thinking skills were assessed using a pilot critical thinking assessment tool. This assessment tool consists of both selected- and constructed-response items and was designed to reflect the five critical thinking dimensions proposed by Liu et al. (2014). The scenarios that were presented to test ...

  19. Fostering and assessing student critical thinking: From theory to

    recent research about critical thinking, then highlight sub-skills that supported the development of rubrics for teaching practices. After contrasting critical thinking and creativity, I present how the rubrics can help teachers review and improve their lesson plans as well as assess the critical thinking of students. The conclusion links crit -

  20. Fostering and assessing creativity and critical thinking in education

    Creativity and critical thinking are key skills for the complex and globalized economies and societies of the 21st century. There is a growing consensus that higher education systems and institutions should cultivate these skills with their students. However, too little is known about what this means for everyday teaching and assessment ...

  21. What Are Critical Thinking Skills and Why Are They Important?

    According to the University of the People in California, having critical thinking skills is important because they are [ 1 ]: Universal. Crucial for the economy. Essential for improving language and presentation skills. Very helpful in promoting creativity. Important for self-reflection.

  22. Yes, We Can Define, Teach, and Assess Critical Thinking Skills

    These assessments demonstrate that it is possible to capture meaningful data on students' critical thinking abilities. They are not intended to be high stakes accountability measures. Instead ...

  23. Questioning Training and Critical Thinking of Undergraduate Students of

    Promoting university students' critical thinking skills through peer feedback activity in an online discussion forum. Alberta Journal of Educational Research, 59 (2) (2013) ... A protocol for the development of a critical thinking assessment tool for nurses using a Delphi technique. Journal of Advanced Nursing, 73 (8) (2017), pp. 1982-1988 ...

  24. Critical Thinking Skills for University Success

    After completing this course, you will be able to: 1. Use critical thinking and argumentation in university contexts to improve academic results 2. Understand the importance and function of critical thinking in academic culture 3. Use a variety of thinking tools to improve critical thinking 4. Identify types of argument, and bias within ...

  25. Effective Ways to Assess Student Critical Thinking

    4. Peer Review. Be the first to add your personal experience. 5. Reflective Writing. Be the first to add your personal experience. 6. Performance Tasks. Be the first to add your personal experience.

  26. Critical Thinking: A Simple Guide and Why It's Important

    Critical thinking enriches communication skills, enabling the clear and logical articulation of ideas. Whether in emails, presentations, or casual conversations, individuals adept in critical thinking exude clarity, earning appreciation for their ability to convey thoughts seamlessly. ☑ Adaptability and Resilience

  27. An improved metacognitive competency framework to inculcate ...

    Enhancing students' analytical thinking skills holds great promise for bolstering a nation's economic growth, fostering a dynamic learning culture, and nurturing human capital development. It is especially critical in today's rapidly changing landscape, where the demand for skilled, adaptable graduates is high. Achieving and sustaining these skills hinges on individuals' awareness of their own ...

  28. Evaluating the use of HEIghten critical thinking assessment to ...

    Purpose/objectives: Critical thinking and evidence-based dentistry are skills that dental students are required to demonstrate, but monitoring and quantifying progress can be challenging. This study is investigating whether the HEIghten critical thinking assessment (HCTA) could be used as a potential tool, both for use prior to admitting students, and to monitor whether the students' skills ...

  29. Digital Competencies in Verifying Fake News: Assessing the ...

    Beyond technical skills, fostering creativity, critical thinking, emotional thinking, and collaborative work can help combat disinformation . Consequently, a comprehensive approach to the enhancement of verification skills can improve the employability of journalism graduates, bridging the gap between academic training and the professional ...