• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

program evaluation research definition

Home Market Research

Evaluation Research: Definition, Methods and Examples

Evaluation Research

Content Index

  • What is evaluation research
  • Why do evaluation research

Quantitative methods

Qualitative methods.

  • Process evaluation research question examples
  • Outcome evaluation research question examples

What is evaluation research?

Evaluation research, also known as program evaluation, refers to research purpose instead of a specific method. Evaluation research is the systematic assessment of the worth or merit of time, money, effort and resources spent in order to achieve a goal.

Evaluation research is closely related to but slightly different from more conventional social research . It uses many of the same methods used in traditional social research, but because it takes place within an organizational context, it requires team skills, interpersonal skills, management skills, political smartness, and other research skills that social research does not need much. Evaluation research also requires one to keep in mind the interests of the stakeholders.

Evaluation research is a type of applied research, and so it is intended to have some real-world effect.  Many methods like surveys and experiments can be used to do evaluation research. The process of evaluation research consisting of data analysis and reporting is a rigorous, systematic process that involves collecting data about organizations, processes, projects, services, and/or resources. Evaluation research enhances knowledge and decision-making, and leads to practical applications.

LEARN ABOUT: Action Research

Why do evaluation research?

The common goal of most evaluations is to extract meaningful information from the audience and provide valuable insights to evaluators such as sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived value as useful if it helps in decision-making. However, evaluation research does not always create an impact that can be applied anywhere else, sometimes they fail to influence short-term decisions. It is also equally true that initially, it might seem to not have any influence, but can have a delayed impact when the situation is more favorable. In spite of this, there is a general agreement that the major goal of evaluation research should be to improve decision-making through the systematic utilization of measurable feedback.

Below are some of the benefits of evaluation research

  • Gain insights about a project or program and its operations

Evaluation Research lets you understand what works and what doesn’t, where we were, where we are and where we are headed towards. You can find out the areas of improvement and identify strengths. So, it will help you to figure out what do you need to focus more on and if there are any threats to your business. You can also find out if there are currently hidden sectors in the market that are yet untapped.

  • Improve practice

It is essential to gauge your past performance and understand what went wrong in order to deliver better services to your customers. Unless it is a two-way communication, there is no way to improve on what you have to offer. Evaluation research gives an opportunity to your employees and customers to express how they feel and if there’s anything they would like to change. It also lets you modify or adopt a practice such that it increases the chances of success.

  • Assess the effects

After evaluating the efforts, you can see how well you are meeting objectives and targets. Evaluations let you measure if the intended benefits are really reaching the targeted audience and if yes, then how effectively.

  • Build capacity

Evaluations help you to analyze the demand pattern and predict if you will need more funds, upgrade skills and improve the efficiency of operations. It lets you find the gaps in the production to delivery chain and possible ways to fill them.

Methods of evaluation research

All market research methods involve collecting and analyzing the data, making decisions about the validity of the information and deriving relevant inferences from it. Evaluation research comprises of planning, conducting and analyzing the results which include the use of data collection techniques and applying statistical methods.

Some of the evaluation methods which are quite popular are input measurement, output or performance measurement, impact or outcomes assessment, quality assessment, process evaluation, benchmarking, standards, cost analysis, organizational effectiveness, program evaluation methods, and LIS-centered methods. There are also a few types of evaluations that do not always result in a meaningful assessment such as descriptive studies, formative evaluations, and implementation analysis. Evaluation research is more about information-processing and feedback functions of evaluation.

These methods can be broadly classified as quantitative and qualitative methods.

The outcome of the quantitative research methods is an answer to the questions below and is used to measure anything tangible.

  • Who was involved?
  • What were the outcomes?
  • What was the price?

The best way to collect quantitative data is through surveys , questionnaires , and polls . You can also create pre-tests and post-tests, review existing documents and databases or gather clinical data.

Surveys are used to gather opinions, feedback or ideas of your employees or customers and consist of various question types . They can be conducted by a person face-to-face or by telephone, by mail, or online. Online surveys do not require the intervention of any human and are far more efficient and practical. You can see the survey results on dashboard of research tools and dig deeper using filter criteria based on various factors such as age, gender, location, etc. You can also keep survey logic such as branching, quotas, chain survey, looping, etc in the survey questions and reduce the time to both create and respond to the donor survey . You can also generate a number of reports that involve statistical formulae and present data that can be readily absorbed in the meetings. To learn more about how research tool works and whether it is suitable for you, sign up for a free account now.

Create a free account!

Quantitative data measure the depth and breadth of an initiative, for instance, the number of people who participated in the non-profit event, the number of people who enrolled for a new course at the university. Quantitative data collected before and after a program can show its results and impact.

The accuracy of quantitative data to be used for evaluation research depends on how well the sample represents the population, the ease of analysis, and their consistency. Quantitative methods can fail if the questions are not framed correctly and not distributed to the right audience. Also, quantitative data do not provide an understanding of the context and may not be apt for complex issues.

Learn more: Quantitative Market Research: The Complete Guide

Qualitative research methods are used where quantitative methods cannot solve the research problem , i.e. they are used to measure intangible values. They answer questions such as

  • What is the value added?
  • How satisfied are you with our service?
  • How likely are you to recommend us to your friends?
  • What will improve your experience?

LEARN ABOUT: Qualitative Interview

Qualitative data is collected through observation, interviews, case studies, and focus groups. The steps for creating a qualitative study involve examining, comparing and contrasting, and understanding patterns. Analysts conclude after identification of themes, clustering similar data, and finally reducing to points that make sense.

Observations may help explain behaviors as well as the social context that is generally not discovered by quantitative methods. Observations of behavior and body language can be done by watching a participant, recording audio or video. Structured interviews can be conducted with people alone or in a group under controlled conditions, or they may be asked open-ended qualitative research questions . Qualitative research methods are also used to understand a person’s perceptions and motivations.

LEARN ABOUT:  Social Communication Questionnaire

The strength of this method is that group discussion can provide ideas and stimulate memories with topics cascading as discussion occurs. The accuracy of qualitative data depends on how well contextual data explains complex issues and complements quantitative data. It helps get the answer of “why” and “how”, after getting an answer to “what”. The limitations of qualitative data for evaluation research are that they are subjective, time-consuming, costly and difficult to analyze and interpret.

Learn more: Qualitative Market Research: The Complete Guide

Survey software can be used for both the evaluation research methods. You can use above sample questions for evaluation research and send a survey in minutes using research software. Using a tool for research simplifies the process right from creating a survey, importing contacts, distributing the survey and generating reports that aid in research.

Examples of evaluation research

Evaluation research questions lay the foundation of a successful evaluation. They define the topics that will be evaluated. Keeping evaluation questions ready not only saves time and money, but also makes it easier to decide what data to collect, how to analyze it, and how to report it.

Evaluation research questions must be developed and agreed on in the planning stage, however, ready-made research templates can also be used.

Process evaluation research question examples:

  • How often do you use our product in a day?
  • Were approvals taken from all stakeholders?
  • Can you report the issue from the system?
  • Can you submit the feedback from the system?
  • Was each task done as per the standard operating procedure?
  • What were the barriers to the implementation of each task?
  • Were any improvement areas discovered?

Outcome evaluation research question examples:

  • How satisfied are you with our product?
  • Did the program produce intended outcomes?
  • What were the unintended outcomes?
  • Has the program increased the knowledge of participants?
  • Were the participants of the program employable before the course started?
  • Do participants of the program have the skills to find a job after the course ended?
  • Is the knowledge of participants better compared to those who did not participate in the program?

MORE LIKE THIS

customer communication tool

Customer Communication Tool: Types, Methods, Uses, & Tools

Apr 23, 2024

sentiment analysis tools

Top 12 Sentiment Analysis Tools for Understanding Emotions

QuestionPro BI: From Research Data to Actionable Dashboards

QuestionPro BI: From Research Data to Actionable Dashboards

Apr 22, 2024

customer experience management software

21 Best Customer Experience Management Software in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Qualitative Research

A newer edition of this book is available.

  • < Previous chapter
  • Next chapter >

23 Program Evaluation

Paul R. Brandon, University of Hawai‘i at Mānoa

Anna L. Ah Sam, University of Hawai‘i at Mānoa

  • Published: 01 July 2014
  • Cite Icon Cite
  • Permissions Icon Permissions

The profession of educational and social program evaluation has expanded exponentially around the globe since the mid-1960s and continues to receive the considerable attention of theorists, methodologists, and practitioners. The literature on it is wide and deep, reflecting an array of definitions and conceptions of purpose and social role. The chapter discusses these topics and several others, including opinions about the choice of methods, some of which are used primarily within evaluation approaches to conducting evaluation; the aspects of programs that evaluators typically address; the concept of value; the differences between evaluation and social science research; research on evaluation topics; and the major evaluation issues and concerns that have dominated discussion in the literature over the years

Evaluation is a ubiquitous activity conducted by people of all ages. Humans have practiced evaluative activities in everyday life since the beginning of recorded history and could not have evolved socially without practicing evaluative activities long before. As a discipline and formal organizational activity, it has been conducted with vigor and considerable professional attention in Western nations for about half a century ( Madaus & Stufflebeam, 2000 ). Our purpose in this chapter is to provide an overview of what has emerged from the work of evaluation theorists, methodologists, and practitioners about the evaluation of social, educational, health, and other programs during this period. We begin by defining programs and evaluation, discussing program evaluation purposes and the social roles that evaluation addresses in its various guises, presenting a brief overview of how evaluations are conducted, and identifying the major evaluation approaches and models. We continue with a discussion of the major features of qualitative methods in program evaluation, a description of the aspects of programs that are addressed in evaluations, an overview of the concept of value in evaluation, and a discussion of the differences between evaluation and social science research. We conclude by discussing research on evaluation, discussing recurring fundamental issues in the evaluation literature, and providing final remarks about the state of the profession. Our intent is to provide a snapshot of the breadth and complexities of the profession and discipline of evaluation. Necessarily, our treatment is wide and thin; the reference list provides ample suggestions for deeper explorations of the topic. We do not discuss the breadth of evaluation methods because most are familiar and drawn from the compendium of social science research methods. We pay somewhat more attention to qualitative evaluation than to quantitative evaluation but refer readers to other chapters in this volume for explanations about how qualitative methods should be conducted. Table 23.1 lists common evaluation terms and their definitions.

The Evaluand: The Object of Evaluation

Before defining evaluation, it is helpful to define evaluand —the object of evaluation, or that which is being evaluated. Professional evaluators commonly discuss six categories of evaluands: programs, policies, performances, products, personnel, and proposals. These broad categories address the foci of most formal evaluative activity. Performances, for example, occur in the arts, sport, and education (as in assessments of writing skills), and products can include all that is evaluated in publications such as Consumer Reports , software applications such as statistical packages or mobile applications, instructional materials, and so forth. Performances and products can also address the foci of everyday informal evaluation: people regularly examine movie reviews before choosing an evening’s entertainment and decide which brands and versions of products to buy based on friends’ recommendations. Our focus in this chapter, however, is on formal evaluations of programs , defined as planned interventions of some duration designed to address social, educational, or commercial problems or needs. (Sometimes, program evaluation is called evaluation research , but this term has largely disappeared from the literature.) The logic of evaluation and the various evaluation topics that we discuss also apply in varying degrees to other evaluands, but our elaboration and examples are about programs.

Definition, Purposes, and Social Roles of Evaluation

The simplest definition of evaluation is the judgment of merit or worth . The French origin of the meaning has to do with assigning value to an evaluand. As Fournier (2005 , pp. 139–140) succinctly stated,

Conclusions made in evaluations encompass both an empirical aspect (that something is the case) and a normative aspect (judgment about the value of something). It is the value feature that distinguishes evaluation from other types of inquiry, such as basic science research, clinical epidemiology, investigative journalism, or public polling.

Merit has to do with the intrinsic value of an evaluand—that is, whether it performs its intended function, independent of context or costs. Worth has to do with the degree to which an evaluand has extrinsic value—that is, the extent to which it meets needs in light of its context and costs. It also can be about the value of the evaluand to society. A program might function well and achieve its outcomes, thus exhibiting merit, but if it does not meet a need or provide a service of value to its beneficiaries (e.g., program clients or students served in a program) or the broader society, it is not said to have worth.

Elaborations on the definition of evaluation in light of its use to examine programs are numerous. As Mark, Greene, and Shaw (2006 , p. 6) stated, “If you ask 10 evaluators to define evaluation, you’ll probably end up with 23 different definitions.” The primary foci of these definitions have to do with the purpose s and uses of program evaluation, such as decision-making about program continuation, program improvement, and increasing understanding about programs.

The purposes of evaluation minimally include both summative evaluation purposes and formative evaluation purposes. These are reflected in Patton’s (2008 , p. 39) partial definition of evaluation as “the systematic collection of information about the activities, characteristics, and results of programs to make judgments about the program, improve or further develop program effectiveness, inform decisions about future programming, and/or increase understanding.” The summative purpose of an evaluation is to arrive at decisions about overall merit and worth and to use the results for deciding about future program operations or funding. A summative evaluation provides a summation; it is likely to address the outcomes of a program, the extent to which it achieved its intended objectives, or the degree to which the benefits of a program are worth its costs. Summative evaluation questions might address how well the evaluand performs, whether it is better than an alternative, or whether it is worth its costs, among other topics. Summative evaluation findings are used for oversight and compliance—for example, the extent to which programs address statutes and regulations or meet performance standards ( Mark, Henry, & Julnes, 2000 ). Sometimes these are simply monitoring efforts. Summative evaluations can have high stakes, and many can be used to make “go/no-go decisions” ( Cronbach et al., 1980 ). For example, legislative bodies mandate programs and want to know their effects when deciding about future program funding, so they require summative evaluations. Often, summative evaluations are required for the purposes of grant-making or contracting organizations; program personnel might not request or anticipate a need for these evaluations, but funding agencies might need them for program accountability. Because of their consequences, the methods of summative evaluations require a good degree of rigor. The findings of evaluative studies that use unsound methods are not sufficiently warranted for supporting decisions about program continuation or funding.

A formative evaluation collects and reports information for improving evaluands. As Stake (2004) famously stated, the chef tasting the soup does a formative evaluation, and the customer tasting the soup does a summative evaluation. A formative evaluation helps in the formation of a program, when the results have leverage for making immediate, useful, and often minor program modifications. Just as consumer products are constantly revised when manufacturers identify flaws in their products, find better manufacturing materials, or learn ways to make products last longer, program personnel find ways to improve the delivery of their programs and thereby improve outcomes. Program personnel might want to know what additional resources should be devoted to a program, whether the activities and materials they have developed to implement a program need improvement, what program beneficiaries think might be changed, and so forth. Even though formative evaluation information is most useful in the earlier stages of a program, when changes are likely to be needed and summative evaluations are premature, evaluations might have formative purposes throughout the life of a study ( Scriven, 1991 a ). Furthermore, the findings at any stage might be used for improving a program at a later date. Even at the end of an evaluation, it is likely that some of the summative results will point toward needed future modifications.

Note the emphasis of these purposes on the use of evaluation findings. Evaluations are conducted to provide useful information. Both summative and formative purposes are focused on the timing and manner of the use of evaluation results. Summative evaluation findings are more formal and comprehensive than formative evaluation findings, which often are presented quickly and without a great degree of formality, but both summative and formative evaluations focus on program stakeholders’ use (i.e., examination and application in decision making) of the findings, albeit differently from one type to the other. The degree, form, and timing of the use of evaluation findings has been an issue discussed and debated among American evaluators since the 1970s, when large-scale studies tended to show minimal positive effects and evaluators became concerned about the usefulness of their endeavors. Studies conducted and published relatively early in the program evaluation literature (e.g., Alkin, Daillak, & White, 1979 ; Cousins & Leithwood, 1986 ; King & Thompson, 1983 ; Patton et al., 1977 ) focused on issues of use; since then, it has probably been the most studied topic in the evaluation literature (e.g., Brandon & Singh, 2009 ; Cousins & Shulha, 2006 ; Fleischer & Christie, 2009 ; Hofstetter & Alkin, 2003 ; Johnson et al., 2009 ; Shulha & Cousins, 1997 ). Indeed, the use of evaluation findings is the major purpose of entire evaluation approaches such as utilization-focused evaluation ( Patton, 2008 , 2012 ) and participatory evaluation ( Cousins & Chouinard, 2012 ).

Another primary purpose of evaluation described by some contributors to the literature is to develop and test new general knowledge . This purpose is more amorphous than others but also focuses on use. It reflects the reality that evaluations sometimes report information that cannot be or is not used for immediate program decision making but is useful at a later date for understanding a program’s theory, methods, or effects in light of other similar programs. Sometimes, evaluation findings are out of date by the time they are produced, perhaps because the program context changed, program funding was eliminated for reasons having nothing to do with the evaluation, program personnel with new agendas and different evaluation questions took over the program, or newer program methods were developed. At other times, evaluation findings are ignored because no program personnel are held accountable, personnel and program funders are too busy to attend to the results, the evaluator has not taken steps to help enhance use, or evaluations are conducted strictly for political purposes. The findings of these studies might not be useful immediately, but the knowledge that they generate might be helpful in other settings. Furthermore, evaluators often publish the results of individual studies. For example, the journals Evaluation Review and Evaluation and Program Planning primarily publish the results of individual evaluations, thereby adding to the store of general knowledge about organized efforts to address social and educational needs and problems.

Finally, some evaluations are conducted for purposes that Weiss (1998 , p. 22) has called “evaluation as subterfuge.” Studies of this sort are sometimes conducted because evaluation agencies need funding to survive or because evaluators are unaware of the hidden purposes of evaluations. This might occur when commissioning an evaluation to delay decision making, using evaluation findings to avoid organizational decisions that might provoke criticism, using evaluations as simple window dressing for changes that had already been made but were not made public for internal reasons, or simply commissioning studies for public relations when programs are already known to be successful. Stufflebeam and Shinkfield (2007) similarly list pseudo-evaluations conducted for public relations purposes, studies controlled by evaluation funders for political reasons, studies in which the evaluator panders to the client, and studies in which evaluation clients mislead evaluators about the intended uses of the findings. Evaluations for these purposes are appropriately criticized as a waste of resources, poor organizational leadership, and co-optation of professional ethics.

The Social Roles of Evaluation

In addition to the emphasis on the intended uses of evaluation findings in evaluators’ definitions of evaluation, an emphasis on social role is also sometimes found. Smith (1999 , p. 44) stated,

Although the technical purpose of evaluation is to assess merit or worth, the social ends to which this activity is put vary dramatically. The societal purpose of some forms of evaluation is to produce knowledge while, for other forms, its purpose is to promote social reform. The modern/post-modern debate in evaluation, for example, is as much about the proper societal role of evaluation as it is about a proper epistemology.

Social roles address the intended social ends of evaluations. Here, we are distinguishing between roles that have to do with the effects of evaluations on society and the roles of evaluators, such as serving as a “critical friend” of other evaluators, or as teacher, facilitator, collaborator, management consultant, organizational development consultant, program planner, scientific expert, and others (see Fitzpatrick, Sanders, & Worthen, 2004 ; Rallis & Rossman, 2003;   Scriven, 1967 ). Social roles are apparent in the formative and summative distinction, with the former emphasizing helping the decision making of program personnel and the latter emphasizing the decision making of agencies that seek to hold programs accountable. The strong emphasis on use in the evaluation literature also reflects a social role of evaluations, as does an emphasis by some on conducting evaluations primarily to describe and explain programs (e.g., Stake, 2004 ).

Many contributors to the literature on evaluation theory and practice go beyond these functional aspects of evaluations and emphasize the values that evaluations should manifest or address. Most frequently, these values have to do with the place of evaluation in a democratic society. MacDonald’s (1976) democratic evaluation approach, with its emphasis on allowing all participants in an evaluation to control the information that they provide and on publishing reports that are accessible by the public, represents one of the earliest widely known manifestations of this emphasis. The approach sought to ensure that powerful interests did not control evaluative activities. Cronbach et al. (1980 , p. 4) stated that, “[i]nsofar as information is a source of power, evaluations carried out to inform a policy maker have a disenfranchising effect.” Simons (1987) furthered the discussion, and House and Howe’s (1999) deliberative democratic evaluation approach elaborated on methods for ensuring that stakeholders from all affected groups, as well as those with varying levels of organizational influence, participate in evaluation activities, particularly the discussion of results. Guba and Lincoln (1989) argued for progressing past previous “generations” of evaluation approaches by promoting consensus and negotiation among stakeholder groups that are participating in an evaluation. McTaggart (1991) pointed out some of the intraorganizational difficulties inherent in supporting democratic principles in evaluations, and others (e.g., Brandon, Lindberg, & Wang, 1993 ; Brandon, Newton, & Harman, 1993 ) pointed out the lack of program beneficiaries’ participation in evaluations, which might detrimentally affect the validity of evaluation conclusions by ignoring important aspects of programs.

Greene (1996) reasoned that evaluation was a means to “democratize” the dialogue about critical social and educational issues. She stated that evaluators who employed democratic evaluation—in which the use of qualitative methodologies was said to be both necessary and appropriate—were manifesting their rights and responsibilities as “scientist citizens.” The centering of qualitative evaluation around sociopolitical value dimensions is reflected in many examples in the literature over the last fifteen years involving evaluations of programs serving vulnerable populations throughout the world, such as diabetics in Spain ( Santos-Guerra & Fernandez-Sierra, 1996 ), unemployed adults in France ( Baslé, 2000 ), battered women in the United States ( Goldman & Du Mont, 2001 ), youth with HIV/AIDS in Madagascar ( Rakotonanahary, Rafransoa, & Bensaid, 2002 ), low-income children in California ( Sobo, Simmes, Landsverk, & Kurtin, 2003 ), and incarcerated substance users in Taiwan ( Chang, Huang, & Chen, 2010 ).

Since the early 1990s, approaches emphasizing the social role of broadening participation in evaluations have burgeoned. These are grouped loosely under the label of collaborative approaches to evaluation. Cousins and Chouinard (2012) described three justifications for collaboration in evaluation and social science research, including social justice and democracy, having local participation define key features of a program and its evaluation (a constructivist rationale), and an emphasis on the use of evaluation results. Fetterman and his colleagues (e.g., Fetterman & Wandersman, 2005 ) have gone further by explicitly making the social and political self-determination, or empowerment , of program participants a key goal of evaluation. Wandersman and Snell-Johns (2005 , p. 422) stated that “empowerment evaluation is not defined by its methods but by the collaborative manner in which methods are applied according to the empowerment evaluation principles.” Smith (2007 , p. 175) stated that empowerment evaluation has “an overt political agenda of changing power differentials within the setting of interest, for if one thinks of social power as a relative commodity, then it can be increased for one group only at the expense of another.”

Issues of the degree and form of culturally responsive approaches to evaluation are another manifestation of a social role for evaluation:

At a basic level, cultural competence is appreciation and recognition of other cultural groups and acceptance of the inherent differences that exist among them. At its highest level, cultural competence involves designing appropriate programs, standards, interventions, and measures so that they are specific, relevant, and valid for each unique group. ( Thompson-Robinson, Hopson, & SenGupta, 2004 , p. 1)

In 2011, a task force of the American Evaluation Association (AEA) prepared a statement on cultural competence in evaluation (AEA, 2011 b ) , asserting that (a) culture has implications for all evaluation phases, (b) all evaluations reflect cultural norms, (c) competence is particular to the cultural setting, and (d) evaluators need to cultivate awareness of the effects of their backgrounds on their understanding of culture. The statement urged evaluators to use culturally appropriate evaluation methods that reflect the complexity of cultural identity, the recognition of the effects of power dynamics, and the propensity for bias in language.

Indigenous approaches are a narrower form of culturally responsive approaches to evaluation, focusing squarely on serving the social and political needs of indigenous peoples. Drawing on Smith (1999) , evaluators working with indigenous peoples in New Zealand, Canada, the United States, and other former colonial societies have developed and espouse using native epistemologies as the foundation for research and evaluation. LaFrance (2004 , pp. 39, 42) stated that

the goal of a competent evaluator, especially in Indian Country, should be to actively seek cultural grounding through the ongoing processes of appreciating the role of tribal sovereignty, seeking knowledge of a particular community, building relationships, and reflecting on methodological practices.... Indigenous knowledge values holistic thinking..., which contrasts with the linear and hierarchical thinking that characterizes much of Western evaluation practice.

It is asserted that nonindigenous evaluators might be blind to evaluation standards, such as those presented in the AEA’s (2004)   Guiding Principles for Evaluators , unaware of the values and worldviews of indigenous communities, and unlikely to conduct evaluations that result in both “academic and cultural validity” ( Kawakami, Aton, Cram, Lai, & Porima, 2008 , p. 239). We expect that indigenous approaches to evaluation will be an expanding focus of the profession.

These evaluation approaches emphasize social roles that serve program stakeholders and beneficiaries as well as evaluators. In the eyes of some commentators, they are reactions to the proliferation of neoliberalism, the political theory that “promotes individual entrepreneurial freedom, frees capital to move across time and space by eliminating regulations, and assigns the state the role of facilitating competitiveness and privatization” ( Mathison, 2009 , p. 526). We elaborate more on evaluations that promote social roles favoring democratic values in the section on evaluation approaches and models.

How Evaluations Are Conducted

Many definitions of evaluation include something about the methods of evaluation, often referring simply to the systematic nature of evaluation studies (e.g., Cronbach et al., 1980 ; Patton, 2008 ; Stufflebeam & Shinkfield, 2007 ; Weiss, 1998 ; Yarbrough, Shulha, Hopson, & Caruthers, 2011 ). The breadth of evaluation methods implied in this terminology is appropriate, because evaluators use most social science research methods, contingent on considerations such as the availability of time and funding and the breadth and depth of the evaluation.

During the first large-scale wave of American evaluations in the 1960s, evaluation methods by and large reflected traditional quantitative methods. Suchman (1967) produced one of the earliest textbooks on evaluation, in which it was clear that he “believed that the ideal study would adhere to the classic experimental model” ( Stufflebeam & Shinkfield, 2007 , p. 277). Milcarek and Struening’s (1975) bibliography of evaluation methods, published in an early handbook on evaluation, provided sections on conceptualization, measurement, design, and interpretation, with a total of seventy-five entries. Apart from a handful of general texts that conceivably covered a variety of methods and multiple research paradigms, nearly all the entries explicitly addressed quantitative research issues, and only one had the word qualitative in the title. An overwhelming emphasis on quantitative methods also was shown in The International Encyclopedia of Educational Evaluation ( Walberg & Haertel, 1980 ).

In the 1970s and 1980s, the dominant use of quantitative methods in most social science disciplines came under attack in what some called the paradigm wars ( Gage, 1989 ). Eisner (1979) demonstrated how educational criticism—the process of enabling others to see the qualities of something—expanded evaluators’ understanding of how they come to know, thereby creating new avenues for educational evaluation and research. Patton (1980) took this further with his tome, Qualitative Evaluation Methods , which provided evaluators and applied social scientists with a reference for expanding their methodological repertoire to include qualitative methods. Experimental designs, which long had been used successfully in small-scale research studies, began to be considered difficult or unworkable in the contexts of large-scale social and educational programs, resulting, in part, in findings showing program failures. House (1980 , pp. 250–251) stated that approaches relying on “objectivist epistemology” (e.g., those using the methods of systems analysis or behavioral objective studies) failed because they relied on “the truth aspect of validity to the exclusion of the credibility and normative aspects.” Evaluators taking an “interpretivist” stance considered evaluation findings to be about “contextualized meaning,” with reality viewed as socially constructed and truth an issue of agreement ( Greene, 1994 ). Greene, Doughty, Marquart, Ray, and Roberts (1988) and Whitmore and Ray (1989) introduced the use of audits in qualitative evaluations to enhance “internal quality, external defensibility, and thus the stature and utilization of naturalistic evaluation” ( Greene et al., 1988 , p. 352).

Slowly, evaluators incorporated the interpretivist stance into their collection of epistemological perspectives ( Greene & Henry, 2005 ). Some scholars trained in quantitative methods began to reject their training and quantitative perspective “as epistemologically inadequate and expressed a qualitative preference” ( Cook, 1997 , p. 33); for example, Stake, trained as a psychometrician, began advocating that evaluators provide both descriptive results and judgmental results in evaluation reports. He is now widely known as a case study expert. Furthermore, it was clear that, like qualitative evaluators, quantitative evaluators understood full well that knowledge is continually refined and that truth is not absolute ( Reichardt & Rallis, 1994 ). Except perhaps for among some epistemological diehards, the qualitative-quantitative debates eventually quieted, with quantitative evaluators accepting the value of qualitative methods. Evaluation theorists, methodologists, and practitioners for the most part came to agree that the paradigms were not incompatible, that a partnership between the two was possible ( Hedrick, 1994 ; Smith, 1994 ), and that evaluation content was more important than evaluation methodologies ( House, 1994 ). Yin (2011 , p. 287) concluded that

The harshness of the debate obscured the fact that contrasting methods had always coexisted in social science, with no method consistently prevailing over any other. Methodological differences had long been recognized and tolerated in such fields as sociology, well predating the disagreements in program evaluation.

This conclusion is exemplified well by the evaluator and evaluation theorist Michael Patton (1990 , p. 39), who stated

Rather than believing that one must choose to align with one paradigm or the other, I advocate a paradigm of choices. A paradigm of choices rejects methodological orthodoxy in favor of methodological appropriateness as the primary criterion for judging methodological quality. The issue then becomes... whether one has made sensible methods decisions given the purpose of the inquiry, the questions being investigated, and the resources available. The paradigm of choices recognizes that different methods are appropriate for different situations.

In the mid-1980s and early 1990s, the literature began to reflect the value of integrating quantitative and qualitative methods for triangulation purposes ( Kidder & Fine, 1987 ; Smith, 1986 ; Smith & Kleine, 1986 ) and for improving the rigor and credibility of evaluations ( Silverman, Ricci, & Gunter, 1990 ). These developments reflect the beginning of the profusion of mixed-methods evaluations ( Greene, 2007 ; Greene & Caracelli, 1997 ; Rallis & Rossman, 2003 ), an approach reflecting the pragmatic considerations that evaluators have had about the use of both qualitative and quantitative methods, as evaluation needs require. The literature has demonstrated a gradual convergence of quantitative and qualitative evaluators toward the recognition of the value, use, and advancement of each others’ methods. Qualitative evaluation methods have become widely used in program evaluation, usually in conjunction with quantitative methods. Their widespread use can also be attributed to what they lend to an evaluation: added depth of understanding of program processes and participant outcomes. In short, most evaluators now agree that multiple methods and multiple ways of knowing are essential to a program evaluation’s overarching purpose and social role.

Methods Unique to Evaluation

No overview of program evaluation methods is complete without a discussion of those that are unique to (or at least largely used by) evaluators. The features of these methods reflect aspects of evaluation that are not shared with social science research. Some of the methods are not well known outside of evaluation circles, and some are not widely used by evaluators. We highlight three here to give a taste of how evaluation generates methods appropriate for its purposes.

Michael Scriven has been a major contributor to the development of evaluation methods outside the realm of the social sciences, beginning with the Goal-Free Evaluation approach ( Scriven, 1974 ). The premise underlying this approach is that evaluations focusing solely on program goals and objectives might ignore the unintended effects and unaddressed needs of a program’s beneficiaries. Scriven and those following his approach maintain that evaluators can identify these in needs assessments: “At the very least, the evaluation team should make some effort to lay out the evidence of the need that led to the development of the evaluand in the first place” ( Davidson, 2005 , p. 39). Logic models, which are graphic displays of major components of programs, including resources, activities, outputs (i.e., the products or services provided), and outcomes at various stages, also are useful for ensuring that evaluations address not only long-term but also short- and intermediate-term goals ( Davidson, 2005 ). Scriven (1976) also proposed the modus operandi method, a procedure for

identifying the cause of a certain effect by detailed analysis of the configuration of the chain of events preceding it and of the ambient conditions.... The term refers to the characteristic pattern of links in the causal chain, which the detective refers to as the modus operandi of the criminal. These can be quantified and often configurally scored; the problem of identifying the cause can thus be converted into a pattern-recognition task for a computer. ( Scriven 1991 b , p. 234).

Scriven expanded on this approach in describing the General Elimination Methodology, in which the evaluator (a) develops a list of all possible causes, (b) considers the possible modus operandi for each cause, and (c) identifies which of the latter are present for each of the former. The modus operandi/General Elimination Methodology approaches rely on commonly used methods for examining effects in everyday life—an approach widely applied by Scriven, as in his discussion of probative logic (i.e., the logic of legal reasoning as applied to evaluation; Scriven, 1987 , 2005 ). A similar approach, developed by Mayne (2001) , labeled contribution analysis , “aims to compare an intervention’s postulated theory of change against the evidence in order to come to robust conclusions about the contribution that it has made to observed outcomes” ( White & Phillip, 2012 ).

A final example in our short list of novel evaluation approaches is the Success Case Method, “a carefully balanced blend of the ancient art of storytelling with more modern methods and principles of rigorous evaluative inquiry and research” that uses “sound principles of inquiry to seek out the right stories to tell,” backed up with “solid evidence” ( Brinkerhoff, 2003 , p. 4). The method is used to gather evidence about illustrations of the best-case scenarios of successful interventions, adapting the approach of analyzing extreme groups, as is done in some manufacturing. After identifying cases of success, evaluators use focus groups, interviews and survey questionnaires, key informants, journalistic inquiry methods, and other methods, largely qualitative, to identify the operations and context of the successful cases.

Major Evaluation Models and Approaches

Evaluators have a number of approaches for addressing the issues that have been debated among evaluators for years, such as the purposes, methods, social roles, and uses of evaluation. Some have called these approaches models of evaluation in the sense that “each one characterizes its author’s view of the main concepts in evaluation work and provides guidelines for using these concepts to arrive at defensible descriptions, judgments, and recommendations” ( Madaus, Scriven, & Stufflebeam, 1983 , p. xii–xiii). Others eschew the formality implied by the term model or believe it is too restrictive; for example, Scriven (2003) suggested the term conceptions of evaluation . Still others refer to the approaches as evaluation theories, but the word “theory” is better if restricted to “underlying fundamental issues such as the nature of evaluation, purpose, valuing, evidence, use, and so on” ( Smith, 2010 , p. 384). For our definition of model, we borrow from Smith (p. 384), who has defined models as “prescriptions for how to conduct an evaluation” that “incorporate positions on various underlying theories about fundamental issues.”

We provide a list, with definitions, of many of the models in Table 23.2 . These have been developed and promulgated over the years. Schema for categorizing them began appearing early. Roughly ten years into the period in which evaluation in the United States began to flourish, Stake (1973) listed student-gain-by-testing, institutional self-study by staff, blue-ribbon panel, transaction-observation, management analysis, instructional research, social-policy analysis, goal-free evaluation, and adversary evaluation. Except for the last two in the list, the models reflected analytic approaches that existed previously. Five years later, House (1980) included systems analysis and art criticism in a largely similar list, again reflecting existing analytic approaches. The 1980s saw the development of more models stemming less from existing approaches and more from the exigencies of evaluative work, such as Stake’s responsive evaluation model ( Stake, 2004 ), Stufflebeam’s Context-Input-Process-Product model ( Stufflebeam & Shinkfield, 2007 ), Patton’s utilization-focused evaluation ( Patton, 2008 ), and Cronbach’s ninety-five evaluation theses ( Cronbach et al., 1980 ), among others. The 1990s and later saw the emergence and refinement of approaches emphasizing collaboration among evaluators and program stakeholders, such as stakeholder-based evaluation ( Mark & Shotland, 1985 ), practical and transformative participatory evaluation ( Cousins & Chouinard, 2012 ; Mertens, 2009 ), empowerment evaluation ( Fetterman & Wandersman, 2005 ), and democratic deliberative evaluation ( House & Howe, 1999 ), as well as attention to cultural issues ( AEA, 2011 a ; Thompson-Robinson et al., 2004 ) and the evaluations of indigenous peoples ( Kawakami et al., 2008 ).

Stufflebeam has provided probably the most exhaustive listing and description of models ( Stufflebeam, 2001 ; Stufflebeam & Shinkfield, 2007 ). Not all will agree with his list and categorization; for example, he ignores models that focus on indigenous and cultural issues, perhaps in part because they have received attention mostly since his work was published, and he classifies empowerment evaluation as a pseudo-evaluation, despite vociferous disagreement by Fetterman and his colleagues ( Fetterman, 1995 ). The schema has five groups categorizing twenty-six models: pseudo-evaluations, models focusing on evaluation questions or methods, models focusing on improvement or accountability, models on social agenda or advocacy, and eclectic approaches (a category including only utilization-focused evaluation). As in most categorizing, the group labels do not reflect the central features of each of the members of the group; some models seemed to be forced into the groups. It is indeed the case, however, that the five groups reflect basic aspects of evaluation, such as the formative and summative purposes of evaluation, as suggested by the improvement and accountability group; adherence to inclusive and deliberative principles, as suggested by the social agenda and advocacy group; and the prominence of social scientists within evaluation, as suggested by the questions-and-methods group.

Two Current, Widely Used Approaches

Of major evaluation approaches, two that have received considerable attention in recent years focus both on methods and on intended uses. These are studies employing experiments as their primary design feature and collaborative evaluations.

Randomized experiments were frequently the design of choice used to study the US federal evaluations of large-scale social and educational programs funded in the 1960s as part of President Lyndon’s Johnson Great Society endeavor. Legislative bodies and government decision makers wanted convincing answers about program effects and saw the advantages of experiments in providing such answers. When the experiments found programs to be ineffective, evaluators (particularly of education programs) began to question the extent to which the studies were subject to validity threats. Theorists debated the relative importance of internal validity versus external validity, with Donald Campbell supporting the former while extolling randomized field studies and Lee Cronbach supporting the latter in the interest of providing results useful for program decision making.

Evaluation funding of program effects was diminished considerably at the US federal level during the years of the Reagan presidential administration, resulting in fewer expensive experimental designs in evaluations, at least of education programs. In the 1970s and 1980s, most large-scale interventions were “demonstration projects,” with the majority of evaluations only examining service delivery inputs into projects while ignoring their effects ( Boruch, 1991 ). In the ensuing years, however, an increasing number of experiments were conducted of health, criminology, manpower training, and other social welfare programs, as well as of some education programs addressing substance abuse, and a cadre of evaluators continued to support their use and implementation. Meanwhile, a movement toward evidence-based decision making, emanating in large part within medicine, began to take hold. Organizations began collecting studies and subjecting them to meta-analyses and other review procedures that were touted as rigorous examinations of methods and effects, with the Cochrane Collaboration focusing on healthcare studies beginning in 1994, and the Campbell Collaboration broadening the focus to social programs in 1999. Experiments in education evaluation began again in earnest with the beginning of the George W. Bush presidential administration in 2001, pushed in large part by the arrival of a psychologist as the head of the Institute of Education Sciences, the reconstituted research arm of the US Department of Education. This resurgence was perhaps most strongly exemplified in the US Department of Education’s push for “scientifically based research,” a pseudonym for randomized controlled trials (also variously called randomized clinical trials, randomized control trials, randomized comparative trials , or randomly controlled trials ).

The increase in funding provided for education research and evaluation has resulted in many additional experimental and quasi-experimental studies. It has even resulted in providing federal seed money to develop a new professional association, the Society for Research on Educational Effectiveness, which meets regularly to present the results of experimental studies of education. In 2010, the president of the association commented at an association meeting that about 1,000 new evaluators proficient in conducting experiments had been trained over the previous decade. Despite strong negative reactions of some in the program evaluation community (e.g., AEA, 2003 ), it is apparent that, in the new push for experiments in education evaluation, multiple approaches to evaluation serving varying social roles remain alive and well.

Collaborative models also have recently received considerable attention in recent years. Strains of the approach have been seen in action research for years. Beginning in about the mid-1990s (e.g., Cousins & Earl, 1995 ), evaluators began discussing participatory evaluations, including those conducted primarily for practical reasons, such as practical participatory evaluation, and those conducted to enhance the social, political, and organizational power of program personnel and program beneficiaries. Depending on the model, the intended benefits and effects of collaborative evaluations include the acceptance, promotion, and application of evaluation results; enhanced learning about the organization functioning, purposes, procedures, culture, context, and so forth; increased evaluation validity by gleaning stakeholders’ knowledge and skills about the program; building the evaluation knowledge and skills of participating stakeholders; and providing program personnel and other stakeholders with the means and results to enhance their organizational, political, and social influence:

What sets participatory evaluation apart from traditional and mainstream approaches to evaluation (e.g., evaluations framed by experimental or quasi-experimental designs) is the focus on the collaborative partnership between evaluators and program community members (i.e., program developers, managers, implementers, funders, intended beneficiaries, or other relevant stakeholder groups), all of whom bring a specific complementary focus and value to the inquiry. On the one hand, evaluators bring knowledge of evaluative logic and inquiry methods, standards of professional practice, and some knowledge of content and context. On the other hand, program community members bring a detailed and rich understanding of the community and program context, understanding of program logic, and often some understanding of research methods and evaluation, depending upon their prior level of knowledge and experience. It is the relationship that emerges, along with the dialogue and conversations that ensue, that effectively define the parameters of participatory practice and the knowledge that are ultimately co-constructed as a result of these practices. ( Cousins & Chouinard, 2012 , p. 5)

Like studies using experimental designs, studies adopting participatory evaluations have also had their detractors. Some have criticized the emphasis on the use of evaluation findings that is central to all forms of collaborative evaluations ( Donaldson, Patton, Fetterman, & Scriven, 2010 ); others have criticized transformative models (particularly empowerment evaluation) as being a form of pseudo-evaluation ( Stufflebeam & Shinkfield, 2007 ) or as having a very thin empirical base establishing its effectiveness as a model ( Miller & Campbell, 2006 ). Nevertheless, as reflected in numerous published reports of collaborative evaluations, as well as by a steady stream of books ( Cousins & Chouinard, 2012 ; Fetterman & Wandersman, 2005 ; O’Sullivan, 2004 ; Rodriguez-Campos, 2005 ), the models are finding a widening audience among evaluators (particularly in international development studies, studies with diverse populations, and in small local studies) and have established themselves within the mainstream of evaluation practice.

The Major Features of Qualitative Methods in Evaluation

It is beyond the scope of this chapter to provide an overview of the methods of evaluation, but it is consistent with the focus of this handbook to present some insights about using qualitative data collection methods in evaluations. Mixed-methods designs, used to balance types of methods to varying degrees, are ubiquitous in evaluations, and the results of interviews, focus groups, observations, and document reviews are found in many evaluative studies. Using qualitative methods allows evaluators to learn about programs in enough depth to provide rich descriptions ( Stake, 2004 ) of program purposes, contexts, resources, activities, products, and outcomes at various stages of development and implementation. Reports of studies that have used qualitative methods can help evaluation audiences learn about programs in all the complexities of their daily functioning. They are particularly advantageous when evaluation issues are not clear in advance, as can often be the case when evaluations are mandated, and when funded program proposals are insufficient for preparing evaluation designs and plans, as can often be the case for evaluations of small programs or interventions. Evaluators use qualitative methods to learn program personnel’s perspectives—and, if they dig deep enough, program beneficiaries’ perspectives—thereby enhancing evaluation validity in a manner roughly similar to establishing content validity in test development. Furthermore, evaluators using qualitative methods are better placed to know the history of a program and to learn about unplanned events. For studies focusing on outcomes, qualitative methods help evaluators learn and understand mediating variables between implementation and outcomes. In Greene and Henry’s (2005 , p. 348) words, “understanding outcomes cannot happen outside of understanding the nature and form of program participation.” Furthermore, studies using qualitative methods can obtain “one-of-a-kind insights.... The question is not ‘How representative is this?’ but ‘Does it happen even once?’” ( Stake, 2004 , p. 88).

Weiss (1998 , p. 265) stated, “Qualitative evaluation has a special advantage for finding out when program operations depart from expectations and new theories are needed.” Even when program theory is well-explicated at the beginning of an evaluation, in many studies, it is fluid and adapting to contingencies as they arise. Qualitative methods allow evaluators to improve evaluation designs and plans as their studies proceed, thereby helping to improve evaluation validity to a greater depth than in evaluations that do not use the methods. Qualitative methods are particularly appropriate for formative evaluations occurring as programs are developed. They allow for rapid feedback of results and for making recommendations for improvement following unanticipated variations in program design, development, and implementation.

Qualitative evaluations and, in some instances, mixed-methods evaluations with significant qualitative components, are best achieved under circumstances in which (a) there is sufficient time to study programs in their natural cycles, (b) intensive inquiry can be conducted without concern for ethical violations, (c) the evaluation is conducted as unobtrusively as possible, (d) data sources are diverse, and (e) program stakeholders and evaluators agree about the methodological approach ( Shaw, 1999 ). Evaluators doing qualitative evaluations should address potential pitfalls such as inadvertently promoting discord while attempting to represent diverse perspectives, particularly between program personnel and program beneficiaries ( Bamberger, Rugh, & Mabry, 2006 ), as well as finding it difficult to summarize findings in a sufficiently short report, particularly when reporting the perspectives of multiple stakeholder groups.

Narratives and complexity may increase the length of reports unhelpfully, and representation of diverse viewpoints may encourage conflict as to how evaluation results should be used. To offset length, qualitative practitioners try to select for reporting those narratives that are most informative and interesting and that best convey complex implications that numbers may only suggest. (Bamberger et al., p. 274)

Information that reflects badly on individuals needs to be reported without attribution or identification. Furthermore, if evaluators’ relationships with program personnel become deep and personal, bias might result ( Weiss, 1998 ). The likelihood of biased results is counterbalanced by the payoff of deeper understanding, the probability that stakeholder groups will recognize and challenge biases in evaluation reports, and by triangulation of results among methods.

Aspects of Evaluands That Are Evaluated

Many of the definitions of evaluation that evaluators have proposed over the years address the aspects of programs that should be examined. For example, Weiss (1998 , p. 4) mentioned the “operation and/or the outcomes” of the program; Mark, Henry, and Julnes (2000 , p. 3) listed program “operations, effects, justifications, and social implications;” Rossi, Lipsey, and Freeman (2004 , p. 2) mentioned “the workings and effectiveness of social programs;” Stufflebeam and Shinkfield (2007 , p. 16) discussed “probity, feasibility, safety, significance, and/or equity” in addition to merit and worth; Patton (2008 , p. 39) discussed the “activities, characteristics, and results of programs;” and Yarbrough et al. (2011 , p. 287) simply specified “defined dimensions.” In the terminology of Scriven’s general logic of evaluation , these are known as the criteria on which programs and other evaluands are evaluated ( Scriven, 1991 b ). Scriven gives the establishment of evaluation criteria as the first step of his general logic.

Clearly, if programs are to be evaluated, the criteria that the evaluation will address must be established first. Understanding evaluation criteria requires understanding programs, of course. Shadish, Cook, and Leviton (1991) discussed three elements of a theory of social programs, including their internal structure and functioning, the external constraints on programs, and aspects of social change and program change. Internal structure includes program personnel, program beneficiaries, the resources available to the program, its administration and budgeting, its facilities, its internal organization, its intended outcomes, and the social norms within which it operates. It is important that evaluators understand the background characteristics of communities served by programs, how program beneficiaries are recruited and selected, the nature and purposes of program activities, the necessary skills of personnel, the materials that are needed to successfully deliver programs, and intended program outcomes. The external constraints have to do with political constituencies, external program stakeholders, the availability of community resources, and the political and economic values of the community and society. These are issues of context that are increasingly seen as essential to understand when conducting evaluations ( Rog, Fitzpatrick, & Conner, 2012 ). Understanding change has to do with knowing that programs can change incrementally by adopting demonstration projects or by large shifts in values and priorities. The success of the evaluation depends in part not only on how well programs are structured and organized and their social and community contexts but also on the sophistication and elucidation of theories of change; indeed, a full understanding of program theory is thought by a number of evaluators (e.g., Chen, 1990 ; Donaldson, 2007 ; Funnell & Rogers, 2011 ) to be necessary for evaluations to be successful. Increasingly, programs theories serve as the foundation for logic models.

Addressing Value in Evaluation

No overview of evaluation is complete without discussing the definition and role of the concept of value in evaluation theory and practice. Values are the underpinning for claims about what constitutes merit or worth. Consistent with the definitions of merit and worth, something can hold intrinsic value or extrinsic value. Schwandt (2005 , p. 443) further categorized values as aesthetic, moral, or effective:

Consider, for example, an evaluation of a recycling center in a local community. Whether it is good (has value) would most certainly involve judging instrumental value [i.e., whether it was a means to an end]; it might also involve a judgment of moral value in considering whether workers involved with the technology are operating in a safe environment; and it could well involve aesthetic judgment in determining whether the center is in some sense an intrusion on the surrounding landscape.

Disputes stemming from long-standing philosophical disagreements, such as those about positivism, have focused on the distinction between facts and values. Scriven (2005 , p. 236) firmly stated,

We can in fact infer validly from factual premises to evaluative conclusions by using definitions that bridge the gap: Because they are definitions, they do not count as value premises. The simplest cases are those in which propositions unpacking the meaning of “a good (or bad) X”—for example, “a good watch,” combined with a number of facts about the performance of a particular watch—fully justify the conclusion that this is a good or bad watch.

Thus, a watch has intrinsic characteristics that specify the extent to which value can be ascribed.

Scriven continued by stating that various premises , such as that a program addresses an important need or societal problem, can serve as the basis for ascribing value if they are validated by the appropriate research. Premises, he stated, “can be directly validated in commonsense ways”, an approach that assumes that “we consider the assumptions of practical life to be sensible ones” that are not “highly contested assumptions of an ethical or political kind” (p. 236). House and Howe (1999 , pp. 7–8) posited,

Whether a statement is true and objective is determined by the procedures established by professional evaluators according to the rules and concepts of their profession. Some human judgments are involved, constructed according to criteria of the institution, but their human origin does not necessarily make them objectively untrue.... Evaluative statements can be objective in the sense that we can present evidence for their truth or falsity, just as for other statements.

Similarly, Davidson (2005 , p. 95) suggested,

The values on which a solid evaluation is based are defensible insofar as there is sufficiently widespread agreement within the relevant context about those values that they can reasonably be treated as givens.... We must remember [that] we are not usually (if ever) looking for 100% certainty in our conclusions; rather, we are seeking just enough to meet the requirements for certainty in the relevant decision-making context.

Disagreements about the sources of values in evaluations have long existed in the evaluation literature. The most common source of value since the earliest years of systematic evaluation in the 1930s ( Tyler, 1991 ) has been program goals and objectives. Scriven, Davidson, and others have touted needs, identified in rigorous needs assessments, as the source of values. Others, such as Stake, believe that there is no one source of value and that value lies in the eyes of the reader. House and Howe (1999) based values on stakeholder claims that evaluators have confirmed through careful examination. Evaluators can examine and validate those stakeholders’ values identified when preparing or conducting an evaluation.

Others have critiqued each of these sources. Mark, Henry, and Julnes (2000) pointed out issues about using program goals, including that they (a) are often incomplete; (b) might not address the actual causes of problems; (c) might differ from the actual, privately held intentions of program personnel; or (d) might have been set too ambitiously to help obtain funding or set too low to ensure success in evaluations. Not infrequently, objectives are not sufficiently well developed because they are written hastily into proposals or are promised without sufficient consideration when state or local government officials receive mandated federal funds to conduct programs, thus adding to their already over-busy schedules. Mark, Henry, and Julnes also argued that needs are a troublesome source of values because they are difficult to define and select, tend to emerge as programs develop and are implemented, and are difficult to rank by priority. Furthermore, they stated that stakeholder input can be problematic because wants are not often differentiated from needs and because stakeholder group representation is not sufficiently wide.

How Evaluation Differs from Research

Discussions have long occurred about the differences between evaluation and research. The discussions tend to revolve around several themes about the nature of research versus evaluation. One theme has to do with the breadth of evaluands considered in the discussion. Some, such as Scriven, define evaluation broadly and, assuming the discussion is about rigorous inquiry (as opposed to, say, movie reviews), distinguish between research activities and skilled judgment that, say, occurs in some events in the Olympic Games. Others (indeed, most commentators in the evaluation literature) think of evaluation strictly as program evaluation and find few differences between research and evaluation, except in light of differing purposes.

A second theme is about the use and effects of evaluation findings, expressed in terms of the widely touted notion that evaluations are about making decisions and research is about simply arriving at conclusions. This notion ties evaluations into the effects of producing findings and limits research to simply producing findings. Evaluation findings might be ignored for a variety of reasons, however (e.g., if an evaluation is conducted for symbolic uses or if an evaluation’s audience disagrees with its findings). Furthermore, the findings of action research are intended to have immediate effects, such as addressing social problems.

A third theme is that research findings are intended to generalize but evaluation findings are not. This perception has some basis in practice, yet findings are said to generalize to similar programs and settings ( Cronbach et al., 1980 ; Stake & Turnbull, 1982 ), and evaluation findings can contribute to theory development, whereas some research findings, such as ecological studies of the Galapagos Islands, do not generalize ( Mathison, 2008 ). Thus, the decision-conclusion dichotomy has a stronger basis on paper than in reality. The caricature of researchers asking “What’s so?” or “Why so?” and evaluators asking, “So what?” often does not hold.

A fourth theme is that evaluation addresses value and research does not. Research need not address the merit or worth of an object of study, whereas evaluation, by definition, does. However, much research involves comparisons between choices, with conclusions about which choice is better, and some evaluations, such as the descriptive studies espoused by Stake (2004) , do not conclude with statements about winners and losers.

A fifth theme is that research and evaluation are not different methodologically because evaluation borrows from the panoply of research methods. This is indeed the case, but evaluation includes methods developed outside of the realm of social science research, such as needs assessments, the success-case method, and the modus operandi method.

Because evaluation necessarily examines issues like needs, costs, ethicality, feasibility, and justifiability, evaluators employ a much broader spectrum of evidentiary strategies than do the social sciences. In addition to all the means for accessing or creating knowledge used by the social sciences, evaluators are likely to borrow from other disciplines such as jurisprudence, journalism, arts, philosophy, accounting, and ethics. ( Mathison, 2008 , p. 192)

Ways in which evaluation differs most starkly from social science research have less to do with academic or scientific differences and more to do with the real-world contexts in which the majority of evaluations are conducted. Many evaluators often need to be methodological generalists; many researchers are more likely to know the nuances of a set of methods than are evaluators. Research funding usually comes in the form of grants, whereas evaluation funding comes in the form of contracts. This often gives researchers much more autonomy than evaluators; for example, researchers set hypotheses, but evaluators alone do not establish evaluation questions, which come from many sources. This is a result of the attention that evaluation must give to a range of stakeholders, a feature absent from most research. Evaluators have clients and usually have intended users; researchers might have no sense of who will use their findings except to hope that their colleagues in their field will attend to them.

Essentially, evaluations occur within the political arena. This aspect manifests itself in several ways. First, most program evaluations are conducted to provide findings to those in positions of power, typically in government or foundations. Program personnel might lead evaluators to ignore evaluation questions about goals or program strategies that would result in negative findings. Evaluators who are intent on focusing on serving the intended users of evaluations might particularly be subject to the undue effects of serving the client ( Alkin, 1990 ). Evaluation studies can be used to further the power of those in charge; as pointed out in our discussion about the social roles of evaluation, commentators such as House (1980) urge that evaluations enhance democratic processes by ensuring that the powerless are well-represented.

Fundamental Issues in Evaluation

The multifaceted nature of evaluative activity, complete with the influence of politics, effects of context, existence and promotion of multiple models and approaches, and use of many methods results in a number of fundamental issues about the theory, methods, practice, and profession of evaluation that have been addressed over the years. Fundamental issues in evaluation are

those underlying concerns, problems, or choices that continually resurface in different guises throughout our evaluation work.... They are periodically encountered, struggled with, and resolved reflecting contemporary values, technical considerations, political forces, professional concerns, emerging technologies, and available resources. There can never be a final, “once and for all” resolution to a fundamental issue. The resolution of such issues is often a point of contention, disagreement, and debate as the profession struggles to shed old ways of dealing with the issue and adopt a newer, more effective position. ( Smith, 2008 , pp. 2, 4).

Unlike social science research practices that have historically largely occurred independently of political, organizational, and social forces in Western nations, fundamental issues about evaluation have emerged and reemerged repeatedly in the professional discussions and in program evaluation literature since the 1960s. Some examples of recurring topics that Smith (2008) listed have to do with the purpose of evaluation, its social role, the type of evidence that is considered acceptable for evaluation claims and the methods for arriving at understanding quality, the best way to involve stakeholders, and the most effective approaches to ensuring high-quality evaluation practice. The disagreements among evaluations about topics such as these are apparent in the preceding sections of this chapter. We discuss two in greater depth here.

Perhaps the most vivid examples of disagreements about fundamental aspects of evaluation have to do with the choice of method to employ in evaluations, with the qualitative-quantitative debate exemplifying this most strongly over the years. As discussed earlier, the dispute was grounded in conflicting deeply held assumptions about inquiry, with quantitative evaluators resting their work on deduction and independence, “implying that expectations about program effects are based on theory set up before an evaluation begins” ( Greene & Henry, 2005 , p. 346). For these evaluators, theories explain observations, with evaluators having “a sense of distance from the program so that evaluators can make judgments on the basis of evidence without contamination” (Greene & Henry, p. 346). In contrast, for qualitative evaluators, “[u]nderstanding behavior doesn’t come from testing hypotheses but by capturing the meanings constructed by various participants” (Greene & Henry, p. 347). The détente between the two camps suggests that this issue has been resolved but “tensions persist, and skirmishes occur from time to time” (Greene & Henry, p. 350). Indeed, the current methodological disagreements on the role of experiments and quasi-experiments in evaluation, which Scriven (2008) called the “causal wars,” is sometimes considered an extension of the quantitative-qualitative debate. The push for “scientifically based evidence” by federal education officials and the depiction by some of experiments as the “gold standard” for research and evaluation has resulted in strong rebuttals by many evaluators. The AEA (2003) issued a statement that randomized control trials are not the only types of studies efficacious for understanding causality, sometimes are not the best for examining causality, sometimes are unethical, and, on occasion, are inappropriate because data sources are too limited. The disagreement has been discussed widely in articles (e.g., Cook, Scriven, Coryn, & Evergreen, 2010 ) books (e.g., Chen, Donaldson, & Mark, 2001 ; Donaldson, Christie, & Mark, 2009 ), and formal debates ( Donaldson & Christie, 2005 ). Often, however, evaluators overlook that disputes about methods actually reflect disagreements about the purpose of evaluation. Part of the controversy over touting experiments as the gold standard is due to disagreements over the purposes of evaluation; many evaluators argue forcefully that evaluation funding should not be targeted largely on examining the effects of evaluations. As Greene (1994) aptly pointed out, what distinguishes one methodology from another is not the methods but “rather whose questions are addressed and which values are promoted” (p. 533).

Another vivid example of disagreement about fundamental issues has to do with the purpose and social role of evaluations. In the early 1990s, Fetterman introduced the empowerment evaluation approach, which promised to conduct highly participatory evaluations (essentially self-evaluations of programs) in a manner that empowered program personnel’s self-determination in their communities and enhanced their ability to conduct future evaluations ( Fetterman, 1994 ; Fetterman, Kaftarian, & Wandersman, 1996 ; Fetterman & Wandersman, 2005 ). The approach gained wide traction, and numerous conference presentations and articles were written describing empowerment evaluations; a Wikipedia entry on the topic claims that it “is a global phenomenon.” Others quickly responded to the expanded definition and its emancipatory focus. Stufflebeam (1994) stated that evaluation should focus on merit and worth; Patton (1997) stated that it should be limited to uses for social or political liberation; Scriven considered it false, amateur evaluation; and Smith (2007) labeled it an ideology, stating, “We need to examine not only is effectiveness in actual studies but its worthiness as a political agenda” (p. 177). This debate has quieted in public fora but with little resolution. The difference between empowerment evaluations espousing self-determination as an intended effect and more traditional approaches that focus on merit, worth, significance, and even the use of evaluation findings is stark and perhaps irreconcilable.

Research on Evaluation

It might be expected that the foundation for the methods, practice, and theory of evaluation—a discipline and profession that arrives at conclusions empirically—is based firmly on empirical findings. However, as Shadish et al. (1991 , p. 478) stated,

Evaluators take for granted that the data they generate about social programs provide theoretical insights and help illuminate theoretical debates about social programs. Too often, they forget to apply this principle reflexively to their theories of evaluation.

Consequently, for decades, many in the evaluation literature have called for conducting empirical research on evaluation. Worthen (1990) stated that these calls began in the early 1970s with Stufflebeam, Worthen, Sanders, and others. Smith (1993) , Cousins and Earl (1995) , Alkin (2003) , and Henry and Mark (2003) reiterated the calls for research on evaluation. Smith (1993) advocated conducting research on evaluation as a means of knowing how evaluators operationalize evaluation models, thereby helping to develop descriptive theories of practice that give guidance about which practices are viable in which organizational and evaluation contexts. Mark (2008) stated that the results of such research might help improve the terms of debate by adding evidence to rhetoric, documenting our understanding of evaluation’s contributions, facilitating appropriate claims about what evaluation can do, stimulating efforts to improve evaluation practice, increasing professionalism among evaluators by making it clear that evaluation is worthy of study, and moving evaluators past standards and guiding principles to a more empirically supported understanding. He developed a typology of four categories of research on evaluation, including evaluation context, activities, consequences, and professional issues, and suggested that four inquiry modes that address description, classification, causal analysis, and values inquiry ( Mark et al., 2000 ) could be used to classify the kinds of research that is done. Stufflebeam and Shinkfield (2007) suggested a number of topics that might be examined empirically, including (a) the degree to which evaluation standards have enhanced the quality of evaluations, (b) the effects of stakeholder involvement on evaluation use, (c) the financial and temporal costs of stakeholder involvement, (d) the effects of collaborative approaches to evaluation on program effectiveness and the use of evaluation findings, (e) the deleterious effects of politics on evaluations, (f) the degree to which and manner in which program theory and logic models improve the programs and their evaluations, (g) the effects of needs assessment on conclusions about programs, and (h) the effects of evaluating program implementation on program functioning, as well as a number of hypotheses about methodological issues. Some other topics include the degree to which evaluations should go beyond examining merit and worth and include the identification of causes and mediators, how strong the evidence for evaluation conclusions should be, and the degree to which issues of feasibility affect the breadth and depth of evaluation studies.

With a few exceptions, substantial efforts have not been made to fund research on evaluation. With federal funding, several years of work on the topic were conducted by the Northwest Regional Educational Laboratory and at the University of California at Los Angeles in the late 1970s and early 1980s. The National Science Foundation funded the Evaluative Research and Evaluation Capacity Building program in the early years of the first decade of the century and currently funds the Promoting Research on Innovative Methodologies for Evaluation program. Other work has largely been comprised of ad hoc studies, with the widespread perception that these studies have been few. In addition to a lack of funding support, King and Stevahn (2013) suggested that little research has been conducted because of the relative newness of the discipline, the lack of agreement on what evaluation means and how it should be conducted, and the practical focus of evaluation that inhibits time and funds for theory building.

An alternative case about the breadth and depth of research on evaluation can be made, however. Evaluators often report how they conducted and what they accomplished in an evaluation, and some conduct small add-on studies. For example, they might report trying out a novel way of conducting, say, a theory-driven evaluation or a new way of approaching culturally relevant evaluation, including findings on how well the method or approach worked. They publish instruments that they have developed and validated within studies. Furthermore, reviews of the research on evaluation are a growing source of empirical findings about evaluation. From 2004 to the present, ten have been published in the American Journal of Evaluation ( AJE ) alone ( Brandon & Singh, 2009 ; Chouinard & Cousins, 2009 ; Christie & Fleischer, 2010 ; Coryn, Noakes, Westine, & Schröter, 2011 ; Johnson et al., 2009 ; Labin, Duffy, Meyers, Wandersman, & Lesesne, 2012 ; Miller & Campbell, 2006 ; Peck, Kim, & Lucio, 2012 ; Ross, Barkauoi, & Scott, 2007 ; Trevisan, 2007 ). Summarizing the reviews of the research on evaluation conducted during a period of about thirty years, King and Stevahn (2013 , p. 60) concluded, “Even as we acknowledge the relative limitations of the existing research base, taken together these syntheses provide direction for program evaluators who want their use-oriented practice to be evidence-based.” Furthermore, researchers have continued to examine experimental methods, albeit usually in publications outside the mainstream of evaluation journals (e.g., Cook, Shadish, & Wong, 2008 .

Consistent with the all-inclusive range of evaluation methods, research on evaluation can be defined to cover a comparable range of methods. A review of 586 articles (excluding book reviews and editorial notes) published in AJE from 1998 through 2011 showed that 219 (37 percent) could be considered research on evaluation ( Brandon, Vallin, & Philippoff, 2012 ). Of these, 73 percent were reports of single case studies, of evaluation instruments or methods, or of the reflections of evaluators about their work. An additional 16 percent were the reports of literature reviews, multiple-case studies, or oral histories. Only a smattering of other types of designs were found. These results suggest that the literature is replete with studies, largely descriptive, of evaluation practices and methods without manipulating variables, following cases over time, conducting simulations of evaluation scenarios, and so forth.

Elsewhere, the first author has argued that reflective narratives lack sufficient information to evaluate the methodological warrants for empirical evidence ( Brandon & Singh, 2009 ; Brandon & Fukunaga, 2014 ). However, a reasonable argument can be made that narrative accounts of the art of evaluation practice have considerable value because (a) they come from actual practice, (b) they might be more accessible to many readers because of their communicative narrative style, (c) they reflect widely accepted constructivist principles that we learn contextually and from material that is meaningful to the reader at various stages and levels of understanding, (d) they are acceptable to non-Westerners, (e) participant observation and rich description help contribute to the credibility of reports, and (f) the practical wisdom of evaluators is key to building a good theory of evaluation ( Cousins & Chouinard, 2012 ).

Opportunities abound for drawing from other disciplines in the pursuit of answering questions of importance to evaluators engaging in research on evaluation. A thriving social psychology literature underlies the development and implementation of programs and treatments, but,

in addition to serving as a wellspring for program theory, social psychology contains a wealth of research and theory that are relevant to the challenges that arise in the practice of evaluation. Examples of practice challenges of evaluation include questions about how to guide interactions among stakeholders who vary in power and in their views and interests; gather information about stakeholders’ views about such matters as a relative importance of various possible program outcomes; seek consensus across stakeholders with different interests; develop and maintain trust with stakeholders, while also eliciting continued participation; alleviate anxiety about the evaluation; maintain compliance with data collection protocols; make sure that evaluation procedures address cross-cultural issues and meet cultural competency standards; measure behaviors that take place repeatedly over time; make sense of the mixed patterns of results, whereby a program that does well on one outcome does poorly on another; and facilitate the use of evaluation findings. ( Mark, Donaldson, & Campbell, 2011 , p. 381)

The Expanding Evaluation Profession

For the past fifty years, evaluation has grown from an activity largely addressing courses and curricula (e.g., Cronbach, 1963 ), with experiments being the dominant design ( Suchman, 1967 ) for evaluating programs, to a multifaceted discipline and profession encompassing multiple evaluation models and built on an expanding and diverse theoretical base ( Shadish et al., 1991 ). It is generating an increasing number of methods unique to evaluation (e.g., White & Phillips, 2012 ) and is becoming ubiquitous in public and private organizations ( AEA, 2011 a ; W. K. Kellogg Foundation, 2004 ). Evaluation models were developed and evaluation journals were launched in the early 1970s, a few years after the federal government began funding large-scale evaluations ( Madaus & Stufflebeam, 2000 ). An informal network of leading evaluators formed the May 12th group in the 1970s, followed by two formal organizations—the Evaluation Network and the Evaluation Research Society—that merged into AEA in 1986. As of 2013, AEA had about 8,000 members, mostly North American, with a growing international contingent. The number of evaluation professional associations is growing; there are now national and regional associations worldwide, numbering about fifty by 2000 ( International Organisation for Cooperation in Evaluation, 2012 ). In the first decade of the century, these associations formed the International Organisation for Cooperation in Evaluation for the purpose of supporting and expanding program evaluation activities worldwide. By the early 1980s, standards for educational program evaluation had been developed and published (Joint Committee on Educational Evaluation, 1981). The Evaluation Research Society developed its own standards, as well ( Evaluation Research Society Standards Committee, 1982 ), and AEA published its Guiding Principles in 1994, followed by later revisions ( AEA, 2004 ). Others have followed with statements of independently developed competencies ( King, Stevahn, Ghere, & Minnema, 2001 ; Russ-Eft, Bober, de la Teja, Foxon, & Koszalka, 2008 ). The Canadian Evaluation Society approved its voluntary Professional Designations Program in 2009, in which evaluators who meet the appropriate criteria are designated “Credentialed Evaluators.” In 2009, it was reported on the Cable News Network website that program evaluation was one of “10 little-known fields with great job opportunities” ( Zupek, 2009 ).

Evaluation is becoming increasingly institutionalized through the vehicle of performance management and measurement ( Nielsen & Hunter, 2013 ), as manifested in the United States by the federal Government Performance and Results Act of 1993, which was updated in 2010, and the federal Program Assessment Rating Tool that was used from 2002 to the end of the George W. Bush administration. The number of formal graduate programs for evaluators has diminished over the years in the United States, but they have grown worldwide: on its website, the AEA shows fifty “graduate programs or certificate programs either directly in evaluation or with available concentrations in evaluation” ( AEA, 2012 ). It is to be expected that this expansion and maturing of evaluation as a discipline and profession will continue, and it is to be hoped that the influence of evaluators in a plethora of fields will grow, as well.

Alkin, M. C. ( 1990 ). Debates on evaluation . Newbury Park, CA: Sage.

Google Scholar

Google Preview

Alkin, M. C. ( 2003 ). Introduction to Section 3: Evaluation utilization. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 189–194). Dordrecht, The Netherlands: Kluwer.

Alkin, M. C. , Daillak, R. , & White, B. ( 1979 ). Using evaluations . Beverly Hills, CA: Sage.

American Evaluation Association. (2003). Response to US Department of Education notice of proposed priority on scientifically-based evaluation methods . Retrieved from http://www.eval.org/doestatement.htm

American Evaluation Association. (2004). American Evaluation Association guiding principles for evaluators . Retrieved from http://www.eval.org/Publications/GuidingPrinciplesPrintable.asp

American Evaluation Association (2008, April). American Evaluation Association internal scan report to the membership . Retrieved from http://www.archive.eval.org/Scan/aea08.scan.report.pdf

American Evaluation Association. ( 2011 a ). EPTF evaluation briefing book . Fairhaven, MA: Author.

American Evaluation Association. ( 2011 b ). Public statement on cultural competence in evaluation . Fairhaven, MA: Author.

American Evaluation Association. (2012). University programs . Retrieved from http://www.eval.org/Training/university_programs.asp .

Bamberger, M. , Rugh, J. , & Mabry, L. ( 2006 ). RealWorld evaluation: Working under budget, time, data, and political constraints . Thousand Oaks, CA: Sage.

Baslé, M. ( 2000 ). Comparative analysis of quantitative and qualitative methods in French non-experimental evaluation of regional and local policies: Three cases of training programmes for unemployed adults.   Evaluation , 6 , 323–334.

Boruch, R. F. ( 1991 ). The President’s mandate: Discovering what works and what works better. In M. W. McLaughlin & D. C. Phillips (Eds.), Evaluation and education: At quarter century (pp. 147–167). Chicago: The National Society for the Study of Education.

Brandon, P. R. , & Fukunaga, L. L. ( 2014 ). The state of the empirical research literature on stakeholder involvement in program evaluation.   American Journal of Evaluation , 35 , 26–44..

Brandon, P. R. , Lindberg, M. A. , & Wang, Z. ( 1993 ). Involving program beneficiaries in the early stages of evaluation: Issues of consequential validity and influence.   Educational Evaluation and Policy Analysis , 15 , 420–428.

Brandon, P. R. , Newton, B. J. , & Harman, J. W. ( 1993 ). Enhancing validity through beneficiaries’ equitable involvement in identifying and prioritizing homeless children’s educational problems. Evaluation and Program Planning , 16 , 287–293.

Brandon, P. R. , & Singh, J. M. ( 2009 ). The strength of the methodological warrants for the findings of research on program evaluation use.   American Journal of Evaluation , 30 , 123–157.

Brandon, P. R. , Vallin, L. M. , & Phillippoff, J. (2012, October). A quantitative summary of recent research on evaluation in the American Journal of Evaluation . Paper presented at the meeting of the American Evaluation Association, Minneapolis, MN.

Brinkerhoff, R. O. ( 2003 ). The success case method: Find out quickly what’s working and what’s not . San Francisco: Berrett-Koehler.

Chang, C. , Huang, C. , & Chen, C. ( 2010 ). The impact of implementing smoking bans among incarcerated substance abusers: A qualitative study.   Evaluation and the Health Professions , 33 , 473–479.

Chen, H. T. ( 1990 ). Theory-driven evaluations . Newbury Park, CA: Sage.

Chen, H. T. , Donaldson, S. I. , & Mark, M. M. (Eds.). ( 2011 ). Advancing validity in outcome evaluation: Theory and practice.   New Directions for Evaluation , 130 . San Francisco: Jossey-Bass.

Chouinard J. , & Cousins, J. B. ( 2009 ). A review and synthesis of current research on cross-cultural evaluation.   American Journal of Evaluation , 30 , 457–494.

Christie, C. A. , & Fleischer, D. N. ( 2010 ). Insight into evaluation practice: A content analysis of designs and methods used in evaluation studies published in North American evaluation-focused journals.   American Journal of Evaluation , 31 , 326–346.

Cook, T. D. ( 1997 ). Lessons learned in evaluation over the last 25 years. In E. Chelimsky & W. R. Shadish, Jr. (Eds.), Evaluation for the 21st Century: A resource book (pp. 30–52). Thousand Oaks, CA: Sage.

Cook, T. D. , Scriven, M. , Coryn, C. L. S. , & Evergreen, S. D. H. ( 2010 ). Contemporary thinking about causation in evaluation: A dialogue with Tom Cook and Michael Scriven.   American Journal of Evaluation , 31 , 105–117.

Cook, T. D. , Shadish, W. R. , & Wong, V. C. ( 2008 ). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons.   Journal of Policy Analysis and Management , 27 , 724–750.

Coryn, C. L. S. , Noakes, L. A. , Westine, C. D. , & Schröter, D. C. ( 2011 ). A systematic review of theory-driven evaluation practice from 1990 to 2009.   American Journal of Evaluation , 32 , 199–226.

Cousins, J. B. , & Chouinard, J. A. ( 2012 ). Participatory evaluation up close: An integration of research-based knowledge . Charlotte, NC: Information Age.

Cousins, J. B. , & Earl, L. M. ( 1995 ). Participatory evaluation in education: Studies in evaluation use and organizational learning . London: Falmer.

Cousins, J. B. , & Leithwood, K. A. ( 1986 ). Current empirical research in evaluation utilization.   Review of Educational Research , 56 , 331–364.

Cousins, J. B. , & Shulha, L. M. ( 2006 ). A comparative analysis of evaluation utilization and its cognate fields of inquiry: Current trends and issues. In M. M. Mark , J. C. Greene , & I. F. Shaw (Eds.), The Sage handbook of evaluation (pp. 266–291). Thousand Oaks, CA: Sage.

Cronbach, L. J. ( 1963 ). Course improvement through evaluation,   Teachers College Record , 64 , 672–683.

Cronbach, L. J. , Ambron, S. R. , Dornbusch, S. M. , Hess, R. D. , Hornik, R. C. , Phillips, D. C. , Walker, D. F. , & Weiner, S. S. ( 1980 ). Toward reform of program evaluation: Aims, methods and institutional arrangements . San Francisco, CA: Jossey Bass.

Davidson, E. J. ( 2005 ). Evaluation methodology basics: The nuts and bolts of sound evaluation . Thousand Oaks, CA: Sage.

Donaldson, S. I. ( 2007 ). Program theory-driven evaluation science: Strategies and applications . New York: Erlbaum.

Donaldson, S. I. , & Christie, C. A. ( 2005 ). The 2004 Claremont debate: Lipsey versus Scriven. Determining causality in program evaluation and applied research: Should experimental evidence be the gold standard?   Journal of Multidisciplinary Evaluation , 3 , 60–77.

Donaldson, S. I. , Christie, C. A. , & Mark, M. M. (Eds.). ( 2009 ). What counts as credible evidence in applied research and evaluation practice? Thousand Oaks, CA: Sage.

Donaldson, S. I. , Patton, M. Q. , Fetterman, D. , & Scriven, M. ( 2010 ). The 2009 Claremont debates: The promise and pitfalls of utilization-focused and empowerment evaluation.   Journal of MultiDisciplinary Evaluation , 6 (13), 15–57.

Eisner, E. ( 1979 ). The use of qualitative forms of evaluation for improving educational practice.   Educational Evaluation and Policy Analysis , 1 , 11–19.

Evaluation Research Society Standards Committee. ( 1982 ). Evaluation Research Society standards for program evaluation.   New Directions for Program Evaluation , 15 , 7–19.

Fetterman, D. M. ( 1994 ). Empowerment evaluation.   Evaluation Practice , 15 (1), 1–15.

Fetterman, D. M. ( 1995 ). In response to Dr. Daniel Stufflebeam’s “Empowerment evaluation, objectivist evaluation, and evaluation standards. Where the future of evaluation should not go, and where it needs to go. ” Evaluation Practice , 16 , 179–199.

Fetterman, D. M. , Kaftarian, S. , & Wandersman, A. ( 1996 ). Empowerment evaluation: Knowledge and tools for self-assessment and accountability . Thousand Oaks, CA: Sage.

Fetterman, D. M. , & Wandersman, A. ( 2005 ). Foundations of empowerment evaluation . New York: Guilford.

Fitzpatrick, J. L. , Sanders, J. R. , & Worthen, B. R. ( 2004 ). Program evaluation: Alternative approaches and practical guidelines . New York: Pearson.

Fleischer, D. N. , & Christie, C. A. ( 2009 ). Evaluation use: Results from a survey of U. S. American Evaluation Association members.   American Journal of Evaluation , 30 , 158–175.

Fournier, D. ( 2005 ). Evaluation. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 139–140). Thousand Oaks, CA: Sage.

Funnell, S. C. , & Rogers, P. J. ( 2011 ). Purposeful program theory: Effective use of theories of change and logic models . San Francisco: Jossey-Bass.

Gage, N. ( 1989 ). The paradigm wars and their aftermath: A “historical” sketch of research and teaching since 1989.   Educational Researcher , 17 (8), 10–16.

Goldman, J. , & Du Mont, J. ( 2001 ). Moving forward in batterer program evaluation: Posing a qualitative, woman-centered approach.   Evaluation and Program Planning , 24 , 297–305.

Greene, J. C. ( 1994 ). Qualitative program evaluation. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 530–544). Thousand Oaks, CA: Sage.

Greene, J. C. ( 1996 ). Qualitative evaluation and scientific citizenship : Reflections and refractions. Evaluation , 2 , 277–289.

Greene, J. C. ( 2005 ). Stakeholders. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 397–398). Thousand Oaks, CA: Sage.

Greene, J. C. ( 2007 ). Mixed methods in social inquiry . San Francisco: Wiley.

Greene, J. C. , & Caracelli, V. J. ( 1997 ). (Eds.) Advances in mixed-method evaluation: The challenges and benefits of integrating diverse paradigms.   New Directions for Evaluation , 74 . San Francisco: Jossey-Bass.

Greene, J. C. , Doughty, J. , Marquart, J. , Ray, M. , & Roberts, L. ( 1988 ). Qualitative evaluation audits in practice.   Evaluation Review , 12 , 352–375.

Greene J. C. , & Henry, G. T. ( 2005 ). Qualitative-quantitative debate in evaluation. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 345–350). Thousand Oaks, CA: Sage.

Guba, E. G. , & Lincoln, Y. S. ( 1989 ). Fourth generation evaluation . Newbury Park, CA: Sage.

Hedrick, T. E. ( 1994 ). The quantitative-qualitative debate: Possibilities for integration.   New Directions for Program Evaluation , 61 , 45–52.

Henry, G. T. ( 2005 ). Realist evaluation. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 359–362). Thousand Oaks, CA: Sage.

Henry, G. T. , & Mark, M. M. ( 2003 ). Beyond use: Understanding evaluation’s influence on attitudes and actions.   American Journal of Evaluation , 24 , 293–314.

Hofstetter, C. H. , & Alkin, M. C. ( 2003 ). Evaluation use revisited. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 197–222). Dordrecht, The Netherlands: Kluwer.

House, E. R. ( 1980 ). Evaluating with validity . Beverly Hills: Sage.

House, E. R. ( 1994 ). Integrating the quantitative and qualitative.   New Directions for Program Evaluation , 61 , 13–22.

House, E. R. , & Howe, K. R. ( 1999 ). Values in evaluation and social research . Thousand Oaks, CA: Sage.

International Organisation for Cooperation in Evaluation (2012). Overview . Retrieved from http://www.ioce.net/en/index.php

Johnson, K. , Greenseid, L. O. , Toal, S. A. , King, J. A. , Lawrenz, F. , & Volkov, B. ( 2009 ). Research on evaluation use: A review of the empirical literature from 1986 to 2005.   American Journal of Evaluation , 30 , 377–410.

Joint Committee on Standards for Educational Evaluation. ( 1981 ). Standards for evaluations of educational programs, projects, and materials . New York: McGraw-Hill.

Kawakami, A. J. , Aton, K. , Cram, F. , Lai, M. , & Porima, M. ( 2008 ). Improving the practice of evaluation through indigenous values and methods: Decolonizing evaluation practice: Returning the gaze from Hawai’i and Aotearoa. In N. L. Smith & P. R. Brandon (Eds.), Fundamental issues in evaluation (pp. 219–242). New York: Guilford.

Kidder, L. , & Fine, M. ( 1987 ). Qualitative and quantitative methods: When stories converge.   New Directions for Program Evaluation , 35 , 57–75.

King, J. A , & Stevahn, L. ( 2013 ). Interactive evaluation practice: Mastering the interpersonal dynamics of program evaluation . Thousand Oaks, CA: Sage.

King, J. A. , Stevahn, L. , Ghere, G. , & Minnema, J. ( 2001 ). Toward a taxonomy of essential evaluator competencies.   American Journal of Evaluation , 22 , 229–247.

King, J. A. , & Thompson, B. ( 1983 ). Research on school use of program evaluation: A literature review and research agenda.   Studies in Educational Evaluation , 9 , 5–21

Labin, S. N. , Duffy, J. L. , Meyers, D. C. , Wandersman, A. , & Lesesne, C. A. ( 2012 ). A research synthesis of the evaluation capacity building literature.   American Journal of Evaluation , 33 , 307–338.

LaFrance, J. ( 2004 ).. Culturally competent evaluation in Indian Country. In M. Thompson-Robinson , R. Hopson , & S. SenGupta (Eds.), In search of cultural competence in evaluation: Toward principles and practices. New Directions for Evaluation , 102 (pp. 39–50). San Francisco: Jossey-Bass.

MacDonald, B. ( 1976 ). Evaluation and the control of education. In D. Tawney (Ed.), Curriculum evaluation today: Trends and implications (pp. 125–136). London: Macmillan.

Madaus, G. F. , Scriven, M. , & Stufflebeam, D. L. ( 1983 ). Evaluation models: Viewpoints on educational and human services evaluation . Boston: Kluwer-Nijhoff.

Madaus, G. F. , & Stufflebeam, D. L. ( 2000 ). Program evaluation: A historical overview. In D. L. Stufflebeam , G. F. Madaus , & T. Kellaghan , Evaluation models: Viewpoints on educational and human services evaluation (2nd ed.) (pp. 3–18). Boston: Kluwer.

Mark, M. M. ( 2008 ). Building a better evidence base for evaluation theory: Beyond general calls to a framework of types of research on evaluation. In N. L. Smith & P. R. Brandon (Eds.), Fundamental issues on evaluation (pp. 111–134). New York: Guilford.

Mark, M. M. , Donaldson, S. , & Campbell, B. ( 2011 ). Social psychology and evaluation . New York: Guilford.

Mark, M. M. , Greene, J. C. , & Shaw, I. F. ( 2006 ). The evaluation of policies, programs, and practices. In M. M. Mark , J. C. Greene , & I. F. Shaw (Eds.), The Sage handbook of evaluation (pp. 1–30). Thousand Oaks, CA: Sage.

Mark, M. M. , Henry, G. T. , & Julnes, G. ( 2000 ). Evaluation: An integrated framework for understanding, guiding, and improving policies and programs . San Francisco: Jossey-Bass.

Mark, M. M. , & Shotland, R. L. ( 1985 ). Stakeholder-based evaluation and value judgements.   Evaluation Review , 9 , 605–626.

Mathison, S. ( 2008 ). What is the difference between evaluation and research—and why do we care? In N. L. Smith & P. R. Brandon (Eds.), Fundamental issues in evaluation (pp. 183–196). New York: Guilford.

Mathison, S. ( 2009 ). Serving the public interest through educational evaluation: Salvaging democracy by rejecting neoliberalism. In K. E. Ryan & J. B. Cousins (Eds.), The Sage international handbook of educational evaluation (pp. 525– 538). Thousand Oaks, CA: Sage.

Mayne, J. ( 2001 ). Addressing attribution through contribution analysis: Using performance measures sensibly.   Canadian Journal of Program Evaluation , 16 , 1–24.

McTaggart, R. ( 1991 ). When democratic evaluation doesn’t seem democratic.   Evaluation Practice , 12 , 9–21.

Mertens, D. M. ( 2003 ). Mixed methods and the politics of human research: The transformative-emancipatory perspective. In A.Tashakkori & C.Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 135–164). Thousand Oaks, CA: Sage.

Mertens, D. ( 2009 ). Transformative research and evaluation . New York: Guilford.

Milcarek, B. I. , & Struening, E. L. ( 1975 ). Evaluation methodology: A selective bibliography. In E. L. Struening & M. Guttentag (Eds.), Handbook of evaluation research (pp. 667– 676). Beverly Hills: Sage.

Miller, R. L. , & Campbell, R. ( 2006 ). Taking stock of empowerment evaluation: An empirical review.   American Journal of Evaluation , 27 , 296–319.

Nielsen, S. B. , & Hunter, D. E. K. ( 2013 ). (Eds.). Performance management and evaluation.   New Directions for Evaluation , 137 .

O’Sullivan, R. G. ( 2004 ). Practicing evaluation: A collaborative approach . Thousand Oaks, CA: Sage.

Parlett, M. and Hamilton, D. ( 1976 ). Evaluation as illumination: a new approach to the study of innovatory programmes. In: G. Glass , Evaluation studies review annual (Vol. 1) (pp. 140–157). Beverly Hills, CA: Sage.

Patton, M. Q. ( 1980 ). Qualitative evaluation methods . Beverly Hills: Sage.

Patton, M. Q. ( 1990 ). Qualitative evaluation and research methods (2nd ed.). Newbury Park, CA: Sage.

Patton, M. Q. ( 1997 ). Toward distinguishing empowerment evaluation and placing it in a larger context.   Evaluation Practice , 18 , 147–163.

Patton, M. Q. ( 2005 ). Intended users. In S. Mathison (Ed.), Encyclopedia of evaluation (p. 206). Thousand Oaks, CA: Sage.

Patton, M. Q. ( 2008 ). Utilization-focused evaluation . (4th ed.) Thousand Oaks, CA: Sage.

Patton, M. Q. ( 2012 ). Essentials of utilization-focused evaluation . Thousand Oaks, CA: Sage.

Patton, M. Q. , Grimes, P. S. , Guthrie, K. M. , Brennan, N. J. , French, B. D. , & Blyth, D. A. ( 1977 ). In search of impact: An analysis of the utilization of federal health evaluation research. In C. Weiss (Ed.), Using social research in public policy making (pp. 141–164). Lexington, MA: Heath.

Peck, L. R. , Kim, Y. , & Lucio, J. ( 2012 ). An empirical examination of validity in evaluation.   American Journal of Evaluation , 33 , 350–365.

Rakotonanahary, A. , Rafransoa, Z. , & Bensaid, K. ( 2002 ). Qualitative evaluation of HIV/AIDS IEC activities in Madagascar.   Evaluation and Program Planning , 25 , 341–345.

Rallis, S. F. , & Rossman, G. B. ( 2003 ). Mixed methods in evaluation contexts: A pragmatic framework. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 491–512). Thousand Oaks, CA: Sage.

Reichardt, C. S. , & Rallis, S. F. ( 1994 ). The qualitative-quantitative debate: New perspectives.   New Directions for Program Evaluation , 61 . San Francisco: Jossey-Bass.

Rodriguez-Campos, L. ( 2005 ). Collaborative evaluations: A step-by-step model for the evaluator . Tamarac, FL: Lumina.

Rog, D. J. , Fitzpatrick, J. L. , & Conner, R. (Eds.). ( 2012 ). Context and evaluation.   New Directions for Evaluation , 135 . San Francisco: Jossey-Bass.

Ross, J. A. , Barkauoi, K. , & Scott, G. ( 2007 ). Evaluations that consider the cost of educational programs: The contribution of high-quality studies.   American Journal of Evaluation , 28 , 477–492.

Rossi, P. H. , Lipsey, M. , & Freeman, H. ( 2004 ). Evaluation: A systematic approach . Thousand Oaks, CA: Sage.

Russ-Eft, D. , Bober, M. J. , de la Teja, I. , Foxon, M. J. , & Koszalka, T. A. ( 2008 ). Evaluator competencies: Standards for the practice of evaluation in organizations . San Francisco: Jossey-Bass.

Santos-Guerra, M. A. , & Fernandez-Sierra, J. ( 1996 ). Qualitative evaluation of a program on self-care and health education for diabetics.   Evaluation , 2 , 339–347.

Schwandt, T. A. ( 2005 ). Values. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 443–444). Thousand Oaks, CA: Sage.

Scriven, M. ( 1967 ). The methodology of evaluation. In R. Tyler , R. Gagne , & M. Scriven (Eds.), Perspectives on curriculum evaluation (pp. 39–83). AERA Monograph Series on Curriculum Evaluation, No. 1. Skokie, IL: Rand McNally.

Scriven, M. ( 1974 ). Prose and cons about goal-free evaluation.   Evaluation Comment , 3 , 1–4.

Scriven, M. ( 1976 ). Maximizing the power of causal investigations: The modus operandi method. In G. Glass (Ed.), Evaluation studies review annual (Vol. 1) (pp. 119–139). Beverly Hills, CA: Sage.

Scriven, M. ( 1987 ). Probative logic. In F. H. van Eemeren , R. Grootendorst , J. A. Blair , & C. A. Willard (Eds.), Argumentation: Across the lines of discipline (pp. 7–32). Dordrecht, Holland: Foris.

Scriven, M. ( 1991 a ). Beyond formative and summative evaluation. In M. W. McLaughlin & D. C. Phillips (Eds.), Evaluation and education: At quarter century . (pp. 19–64). Chicago: The National Society for the Study of Education.

Scriven, M. ( 1991 b ). Evaluation thesaurus . Newbury Park, CA: Sage.

Scriven, M. ( 2003 ). Evaluation theory and metatheory. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 15–29). Dordrecht, The Netherlands: Kluwer.

Scriven, M. ( 2005 ). Probative logic. In S. Mathison (Ed.), Encyclopedia of evaluation (p. 327). Thousand Oaks, CA: Sage.

Scriven, M. ( 2008 ). A summative evaluation of RCT methodology & an alternative approach to causal research.   Journal of Multidisciplinary Evaluation , 5 (9), 11–24.

Shadish, W. R. , Cook, T. D. , & Leviton, L. C. ( 1991 ). Foundations of program evaluation . Newbury Park, CA: Sage.

Shaw, I. F. ( 1999 ). Qualitative evaluation . Thousand Oaks, CA: Sage.

Shulha, L. , & Cousins, J. B. ( 1997 ). Evaluation utilization: Theory, research and practice since 1986.   Evaluation Practice , 18 , 195–208.

Silverman, M. , Ricci, E. M. , & Gunter, M. J. ( 1990 ). Strategies for increasing the rigor of qualitative methods in evaluation of health care programs.   Evaluation Review , 14 , 57–74.

Simons, H. ( 1987 ). Getting to know schools in a democracy: The politics and process of evaluation . Lewes, England: Falmer.

Smith, M. L. ( 1986 ). The whole is greater: Combining qualitative and quantitative approaches in evaluation studies.   New Directions for Program Evaluation , 30 , 37–54.

Smith, M. L. ( 1994 ). Qualitative plus/versus quantitative: The last word.   New Directions for Program Evaluation , 61 , 37–44.

Smith, N. L. ( 1993 ). Improving evaluation theory through the empirical study of evaluation practice.   Evaluation Practice , 14 , 237–242.

Smith, N. L. ( 1999 ). A framework for characterizing the practice of evaluation, with application to empowerment evaluation.   The Canadian Journal of Program Evaluation, Special Issue , 39–68.

Smith, N. L. ( 2007 ). Empowerment evaluation as evaluation ideology.   American Journal of Evaluation , 28 , 169–178.

Smith, N. L. ( 2008 ). Fundamental issues in evaluation. In N. L. Smith & P. R. Brandon (Eds.)., Fundamental issues in evaluation (pp. 1–23). New York: Guilford.

Smith, N. L. ( 2010 ). Characterizing the evaluand in evaluation theory.   American Journal of Evaluation , 31 , 383–389.

Smith, N. L. , & Kleine, P. L. ( 1986 ). Qualitative research and evaluation: Triangulation and multimethods reconsidered.   New Directions for Program Evaluation , 30 , 55–71.

Sobo, E. , Simmes, D. , Landsverk, J. , & Kurtin, P. S. ( 2003 ). Rapid assessment with qualitative telephone interviews: Lessons from an evaluation of California’s Healthy Families program & Medical for children.   American Journal of Evaluation , 24 , 399–408.

Stake, R. E. (1973, October). Program evaluation, particularly responsive evaluation . Paper presented at the New Trends in Evaluation conference, Göteborg, Sweden.

Stake, R. E. ( 2004 ). Standards-based & responsive evaluation . Thousand Oaks, CA: Sage.

Stake, R. E. , & Turnbull, D. J. ( 1982 ). Naturalistic generalizations.   Review Journal of Philosophy and Social Science , 7 (1), 1–12.

Stufflebeam, D. L. ( 1994 ). Empowerment evaluation, objectivist evaluation, and evaluation standards: Where the future of evaluation should not go and where it needs to go.   American Journal of Evaluation , 15 , 324–338.

Stufflebeam, D. L. ( 2001 ). Evaluation models.   New Directions for Evaluation , 89 . San Francisco: Jossey-Bass.

Stufflebeam, D. L. , & Shinkfield, A. J. ( 2007 ). Evaluation theory, models, and applications . San Francisco: Jossey-Bass.

Suchman, E. A. ( 1967 ). Evaluative research: Principles and practice in public service and social action programs . New York: Russell Sage Foundation.

Thompson-Robinson, M. , Hopson, R. , & SenGupta, S. (Eds.). ( 2004 ). In search of cultural competence in evaluation: Toward principles and practices.   New Directions for Evaluation , 102 . San Francisco: Jossey-Bass.

Trevisan, M. S. ( 2007 ). Evaluability assessment from 1986 to 2006.   American Journal of Evaluation , 28 , 290–303.

Tyler, R. W. ( 1991 ). General statement of program evaluation. In M. W. McLaughlin & D. C. Phillips (Eds.), Evaluation and education: At quarter century (pp. 3–17). Chicago: University of Chicago.

W. K. Kellogg Foundation. ( 2004 ). W. K. Kellogg Foundation handbook . Battle Creek, MI: Author.

Walberg & H. J. , & Haertel, G. D. ( 1980 ). The international encyclopedia of educational evaluation . New York: Pergamon.

Wandersman, A. , & Snell-Johns, J. ( 2005 ). Empowerment evaluation: Clarity, dialogue, and growth.   American Journal of Evaluation , 26 , 421–428.

Weiss, C. H. ( 1998 ). Evaluation: Methods for studying programs and practices (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

White, H. , & Phillips, D. ( 2012 ). Addressing attribution of cause and effect in small n impact evaluations: Towards an integrated framework (Working Paper 15). New Delhi, India: Global Development Network, International Initiative for Impact Evaluation. Retrieved from http://www.3ieimpact.org/media/filer/2012/06/29/working_paper_15.pdf .

Whitmore, E. , & Ray, M. L. ( 1989 ). Qualitative evaluation audits: Continuation of the discussion.   Evaluation Review , 13 , 78–90.

Worthen, B. R. ( 1990 ). Program evaluation. In H. J. Walberg & G. D. Haertel (Eds.), The international encyclopedia of educational evaluation (pp. 42–47). Oxford: Pergamon.

Yarbrough, D. , Shulha, L. , Hopson, R. , & Caruthers, F. ( 2011 ). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.

Yin, R. K. ( 2011 ). Qualitative research from start to finish . New York: Guilford.

Zupek, R. (2009). 10 little-known fields with great job opportunities (Worldwide Web report). Retrieved from http://www.cnn.com/2009/LIVING/worklife/09/16/cb.littleknown.jobs.opportunities/index.html

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Program Evaluation Research

Cite this chapter.

program evaluation research definition

  • Jerald Jay Thomas  

1413 Accesses

According to the Joint Committee on Standards for Educational Evaluation (2011), program evaluation as a method of research is a means of systematically evaluating an object or educational program. As straightforward and succinct as that definition is, you will find that evaluation research borrows heavily from other methods of research. Evaluation research has at its root the assumption that the value, quality, and effectiveness of an educational program can be appraised through a variety of data sources. As educators, we find ourselves making evaluations daily, and in a variety of contexts. The evaluations we make according to Fitzpatrick, Sanders, and Worthen (2011) fall along a continuum between formal evaluation and informal evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

American Educational Research Association. (2011). Codeof ethics of the American Educational Research Association . Retrieved from http://www.aera.net/uploadedFiles/About_AERA/Ethical_Standards/CodeOfEthics%281%29.pdf.

Google Scholar  

Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation: Alternative approaches and practical guidelines . Upper Saddle River, NJ: Pearson Education.

Greene, J. C. (2005). Mixed methods. In S. Mathison (Ed.), Encyclopedia of evaluation (pp. 397–298). Thousand Oaks, CA: Sage.

Joint Committee on Standards for Educational Evaluation. (2011). The program evaluation standards . Newbury Park, CA: Sage.

McNeil, K., Newman, I., & Steinhauser, J. (2005). How to be involved in program evaluation . Lanham, MD: Scarecrow Education.

Sanders, J. R. (2000). Evaluating school programs: An educator’s guide. Thousand Oaks, CA: Corwin Press.

Scriven, M. (1991). Beyond formative and summative evaluation. In G. W. McLaughlin & D. C. Phillips (Eds.), Evaluation and education: A quarter century (pp. 19–64). Chicago, IL: University of Chicago Press.

Download references

You can also search for this author in PubMed   Google Scholar

Editor information

Copyright information.

© 2012 Sheri R. Klein

About this chapter

Thomas, J.J. (2012). Program Evaluation Research. In: Klein, S.R. (eds) Action Research Methods. Palgrave Macmillan, New York. https://doi.org/10.1057/9781137046635_9

Download citation

DOI : https://doi.org/10.1057/9781137046635_9

Publisher Name : Palgrave Macmillan, New York

Print ISBN : 978-1-349-29560-9

Online ISBN : 978-1-137-04663-5

eBook Packages : Palgrave Education Collection Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Logo for VIVA Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

10 23. Program evaluation

Chapter outline.

  • What is program evaluation? (5 minute read time)
  • Planning your program evaluation (20 minute read time, including video)
  • Process evaluations and implementation science (7 minute read time)
  • Outcome and impact evaluations (5 minute read time)
  • Ethics and culture in program evaluation (10 minute read time)

Content warning: discussions of BMI/weight/obesity, genocide, and residential schools for indigenous children.

Imagine you are working for a nonprofit focused on children’s health and wellness in school. One of the grants you received this year funds a full-time position at a local elementary school for a teacher who will be integrating kinesthetic learning into their lesson plans for math classes for third graders. Kinesthetic learning is learning that occurs when the students do something physical to help learn and reinforce information, instead of listening to a lecture or other verbal teaching activity. You have read research suggesting that students retain information better using kinesthetic teaching methods and that it can reduce student behavior issues. You want to know if it might benefit your community.

A group of elementary school-aged children in green uniforms standing together smiling.

When you applied for the grant, you had to come up with some outcome measures that would tell the foundation if your program was worth continuing to fund – if it’s having an effect on your target population (the kids at the school). You told the foundation you would look at three outcomes:

  • How did using kinesthetic learning affect student behavior in classes?
  • How did using kinesthetic learning affect student scores on end-of-year standardized tests?
  • How did the students feel about kinesthetic teaching methods?

But, you say, this sounds like research! However, we have to take a look at the purpose, origin, effect , and execution of the project to understand the difference, which we do in section 23.1 in this chapter. Those domains are where we can find the similarities and differences between program evaluation and research.

Realistically, as a practitioner, you’re far more likely to engage in program evaluation than you are in research. So, you might ask why you are learning research methods and not program evaluation methods, and the answer is that you will use research methods in evaluating programs. Program evaluation tends to focus less on generalizability, experimental design, and replicability, and instead focuses on the practical application of research methods to a specific context in practice.

23.1 What is program evaluation?

Learning objectives.

Learners will be able to…

  • Define program evaluation
  • Discuss similarities and differences between program evaluation and research
  • Determine situations in which program evaluation is more appropriate than research

Program evaluation can be defined as the systematic process by which we determine if social programs are meeting their goals, how well the program runs, whether the program had the desired effect, and whether the program has merit according to stakeholders (including in terms of the monetary costs and benefits). It’s important to know what we mean when we say “evaluation.” Pruett (2000) [1]  provides a useful definition: “Evaluation is the systematic application of scientific methods to assess the design, implementation, improvement or outcomes of a program” (para. 1). That nod to scientific methods is what ties program evaluation back to research, as we discussed above. Program evaluation is action-oriented, which makes it fit well into social work research (as we discussed in Chapter 1 ).

Often, program evaluation will consist of mixed methods because its focus of is so heavily on the effect of the program in your specific context . Not that research doesn’t care about the effects of programs – of course it does! But with program evaluation, we seek to ensure the way that we are applying our program works in our agency, with our communities and clients. Thinking back to the example at the beginning of the chapter, consider the following: Does kinesthetic learning make sense for your school? What if your classroom spaces are too small? Are the activities appropriate for children with differing physical abilities who attend your school? What if school administrators are on board, but some parents are skeptical?

Bright green hedges trimmed into a maze

The project we talked about in the introductions – a real project, by the way – was funded by a grant from a foundation. The reality of the grant funding environment is that funders want to see that their money is not only being used wisely, but is having a material effect on the target population. This is a good thing, because we want to know our programs have a positive effect on clients and communities. We don’t want to just keep running a program because it’s what we’ve always done. (Consider the ethical implications of continuing to run an ineffective program.) It also forces us as practitioners to plan grant-funded programs with an eye toward evaluation. It’s much easier to evaluate your program when you can gather data at the beginning of the program than when you have to work backwards at the middle or end of the program.

How do program evaluation and research relate to each other?

As we talked about above, program evaluation and research are similar, particularly in that they both rely on scientific methods. Both use quantitative and qualitative methods, like data analysis and interviews. Effective program evaluation necessarily involves the research methods we’ve talked about in this book. Without understanding research methods, your program evaluation won’t be very rigorous and probably won’t give you much useful information.

However, there are some key differences between the two that render them distinct activities that are appropriate in different circumstances. Research is often exploratory and not evaluative at all, and instead looks for relationships between variables to build knowledge on a subject. It’s important to note at the outset that what we’re discussing below is not universally true of all projects. Instead, the framework we’re providing is a broad way to think about the differences between program evaluation and research. Scholars and practitioners disagree on whether program evaluation is a subset of research or something else entirely (and everything in between). The important thing to know about that debate is that it’s not settled, and what we’re presenting below is just one way to think about the relationship between the two.

According to Mathison (2008) [2] , the differences between program evaluation and research have to do with the domains of purpose, origins, effect and execution. 

Let’s think back to our example from the start of the chapter – kinesthetic teaching methods for 3rd grade math – to talk more about these four domains.

To understand this domain, we have to ask a few questions: why do we want to research or evaluate this program? What do we hope to gain? This is the  why  of our project (Mathison). Another way to think about it is as the  aim  of your research, which is a concept you hopefully remember from Chapter 2.

Through the lens of program evaluation, we’re evaluating this program because we want to know its effects, but also because our funder probably only wants to give money to programs that do what they’re supposed to do. We want to gather information to determine if it’s worth it for our funder – or for  us  – to invest resources in the program.

If this were a research project instead, our purpose would be congruent, but different. We would be seeking to add to the body of knowledge and evidence about kinesthetic learning, most likely hoping to provide information that can be generalized beyond 3rd grade math students. We’re trying to inform further development of the body of knowledge around kinesthetic learning and children. We’d also like to know if and how we can apply this program in contexts other than one specific school’s 3rd grade math classes. These are not the only research considerations, but just a few examples.

Purpose and origins can feel very similar and be a little hard to distinguish. The main difference is that origins are about the  who , whereas purpose is about the  why  (Mathison). So, to understand this domain, we have to ask about the source of our project – who wanted to get the project started? What do they hope this project will contribute?

For a program evaluation, the project usually arises from the priorities of funders, agencies, practitioners and (hopefully) consumers of our services. They are the ones who define the purpose we discussed above and the questions we will ask.

In research, the project arises from a researcher’s intellectual curiosity and desire to add to a body of knowledge around something they think is important and interesting. Researchers define the purpose and the questions asked in the project.

The effect of program evaluation and research is essentially what we’re going to use our results for. For program evaluation, we will use them to make a decision about whether a program is worth continuing, what changes we might make to the program in the future or how we might change the resources we devote going forward. The results are often also used by our funders to make decisions about whether they want to keep funding our program or not. (Outcome evaluations aren’t the only thing that funders will look at – they also sometimes want to know whether our processes in the program were faithful to what we described when we requested funding. We’ll discuss outcome and process evaluations in section 23.4.)

The effect of research – again, what we’re going to use our results for – is typically to add to the knowledge and evidence base surrounding our topic. Research can certainly be used for decision-making about programs, especially to decide which program to implement in the first place. But that’s not what results are primarily used for, especially by other researchers.

Execution is fundamentally the  how  of our project. What are the circumstances under which we’re running the project?

Program evaluation projects that most of us will ever work on are frequently based in a nonprofit or government agency. Context is extremely important in program evaluation (and program implementation). As most of us will know, these are environments with lots of moving parts. As a result, running controlled experiments is usually not possible, and we sometimes have to be more flexible with our evaluations to work with the resources we actually have and the unique challenges and needs of our agencies. This doesn’t mean that program evaluations can’t be rigorous or use strong research methods. We just have to be realistic about our environments and plan for that when we’re planning our evaluation.

Research is typically a lot more controlled. We do everything we can to minimize outside influences on our variables of interest, which is expected of rigorous research. Of course, some research is  extremely  controlled, especially experimental research and randomized controlled trials. this all ties back to the purpose, origins, and effects of research versus those of program evaluation – we’re primarily building knowledge and evidence.

In the end, it’s important to remember that these are guidelines, and you will no doubt encounter program evaluation projects that cross the lines of research, and vice versa. Understanding how the two differ will help you decide how to move forward when you encounter the need to assess the effect of a program in practice.

Key Takeaways

  • Program evaluation is a systematic process that uses the scientific research method to determine the effects of social programs.
  • Program evaluation and research are similar, but they differ in purpose, origins, effect and execution.
  • The purpose of program evaluation is to judge the merit or worth of a program, whereas the purpose of research is primarily to contribute to the body of knowledge around a topic.
  • The origins of program evaluation are usually funders and people working in agencies, whereas research originates primarily with scholars and their scientific interests.
  • Program evaluations are typically used to make decisions about programs, whereas research is used to add to the knowledge and evidence base around a topic.
  • Executing a program evaluation project requires a strong understanding of your setting and context in order to adapt your evaluation to meet your goals in a realistic way. The execution of research is much more controlled and seeks to minimize the influence of context.
  • If you were conducting a research project on the kinesthetic teaching methods that we talked about in this chapter, what is one research question you could study that aligns with the purpose, origins, and effects of research?
  • Consider the research project you’ve been building throughout this book. What is one program evaluation question you could study that aligns with the purpose, origins, and effects of program evaluation? How might its execution look different than what you’ve envisioned so far?

23.2 Planning your program evaluation

  • Discuss how planning a program evaluation is similar and different from planning a research project
  • Identify program stakeholders
  • Identify the basics of logic models and how they inform evaluation
  • Produce evaluation questions based on a logic model

Planning a program evaluation project requires just as much care and thought as planning a research project. But as we discussed in section 23.1, there are some significant differences between program evaluation and research that mean your planning process is also going to look a little different. You have to involve the program stakeholders at a greater level than that found with most types of research, which will sometimes focus your program evaluation project on areas you wouldn’t have necessarily chosen (for better or worse). Your program evaluation questions are far less likely to be exploratory; they are typically evaluative and sometimes explanatory.

For instance, I worked on a project designed to increase physical activity for elementary school students at recess. The school had noticed a lot of kids would just sit around at recess instead of playing. As an intervention, the organization I was working with hired recess coaches to engage the kids with new games and activities to get them moving. Our plan to measure the effect of recess coaching was to give the kids pedometers at a couple of different points during the year, and see if there was any change in their activity level as measured by the number of steps they took during recess. However, the school was also concerned with the rate of obesity among students, and asked us to also measure the height and weight of the students to calculate BMI at the beginning and end of the year. I balked at this because kids are still growing and BMI isn’t a great measure to use for kids and some kids were uncomfortable with us weighing them (with parental consent), even though no other kids would be in the room. However, the school was insistent that we take those measurements, and so we did that for all kids whose parents consented and who themselves assented to have their weight measured. We didn’t think BMI was an important measure, but the school did, so this changed an element of our evaluation.

In an ideal world, your program evaluation is going to be part of your overall program plan. This very often doesn’t happen in practice, but for the purposes of this section, we’re going to assume you’re starting from scratch with a program and really internalized the first sentence of this paragraph. (It’s important to note that no one  intentionally leaves evaluation out of their program planning; instead, it’s just not something many people running programs think about. They’re too busy… well, running programs. That’s why this chapter is so important!)

In this section, we’re going to learn about how to plan your program evaluation, including the importance of logic models. You may have heard people groan about logic models (or you may have groaned when you read those words), and the truth is, they’re a lot of work and a little complicated. Teaching you how to make one from start to finish is a little bit outside the scope of this section, but what I am going to try to do is teach you how to interpret them and build some evaluation questions from them. (Pro-tip: logic models are a heck of a lot easier to make in Excel than Word.)

It has three primary steps: engaging stakeholders, describing the program and focusing the evaluation.

Step 1: Engaging stakeholders

Stakeholders are the people and organizations that have some interest in or will be impacted by our program. Including as many stakeholders as possible when you plan your evaluation will help to make it as useful as possible for as many people as possible. The key to this step is to listen. However, a note of caution: sometimes stakeholders have competing priorities, and as the program evaluator, you’re going to have to help navigate that. For example, in our kinesthetic learning program, the teachers at your school might be interested in decreasing classroom disruptions or enhancing subject matter learning, while the administration is solely focused on test scores, while the administration is solely focused on test scores. Here is where it’s a great idea to use your social work ethics and research knowledge to guide conversations and planning. Improved test scores are great, but how much does that actually  benefit the students?

Two colleagues, a transgender woman and a non-binary person, laughing in a meeting at work

Step 2: Describe the program

Once you’ve got stakeholder input on evaluation priorities, it’s time to describe what’s going into the program and what you hope your participants and stakeholders will get out of it. Here is where a logic model becomes an essential piece of program evaluation. A logic model “ is a graphic depiction (road map) that presents the shared relationships among the resources, activities, outputs, outcomes, and impact for your program” ( Centers for Disease Control , 2018, para. 1). Basically, it’s a way to show how what you’re doing is going to lead to an intended outcome and/or impact. (We’ll discuss the difference between outcomes and impacts in section 23.4.)

Logic models have several key components, which I describe in the list below (CDC, 2018). The components are numbered because of where they come in the “logic” of your program – basically, where they come in time order.

  • Inputs: resources (e.g. people and material resources) that you have to execute your program.
  • Activities: what you’re actually doing with your program resources.
  • Outputs: the direct products and results of your program.
  • Outcomes: the changes that happen because of your program inputs and activities.
  • Impacts: the long-term effects of your program.

The CDC also talks about moderators – what they call “contextual factors” – that affect the execution of your program evaluation. This is an important component of the execution of your project, which we talked about in 23.1. Context will also become important when we talk about implementation science in section 23.3.

Let’s think about our kinesthetic learning project. While you obviously don’t have full information about what the project looks like, you’ve got a good enough idea for a little exercise below.

Step 3: Focus the evaluation

So now you know what your stakeholder priorities are and you have described your program. It’s time to figure out what questions you want to ask that will reflect stakeholder priorities and are actually possible given your program inputs, activities and outputs.

Why do inputs, activities and outputs matter for your question?

  • Inputs are your resources for the evaluation – do you have to do it with existing staff, or can you hire an expert consultant? Realistically, what you ask is going to be affected by the resources you can dedicate to your evaluation project, just like in a research project.
  • Activities are what you can actually evaluate – for instance, what effect does using hopscotch to teach multiplication have?
  • And finally, outputs are most likely your indicators of change – student engagement with administrators for behavioral issues, end-of-grade math test scores, for example.
  • Program evaluation planning should be rigorous like research planning, but will most likely focus more on stakeholder input and evaluative questions
  • The three primary steps in planning a program evaluation project are engaging stakeholders, describing your program, and focusing your evaluation.
  • Logic models are a key piece of information in planning program evaluation because they describe how a program is designed to work and what you are investing in it, which are important factors in formulating evaluation questions.
  • Who would the key stakeholders be? What is each stakeholder’s interest in the project?
  • What are the activities (the action(s) you’re evaluating) and outputs (data/indicators) for your program? Can you turn them into an evaluation question?

23.3 Process evaluation and implementation science

  • Define process evaluation
  • Explain why process evaluation is important for programs
  • Distinguish between process and outcome measures
  • Explain the purpose of implementation science and how it relates to program evaluation

Something we often don’t have time for in practice is evaluating how things are going internally with our programs. How’s it going with all the documentation our agency asks us to complete? Is the space we’re using for our group sessions facilitating client engagement? Is the way we communicate with volunteers effective? All of these things can be evaluated using a process evaluation , which is an analysis of how well your program ended up running, and sometimes how well it’s going in real time.  If you have the resources and ability to complete one of these analyses, I highly recommend it – even if it stretches your staff, it will often result in a greater degree of efficiency in the long run. (Evaluation should, at least in part, be about the long game.)

From a research perspective, process evaluations can also help you find irregularities in how you collect data that might be affecting your outcome or impact evaluations. Like other evaluations, ideally, you’re going to plan your process evaluation before you start the project. Take an iterative approach, though, because sometimes you’re going to run into problems you need to analyze in real time.

A winding country road in a flat, green landscape on a sunny day

The RAND corporation is an excellent resource for guidance on program evaluation, and they describe process evaluations this way: “Process evaluations typically track attendance of participants, program adherence, and how well you followed your work plan. They may also involve asking about satisfaction of program participants or about staff’s perception of how well the program was delivered. A process evaluation should be planned before the program begins and should continue while the program is running” (RAND Corporation, 2019, para. 1) [3] .

There are several key data sources for process evaluations (RAND Corporation, 2019) [4] , some of which are listed below.

  • Participant data: can help you determine if you are actually reaching the people you intend to.
  • Focus groups: how did people experience the program? How could you improve it from the participant perspective?
  • Satisfaction surveys: did participants get what they wanted from the program?
  • Staff perception data: How did the program go for staff? Were expectations realistic? What did they see in terms of qualitative changes for participants?
  • Program adherence monitoring: how well did you follow your program plans?

Using these data sources, you can learn lessons about your program and make any necessary adjustments if you run the program again. It can also give you insights about your staff’s needs (like training, for instance) and enable you to identify gaps in your programs or services.

Implementation science: The basics

A further development of process evaluations, i mplementation science is “the scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice, and, hence, to improve the quality and effectiveness of health services.” (Bauer, Damschroder, Hagerdorn, Smith & Kilbourne, 2015) [5]

Put more plainly, implementation science studies how we put evidence-based interventions (EBIs) into practice. It’s e ssentially a form of process evaluation, just at a more macro level. Implementation science is a r elatively new field of study that focuses on how to best put interventions into practice, and it’s i mportant because it helps us analyze on a macro level those factors that might affect our ability to implement a program. Implementation science focuses on the context of program implementation, which has significant implications for program evaluation.

A useful framework for implementation science is the EPIS (Exploration, Preparation, Implementation and Sustainment) framework. It’s not the only one out there, but I like it because to me, it sort of mirrors the linear nature of a logic model.

The EPIS framework was developed by Aarons, Hurlburt and Horwitz (first published 2011). (The linked article is behind a paywall, the abstract is still pretty useful, and if you’re affiliated with a college or university, you can probably get access through your library.) This framework emphasizes the importance of the context in which your program is being implemented – inner, organizational, context and outer, or the political, public policy and social contexts. What’s happening in your organization and in the larger political and social sphere that might affect how your program gets implemented?

There are a few key questions in each phase, according to Aarons, Hurlburt and Horwitz (2011) [6] :

  • Exploration phase: what is the problem or issue we want to address? What are our options for programs and interventions? What is the best way to put them into practice? What is the organizational and societal context that we need to consider when choosing our option?
  • Preparation: which option do we want to adopt? What resources will we need to put that option into practice? What are our organizational or sociopolitical assets and challenges in putting this option into practice?
  • Implementation: what is actually happening now that we’re putting our option into practice? How is the course of things being affected by contexts?
  • Sustainment: what can we do to ensure our option remains viable, given competing priorities with funding and public attention?

Implementation is a new and rapidly advancing field, and realistically, it’s beyond what a lot of us are going to be able to evaluate in our agencies at this point. But even taking pieces of it – especially the pieces about the importance of context for our programs and evaluations – is useful. Even if you don’t use it as an evaluative framework, the questions outlined above are good ones to ask when you’re planning your program in the first place.

  • A  process evaluation is an analysis of how your program actually ran, and sometimes how it’s running in real time.
  • Process evaluations are useful because they can help programs run more efficiently and effectively and reveal agency and program needs.
  • The EPIS model is a way to analyze the implementation of a program that emphasizes distinct phases of implementation and the context in which the phases happen.
  • The EPIS model is also useful in program planning, as it mirrors the linear process of a logic model .
  • Consider your research project or, if you have been able to adapt it, your program evaluation project. What are some inner/organizational context factors that might affect how the program gets implemented and what you can evaluate?
  • What are some things you would want to evaluate about your program’s process? What would you gain from that information?

23.4 Outcome and impact evaluations

  • Define outcome
  • Explain the principles of conducting an outcome evaluation
  • Define impact
  • Explain the principles of conducting an impact evaluation
  • Explain the difference between outcomes and impacts

A lot of us will use “outcome” and “impact” interchangeably, but the truth is, they are different. An o utcome is the final condition that occurs at the end of an intervention or program. It is the short-term effect – for our kinesthetic learning example, perhaps an improvement over last year’s end-of-grade math test scores. An i mpact is the long-term condition that occurs at the end of a defined time period after an intervention. It is the longer-term effect – for our kinesthetic learning example, perhaps better retention of math skills as students advance through school. Because of this distinction, outcome and impact evaluations are going to look a little different.

But first, let’s talk about how these types of evaluations are the same. Outcome and impact evaluations are all about change. As a result, we have to know what circumstance, characteristic or condition we are hoping will change because of our program.  We also need to figure out what we think the causal link between our intervention or program and the change is, especially if we are using a new type of intervention that doesn’t yet have a strong evidence base.

For both of these types of evaluations, you have to consider what type of research design you can actually use in your circumstances – are you coming in when a program is already in progress, so you have no baseline data? Or can you collect baseline data to compare to a post-test? For impact evaluations, how are you going to track participants over time?

The main difference between outcome and impact evaluation is the timing and, consequently, the difficulty and level of investment. You can pretty easily collect outcome data from program participants at the end of the program. But tracking people over time, especially for populations social workers serve, can be extremely difficult. It can also be difficult or impossible to control for whatever happened in your participant’s life between the end of the program and the end of your long-term measurement period.

Impact evaluations require careful planning to determine how your follow-up is going to happen. It’s a good practice to try to keep intermittent contact with participants, even if you aren’t taking a measurement at that time, so that you’re less likely to lose track of them.

  • Outcomes are short-term effects that can be measured at the end of a program.
  • Outcome evaluations apply research methods to the analysis of change during a program and try to establish a logical link between program participation and the short-term change.
  • Impacts are long-term effects that are measured after a period of time has passed since the end of a program.
  • Impact evaluations apply research methods to the analysis of change after a defined period of time has passed after the end of a program and try to establish a logical link between program participation and long-term change.
  • Is each of the following examples an outcome or an impact? Choose the correct answer.

23.5 Ethics and culture in program evaluation

  • Discuss cultural and ethical issues to consider when planning and conducting program evaluation
  • Explain the importance of stakeholder and participant involvement to address these issues

In a now decades-old paper, Stake and Mabry (1998) [7] point out, “The theory and practice of evaluation are of little value unless we can count on vigorous ethical behavior by evaluators” (p. 99). I know we always say to use the most recent scholarship available, but this point is as relevant now as it was over 20 years ago. One thing they point out that rings particularly true for me as an experienced program evaluator is the idea that we evaluators are also supposed to be “program advocates” (p. 99). We have to work through competing political and ideological differences from our stakeholders, especially funders, that, while sometimes present in research, are especially salient for program evaluation given its origins.

There’s not a rote answer for these ethical questions, just as there are none for the practice-based ethical dilemmas your instructors hammer home with you in classes. You need to use your research and social work ethics to solve these problems. Ultimately, do your best to focus on rigor while meeting stakeholder needs.

One of the most important ethical issues in program evaluation is the implication of not evaluating your program. Providing an ineffective intervention to people can be extremely harmful. And what happens if our intervention actually causes harm? It’s our duty as social workers to explore these issues and not just keep doing what we’ve always done because it’s expedient or guarantees continued funding. I’ve evaluated programs before that turned out to be ineffective, but were required by state law to be delivered to a certain population. It’s not just potentially harmful to clients; it’s also a waste of precious resources that could be devoted to other, more effective programs.

We’ve talked throughout this book about ethical issues and research. All of that is applicable to program evaluation too. Federal law governing IRB practice does not require that program evaluation go through IRB if it is not seeking to gather generalizable knowledge, so IRB approval isn’t a given for these projects. As a result, you’re even more responsible for ensuring that your project is ethical.

Cultural considerations

Ultimately, social workers should start from a place of humility in the face of cultures or groups of which we are not a part. Cultural considerations in program evaluation look similar to those in research. Something to consider about program evaluation, though: is it your duty to point out potential cultural humility issues as part of your evaluation, even if you’re not asked to? I’d argue that it is.

It is also important we make sure that our definition of success is not oppressive. For example, in Australia, the government undertook a program to remove Aboriginal children from their families and assimilated them into white culture.  The program was viewed as successful, but the measures of success were based on oppressive beliefs and stereotypes. This is why stakeholder input is essential – especially if you’re not a member of the group you’re evaluating, stakeholders are going to be the ones to tell you that you may need to reconsider what “success” means.

Native American man dressed in traditional clothing participating in a cultural celebration

Unrau , Gabor, and Grinnell (2007) [8] identified several important factors to consider when designing and executing a culturally sensitive program evaluation. First, evaluators need “a clear understanding of the impact of culture on human and social processes generally and on evaluation processes specifically and… skills in cross-cultural communications to ensure that they can effectively interact with people from diverse backgrounds” (p. 419). These are also essential skills in social work practice that you are hopefully learning in your other classes! We should strive to learn as much as possible about the cultures of our clients when they differ from ours.

The authors also point out that evaluators need to be culturally aware and make sure the way they plan and execute their evaluations isn’t centered on their own ethnic experience and that they aren’t basing their plans on stereotypes about other cultures. In addition, when executing our evaluations, we have to be mindful of how our cultural background affects our communication and behavior, because we may need to adjust these to communicate (both verbally and non-verbally) with our participants in a culturally sensitive and appropriate way.

Consider also that the type of information on which  you  place the most value may not match that of people from other cultures. Unrau , Gabor, and Grinnell (2007) [9] point out that mainstream North American cultures place a lot of value on hard data and rigorous processes like clinical trials. (You might notice that we spend a lot of time on this type of information in this textbook.) According to the authors, though, cultures from other parts of the world value relationships and storytelling as evidence and important information. This kind of information is as important and valid as what we are teaching you to collect and analyze in most of this book.

Being the squeaky wheel about evaluating programs can be uncomfortable. But as you go into practice (or grow in your current practice), I strongly believe it’s your ethical obligation to push for evaluation. It honors the dignity and worth of our clients. My hope is that this chapter has given you the tools to talk about it and, ultimately, execute it in practice.

  • Ethical considerations in program evaluation are very similar to those in research.
  • Culturally sensitive program evaluation requires evaluators to learn as much as they can about cultures different from their own and develop as much cultural awareness as possible.
  • Stakeholder input is always important, but it’s essential when planning evaluations for programs serving people from diverse backgrounds.
  • Consider the research project you’ve been working on throughout this book. Are there cultural considerations in your planning that you need to think about?
  • If you adapted your research project into a program evaluation, what might some ethical considerations be? What ethical dilemmas could you encounter?
  • Pruett, R. (2000). Program evaluation 101. Retrieved from https://mainweb-v.musc.edu/vawprevention/research/programeval.shtml ↵
  • Mathison, S. (2007). What is the difference between research and evaluation—and why do we care? In N. L. Smith & P. R. Brandon (Eds.), Fundamental issues in evaluation (pp. 183-196). New York: Guilford. ↵
  • RAND Corporation. (2020). Step 07: Process evaluation. Retrieved from https://www.rand.org/pubs/tools/TL259/step-07.html. ↵
  • Bauer, M., Damschroder, L., Hagedorn, H., Smith, J. & Kilbourne, A. (2015). An introduction to implementation science for the non-specialist. BMC Psychology, 3 (32). ↵
  • Aarons, G., Hurlburt, M. & Horwitz, S. (2011). Advancing a conceptual model of evidence-based practice implementation in public service sectors. Administration and Policy in Mental Health and Mental Health Services Research, 38 (1), pp. 4-23. ↵
  • Stake, R. & Mabry, L. (2007). Ethics in program evaluation. Scandinavian Journal of Social Welfare, 7 (2). ↵
  • Unrau, Y., Gabor, P. & Grinnell, R. (2007). Evaluation in social work: The art and science of practice . New York, New York: Oxford University Press. ↵

The systematic process by which we determine if social programs are meeting their goals, how well the program runs, whether the program had the desired effect, and whether the program has merit according to stakeholders (including in terms of the monetary costs and benefits)

individuals or groups who have an interest in the outcome of the study you conduct

the people or organizations who control access to the population you want to study

The people and organizations that have some interest in or will be effected by our program.

A graphic depiction (road map) that presents the shared relationships among the resources, activities, outputs, outcomes, and impact for your program

An analysis of how well your program ended up running, and sometimes how well it's going in real time.

The scientific study of methods to promote the systematic uptake of research findings and other evidence-based practices into routine practice, and, hence, to improve the quality and effectiveness of health services.

The final condition that occurs at the end of an intervention or program.

Tthe long-term condition that occurs at the end of a defined time period after an intervention.

Graduate research methods in social work Copyright © 2020 by Matthew DeCarlo, Cory Cummings, Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Performance and Evaluation Office (PEO) - Program Evaluation

Centers for Disease Control and Prevention. Framework for program evaluation in public health. MMWR 1999;48 (No. RR-11)

This Public Health Reports  article highlights the path CDC has taken to foster the use of evaluation. Access this valuable resource to learn more about using evaluation to inform program improvements.

What is program evaluation?

Evaluation: A systematic method for collecting, analyzing, and using data to examine the effectiveness and efficiency of programs and, as importantly, to contribute to continuous program improvement.

Program: Any set of related activities undertaken to achieve an intended outcome; any organized public health action. At CDC, program is defined broadly to include policies; interventions; environmental, systems, and media initiatives; and other efforts. It also encompasses preparedness efforts as well as research, capacity, and infrastructure efforts.

At CDC, effective program evaluation is a systematic way to improve and account for public health actions.

Why evaluate?

  • CDC has a deep and long-standing commitment to the use of data for decision making, as well as the responsibility to describe the outcomes achieved with its public health dollars.
  • Strong program evaluation can help us identify our best investments as well as determine how to establish and sustain them as optimal practice.
  • The goal is to increase the use of evaluation data for continuous program improvement Agency-wide.
We have to have a healthy obsession with impact. To always be asking ourselves what is the real impact of our work on improving health? Dr. Frieden, January 21, 2014

What's the difference between evaluation, research, and monitoring?

  • Evaluation: Purpose is to determine effectiveness of a specific program or model and understand why a program may or may not be working. Goal is to improve programs.
  • Research: Purpose is theory testing and to produce generalizable knowledge. Goal is to contribute to knowledge base.
  • Monitoring: Purpose is to track implementation progress through periodic data collection. Goal is to provide early indications of progress (or lack thereof).
  • Data collection methods and analyses are often similar between research and evaluation.
  • Monitoring and evaluation (M&E) measure and assess performance to help improve performance and achieve results.
Research seeks to prove, evaluation seeks to improve. Michael Quinn Patton, Founder and Director of Utilization-Focused Evaluation

E-mail: [email protected]

To receive email updates about this page, enter your email address:

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991.

Cover of Evaluating AIDS Prevention Programs

Evaluating AIDS Prevention Programs: Expanded Edition.

  • Hardcopy Version at National Academies Press

1 Design and Implementation of Evaluation Research

Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or collective change. This setting usually engenders a great need for cooperation between those who conduct the program and those who evaluate it. This need for cooperation can be particularly acute in the case of AIDS prevention programs because those programs have been developed rapidly to meet the urgent demands of a changing and deadly epidemic.

Although the characteristics of AIDS intervention programs place some unique demands on evaluation, the techniques for conducting good program evaluation do not need to be invented. Two decades of evaluation research have provided a basic conceptual framework for undertaking such efforts (see, e.g., Campbell and Stanley [1966] and Cook and Campbell [1979] for discussions of outcome evaluation; see Weiss [1972] and Rossi and Freeman [1982] for process and outcome evaluations); in addition, similar programs, such as the antismoking campaigns, have been subject to evaluation, and they offer examples of the problems that have been encountered.

In this chapter the panel provides an overview of the terminology, types, designs, and management of research evaluation. The following chapter provides an overview of program objectives and the selection and measurement of appropriate outcome variables for judging the effectiveness of AIDS intervention programs. These issues are discussed in detail in the subsequent, program-specific Chapters 3 - 5 .

  • Types of Evaluation

The term evaluation implies a variety of different things to different people. The recent report of the Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences defines the area through a series of questions (Turner, Miller, and Moses, 1989:317-318):

Evaluation is a systematic process that produces a trustworthy account of what was attempted and why; through the examination of results—the outcomes of intervention programs—it answers the questions, "What was done?" "To whom, and how?" and "What outcomes were observed?'' Well-designed evaluation permits us to draw inferences from the data and addresses the difficult question: ''What do the outcomes mean?"

These questions differ in the degree of difficulty of answering them. An evaluation that tries to determine the outcomes of an intervention and what those outcomes mean is a more complicated endeavor than an evaluation that assesses the process by which the intervention was delivered. Both kinds of evaluation are necessary because they are intimately connected: to establish a project's success, an evaluator must first ask whether the project was implemented as planned and then whether its objective was achieved. Questions about a project's implementation usually fall under the rubric of process evaluation . If the investigation involves rapid feedback to the project staff or sponsors, particularly at the earliest stages of program implementation, the work is called formative evaluation . Questions about effects or effectiveness are often variously called summative evaluation, impact assessment, or outcome evaluation, the term the panel uses.

Formative evaluation is a special type of early evaluation that occurs during and after a program has been designed but before it is broadly implemented. Formative evaluation is used to understand the need for the intervention and to make tentative decisions about how to implement or improve it. During formative evaluation, information is collected and then fed back to program designers and administrators to enhance program development and maximize the success of the intervention. For example, formative evaluation may be carried out through a pilot project before a program is implemented at several sites. A pilot study of a community-based organization (CBO), for example, might be used to gather data on problems involving access to and recruitment of targeted populations and the utilization and implementation of services; the findings of such a study would then be used to modify (if needed) the planned program.

Another example of formative evaluation is the use of a "story board" design of a TV message that has yet to be produced. A story board is a series of text and sketches of camera shots that are to be produced in a commercial. To evaluate the effectiveness of the message and forecast some of the consequences of actually broadcasting it to the general public, an advertising agency convenes small groups of people to react to and comment on the proposed design.

Once an intervention has been implemented, the next stage of evaluation is process evaluation, which addresses two broad questions: "What was done?" and "To whom, and how?" Ordinarily, process evaluation is carried out at some point in the life of a project to determine how and how well the delivery goals of the program are being met. When intervention programs continue over a long period of time (as is the case for some of the major AIDS prevention programs), measurements at several times are warranted to ensure that the components of the intervention continue to be delivered by the right people, to the right people, in the right manner, and at the right time. Process evaluation can also play a role in improving interventions by providing the information necessary to change delivery strategies or program objectives in a changing epidemic.

Research designs for process evaluation include direct observation of projects, surveys of service providers and clients, and the monitoring of administrative records. The panel notes that the Centers for Disease Control (CDC) is already collecting some administrative records on its counseling and testing program and community-based projects. The panel believes that this type of evaluation should be a continuing and expanded component of intervention projects to guarantee the maintenance of the projects' integrity and responsiveness to their constituencies.

The purpose of outcome evaluation is to identify consequences and to establish that consequences are, indeed, attributable to a project. This type of evaluation answers the questions, "What outcomes were observed?" and, perhaps more importantly, "What do the outcomes mean?" Like process evaluation, outcome evaluation can also be conducted at intervals during an ongoing program, and the panel believes that such periodic evaluation should be done to monitor goal achievement.

The panel believes that these stages of evaluation (i.e., formative, process, and outcome) are essential to learning how AIDS prevention programs contribute to containing the epidemic. After a body of findings has been accumulated from such evaluations, it may be fruitful to launch another stage of evaluation: cost-effectiveness analysis (see Weinstein et al., 1989). Like outcome evaluation, cost-effectiveness analysis also measures program effectiveness, but it extends the analysis by adding a measure of program cost. The panel believes that consideration of cost-effective analysis should be postponed until more experience is gained with formative, process, and outcome evaluation of the CDC AIDS prevention programs.

  • Evaluation Research Design

Process and outcome evaluations require different types of research designs, as discussed below. Formative evaluations, which are intended to both assess implementation and forecast effects, use a mix of these designs.

Process Evaluation Designs

To conduct process evaluations on how well services are delivered, data need to be gathered on the content of interventions and on their delivery systems. Suggested methodologies include direct observation, surveys, and record keeping.

Direct observation designs include case studies, in which participant-observers unobtrusively and systematically record encounters within a program setting, and nonparticipant observation, in which long, open-ended (or "focused") interviews are conducted with program participants. 1 For example, "professional customers" at counseling and testing sites can act as project clients to monitor activities unobtrusively; 2 alternatively, nonparticipant observers can interview both staff and clients. Surveys —either censuses (of the whole population of interest) or samples—elicit information through interviews or questionnaires completed by project participants or potential users of a project. For example, surveys within community-based projects can collect basic statistical information on project objectives, what services are provided, to whom, when, how often, for how long, and in what context.

Record keeping consists of administrative or other reporting systems that monitor use of services. Standardized reporting ensures consistency in the scope and depth of data collected. To use the media campaign as an example, the panel suggests using standardized data on the use of the AIDS hotline to monitor public attentiveness to the advertisements broadcast by the media campaign.

These designs are simple to understand, but they require expertise to implement. For example, observational studies must be conducted by people who are well trained in how to carry out on-site tasks sensitively and to record their findings uniformly. Observers can either complete narrative accounts of what occurred in a service setting or they can complete some sort of data inventory to ensure that multiple aspects of service delivery are covered. These types of studies are time consuming and benefit from corroboration among several observers. The use of surveys in research is well-understood, although they, too, require expertise to be well implemented. As the program chapters reflect, survey data collection must be carefully designed to reduce problems of validity and reliability and, if samples are used, to design an appropriate sampling scheme. Record keeping or service inventories are probably the easiest research designs to implement, although preparing standardized internal forms requires attention to detail about salient aspects of service delivery.

Outcome Evaluation Designs

Research designs for outcome evaluations are meant to assess principal and relative effects. Ideally, to assess the effect of an intervention on program participants, one would like to know what would have happened to the same participants in the absence of the program. Because it is not possible to make this comparison directly, inference strategies that rely on proxies have to be used. Scientists use three general approaches to construct proxies for use in the comparisons required to evaluate the effects of interventions: (1) nonexperimental methods, (2) quasi-experiments, and (3) randomized experiments. The first two are discussed below, and randomized experiments are discussed in the subsequent section.

Nonexperimental and Quasi-Experimental Designs 3

The most common form of nonexperimental design is a before-and-after study. In this design, pre-intervention measurements are compared with equivalent measurements made after the intervention to detect change in the outcome variables that the intervention was designed to influence.

Although the panel finds that before-and-after studies frequently provide helpful insights, the panel believes that these studies do not provide sufficiently reliable information to be the cornerstone for evaluation research on the effectiveness of AIDS prevention programs. The panel's conclusion follows from the fact that the postintervention changes cannot usually be attributed unambiguously to the intervention. 4 Plausible competing explanations for differences between pre-and postintervention measurements will often be numerous, including not only the possible effects of other AIDS intervention programs, news stories, and local events, but also the effects that may result from the maturation of the participants and the educational or sensitizing effects of repeated measurements, among others.

Quasi-experimental and matched control designs provide a separate comparison group. In these designs, the control group may be selected by matching nonparticipants to participants in the treatment group on the basis of selected characteristics. It is difficult to ensure the comparability of the two groups even when they are matched on many characteristics because other relevant factors may have been overlooked or mismatched or they may be difficult to measure (e.g., the motivation to change behavior). In some situations, it may simply be impossible to measure all of the characteristics of the units (e.g., communities) that may affect outcomes, much less demonstrate their comparability.

Matched control designs require extraordinarily comprehensive scientific knowledge about the phenomenon under investigation in order for evaluators to be confident that all of the relevant determinants of outcomes have been properly accounted for in the matching. Three types of information or knowledge are required: (1) knowledge of intervening variables that also affect the outcome of the intervention and, consequently, need adjustment to make the groups comparable; (2) measurements on all intervening variables for all subjects; and (3) knowledge of how to make the adjustments properly, which in turn requires an understanding of the functional relationship between the intervening variables and the outcome variables. Satisfying each of these information requirements is likely to be more difficult than answering the primary evaluation question, "Does this intervention produce beneficial effects?"

Given the size and the national importance of AIDS intervention programs and given the state of current knowledge about behavior change in general and AIDS prevention, in particular, the panel believes that it would be unwise to rely on matching and adjustment strategies as the primary design for evaluating AIDS intervention programs. With differently constituted groups, inferences about results are hostage to uncertainty about the extent to which the observed outcome actually results from the intervention and is not an artifact of intergroup differences that may not have been removed by matching or adjustment.

Randomized Experiments

A remedy to the inferential uncertainties that afflict nonexperimental designs is provided by randomized experiments . In such experiments, one singly constituted group is established for study. A subset of the group is then randomly chosen to receive the intervention, with the other subset becoming the control. The two groups are not identical, but they are comparable. Because they are two random samples drawn from the same population, they are not systematically different in any respect, which is important for all variables—both known and unknown—that can influence the outcome. Dividing a singly constituted group into two random and therefore comparable subgroups cuts through the tangle of causation and establishes a basis for the valid comparison of respondents who do and do not receive the intervention. Randomized experiments provide for clear causal inference by solving the problem of group comparability, and may be used to answer the evaluation questions "Does the intervention work?" and "What works better?"

Which question is answered depends on whether the controls receive an intervention or not. When the object is to estimate whether a given intervention has any effects, individuals are randomly assigned to the project or to a zero-treatment control group. The control group may be put on a waiting list or simply not get the treatment. This design addresses the question, "Does it work?"

When the object is to compare variations on a project—e.g., individual counseling sessions versus group counseling—then individuals are randomly assigned to these two regimens, and there is no zero-treatment control group. This design addresses the question, "What works better?" In either case, the control groups must be followed up as rigorously as the experimental groups.

A randomized experiment requires that individuals, organizations, or other treatment units be randomly assigned to one of two or more treatments or program variations. Random assignment ensures that the estimated differences between the groups so constituted are statistically unbiased; that is, that any differences in effects measured between them are a result of treatment. The absence of statistical bias in groups constituted in this fashion stems from the fact that random assignment ensures that there are no systematic differences between them, differences that can and usually do affect groups composed in ways that are not random. 5 The panel believes this approach is far superior for outcome evaluations of AIDS interventions than the nonrandom and quasi-experimental approaches. Therefore,

To improve interventions that are already broadly implemented, the panel recommends the use of randomized field experiments of alternative or enhanced interventions.

Under certain conditions, the panel also endorses randomized field experiments with a nontreatment control group to evaluate new interventions. In the context of a deadly epidemic, ethics dictate that treatment not be withheld simply for the purpose of conducting an experiment. Nevertheless, there may be times when a randomized field test of a new treatment with a no-treatment control group is worthwhile. One such time is during the design phase of a major or national intervention.

Before a new intervention is broadly implemented, the panel recommends that it be pilot tested in a randomized field experiment.

The panel considered the use of experiments with delayed rather than no treatment. A delayed-treatment control group strategy might be pursued when resources are too scarce for an intervention to be widely distributed at one time. For example, a project site that is waiting to receive funding for an intervention would be designated as the control group. If it is possible to randomize which projects in the queue receive the intervention, an evaluator could measure and compare outcomes after the experimental group had received the new treatment but before the control group received it. The panel believes that such a design can be applied only in limited circumstances, such as when groups would have access to related services in their communities and that conducting the study was likely to lead to greater access or better services. For example, a study cited in Chapter 4 used a randomized delayed-treatment experiment to measure the effects of a community-based risk reduction program. However, such a strategy may be impractical for several reasons, including:

  • sites waiting for funding for an intervention might seek resources from another source;
  • it might be difficult to enlist the nonfunded site and its clients to participate in the study;
  • there could be an appearance of favoritism toward projects whose funding was not delayed.

Although randomized experiments have many benefits, the approach is not without pitfalls. In the planning stages of evaluation, it is necessary to contemplate certain hazards, such as the Hawthorne effect 6 and differential project dropout rates. Precautions must be taken either to prevent these problems or to measure their effects. Fortunately, there is some evidence suggesting that the Hawthorne effect is usually not very large (Rossi and Freeman, 1982:175-176).

Attrition is potentially more damaging to an evaluation, and it must be limited if the experimental design is to be preserved. If sample attrition is not limited in an experimental design, it becomes necessary to account for the potentially biasing impact of the loss of subjects in the treatment and control conditions of the experiment. The statistical adjustments required to make inferences about treatment effectiveness in such circumstances can introduce uncertainties that are as worrisome as those afflicting nonexperimental and quasi-experimental designs. Thus, the panel's recommendation of the selective use of randomized design carries an implicit caveat: To realize the theoretical advantages offered by randomized experimental designs, substantial efforts will be required to ensure that the designs are not compromised by flawed execution.

Another pitfall to randomization is its appearance of unfairness or unattractiveness to participants and the controversial legal and ethical issues it sometimes raises. Often, what is being criticized is the control of project assignment of participants rather than the use of randomization itself. In deciding whether random assignment is appropriate, it is important to consider the specific context of the evaluation and how participants would be assigned to projects in the absence of randomization. The Federal Judicial Center (1981) offers five threshold conditions for the use of random assignment.

  • Does present practice or policy need improvement?
  • Is there significant uncertainty about the value of the proposed regimen?
  • Are there acceptable alternatives to randomized experiments?
  • Will the results of the experiment be used to improve practice or policy?
  • Is there a reasonable protection against risk for vulnerable groups (i.e., individuals within the justice system)?

The parent committee has argued that these threshold conditions apply in the case of AIDS prevention programs (see Turner, Miller, and Moses, 1989:331-333).

Although randomization may be desirable from an evaluation and ethical standpoint, and acceptable from a legal standpoint, it may be difficult to implement from a practical or political standpoint. Again, the panel emphasizes that questions about the practical or political feasibility of the use of randomization may in fact refer to the control of program allocation rather than to the issues of randomization itself. In fact, when resources are scarce, it is often more ethical and politically palatable to randomize allocation rather than to allocate on grounds that may appear biased.

It is usually easier to defend the use of randomization when the choice has to do with assignment to groups receiving alternative services than when the choice involves assignment to groups receiving no treatment. For example, in comparing a testing and counseling intervention that offered a special "skills training" session in addition to its regular services with a counseling and testing intervention that offered no additional component, random assignment of participants to one group rather than another may be acceptable to program staff and participants because the relative values of the alternative interventions are unknown.

The more difficult issue is the introduction of new interventions that are perceived to be needed and effective in a situation in which there are no services. An argument that is sometimes offered against the use of randomization in this instance is that interventions should be assigned on the basis of need (perhaps as measured by rates of HIV incidence or of high-risk behaviors). But this argument presumes that the intervention will have a positive effect—which is unknown before evaluation—and that relative need can be established, which is a difficult task in itself.

The panel recognizes that community and political opposition to randomization to zero treatments may be strong and that enlisting participation in such experiments may be difficult. This opposition and reluctance could seriously jeopardize the production of reliable results if it is translated into noncompliance with a research design. The feasibility of randomized experiments for AIDS prevention programs has already been demonstrated, however (see the review of selected experiments in Turner, Miller, and Moses, 1989:327-329). The substantial effort involved in mounting randomized field experiments is repaid by the fact that they can provide unbiased evidence of the effects of a program.

Unit of Assignment.

The unit of assignment of an experiment may be an individual person, a clinic (i.e., the clientele of the clinic), or another organizational unit (e.g., the community or city). The treatment unit is selected at the earliest stage of design. Variations of units are illustrated in the following four examples of intervention programs.

Two different pamphlets (A and B) on the same subject (e.g., testing) are distributed in an alternating sequence to individuals calling an AIDS hotline. The outcome to be measured is whether the recipient returns a card asking for more information.

Two instruction curricula (A and B) about AIDS and HIV infections are prepared for use in high school driver education classes. The outcome to be measured is a score on a knowledge test.

Of all clinics for sexually transmitted diseases (STDs) in a large metropolitan area, some are randomly chosen to introduce a change in the fee schedule. The outcome to be measured is the change in patient load.

A coordinated set of community-wide interventions—involving community leaders, social service agencies, the media, community associations and other groups—is implemented in one area of a city. Outcomes are knowledge as assessed by testing at drug treatment centers and STD clinics and condom sales in the community's retail outlets.

In example (1), the treatment unit is an individual person who receives pamphlet A or pamphlet B. If either "treatment" is applied again, it would be applied to a person. In example (2), the high school class is the treatment unit; everyone in a given class experiences either curriculum A or curriculum B. If either treatment is applied again, it would be applied to a class. The treatment unit is the clinic in example (3), and in example (4), the treatment unit is a community .

The consistency of the effects of a particular intervention across repetitions justly carries a heavy weight in appraising the intervention. It is important to remember that repetitions of a treatment or intervention are the number of treatment units to which the intervention is applied. This is a salient principle in the design and execution of intervention programs as well as in the assessment of their results.

The adequacy of the proposed sample size (number of treatment units) has to be considered in advance. Adequacy depends mainly on two factors:

  • How much variation occurs from unit to unit among units receiving a common treatment? If that variation is large, then the number of units needs to be large.
  • What is the minimum size of a possible treatment difference that, if present, would be practically important? That is, how small a treatment difference is it essential to detect if it is present? The smaller this quantity, the larger the number of units that are necessary.

Many formal methods for considering and choosing sample size exist (see, e.g., Cohen, 1988). Practical circumstances occasionally allow choosing between designs that involve units at different levels; thus, a classroom might be the unit if the treatment is applied in one way, but an entire school might be the unit if the treatment is applied in another. When both approaches are feasible, the use of a power analysis for each approach may lead to a reasoned choice.

Choice of Methods

There is some controversy about the advantages of randomized experiments in comparison with other evaluative approaches. It is the panel's belief that when a (well executed) randomized study is feasible, it is superior to alternative kinds of studies in the strength and clarity of whatever conclusions emerge, primarily because the experimental approach avoids selection biases. 7 Other evaluation approaches are sometimes unavoidable, but ordinarily the accumulation of valid information will go more slowly and less securely than in randomized approaches.

Experiments in medical research shed light on the advantages of carefully conducted randomized experiments. The Salk vaccine trials are a successful example of a large, randomized study. In a double-blind test of the polio vaccine, 8 children in various communities were randomly assigned to two treatments, either the vaccine or a placebo. By this method, the effectiveness of Salk vaccine was demonstrated in one summer of research (Meier, 1957).

A sufficient accumulation of relevant, observational information, especially when collected in studies using different procedures and sample populations, may also clearly demonstrate the effectiveness of a treatment or intervention. The process of accumulating such information can be a long one, however. When a (well-executed) randomized study is feasible, it can provide evidence that is subject to less uncertainty in its interpretation, and it can often do so in a more timely fashion. In the midst of an epidemic, the panel believes it proper that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention efforts. In making this recommendation, however, the panel also wishes to emphasize that the advantages of the randomized experimental design can be squandered by poor execution (e.g., by compromised assignment of subjects, significant subject attrition rates, etc.). To achieve the advantages of the experimental design, care must be taken to ensure that the integrity of the design is not compromised by poor execution.

In proposing that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention programs, the panel also recognizes that there are situations in which randomization will be impossible or, for other reasons, cannot be used. In its next report the panel will describe at length appropriate nonexperimental strategies to be considered in situations in which an experiment is not a practical or desirable alternative.

  • The Management of Evaluation

Conscientious evaluation requires a considerable investment of funds, time, and personnel. Because the panel recognizes that resources are not unlimited, it suggests that they be concentrated on the evaluation of a subset of projects to maximize the return on investment and to enhance the likelihood of high-quality results.

Project Selection

Deciding which programs or sites to evaluate is by no means a trivial matter. Selection should be carefully weighed so that projects that are not replicable or that have little chance for success are not subjected to rigorous evaluations.

The panel recommends that any intensive evaluation of an intervention be conducted on a subset of projects selected according to explicit criteria. These criteria should include the replicability of the project, the feasibility of evaluation, and the project's potential effectiveness for prevention of HIV transmission.

If a project is replicable, it means that the particular circumstances of service delivery in that project can be duplicated. In other words, for CBOs and counseling and testing projects, the content and setting of an intervention can be duplicated across sites. Feasibility of evaluation means that, as a practical matter, the research can be done: that is, the research design is adequate to control for rival hypotheses, it is not excessively costly, and the project is acceptable to the community and the sponsor. Potential effectiveness for HIV prevention means that the intervention is at least based on a reasonable theory (or mix of theories) about behavioral change (e.g., social learning theory [Bandura, 1977], the health belief model [Janz and Becker, 1984], etc.), if it has not already been found to be effective in related circumstances.

In addition, since it is important to ensure that the results of evaluations will be broadly applicable,

The panel recommends that evaluation be conducted and replicated across major types of subgroups, programs, and settings. Attention should be paid to geographic areas with low and high AIDS prevalence, as well as to subpopulations at low and high risk for AIDS.

Research Administration

The sponsoring agency interested in evaluating an AIDS intervention should consider the mechanisms through which the research will be carried out as well as the desirability of both independent oversight and agency in-house conduct and monitoring of the research. The appropriate entities and mechanisms for conducting evaluations depend to some extent on the kinds of data being gathered and the evaluation questions being asked.

Oversight and monitoring are important to keep projects fully informed about the other evaluations relevant to their own and to render assistance when needed. Oversight and monitoring are also important because evaluation is often a sensitive issue for project and evaluation staff alike. The panel is aware that evaluation may appear threatening to practitioners and researchers because of the possibility that evaluation research will show that their projects are not as effective as they believe them to be. These needs and vulnerabilities should be taken into account as evaluation research management is developed.

Conducting the Research

To conduct some aspects of a project's evaluation, it may be appropriate to involve project administrators, especially when the data will be used to evaluate delivery systems (e.g., to determine when and which services are being delivered). To evaluate outcomes, the services of an outside evaluator 9 or evaluation team are almost always required because few practitioners have the necessary professional experience or the time and resources necessary to do evaluation. The outside evaluator must have relevant expertise in evaluation research methodology and must also be sensitive to the fears, hopes, and constraints of project administrators.

Several evaluation management schemes are possible. For example, a prospective AIDS prevention project group (the contractor) can bid on a contract for project funding that includes an intensive evaluation component. The actual evaluation can be conducted either by the contractor alone or by the contractor working in concert with an outside independent collaborator. This mechanism has the advantage of involving project practitioners in the work of evaluation as well as building separate but mutually informing communities of experts around the country. Alternatively, a contract can be let with a single evaluator or evaluation team that will collaborate with the subset of sites that is chosen for evaluation. This variation would be managerially less burdensome than awarding separate contracts, but it would require greater dependence on the expertise of a single investigator or investigative team. ( Appendix A discusses contracting options in greater depth.) Both of these approaches accord with the parent committee's recommendation that collaboration between practitioners and evaluation researchers be ensured. Finally, in the more traditional evaluation approach, independent principal investigators or investigative teams may respond to a request for proposal (RFP) issued to evaluate individual projects. Such investigators are frequently university-based or are members of a professional research organization, and they bring to the task a variety of research experiences and perspectives.

Independent Oversight

The panel believes that coordination and oversight of multisite evaluations is critical because of the variability in investigators' expertise and in the results of the projects being evaluated. Oversight can provide quality control for individual investigators and can be used to review and integrate findings across sites for developing policy. The independence of an oversight body is crucial to ensure that project evaluations do not succumb to the pressures for positive findings of effectiveness.

When evaluation is to be conducted by a number of different evaluation teams, the panel recommends establishing an independent scientific committee to oversee project selection and research efforts, corroborate the impartiality and validity of results, conduct cross-site analyses, and prepare reports on the progress of the evaluations.

The composition of such an independent oversight committee will depend on the research design of a given program. For example, the committee ought to include statisticians and other specialists in randomized field tests when that approach is being taken. Specialists in survey research and case studies should be recruited if either of those approaches is to be used. Appendix B offers a model for an independent oversight group that has been successfully implemented in other settings—a project review team, or advisory board.

Agency In-House Team

As the parent committee noted in its report, evaluations of AIDS interventions require skills that may be in short supply for agencies invested in delivering services (Turner, Miller, and Moses, 1989:349). Although this situation can be partly alleviated by recruiting professional outside evaluators and retaining an independent oversight group, the panel believes that an in-house team of professionals within the sponsoring agency is also critical. The in-house experts will interact with the outside evaluators and provide input into the selection of projects, outcome objectives, and appropriate research designs; they will also monitor the progress and costs of evaluation. These functions require not just bureaucratic oversight but appropriate scientific expertise.

This is not intended to preclude the direct involvement of CDC staff in conducting evaluations. However, given the great amount of work to be done, it is likely a considerable portion will have to be contracted out. The quality and usefulness of the evaluations done under contract can be greatly enhanced by ensuring that there are an adequate number of CDC staff trained in evaluation research methods to monitor these contracts.

The panel recommends that CDC recruit and retain behavioral, social, and statistical scientists trained in evaluation methodology to facilitate the implementation of the evaluation research recommended in this report.

Interagency Collaboration

The panel believes that the federal agencies that sponsor the design of basic research, intervention programs, and evaluation strategies would profit from greater interagency collaboration. The evaluation of AIDS intervention programs would benefit from a coherent program of studies that should provide models of efficacious and effective interventions to prevent further HIV transmission, the spread of other STDs, and unwanted pregnancies (especially among adolescents). A marriage could then be made of basic and applied science, from which the best evaluation is born. Exploring the possibility of interagency collaboration and CDC's role in such collaboration is beyond the scope of this panel's task, but it is an important issue that we suggest be addressed in the future.

Costs of Evaluation

In view of the dearth of current evaluation efforts, the panel believes that vigorous evaluation research must be undertaken over the next few years to build up a body of knowledge about what interventions can and cannot do. Dedicating no resources to evaluation will virtually guarantee that high-quality evaluations will be infrequent and the data needed for policy decisions will be sparse or absent. Yet, evaluating every project is not feasible simply because there are not enough resources and, in many cases, evaluating every project is not necessary for good science or good policy.

The panel believes that evaluating only some of a program's sites or projects, selected under the criteria noted in Chapter 4 , is a sensible strategy. Although we recommend that intensive evaluation be conducted on only a subset of carefully chosen projects, we believe that high-quality evaluation will require a significant investment of time, planning, personnel, and financial support. The panel's aim is to be realistic—not discouraging—when it notes that the costs of program evaluation should not be underestimated. Many of the research strategies proposed in this report require investments that are perhaps greater than has been previously contemplated. This is particularly the case for outcome evaluations, which are ordinarily more difficult and expensive to conduct than formative or process evaluations. And those costs will be additive with each type of evaluation that is conducted.

Panel members have found that the cost of an outcome evaluation sometimes equals or even exceeds the cost of actual program delivery. For example, it was reported to the panel that randomized studies used to evaluate recent manpower training projects cost as much as the projects themselves (see Cottingham and Rodriguez, 1987). In another case, the principal investigator of an ongoing AIDS prevention project told the panel that the cost of randomized experimentation was approximately three times higher than the cost of delivering the intervention (albeit the study was quite small, involving only 104 participants) (Kelly et al., 1989). Fortunately, only a fraction of a program's projects or sites need to be intensively evaluated to produce high-quality information, and not all will require randomized studies.

Because of the variability in kinds of evaluation that will be done as well as in the costs involved, there is no set standard or rule for judging what fraction of a total program budget should be invested in evaluation. Based upon very limited data 10 and assuming that only a small sample of projects would be evaluated, the panel suspects that program managers might reasonably anticipate spending 8 to 12 percent of their intervention budgets to conduct high-quality evaluations (i.e., formative, process, and outcome evaluations). 11 Larger investments seem politically infeasible and unwise in view of the need to put resources into program delivery. Smaller investments in evaluation may risk studying an inadequate sample of program types, and it may also invite compromises in research quality.

The nature of the HIV/AIDS epidemic mandates an unwavering commitment to prevention programs, and the prevention activities require a similar commitment to the evaluation of those programs. The magnitude of what can be learned from doing good evaluations will more than balance the magnitude of the costs required to perform them. Moreover, it should be realized that the costs of shoddy research can be substantial, both in their direct expense and in the lost opportunities to identify effective strategies for AIDS prevention. Once the investment has been made, however, and a reservoir of findings and practical experience has accumulated, subsequent evaluations should be easier and less costly to conduct.

  • Bandura, A. (1977) Self-efficacy: Toward a unifying theory of behavioral change . Psychological Review 34:191-215. [ PubMed : 847061 ]
  • Campbell, D. T., and Stanley, J. C. (1966) Experimental and Quasi-Experimental Design and Analysis . Boston: Houghton-Mifflin.
  • Centers for Disease Control (CDC) (1988) Sourcebook presented at the National Conference on the Prevention of HIV Infection and AIDS Among Racial and Ethnic Minorities in the United States (August).
  • Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences . 2nd ed. Hillsdale, NJ.: L. Erlbaum Associates.
  • Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for Field Settings . Boston: Houghton-Mifflin.
  • Federal Judicial Center (1981) Experimentation in the Law . Washington, D.C.: Federal Judicial Center.
  • Janz, N. K., and Becker, M. H. (1984) The health belief model: A decade later . Health Education Quarterly 11 (1):1-47. [ PubMed : 6392204 ]
  • Kelly, J. A., St. Lawrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral intervention to reduce AIDS risk activities . Journal of Consulting and Clinical Psychology 57:60-67. [ PubMed : 2925974 ]
  • Meier, P. (1957) Safety testing of poliomyelitis vaccine . Science 125(3257): 1067-1071. [ PubMed : 13432758 ]
  • Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker . Cambridge, Mass.: Harvard University Press.
  • Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach . 2nd ed. Beverly Hills, Cal.: Sage Publications.
  • Turner, C. F., editor; , Miller, H. G., editor; , and Moses, L. E., editor. , eds. (1989) AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weinstein, M. C., Graham, J. D., Siegel, J. E., and Fineberg, H. V. (1989) Cost-effectiveness analysis of AIDS prevention programs: Concepts, complications, and illustrations . In C.F. Turner, editor; , H. G. Miller, editor; , and L. E. Moses, editor. , eds., AIDS, Sexual Behavior, and Intravenous Drug Use . Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. [ PubMed : 25032322 ]
  • Weiss, C. H. (1972) Evaluation Research . Englewood Cliffs, N.J.: Prentice-Hall, Inc.

On occasion, nonparticipants observe behavior during or after an intervention. Chapter 3 introduces this option in the context of formative evaluation.

The use of professional customers can raise serious concerns in the eyes of project administrators at counseling and testing sites. The panel believes that site administrators should receive advance notification that professional customers may visit their sites for testing and counseling services and provide their consent before this method of data collection is used.

Parts of this section are adopted from Turner, Miller, and Moses, (1989:324-326).

This weakness has been noted by CDC in a sourcebook provided to its HIV intervention project grantees (CDC, 1988:F-14).

The significance tests applied to experimental outcomes calculate the probability that any observed differences between the sample estimates might result from random variations between the groups.

Research participants' knowledge that they were being observed had a positive effect on their responses in a series of famous studies made at General Electric's Hawthorne Works in Chicago (Roethlisberger and Dickson, 1939); the phenomenon is referred to as the Hawthorne effect.

participants who self-select into a program are likely to be different from non-random comparison groups in terms of interests, motivations, values, abilities, and other attributes that can bias the outcomes.

A double-blind test is one in which neither the person receiving the treatment nor the person administering it knows which treatment (or when no treatment) is being given.

As discussed under ''Agency In-House Team,'' the outside evaluator might be one of CDC's personnel. However, given the large amount of research to be done, it is likely that non-CDC evaluators will also need to be used.

See, for example, chapter 3 which presents cost estimates for evaluations of media campaigns. Similar estimates are not readily available for other program types.

For example, the U. K. Health Education Authority (that country's primary agency for AIDS education and prevention programs) allocates 10 percent of its AIDS budget for research and evaluation of its AIDS programs (D. McVey, Health Education Authority, personal communication, June 1990). This allocation covers both process and outcome evaluation.

  • Cite this Page National Research Council (US) Panel on the Evaluation of AIDS Interventions; Coyle SL, Boruch RF, Turner CF, editors. Evaluating AIDS Prevention Programs: Expanded Edition. Washington (DC): National Academies Press (US); 1991. 1, Design and Implementation of Evaluation Research.
  • PDF version of this title (6.0M)

In this Page

Related information.

  • PubMed Links to PubMed

Recent Activity

  • Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Pr... Design and Implementation of Evaluation Research - Evaluating AIDS Prevention Programs

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Program Evaluation

Full Document Cdc-pdf [PDF – 2.6 MB] This Chapter Cdc-pdf [PDF – 777 KB]

Program evaluation can be defined as “the systematic collection of information about the activities, characteristics, and outcomes of programs, for use by people to reduce uncertainties, improve effectiveness, and make decisions” (Patton, 2008, p. 39). This utilization-focused definition guides us toward including the goals, concerns, and perspectives of program stakeholders. The results of evaluation are often used by stakeholders to improve or increase capacity of the program or activity. Furthermore, stakeholders can identify program priorities, what consti­tutes “success,” and the data sources that could serve to answer questions about the acceptability, possible participation levels, and short- and long-term impact of proposed programs.

The community as a whole and individual community groups are both key stake­holders for the evaluation of a community engagement program. This type of evaluation needs to identify the relevant community and establish its perspectives so that the views of engagement leaders and all the important components of the community are used to identify areas for improvement. This approach includes determining whether the appropriate persons or organizations are involved; the activities they are involved in; whether participants feel they have significant input; and how engagement develops, matures, and is sustained.

Program evaluation uses the methods and design strategies of traditional research, but in contrast to the more inclusive, utility-focused approach of evaluation, research is a systematic investigation designed to develop or contribute to gener­alizable knowledge (MacDonald et al., 2001). Research is hypothesis driven, often initiated and controlled by an investigator, concerned with research standards of internal and external validity, and designed to generate facts, remain value-free, and focus on specific variables. Research establishes a time sequence and control for potential confounding variables. Often, the research is widely disseminated. Evaluation, in contrast, may or may not contribute to generalizable knowledge. The primary purposes of an evaluation are to assess the processes and outcomes of a specific initiative and to facilitate ongoing program management. Evaluation of a program usually includes multiple measures that are informed by the contri­butions and perspectives of diverse stakeholders.

Formative evaluation provides information to guide program improvement, whereas process evaluation determines whether a program is delivered as intended to the targeted recipients (Rossi et al., 2004). Formative and process evaluations are appropriate to conduct during the imple­mentation of a program. Summative evaluation informs judgments about whether the program worked (i.e., whether the goals and objectives were met) and requires making explicit the criteria and evidence being used to make “summary” judg­ments. Outcome evaluation focuses on the observable conditions of a specific population, organizational attribute, or social condition that a program is expected to have changed. Whereas outcome evaluation tends to focus on conditions or behaviors that the program was expected to affect most directly and immediately (i.e., “proximal” outcomes), impact evaluation examines the program’s long-term goals. Summative, outcome, and impact evaluation are appropriate to conduct when the program either has been completed or has been ongoing for a sub­stantial period of time (Rossi et al., 2004).

For example, assessing the strategies used to implement a smoking cessation program and determining the degree to which it reached the target population are process evaluations. In contrast, an outcome evaluation of a smoking cessation program might examine how many of the program’s participants stopped smok­ing as compared with persons who did not participate. Reduction in morbidity and mortality associated with cardiovascular disease may represent an impact goal for a smoking cessation program (Rossi et al., 2004).

Several institutions have identified guidelines for an effective evaluation. For example, in 1999, CDC published a framework to guide public health professionals in developing and implementing a program evaluation (CDC, 1999). The impe­tus for the framework was to facilitate the integration of evaluation into public health programs, but the framework focuses on six components that are critical for any evaluation. Although the components are interdependent and might be implemented in a nonlinear order, the earlier domains provide a foundation for subsequent areas. They include:

  • Engage stakeholders to ensure that all partners invested in what will be learned from the evaluation become engaged early in the evaluation process.
  • Describe the program to clearly identify its goals and objectives. This description should include the program’s needs, expected outcomes, activities, resources, stage of development, context, and logic model.
  • Design the evaluation design to be useful, feasible, ethical, and accurate.
  • Gather credible evidence that strengthens the results of the evaluation and its recommendations. Sources of evidence could include people, documents, and observations.
  • Justify conclusions that are linked to the results and judged against standards or values of the stakeholders.
  • Deliberately ensure use of the evaluation and share lessons learned from it.

Five years before CDC issued its framework, the Joint Committee on Standards for Educational Evaluation (1994) created an important and practical resource for improving program evaluation. The Joint Committee, a nonprofit coalition of major professional organizations concerned with the quality of program evaluations, identified four major categories of standards — propriety, utility, feasibility, and accuracy — to consider when conducting a program evaluation.

Propriety standards focus on ensuring that an evaluation will be conducted legally, ethically, and with regard for promoting the welfare of those involved in or affected by the program evaluation. In addition to the rights of human subjects that are the concern of institutional review boards, propriety standards promote a service orientation (i.e., designing evaluations to address and serve the needs of the program’s targeted participants), fairness in identifying program strengths and weaknesses, formal agreements, avoidance or disclosure of conflict of inter­est, and fiscal responsibility.

Utility standards are intended to ensure that the evaluation will meet the information needs of intended users. Involving stakeholders, using credible evaluation methods, asking pertinent questions, including stakeholder perspectives, and providing clear and timely evaluation reports represent attention to utility standards.

Feasibility standards are intended to make sure that the evaluation’s scope and methods are realistic. The scope of the information collected should ensure that the data provide stakeholders with sufficient information to make decisions regarding the program.

Accuracy standards are intended to ensure that evaluation reports use valid methods for evaluation and are transparent in the description of those methods. Meeting accuracy standards might, for example, include using mixed methods (e.g., quantitative and qualitative), selecting justifiable informants, and drawing conclusions that are consistent with the data.

Together, the CDC framework and the Joint Committee standards provide a gen­eral perspective on the characteristics of an effective evaluation. Both identify the need to be pragmatic and serve intended users with the goal of determining the effectiveness of a program.

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

IMAGES

  1. Program Evaluation for Effective Professional Development

    program evaluation research definition

  2. What is Evaluation Research? + [Methods & Examples]

    program evaluation research definition

  3. Program Evaluation

    program evaluation research definition

  4. What is evaluation research: Methods & examples

    program evaluation research definition

  5. Evaluation Research: Definition, Methods and Examples

    program evaluation research definition

  6. PPT

    program evaluation research definition

VIDEO

  1. Definition of Evaluation by NCERT #bednotes #evaluation

  2. Research Definition ,Process of Research

  3. Evaluating Development Programs

  4. It's Not You, It's Me: When Your Reports, Not Your Readers, Are The Problem

  5. Program Evaluation Example

  6. Testing and Evaluation

COMMENTS

  1. PDF What is program evaluation?

    How does program evaluation answer questions about whether a program works, or how to improve it. Basically, program evaluations systematically collect and analyze data about program activities and outcomes. The purpose of this guide is to briefly describe the methods used in the systematic collection and use of data.

  2. Program Evaluation Guide

    Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program ...

  3. Program Evaluation Research

    Program Evaluation Research Jerald (Jay) Thomas Introduction According to the Joint Committee on Standards for Educational Evaluation (2011), program evaluation as a method of research is a means of systematically evaluating an object or educational pro-gram. As straightforward and succinct as that definition is, you will

  4. Evaluation Research: Definition, Methods and Examples

    Evaluation research, also known as program evaluation,refers to research purpose instead of a specificmethod. Evaluation research is the systematic assessment of the worth or merit of time, money, effort and resources spent in order to achieve a goal. Evaluation research is closely related to but slightly different from more conventional social ...

  5. What Is Program Evaluation?

    We believe the power to define program evaluation ultimately rests with this community. An essential purpose of AJPH is to help public health research and practice evolve by learning from within and outside the field. To that end, we hope to stimulate discussion on what program evaluation is, what it should be, and why it matters in public ...

  6. Program evaluation

    Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency.. In both the public sector and private sector, as well as the voluntary sector, stakeholders might be required to assess—under law or charter—or want to know whether the programs they are ...

  7. Program Evaluation: Getting Started and Standards

    What Is Known. In the mid-20th century, program evaluation evolved into its own field. Today, the purpose of program evaluation typically falls in 1 of 2 orientations in using data to (1) determine the overall value or worth of an education program (summative judgements of a program) or (2) plan program improvement (formative improvements to a program, project, or activity).

  8. Evaluation Research

    Evaluation Research. T. Kellaghan, in International Encyclopedia of Education (Third Edition), 2010 Evaluation research is defined as a form of disciplined and systematic inquiry that is carried out to arrive at an assessment or appraisal of an object, program, practice, activity, or system with the purpose of providing information that will be of use in decision making.

  9. PDF Key Concepts and Issues in Program Evaluation and Performance Measurement

    Program evaluation is a rich and varied combination of theory and practice. It is widely used in public, nonprofit, and private sector organizations to create information for plan- ... Thus, program logic models (Chapter 2), research designs (Chapter 3), and measurement (Chapter 4) are important for both program evaluation and performance ...

  10. Program Evaluation

    The profession of educational and social program evaluation has expanded exponentially around the globe since the mid-1960s and continues to receive the considerable attention of theorists, methodologists, and practitioners. The literature on it is wide and deep, reflecting an array of definitions and conceptions of purpose and social role.

  11. PDF An Overview of Program Evaluation

    In its broadest meaning, to evaluate means to ascertain the worth of or to fix a value on some object. In this book, we use evaluation in a more restricted sense, as program evaluation or interchangeably as evaluation research, defined as a social science activity directed at collecting, analyzing, interpreting, and communicating information about the

  12. PDF Program Evaluation Toolkit: Quick Start Guide

    5 8. Program Evaluation Toolkit: Quick Start Guide. Joshua Stewart, Jeanete Joyce, Mckenzie Haines, David Yanoski, Douglas Gagnon, Kyle Luke, Christopher Rhoads, and Carrie Germeroth October 2021. Program evaluation is important for assessing the implementation and outcomes of local, state, and federal programs.

  13. PDF Reference Guide: Program Evaluation or Research?

    Program Evaluation is a systematic collection of information about activities, characteristics and outcomes of program products or services to analyze and make judgments overall about an organization, organizational unit, or program. Such efforts are designed to improve program effectiveness, and/or inform decisions about future programs ...

  14. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    Source Definition; Suchman (1968, pp. 2-3) [Evaluation applies] the methods of science to action programs in order to obtain objective and valid measures of what such programs are accomplishing.…Evaluation research asks about the kinds of change desired, the means by which this change is to be brought about, and the signs by which such changes can be recognized.

  15. Program Evaluation for Health Professionals: What It Is, What It Isn't

    The framework presented distinguishes program evaluation from research and encourages health professionals to apply an evaluative lens in order that value judgements about the merit, worth, and significance of programs can be made. ... More recently Davidson (2014) stated that "evaluation, by definition, answers evaluative questions, that is ...

  16. Program Evaluation Research

    Abstract. According to the Joint Committee on Standards for Educational Evaluation (2011), program evaluation as a method of research is a means of systematically evaluating an object or educational program. As straightforward and succinct as that definition is, you will find that evaluation research borrows heavily from other methods of research.

  17. What is Evaluation Research? + [Methods & Examples]

    Evaluation Research: Definition. Evaluation research, also known as program evaluation, is a systematic analysis that evaluates whether a program or strategy is worth the effort, time, money, and resources spent to achieve a goal. Based on the project's objectives, the study may target different audiences such as: Employees.

  18. Program evaluation: An educator's portal into academic scholarship

    A formal definition for program evaluation has been put forth by Mohanna and Cottrell as "a systematic approach to the collection, ... Despite being distinct from research, program evaluation is a rigorous process that might use a variety of quantitative and/or qualitative data to determine the value of the outcomes of a program, though ...

  19. 23. Program evaluation

    Program evaluation is a systematic process that uses the scientific research method to determine the effects of social programs. Program evaluation and research are similar, but they differ in purpose, origins, effect and execution. The purpose of program evaluation is to judge the merit or worth of a program, whereas the purpose of research is ...

  20. PDF Program Evaluation: When is it Research?

    make judgments about the program, improve program effectiveness, and/or inform decisions about future program development. When questioning if your program evaluation project needs IRB review the researcher should ask: Does the project meet the definition of research as defined in the human subject protection regulations? That

  21. Program Evaluation Home

    Evaluation: A systematic method for collecting, analyzing, and using data to examine the effectiveness and efficiency of programs and, as importantly, to contribute to continuous program improvement. Program: Any set of related activities undertaken to achieve an intended outcome; any organized public health action. At CDC, program is defined broadly to include policies; interventions ...

  22. Design and Implementation of Evaluation Research

    Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or ...

  23. Chapter 7: Program Evaluation

    Program Evaluation. Program evaluation can be defined as "the systematic collection of information about the activities, characteristics, and outcomes of programs, for use by people to reduce uncertainties, improve effectiveness, and make decisions" (Patton, 2008, p. 39). This utilization-focused definition guides us toward including the ...