Considered the strongest because replication occurs across individuals
Measurement and Assessment Standards and Guidelines
What Works Clearinghouse | APA Division 12 Task Force on Psychological Interventions | APA Division 16 Task Force on Evidence-Based Interventions in School Psychology | National Reading Panel | The Single-Case Experimental Design Scale ( ) | Ecological Momentary Assessment ( ) | |
---|---|---|---|---|---|---|
1. Dependent variable (DV) | ||||||
Selection of DV | N/A | ≥ 3 clinically important behaviors that are relatively independent | Outcome measures that produce reliable scores (validity of measure reported) | Standardized or investigator-constructed outcomes measures (report reliability) | Measure behaviors that are the target of the intervention | Determined by research question(s) |
Assessor(s)/reporter(s) | More than one (self-report not acceptable) | N/A | Multisource (not always applicable) | N/A | Independent (implied minimum of 2) | Determined by research question(s) |
Interrater reliability | On at least 20% of the data in each phase and in each condition Must meet minimal established thresholds | N/A | N/A | N/A | Interrater reliability is reported | N/A |
Method(s) of measurement/assessment | N/A | N/A | Multimethod (e.g., at least 2 assessment methods to evaluate primary outcomes; not always applicable) | Quantitative or qualitative measure | N/A | Description of prompting, recording, participant-initiated entries, data acquisition interface (e.g., diary) |
Interval of assessment | Must be measured repeatedly over time (no minimum specified) within and across different conditions and levels of the IV | N/A | N/A | List time points when dependent measures were assessed | Sampling of the targeted behavior (i.e., DV) occurs during the treatment period | Density and schedule are reported and consistent with addressing research question(s) Define “immediate and timely response” |
Other guidelines | Raw data record provided (represent the variability of the target behavior) | |||||
2. Baseline measurement (see also Research Design Standards in ) | Minimum of 3 data points across multiple phases of a reversal or multiple baseline design; 5 data points in each phase for highest rating 1 or 2 data points can be sufficient in alternating treatment designs | Minimum of 3 data points (to establish a linear trend) | No minimum specified | No minimum (“sufficient sampling of behavior [i.e., DV] occurred pretreatment”) | N/A | |
3. Compliance and missing data guidelines | N/A | N/A | N/A | N/A | N/A | Rationale for compliance decisions, rates reported, missing data criteria and actions |
Analysis Standards and Guidelines
What Works Clearinghouse | APA Division 12 Task Force on Psychological Interventions | APA Division 16 Task Force on Evidence-Based Interventions in School Psychology | National Reading Panel | The Single-Case Experimental Design Scale ( ) | Ecological Momentary Assessment ( ) | |
---|---|---|---|---|---|---|
1. Visual analysis | 4-step, 6-variable procedure (based on ) | Acceptable (no specific guidelines or procedures offered) | ) | N/A | Not acceptable (“use statistical analyses or describe effect sizes” p. 389) | N/A |
2. Statistical analysis procedures | Estimating effect sizes: nonparametric and parametric approaches, multilevel modeling, and regression (recommended) | Preferred when the number of data points warrants statistical procedures (no specific guidelines or procedures offered) | Rely on the guidelines presented by Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) | Type not specified – report value of the effect size, type of summary statistic, and number of people providing the effect size information | Specific statistical methods are not specified, only their presence or absence is of interest in completing the scale | |
3. Demonstrating an effect | ABAB - stable baseline established during first A period, data must show improvement during the first B period, reversal or leveling of improvement during the second A period, and resumed improvement in the second B period (no other guidelines offered) | N/A | N/A | N/A | ||
4. Replication | N/A | Replication occurs across subjects, therapists, or settings | N/A |
The Stone and Shiffman (2002) standards for EMA are concerned almost entirely with the reporting of measurement characteristics and less so with research design. One way in which these standards differ from those of other sources is in the active manipulation of the IV. Many research questions in EMA, daily diary, and time-series designs are concerned with naturally occurring phenomena, and a researcher manipulation would run counter to this aim. The EMA standards become important when selecting an appropriate measurement strategy within the SCED. In EMA applications, as is also true in some other time-series and daily diary designs, researcher manipulation occurs as a function of the sampling interval in which DVs of interest are measured according to fixed time schedules (e.g., reporting occurs at the end of each day), random time schedules (e.g., the data collection device prompts the participant to respond at random intervals throughout the day), or on an event-based schedule (e.g., reporting occurs after a specified event takes place).
The basic measurement requirement of the SCED is a repeated assessment of the DV across each phase of the design in order to draw valid inferences regarding the effect of the IV on the DV. In other applications, such as those used by personality and social psychology researchers to study various human phenomena ( Bolger et al., 2003 ; Reis & Gable, 2000 ), sampling strategies vary widely depending on the topic area under investigation. Regardless of the research area, SCEDs are most typically concerned with within-person change and processes and involve a time-based strategy, most commonly to assess global daily averages or peak daily levels of the DV. Many sampling strategies, such as time-series, in which reporting occurs at uniform intervals or on event-based, fixed, or variable schedules, are also appropriate measurement methods and are common in psychological research (see Bolger et al., 2003 ).
Repeated-measurement methods permit the natural, even spontaneous, reporting of information ( Reis, 1994 ), which reduces the biases of retrospection by minimizing the amount of time elapsed between an experience and the account of this experience ( Bolger et al., 2003 ). Shiffman et al. (2008) aptly noted that the majority of research in the field of psychology relies heavily on retrospective assessment measures, even though retrospective reports have been found to be susceptible to state-congruent recall (e.g., Bower, 1981 ) and a tendency to report peak levels of the experience instead of giving credence to temporal fluctuations ( Redelmeier & Kahneman, 1996 ; Stone, Broderick, Kaell, Deles-Paul, & Porter, 2000 ). Furthermore, Shiffman et al. (1997) demonstrated that subjective aggregate accounts were a poor fit to daily reported experiences, which can be attributed to reductions in measurement error resulting in increased validity and reliability of the daily reports.
The necessity of measuring at least one DV repeatedly means that the selected assessment method, instrument, and/or construct must be sensitive to change over time and be capable of reliably and validly capturing change. Horner et al. (2005) discusses the important features of outcome measures selected for use in these types of designs. Kazdin (2010) suggests that measures be dimensional, which can more readily detect effects than categorical and binary measures. Although using an established measure or scale, such as the Outcome Questionnaire System ( M. J. Lambert, Hansen, & Harmon, 2010 ), provides empirically validated items for assessing various outcomes, most measure validation studies conducted on this type of instrument involve between-subject designs, which is no guarantee that these measures are reliable and valid for assessing within-person variability. Borsboom, Mellenbergh, and van Heerden (2003) suggest that researchers adapting validated measures should consider whether the items they propose using have a factor structure within subjects similar to that obtained between subjects. This is one of the reasons that SCEDs often use observational assessments from multiple sources and report the interrater reliability of the measure. Self-report measures are acceptable practice in some circles, but generally additional assessment methods or informants are necessary to uphold the highest methodological standards. The results of this review indicate that the majority of studies include observational measurement (76.0%). Within those studies, nearly all (97.1%) reported interrater reliability procedures and results. The results within each design were similar, with the exception of time-series designs, which used observer ratings in only half of the reviewed studies.
Time-series designs are defined by repeated measurement of variables of interest over a period of time ( Box & Jenkins, 1970 ). Time-series measurement most often occurs in uniform intervals; however, this is no longer a constraint of time-series designs (see Harvey, 2001 ). Although uniform interval reporting is not necessary in SCED research, repeated measures often occur at uniform intervals, such as once each day or each week, which constitutes a time-series design. The time-series design has been used in various basic science applications ( Scollon, Kim-Pietro, & Diener, 2003 ) across nearly all subspecialties in psychology (e.g., Bolger et al., 2003 ; Piasecki et al., 2007 ; for a review, see Reis & Gable, 2000 ; Soliday et al., 2002 ). The basic time-series formula for a two-phase (AB) data stream is presented in Equation 1 . In this formula α represents the step function of the data stream; S represents the change between the first and second phases, which is also the intercept in a two-phase data stream and a step function being 0 at times i = 1, 2, 3…n1 and 1 at times i = n1+1, n1+2, n1+3…n; n 1 is the number of observations in the baseline phase; n is the total number of data points in the data stream; i represents time; and ε i = ρε i −1 + e i , which indicates the relationship between the autoregressive function (ρ) and the distribution of the data in the stream.
Time-series formulas become increasingly complex when seasonality and autoregressive processes are modeled in the analytic procedures, but these are rarely of concern for short time-series data streams in SCEDs. For a detailed description of other time-series design and analysis issues, see Borckardt et al. (2008) , Box and Jenkins (1970) , Crosbie (1993) , R. R. Jones et al. (1977) , and Velicer and Fava (2003) .
Time-series and other repeated-measures methodologies also enable examination of temporal effects. Borckardt et al. (2008) and others have noted that time-series designs have the potential to reveal how change occurs, not simply if it occurs. This distinction is what most interested Skinner (1938) , but it often falls below the purview of today’s researchers in favor of group designs, which Skinner felt obscured the process of change. In intervention and psychopathology research, time-series designs can assess mediators of change ( Doss & Atkins, 2006 ), treatment processes ( Stout, 2007 ; Tschacher & Ramseyer, 2009 ), and the relationship between psychological symptoms (e.g., Alloy, Just, & Panzarella, 1997 ; Hanson & Chen, 2010 ; Oslin, Cary, Slaymaker, Colleran, & Blow, 2009 ), and might be capable of revealing mechanisms of change ( Kazdin, 2007 , 2009 , 2010 ). Between- and within-subject SCED designs with repeated measurements enable researchers to examine similarities and differences in the course of change, both during and as a result of manipulating an IV. Temporal effects have been largely overlooked in many areas of psychological science ( Bolger et al., 2003 ): Examining temporal relationships is sorely needed to further our understanding of the etiology and amplification of numerous psychological phenomena.
Time-series studies were very infrequently found in this literature search (2%). Time-series studies traditionally occur in subfields of psychology in which single-case research is not often used (e.g., personality, physiological/biological). Recent advances in methods for collecting and analyzing time-series data (e.g., Borckardt et al., 2008 ) could expand the use of time-series methodology in the SCED community. One problem with drawing firm conclusions from this particular review finding is a semantic factor: Time-series is a specific term reserved for measurement occurring at a uniform interval. However, SCED research appears to not yet have adopted this language when referring to data collected in this fashion. When time-series data analytic methods are not used, the matter of measurement interval is of less importance and might not need to be specified or described as a time-series. An interesting extension of this work would be to examine SCED research that used time-series measurement strategies but did not label it as such. This is important because then it could be determined how many SCEDs could be analyzed with time-series statistical methods.
EMA and daily diary approaches represent methodological procedures for collecting repeated measurements in time-series and non-time-series experiments, which are also known as experience sampling. Presenting an in-depth discussion of the nuances of these sampling techniques is well beyond the scope of this paper. The reader is referred to the following review articles: daily diary ( Bolger et al., 2003 ; Reis & Gable, 2000 ; Thiele, Laireiter, & Baumann, 2002 ), and EMA ( Shiffman et al., 2008 ). Experience sampling in psychology has burgeoned in the past two decades as technological advances have permitted more precise and immediate reporting by participants (e.g., Internet-based, two-way pagers, cellular telephones, handheld computers) than do paper and pencil methods (for reviews see Barrett & Barrett, 2001 ; Shiffman & Stone, 1998 ). Both methods have practical limitations and advantages. For example, electronic methods are more costly and may exclude certain subjects from participating in the study, either because they do not have access to the necessary technology or they do not have the familiarity or savvy to successfully complete reporting. Electronic data collection methods enable the researcher to prompt responses at random or predetermined intervals and also accurately assess compliance. Paper and pencil methods have been criticized for their inability to reliably track respondents’ compliance: Palermo, Valenzuela, and Stork (2004) found better compliance with electronic diaries than with paper and pencil. On the other hand, Green, Rafaeli, Bolger, Shrout, & Reis (2006) demonstrated the psychometric data structure equivalence between these two methods, suggesting that the data collected in either method will yield similar statistical results given comparable compliance rates.
Daily diary/daily self-report and EMA measurement were somewhat rarely represented in this review, occurring in only 6.1% of the total studies. EMA methods had been used in only one of the reviewed studies. The recent proliferation of EMA and daily diary studies in psychology reported by others ( Bolger et al., 2003 ; Piasecki et al., 2007 ; Shiffman et al., 2008 ) suggests that these methods have not yet reached SCED researchers, which could in part have resulted from the long-held supremacy of observational measurement in fields that commonly practice single-case research.
As was previously mentioned, measurement in SCEDs requires the reliable assessment of change over time. As illustrated in Table 4 , DIV16 and the NRP explicitly require that reliability of all measures be reported. DIV12 provides little direction in the selection of the measurement instrument, except to require that three or more clinically important behaviors with relative independence be assessed. Similarly, the only item concerned with measurement on the Tate et al. scale specifies assessing behaviors consistent with the target of the intervention. The WWC and the Tate et al. scale require at least two independent assessors of the DV and that interrater reliability meeting minimum established thresholds be reported. Furthermore, WWC requires that interrater reliability be assessed on at least 20% of the data in each phase and in each condition. DIV16 expects that assessment of the outcome measures will be multisource and multimethod, when applicable. The interval of measurement is not specified by any of the reviewed sources. The WWC and the Tate et al. scale require that DVs be measured repeatedly across phases (e.g., baseline and treatment), which is a typical requirement of a SCED. The NRP asks that the time points at which DV measurement occurred be reported.
The baseline measurement represents one of the most crucial design elements of the SCED. Because subjects provide their own data for comparison, gathering a representative, stable sampling of behavior before manipulating the IV is essential to accurately inferring an effect. Some researchers have reported the typical length of the baseline period to range from 3 to 12 observations in intervention research applications (e.g., Center et al., 1986 ; Huitema, 1985 ; R. R. Jones et al., 1977 ; Sharpley, 1987 ); Huitema’s (1985) review of 881 experiments published in the Journal of Applied Behavior Analysis resulted in a modal number of three to four baseline points. Center et al. (1986) suggested five as the minimum number of baseline measurements needed to accurately estimate autocorrelation. Longer baseline periods suggest a greater likelihood of a representative measurement of the DVs, which has been found to increase the validity of the effects and reduce bias resulting from autocorrelation ( Huitema & McKean, 1994 ). The results of this review are largely consistent with those of previous researchers: The mean number of baseline observations was found to be 10.22 ( SD = 9.59), and 6 was the modal number of observations. Baseline data were available in 77.8% of the reviewed studies. Although the baseline assessment has tremendous bearing on the results of a SCED study, it was often difficult to locate the exact number of data points. Similarly, the number of data points assessed across all phases of the study were not easily identified.
The WWC, DIV12, and DIV16 agree that a minimum of three data points during the baseline is necessary. However, to receive the highest rating by the WWC, five data points are necessary in each phase, including the baseline and any subsequent withdrawal baselines as would occur in a reversal design. DIV16 explicitly states that more than three points are preferred and further stipulates that the baseline must demonstrate stability (i.e., limited variability), absence of overlap between the baseline and other phases, absence of a trend, and that the level of the baseline measurement is severe enough to warrant intervention; each of these aspects of the data is important in inferential accuracy. Detrending techniques can be used to address baseline data trend. The integration option in ARIMA-based modeling and the empirical mode decomposition method ( Wu, Huang, Long, & Peng, 2007 ) are two sophisticated detrending techniques. In regression-based analytic methods, detrending can be accomplished by simply regressing each variable in the model on time (i.e., the residuals become the detrended series), which is analogous to adding a linear, exponential, or quadratic term to the regression equation.
NRP does not provide a minimum for data points, nor does the Tate et al. scale, which requires only a sufficient sampling of baseline behavior. Although the mean and modal number of baseline observations is well within these parameters, seven (1.7%) studies reported mean baselines of less than three data points.
Establishing a uniform minimum number of required baseline observations would provide researchers and reviewers with only a starting guide. The baseline phase is important in SCED research because it establishes a trend that can then be compared with that of subsequent phases. Although a minimum number of observations might be required to meet standards, many more might be necessary to establish a trend when there is variability and trends in the direction of the expected effect. The selected data analytic approach also has some bearing on the number of necessary baseline observations. This is discussed further in the Analysis section.
Stone and Shiffman (2002) provide a comprehensive set of guidelines for the reporting of EMA data, which can also be applied to other repeated-measurement strategies. Because the application of EMA is widespread and not confined to specific research designs, Stone and Shiffman intentionally place few restraints on researchers regarding selection of the DV and the reporter, which is determined by the research question under investigation. The methods of measurement, however, are specified in detail: Descriptions of prompting, recording of responses, participant-initiated entries, and the data acquisition interface (e.g., paper and pencil diary, PDA, cellular telephone) ought to be provided with sufficient detail for replication. Because EMA specifically, and time-series/daily diary methods similarly, are primarily concerned with the interval of assessment, Stone and Shiffman suggest reporting the density and schedule of assessment. The approach is generally determined by the nature of the research question and pragmatic considerations, such as access to electronic data collection devices at certain times of the day and participant burden. Compliance and missing data concerns are present in any longitudinal research design, but they are of particular importance in repeated-measurement applications with frequent measurement. When the research question pertains to temporal effects, compliance becomes paramount, and timely, immediate responding is necessary. For this reason, compliance decisions, rates of missing data, and missing data management techniques must be reported. The effect of missing data in time-series data streams has been the topic of recent research in the social sciences (e.g., Smith, Borckardt, & Nash, in press ; Velicer & Colby, 2005a , 2005b ). The results and implications of these and other missing data studies are discussed in the next section.
Visual analysis.
Experts in the field generally agree about the majority of critical single-case experiment design and measurement characteristics. Analysis, on the other hand, is an area of significant disagreement, yet it has also received extensive recent attention and advancement. Debate regarding the appropriateness and accuracy of various methods for analyzing SCED data, the interpretation of single-case effect sizes, and other concerns vital to the validity of SCED results has been ongoing for decades, and no clear consensus has been reached. Visual analysis, following systematic procedures such as those provided by Franklin, Gorman, Beasley, and Allison (1997) and Parsonson and Baer (1978) , remains the standard by which SCED data are most commonly analyzed ( Parker, Cryer, & Byrns, 2006 ). Visual analysis can arguably be applied to all SCEDs. However, a number of baseline data characteristics must be met for effects obtained through visual analysis to be valid and reliable. The baseline phase must be relatively stable; free of significant trend, particularly in the hypothesized direction of the effect; have minimal overlap of data with subsequent phases; and have a sufficient sampling of behavior to be considered representative ( Franklin, Gorman, et al., 1997 ; Parsonson & Baer, 1978 ). The effect of baseline trend on visual analysis, and a technique to control baseline trend, are offered by Parker et al. (2006) . Kazdin (2010) suggests using statistical analysis when a trend or significant variability appears in the baseline phase, two conditions that ought to preclude the use of visual analysis techniques. Visual analysis methods are especially adept at determining intervention effects and can be of particular relevance in real-world applications (e.g., Borckardt et al., 2008 ; Kratochwill, Levin, Horner, & Swoboda, 2011 ).
However, visual analysis has its detractors. It has been shown to be inconsistent, can be affected by autocorrelation, and results in overestimation of effect (e.g., Matyas & Greenwood, 1990 ). Visual analysis as a means of estimating an effect precludes the results of SCED research from being included in meta-analysis, and also makes it very difficult to compare results to the effect sizes generated by other statistical methods. Yet, visual analysis proliferates in large part because SCED researchers are familiar with these methods and are not only generally unfamiliar with statistical approaches, but lack agreement about their appropriateness. Still, top experts in single-case analysis champion the use of statistical methods alongside visual analysis whenever it is appropriate to do so ( Kratochwill et al., 2011 ).
Statistical analysis of SCED data consists generally of an attempt to address one or more of three broad research questions: (1) Does introduction/manipulation of the IV result in statistically significant change in the level of the DV (level-change or phase-effect analysis)? (2) Does introduction/manipulation of the IV result in statistically significant change in the slope of the DV over time (slope-change analysis)? and (3) Do meaningful relationships exist between the trajectory of the DV and other potential covariates? Level- and slope-change analyses are relevant to intervention effectiveness studies and other research questions in which the IV is expected to result in changes in the DV in a particular direction. Visual analysis methods are most adept at addressing research questions pertaining to changes in level and slope (Questions 1 and 2), most often using some form of graphical representation and standardized computation of a mean level or trend line within and between each phase of interest (e.g., Horner & Spaulding, 2010 ; Kratochwill et al., 2011 ; Matyas & Greenwood, 1990 ). Research questions in other areas of psychological science might address the relationship between DVs or the slopes of DVs (Question 3). A number of sophisticated modeling approaches (e.g., cross-lag, multilevel, panel, growth mixture, latent class analysis) may be used for this type of question, and some are discussed in greater detail later in this section. However, a discussion about the nuances of this type of analysis and all their possible methods is well beyond the scope of this article.
The statistical analysis of SCEDs is a contentious issue in the field. Not only is there no agreed-upon statistical method, but the practice of statistical analysis in the context of the SCED is viewed by some as unnecessary (see Shadish, Rindskopf, & Hedges, 2008 ). Traditional trends in the prevalence of statistical analysis usage by SCED researchers are revealing: Busk & Marascuilo (1992) found that only 10% of the published single-case studies they reviewed used statistical analysis; Brossart, Parker, Olson, & Mahadevan (2006) estimated that this figure had roughly doubled by 2006. A range of concerns regarding single-case effect size calculation and interpretation is discussed in significant detail elsewhere (e.g., Campbell, 2004 ; Cohen, 1994 ; Ferron & Sentovich, 2002 ; Ferron & Ware, 1995 ; Kirk, 1996 ; Manolov & Solanas, 2008 ; Olive & Smith, 2005 ; Parker & Brossart, 2003 ; Robey et al., 1999 ; Smith et al., in press ; Velicer & Fava, 2003 ). One concern is the lack of a clearly superior method across datasets. Although statistical methods for analyzing SCEDs abound, few studies have examined their comparative performance with the same dataset. The most recent studies of this kind, performed by Brossart et al. (2006) , Campbell (2004) , Parker and Brossart (2003) , and Parker and Vannest (2009) , found that the more promising available statistical analysis methods yielded moderately different results on the same data series, which led them to conclude that each available method is equipped to adequately address only a relatively narrow spectrum of data. Given these findings, analysts need to select an appropriate model for the research questions and data structure, being mindful of how modeling results can be influenced by extraneous factors.
The current standards unfortunately provide little guidance in the way of statistical analysis options. This article presents an admittedly cursory introduction to available statistical methods; many others are not covered in this review. The following articles provide more in-depth discussion and description of other methods: Barlow et al. (2008) ; Franklin et al., (1997) ; Kazdin (2010) ; and Kratochwill and Levin (1992 , 2010 ). Shadish et al. (2008) summarize more recently developed methods. Similarly, a Special Issue of Evidence-Based Communication Assessment and Intervention (2008, Volume 2) provides articles and discussion of the more promising statistical methods for SCED analysis. An introduction to autocorrelation and its implications for statistical analysis is necessary before specific analytic methods can be discussed. It is also pertinent at this time to discuss the implications of missing data.
Many repeated measurements within a single subject or unit create a situation that most psychological researchers are unaccustomed to dealing with: autocorrelated data, which is the nonindependence of sequential observations, also known as serial dependence. Basic and advanced discussions of autocorrelation in single-subject data can be found in Borckardt et al. (2008) , Huitema (1985) , and Marshall (1980) , and discussions of autocorrelation in multilevel models can be found in Snijders and Bosker (1999) and Diggle and Liang (2001) . Along with trend and seasonal variation, autocorrelation is one example of the internal structure of repeated measurements. In the social sciences, autocorrelated data occur most naturally in the fields of physiological psychology, econometrics, and finance, where each phase of interest has potentially hundreds or even thousands of observations that are tightly packed across time (e.g., electroencephalography actuarial data, financial market indices). Applied SCED research in most areas of psychology is more likely to have measurement intervals of day, week, or hour.
Autocorrelation is a direct result of the repeated-measurement requirements of the SCED, but its effect is most noticeable and problematic when one is attempting to analyze these data. Many commonly used data analytic approaches, such as analysis of variance, assume independence of observations and can produce spurious results when the data are nonindependent. Even statistically insignificant autocorrelation estimates are generally viewed as sufficient to cause inferential bias when conventional statistics are used (e.g., Busk & Marascuilo, 1988 ; R. R. Jones et al., 1977 ; Matyas & Greenwood, 1990 ). The effect of autocorrelation on statistical inference in single-case applications has also been known for quite some time (e.g., R. R. Jones et al., 1977 ; Kanfer, 1970 ; Kazdin, 1981 ; Marshall, 1980 ). The findings of recent simulation studies of single-subject data streams indicate that autocorrelation is a nontrivial matter. For example, Manolov and Solanas (2008) determined that calculated effect sizes were linearly related to the autocorrelation of the data stream, and Smith et al. (in press) demonstrated that autocorrelation estimates in the vicinity of 0.80 negatively affect the ability to correctly infer a significant level-change effect using a standardized mean differences method. Huitema and colleagues (e.g., Huitema, 1985 ; Huitema & McKean, 1994 ) argued that autocorrelation is rarely a concern in applied research. Huitema’s methods and conclusions have been questioned and opposing data have been published (e.g., Allison & Gorman, 1993 ; Matyas & Greenwood, 1990 ; Robey et al., 1999 ), resulting in abandonment of the position that autocorrelation can be conscionably ignored without compromising the validity of the statistical procedures. Procedures for removing autocorrelation in the data stream prior to calculating effect sizes are offered as one option: One of the more promising analysis methods, autoregressive integrated moving averages (discussed later in this article), was specifically designed to remove the internal structure of time-series data, such as autocorrelation, trend, and seasonality ( Box & Jenkins, 1970 ; Tiao & Box, 1981 ).
Another concern inherent in repeated-measures designs is missing data. Daily diary and EMA methods are intended to reduce the risk of retrospection error by eliciting accurate, real-time information ( Bolger et al., 2003 ). However, these methods are subject to missing data as a result of honest forgetfulness, not possessing the diary collection tool at the specified time of collection, and intentional or systematic noncompliance. With paper and pencil diaries and some electronic methods, subjects might be able to complete missed entries retrospectively, defeating the temporal benefits of these assessment strategies ( Bolger et al., 2003 ). Methods of managing noncompliance through the study design and measurement methods include training the subject to use the data collection device appropriately, using technology to prompt responding and track the time of response, and providing incentives to participants for timely compliance (for additional discussion of this topic, see Bolger et al., 2003 ; Shiffman & Stone, 1998 ).
Even when efforts are made to maximize compliance during the conduct of the research, the problem of missing data is often unavoidable. Numerous approaches exist for handling missing observations in group multivariate designs (e.g., Horton & Kleinman, 2007 ; Ibrahim, Chen, Lipsitz, & Herring, 2005 ). Ragunathan (2004) and others concluded that full information and raw data maximum likelihood methods are preferable. Velicer and Colby (2005a , 2005b ) established the superiority of maximum likelihood methods over listwise deletion, mean of adjacent observations, and series mean substitution in the estimation of various critical time-series data parameters. Smith et al. (in press) extended these findings regarding the effect of missing data on inferential precision. They found that managing missing data with the EM procedure ( Dempster, Laird, & Rubin, 1977 ), a maximum likelihood algorithm, did not affect one’s ability to correctly infer a significant effect. However, lag-1 autocorrelation estimates in the vicinity of 0.80 resulted in insufficient power sensitivity (< 0.80), regardless of the proportion of missing data (10%, 20%, 30%, or 40%). 1 Although maximum likelihood methods have garnered some empirical support, methodological strategies that minimize missing data, particularly systematically missing data, are paramount to post-hoc statistical remedies.
In addition to the autocorrelated nature of SCED data, typical measurement methods also present analytic challenges. Many statistical methods, particularly those involving model finding, assume that the data are normally distributed. This is often not satisfied in SCED research when measurements involve count data, observer-rated behaviors, and other, similar metrics that result in skewed distributions. Techniques are available to manage nonnormal distributions in regression-based analysis, such as zero-inflated Poisson regression ( D. Lambert, 1992 ) and negative binomial regression ( Gardner, Mulvey, & Shaw, 1995 ), but many other statistical analysis methods do not include these sophisticated techniques. A skewed data distribution is perhaps one of the reasons Kazdin (2010) suggests not using count, categorical, or ordinal measurement methods.
Following is a basic introduction to the more promising and prevalent analytic methods for SCED research. Because there is little consensus regarding the superiority of any single method, the burden unfortunately falls on the researcher to select a method capable of addressing the research question and handling the data involved in the study. Some indications and contraindications are provided for each method presented here.
Multilevel modeling (MLM; e.g., Schmidt, Perels, & Schmitz, 2010 ) techniques represent the state of the art among parametric approaches to SCED analysis, particularly when synthesizing SCED results ( Shadish et al., 2008 ). MLM and related latent growth curve and factor mixture methods in structural equation modeling (SEM; e.g., Lubke & Muthén, 2005 ; B. O. Muthén & Curran, 1997 ) are particularly effective for evaluating trajectories and slopes in longitudinal data and relating changes to potential covariates. MLM and related hierarchical linear models (HLM) can also illuminate the relationship between the trajectories of different variables under investigation and clarify whether or not these relationships differ amongst the subjects in the study. Time-series and cross-lag analyses can also be used in MLM and SEM ( Chow, Ho, Hamaker, & Dolan, 2010 ; du Toit & Browne, 2007 ). However, they generally require sophisticated model-fitting techniques, making them difficult for many social scientists to implement. The structure (autocorrelation) and trend of the data can also complicate many MLM methods. The common, short data streams in SCED research and the small number of subjects also present problems to MLM and SEM approaches, which were developed for data with significantly greater numbers of observations when the number of subjects is fewer, and for a greater number of participants for model-fitting purposes, particularly when there are fewer data points. Still, MLM and related techniques arguably represent the most promising analytic methods.
A number of software options 2 exist for SEM. Popular statistical packages in the social sciences provide SEM options, such as PROC CALIS in SAS ( SAS Institute Inc., 2008 ), the AMOS module ( Arbuckle, 2006 ) of SPSS ( SPSS Statistics, 2011 ), and the sempackage for R ( R Development Core Team, 2005 ), the use of which is described by Fox ( Fox, 2006 ). A number of stand-alone software options are also available for SEM applications, including Mplus ( L. K. Muthén & Muthén, 2010 ) and Stata ( StataCorp., 2011 ). Each of these programs also provides options for estimating multilevel/hierarchical models (for a review of using these programs for MLM analysis see Albright & Marinova, 2010 ). Hierarchical linear and nonlinear modeling can also be accomplished using the HLM 7 program ( Raudenbush, Bryk, & Congdon, 2011 ).
Two primary points have been raised regarding ARMA modeling: length of the data stream and feasibility of the modeling technique. ARMA models generally require 30–50 observations in each phase when analyzing a single-subject experiment (e.g., Borckardt et al., 2008 ; Box & Jenkins, 1970 ), which is often difficult to satisfy in applied psychological research applications. However, ARMA models in an SEM framework, such as those described by du Toit & Browne (2001) , are well suited for longitudinal panel data with few observations and many subjects. Autoregressive SEM models are also applicable under similar conditions. Model-fitting options are available in SPSS, R, and SAS via PROC ARMA.
ARMA modeling also requires considerable training in the method and rather advanced knowledge about statistical methods (e.g., Kratochwill & Levin, 1992 ). However, Brossart et al. (2006) point out that ARMA-based approaches can produce excellent results when there is no “model finding” and a simple lag-1 model, with no differencing and no moving average, is used. This approach can be taken for many SCED applications when phase- or slope-change analyses are of interest with a single, or very few, subjects. As already mentioned, this method is particularly useful when one is seeking to account for autocorrelation or other over-time variations that are not directly related to the experimental or intervention effect of interest (i.e., detrending). ARMA and other time-series analysis methods require missing data to be managed prior to analysis by means of options such as full information maximum likelihood estimation, multiple imputation, or the Kalman filter (see Box & Jenkins, 1970 ; Hamilton, 1994 ; Shumway & Stoffer, 1982 ) because listwise deletion has been shown to result in inaccurate time-series parameter estimates ( Velicer & Colby, 2005a ).
Standardized mean differences approaches include the common Cohen’s d , Glass’s Delta, and Hedge’s g that are used in the analysis of group designs. The computational properties of mean differences approaches to SCEDs are identical to those used for group comparisons, except that the results represent within-case variation instead of the variation between groups, which suggests that the obtained effect sizes are not interpretively equivalent. The advantage of the mean differences approach is its simplicity of calculation and also its familiarity to social scientists. The primary drawback of these approaches is that they were not developed to contend with autocorrelated data. However, Manolov and Solanas (2008) reported that autocorrelation least affected effect sizes calculated using standardized mean differences approaches. To the applied-research scientist this likely represents the most accessible analytic approach, because statistical software is not required to calculate these effect sizes. The resultant effect sizes of single subject standardized mean differences analysis must be interpreted cautiously because their relation to standard effect size benchmarks, such as those provided by Cohen (1988) , is unknown. Standardized mean differences approaches are appropriate only when examining significant differences between phases of the study and cannot illuminate trajectories or relationships between variables.
Researchers have offered other analytic methods to deal with the characteristics of SCED data. A number of methods for analyzing N -of-1 experiments have been developed. Borckardt’s Simulation Modeling Analysis (2006) program provides a method for analyzing level- and slope-change in short (<30 observations per phase; see Borckardt et al., 2008 ), autocorrelated data streams that is statistically sophisticated, yet accessible and freely available to typical psychological scientists and clinicians. A replicated single-case time-series design conducted by Smith, Handler, & Nash (2010) provides an example of SMA application. The Singwin Package, described in Bloom et al., (2003) , is a another easy-to-use parametric approach for analyzing single-case experiments. A number of nonparametric approaches have also been developed that emerged from the visual analysis tradition: Some examples include percent nonoverlapping data ( Scruggs, Mastropieri, & Casto, 1987 ) and nonoverlap of all pairs ( Parker & Vannest, 2009 ); however, these methods have come under scrutiny, and Wolery, Busick, Reichow, and Barton (2010) have suggested abandoning them altogether. Each of these methods appears to be well suited for managing specific data characteristics, but they should not be used to analyze data streams beyond their intended purpose until additional empirical research is conducted.
Beyond the issue of single-case analysis is the matter of integrating and meta-analyzing the results of single-case experiments. SCEDs have been given short shrift in the majority of meta-analytic literature ( Littell, Corcoran, & Pillai, 2008 ; Shadish et al., 2008 ), with only a few exceptions ( Carr et al., 1999 ; Horner & Spaulding, 2010 ). Currently, few proven methods exist for integrating the results of multiple single-case experiments. Allison and Gorman (1993) and Shadish et al. (2008) present the problems associated with meta-analyzing single-case effect sizes, and W. P. Jones (2003) , Manolov and Solanas (2008) , Scruggs and Mastropieri (1998) , and Shadish et al. (2008) offer four different potential statistical solutions for this problem, none of which appear to have received consensus amongst researchers. The ability to synthesize and compare single-case effect sizes, particularly effect sizes garnered through group design research, is undoubtedly necessary to increase SCED proliferation.
The coding criteria for this review were quite stringent in terms of what was considered to be either visual or statistical analysis. For visual analysis to be coded as present, it was necessary for the authors to self-identify as having used a visual analysis method. In many cases, it could likely be inferred that visual analysis had been used, but it was often not specified. Similarly, statistical analysis was reserved for analytic methods that produced an effect. 3 Analyses that involved comparing magnitude of change using raw count data or percentages were not considered rigorous enough. These two narrow definitions of visual and statistical analysis contributed to the high rate of unreported analytic method, shown in Table 1 (52.3%). A better representation of the use of visual and statistical analysis would likely be the percentage of studies within those that reported a method of analysis. Under these parameters, 41.5% used visual analysis and 31.3% used statistical analysis. Included in these figures are studies that included both visual and statistical methods (11%). These findings are slightly higher than those estimated by Brossart et al. (2006) , who estimated statistical analysis is used in about 20% of SCED studies. Visual analysis continues to undoubtedly be the most prevalent method, but there appears to be a trend for increased use of statistical approaches, which is likely to only gain momentum as innovations continue.
The standards selected for inclusion in this review offer minimal direction in the way of analyzing the results of SCED research. Table 5 summarizes analysis-related information provided by the six reviewed sources for SCED standards. Visual analysis is acceptable to DV12 and DIV16, along with unspecified statistical approaches. In the WWC standards, visual analysis is the acceptable method of determining an intervention effect, with statistical analyses and randomization tests permissible as a complementary or supporting method to the results of visual analysis methods. However, the authors of the WWC standards state, “As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed” ( Kratochwill et al., 2010 , p.16). The NRP and DIV12 seem to prefer statistical methods when they are warranted. The Tate at al. scale accepts only statistical analysis with the reporting of an effect size. Only the WWC and DIV16 provide guidance in the use of statistical analysis procedures: The WWC “recommends” nonparametric and parametric approaches, multilevel modeling, and regression when statistical analysis is used. DIV16 refers the reader to Wilkinson and the Task Force on Statistical Inference of the APA Board of Scientific Affairs (1999) for direction in this matter. Statistical analysis of daily diary and EMA methods is similarly unsettled. Stone and Shiffman (2002) ask for a detailed description of the statistical procedures used, in order for the approach to be replicated and evaluated. They provide direction for analyzing aggregated and disaggregated data. They also aptly note that because many different modes of analysis exist, researchers must carefully match the analytic approach to the hypotheses being pursued.
This review has a number of limitations that leave the door open for future study of SCED methodology. Publication bias is a concern in any systematic review. This is particularly true for this review because the search was limited to articles published in peer-reviewed journals. This strategy was chosen in order to inform changes in the practice of reporting and of reviewing, but it also is likely to have inflated the findings regarding the methodological rigor of the reviewed works. Inclusion of book chapters, unpublished studies, and dissertations would likely have yielded somewhat different results.
A second concern is the stringent coding criteria in regard to the analytic methods and the broad categorization into visual and statistical analytic approaches. The selection of an appropriate method for analyzing SCED data is perhaps the murkiest area of this type of research. Future reviews that evaluate the appropriateness of selected analytic strategies and provide specific decision-making guidelines for researchers would be a very useful contribution to the literature. Although six sources of standards apply to SCED research reviewed in this article, five of them were developed almost exclusively to inform psychological and behavioral intervention research. The principles of SCED research remain the same in different contexts, but there is a need for non–intervention scientists to weigh in on these standards.
Finally, this article provides a first step in the synthesis of the available SCED reporting guidelines. However, it does not resolve disagreements, nor does it purport to be a definitive source. In the future, an entity with the authority to construct such a document ought to convene and establish a foundational, adaptable, and agreed-upon set of guidelines that cuts across subspecialties but is applicable to many, if not all, areas of psychological research, which is perhaps an idealistic goal. Certain preferences will undoubtedly continue to dictate what constitutes acceptable practice in each subspecialty of psychology, but uniformity along critical dimensions will help advance SCED research.
The first decade of the twenty-first century has seen an upwelling of SCED research across nearly all areas of psychology. This article contributes updated benchmarks in terms of the frequency with which SCED design and methodology characteristics are used, including the number of baseline observations, assessment and measurement practices, and data analytic approaches, most of which are largely consistent with previously reported benchmarks. However, this review is much broader than those of previous research teams and also breaks down the characteristics of single-case research by the predominant design. With the recent SCED proliferation came a number of standards for the conduct and reporting of such research. This article also provides a much-needed synthesis of recent SCED standards that can inform the work of researchers, reviewers, and funding agencies conducting and evaluating single-case research, which reveals many areas of consensus as well as areas of significant disagreement. It appears that the question of where to go next is very relevant at this point in time. The majority of the research design and measurement characteristics of the SCED are reasonably well established, and the results of this review suggest general practice that is in accord with existing standards and guidelines, at least in regard to published peer-reviewed works. In general, the published literature appears to be meeting the basic design and measurement requirement to ensure adequate internal validity of SCED studies.
Consensus regarding the superiority of any one analytic method stands out as an area of divergence. Judging by the current literature and lack of consensus, researchers will need to carefully select a method that matches the research design, hypotheses, and intended conclusions of the study, while also considering the most up-to-date empirical support for the chosen analytic method, whether it be visual or statistical. In some cases the number of observations and subjects in the study will dictate which analytic methods can and cannot be used. In the case of the true N -of-1 experiment, there are relatively few sound analytic methods, and even fewer that are robust with shorter data streams (see Borckardt et al., 2008 ). As the number of observations and subjects increases, sophisticated modeling techniques, such as MLM, SEM, and ARMA, become applicable. Trends in the data and autocorrelation further obfuscate the development of a clear statistical analysis selection algorithm, which currently does not exist. Autocorrelation was rarely addressed or discussed in the articles reviewed, except when the selected statistical analysis dictated consideration. Given the empirical evidence regarding the effect of autocorrelation on visual and statistical analysis, researchers need to address this more explicitly. Missing-data considerations are similarly left out when they are unnecessary for analytic purposes. As newly devised statistical analysis approaches mature and are compared with one another for appropriateness in specific SCED applications, guidelines for statistical analysis will necessarily be revised. Similarly, empirically derived guidance, in the form of a decision tree, must be developed to ensure application of appropriate methods based on characteristics of the data and the research questions being addressed. Researchers could also benefit from tutorials and comparative reviews of different software packages: This is a needed area of future research. Powerful and reliable statistical analyses help move the SCED up the ladder of experimental designs and attenuate the view that the method applies primarily to pilot studies and idiosyncratic research questions and situations.
Another potential future advancement of SCED research comes in the area of measurement. Currently, SCED research gives significant weight to observer ratings and seems to discourage other forms of data collection methods. This is likely due to the origins of the SCED in behavioral assessment and applied behavior analysis, which remains a present-day stronghold. The dearth of EMA and diary-like sampling procedures within the SCED research reviewed, yet their ever-growing prevalence in the larger psychological research arena, highlights an area for potential expansion. Observational measurement, although reliable and valid in many contexts, is time and resource intensive and not feasible in all areas in which psychologists conduct research. It seems that numerous untapped research questions are stifled because of this measurement constraint. SCED researchers developing updated standards in the future should include guidelines for the appropriate measurement requirement of non-observer-reported data. For example, the results of this review indicate that reporting of repeated measurements, particularly the high-density type found in diary and EMA sampling strategies, ought to be more clearly spelled out, with specific attention paid to autocorrelation and trend in the data streams. In the event that SCED researchers adopt self-reported assessment strategies as viable alternatives to observation, a set of standards explicitly identifying the necessary psychometric properties of the measures and specific items used would be in order.
Along similar lines, SCED researchers could take a page from other areas of psychology that champion multimethod and multisource evaluation of primary outcomes. In this way, the long-standing tradition of observational assessment and the cutting-edge technological methods of EMA and daily diary could be married with the goal of strengthening conclusions drawn from SCED research and enhancing the validity of self-reported outcome assessment. The results of this review indicate that they rarely intersect today, and I urge SCED researchers to adopt other methods of assessment informed by time-series, daily diary, and EMA methods. The EMA standards could serve as a jumping-off point for refined measurement and assessment reporting standards in the context of multimethod SCED research.
One limitation of the current SCED standards is their relatively limited scope. To clarify, with the exception of the Stone & Shiffman EMA reporting guidelines, the other five sources of standards were developed in the context of designing and evaluating intervention research. Although this is likely to remain its patent emphasis, SCEDs are capable of addressing other pertinent research questions in the psychological sciences, and the current standards truly only roughly approximate salient crosscutting SCED characteristics. I propose developing broad SCED guidelines that address the specific design, measurement, and analysis issues in a manner that allows it to be useful across applications, as opposed to focusing solely on intervention effects. To accomplish this task, methodology experts across subspecialties in psychology would need to convene. Admittedly this is no small task.
Perhaps funding agencies will also recognize the fiscal and practical advantages of SCED research in certain areas of psychology. One example is in the field of intervention effectiveness, efficacy, and implementation research. A few exemplary studies using robust forms of SCED methodology are needed in the literature. Case-based methodologies will never supplant the group design as the gold standard in experimental applications, nor should that be the goal. Instead, SCEDs provide a viable and valid alternative experimental methodology that could stimulate new areas of research and answer questions that group designs cannot. With the astonishing number of studies emerging every year that use single-case designs and explore the methodological aspects of the design, we are poised to witness and be a part of an upsurge in the sophisticated application of the SCED. When federal grant-awarding agencies and journal editors begin to use formal standards while making funding and publication decisions, the field will benefit.
Last, for the practice of SCED research to continue and mature, graduate training programs must provide students with instruction in all areas of the SCED. This is particularly true of statistical analysis techniques that are not often taught in departments of psychology and education, where the vast majority of SCED studies seem to be conducted. It is quite the conundrum that the best available statistical analytic methods are often cited as being inaccessible to social science researchers who conduct this type of research. This need not be the case. To move the field forward, emerging scientists must be able to apply the most state-of-the-art research designs, measurement techniques, and analytic methods.
Research support for the author was provided by research training grant MH20012 from the National Institute of Mental Health, awarded to Elizabeth A. Stormshak. The author gratefully acknowledges Robert Horner and Laura Lee McIntyre, University of Oregon; Michael Nash, University of Tennessee; John Ferron, University of South Florida; the Action Editor, Lisa Harlow, and the anonymous reviewers for their thoughtful suggestions and guidance in shaping this article; Cheryl Mikkola for her editorial support; and Victoria Mollison for her assistance in the systematic review process.
Psycinfo search conducted july 2011.
(* indicates inclusion in study: N = 409)
1 Autocorrelation estimates in this range can be caused by trends in the data streams, which creates complications in terms of detecting level-change effects. The Smith et al. (in press) study used a Monte Carlo simulation to control for trends in the data streams, but trends are likely to exist in real-world data with high lag-1 autocorrelation estimates.
2 The author makes no endorsement regarding the superiority of any statistical program or package over another by their mention or exclusion in this article. The author also has no conflicts of interest in this regard.
3 However, it should be noted that it was often very difficult to locate an actual effect size reported in studies that used statistical analysis. Although this issue would likely have added little to this review, it does inhibit the inclusion of the results in meta-analysis.
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Learning objectives.
Before looking at any specific single-subject research designs, it will be helpful to consider some features that are common to most of them. Many of these features are illustrated in Figure 10.3 “Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research” , which shows the results of a generic single-subject study. First, the dependent variable (represented on the y -axis of the graph) is measured repeatedly over time (represented by the x -axis) at regular intervals. Second, the study is divided into distinct phases, and the participant is tested under one condition per phase. The conditions are often designated by capital letters: A, B, C, and so on. Thus Figure 10.3 “Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research” represents a design in which the participant was tested first in one condition (A), then tested in another condition (B), and finally retested in the original condition (A). (This is called a reversal design and will be discussed in more detail shortly.)
Figure 10.3 Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research
Another important aspect of single-subject research is that the change from one condition to the next does not usually occur after a fixed amount of time or number of observations. Instead, it depends on the participant’s behavior. Specifically, the researcher waits until the participant’s behavior in one condition becomes fairly consistent from observation to observation before changing conditions. This is sometimes referred to as the steady state strategy (Sidman, 1960). The idea is that when the dependent variable has reached a steady state, then any change across conditions will be relatively easy to detect. Recall that we encountered this same principle when discussing experimental research more generally. The effect of an independent variable is easier to detect when the “noise” in the data is minimized.
The most basic single-subject research design is the reversal design , also called the ABA design . During the first phase, A, a baseline is established for the dependent variable. This is the level of responding before any treatment is introduced, and therefore the baseline phase is a kind of control condition. When steady state responding is reached, phase B begins as the researcher introduces the treatment. There may be a period of adjustment to the treatment during which the behavior of interest becomes more variable and begins to increase or decrease. Again, the researcher waits until that dependent variable reaches a steady state so that it is clear whether and how much it has changed. Finally, the researcher removes the treatment and again waits until the dependent variable reaches a steady state. This basic reversal design can also be extended with the reintroduction of the treatment (ABAB), another return to baseline (ABABA), and so on.
The study by Hall and his colleagues was an ABAB reversal design. Figure 10.4 “An Approximation of the Results for Hall and Colleagues’ Participant Robbie in Their ABAB Reversal Design” approximates the data for Robbie. The percentage of time he spent studying (the dependent variable) was low during the first baseline phase, increased during the first treatment phase until it leveled off, decreased during the second baseline phase, and again increased during the second treatment phase.
Figure 10.4 An Approximation of the Results for Hall and Colleagues’ Participant Robbie in Their ABAB Reversal Design
Why is the reversal—the removal of the treatment—considered to be necessary in this type of design? Why use an ABA design, for example, rather than a simpler AB design? Notice that an AB design is essentially an interrupted time-series design applied to an individual participant. Recall that one problem with that design is that if the dependent variable changes after the treatment is introduced, it is not always clear that the treatment was responsible for the change. It is possible that something else changed at around the same time and that this extraneous variable is responsible for the change in the dependent variable. But if the dependent variable changes with the introduction of the treatment and then changes back with the removal of the treatment, it is much clearer that the treatment (and removal of the treatment) is the cause. In other words, the reversal greatly increases the internal validity of the study.
There are close relatives of the basic reversal design that allow for the evaluation of more than one treatment. In a multiple-treatment reversal design , a baseline phase is followed by separate phases in which different treatments are introduced. For example, a researcher might establish a baseline of studying behavior for a disruptive student (A), then introduce a treatment involving positive attention from the teacher (B), and then switch to a treatment involving mild punishment for not studying (C). The participant could then be returned to a baseline phase before reintroducing each treatment—perhaps in the reverse order as a way of controlling for carryover effects. This particular multiple-treatment reversal design could also be referred to as an ABCACB design.
In an alternating treatments design , two or more treatments are alternated relatively quickly on a regular schedule. For example, positive attention for studying could be used one day and mild punishment for not studying the next, and so on. Or one treatment could be implemented in the morning and another in the afternoon. The alternating treatments design can be a quick and effective way of comparing treatments, but only when the treatments are fast acting.
There are two potential problems with the reversal design—both of which have to do with the removal of the treatment. One is that if a treatment is working, it may be unethical to remove it. For example, if a treatment seemed to reduce the incidence of self-injury in a developmentally disabled child, it would be unethical to remove that treatment just to show that the incidence of self-injury increases. The second problem is that the dependent variable may not return to baseline when the treatment is removed. For example, when positive attention for studying is removed, a student might continue to study at an increased rate. This could mean that the positive attention had a lasting effect on the student’s studying, which of course would be good. But it could also mean that the positive attention was not really the cause of the increased studying in the first place. Perhaps something else happened at about the same time as the treatment—for example, the student’s parents might have started rewarding him for good grades.
One solution to these problems is to use a multiple-baseline design , which is represented in Figure 10.5 “Results of a Generic Multiple-Baseline Study” . In one version of the design, a baseline is established for each of several participants, and the treatment is then introduced for each one. In essence, each participant is tested in an AB design. The key to this design is that the treatment is introduced at a different time for each participant. The idea is that if the dependent variable changes when the treatment is introduced for one participant, it might be a coincidence. But if the dependent variable changes when the treatment is introduced for multiple participants—especially when the treatment is introduced at different times for the different participants—then it is extremely unlikely to be a coincidence.
Figure 10.5 Results of a Generic Multiple-Baseline Study
The multiple baselines can be for different participants, dependent variables, or settings. The treatment is introduced at a different time on each baseline.
As an example, consider a study by Scott Ross and Robert Horner (Ross & Horner, 2009). They were interested in how a school-wide bullying prevention program affected the bullying behavior of particular problem students. At each of three different schools, the researchers studied two students who had regularly engaged in bullying. During the baseline phase, they observed the students for 10-minute periods each day during lunch recess and counted the number of aggressive behaviors they exhibited toward their peers. (The researchers used handheld computers to help record the data.) After 2 weeks, they implemented the program at one school. After 2 more weeks, they implemented it at the second school. And after 2 more weeks, they implemented it at the third school. They found that the number of aggressive behaviors exhibited by each student dropped shortly after the program was implemented at his or her school. Notice that if the researchers had only studied one school or if they had introduced the treatment at the same time at all three schools, then it would be unclear whether the reduction in aggressive behaviors was due to the bullying program or something else that happened at about the same time it was introduced (e.g., a holiday, a television program, a change in the weather). But with their multiple-baseline design, this kind of coincidence would have to happen three separate times—a very unlikely occurrence—to explain their results.
In another version of the multiple-baseline design, multiple baselines are established for the same participant but for different dependent variables, and the treatment is introduced at a different time for each dependent variable. Imagine, for example, a study on the effect of setting clear goals on the productivity of an office worker who has two primary tasks: making sales calls and writing reports. Baselines for both tasks could be established. For example, the researcher could measure the number of sales calls made and reports written by the worker each week for several weeks. Then the goal-setting treatment could be introduced for one of these tasks, and at a later time the same treatment could be introduced for the other task. The logic is the same as before. If productivity increases on one task after the treatment is introduced, it is unclear whether the treatment caused the increase. But if productivity increases on both tasks after the treatment is introduced—especially when the treatment is introduced at two different times—then it seems much clearer that the treatment was responsible.
In yet a third version of the multiple-baseline design, multiple baselines are established for the same participant but in different settings. For example, a baseline might be established for the amount of time a child spends reading during his free time at school and during his free time at home. Then a treatment such as positive attention might be introduced first at school and later at home. Again, if the dependent variable changes after the treatment is introduced in each setting, then this gives the researcher confidence that the treatment is, in fact, responsible for the change.
In addition to its focus on individual participants, single-subject research differs from group research in the way the data are typically analyzed. As we have seen throughout the book, group research involves combining data across participants. Group data are described using statistics such as means, standard deviations, Pearson’s r , and so on to detect general patterns. Finally, inferential statistics are used to help decide whether the result for the sample is likely to generalize to the population. Single-subject research, by contrast, relies heavily on a very different approach called visual inspection . This means plotting individual participants’ data as shown throughout this chapter, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable. Inferential statistics are typically not used.
In visually inspecting their data, single-subject researchers take several factors into account. One of them is changes in the level of the dependent variable from condition to condition. If the dependent variable is much higher or much lower in one condition than another, this suggests that the treatment had an effect. A second factor is trend , which refers to gradual increases or decreases in the dependent variable across observations. If the dependent variable begins increasing or decreasing with a change in conditions, then again this suggests that the treatment had an effect. It can be especially telling when a trend changes directions—for example, when an unwanted behavior is increasing during baseline but then begins to decrease with the introduction of the treatment. A third factor is latency , which is the time it takes for the dependent variable to begin changing after a change in conditions. In general, if a change in the dependent variable begins shortly after a change in conditions, this suggests that the treatment was responsible.
In the top panel of Figure 10.6 , there are fairly obvious changes in the level and trend of the dependent variable from condition to condition. Furthermore, the latencies of these changes are short; the change happens immediately. This pattern of results strongly suggests that the treatment was responsible for the changes in the dependent variable. In the bottom panel of Figure 10.6 , however, the changes in level are fairly small. And although there appears to be an increasing trend in the treatment condition, it looks as though it might be a continuation of a trend that had already begun during baseline. This pattern of results strongly suggests that the treatment was not responsible for any changes in the dependent variable—at least not to the extent that single-subject researchers typically hope to see.
Figure 10.6
Visual inspection of the data suggests an effective treatment in the top panel but an ineffective treatment in the bottom panel.
The results of single-subject research can also be analyzed using statistical procedures—and this is becoming more common. There are many different approaches, and single-subject researchers continue to debate which are the most useful. One approach parallels what is typically done in group research. The mean and standard deviation of each participant’s responses under each condition are computed and compared, and inferential statistical tests such as the t test or analysis of variance are applied (Fisch, 2001). (Note that averaging across participants is less common.) Another approach is to compute the percentage of nonoverlapping data (PND) for each participant (Scruggs & Mastropieri, 2001). This is the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition. In the study of Hall and his colleagues, for example, all measures of Robbie’s study time in the first treatment condition were greater than the highest measure in the first baseline, for a PND of 100%. The greater the percentage of nonoverlapping data, the stronger the treatment effect. Still, formal statistical approaches to data analysis in single-subject research are generally considered a supplement to visual inspection, not a replacement for it.
Practice: Design a simple single-subject study (using either a reversal or multiple-baseline design) to answer the following questions. Be sure to specify the treatment, operationally define the dependent variable, decide when and where the observations will be made, and so on.
Fisch, G. S. (2001). Evaluating data from behavioral analysis: Visual inspection or statistical models. Behavioural Processes , 54 , 137–154.
Ross, S. W., & Horner, R. H. (2009). Bully prevention in positive behavior support. Journal of Applied Behavior Analysis , 42 , 747–759.
Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single-participant research: Ideas and applications. Exceptionality , 9 , 227–244.
Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology . Boston, MA: Authors Cooperative.
Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Learning objectives.
Single-subject research is a type of quantitative research that involves studying in detail the behavior of each of a small number of participants. Note that the term single-subject does not mean that only one participant is studied; it is more typical for there to be somewhere between two and 10 participants. (This is why single-subject research designs are sometimes called small- n designs, where n is the statistical symbol for the sample size.) Single-subject research can be contrasted with group research , which typically involves studying large numbers of participants and examining their behavior primarily in terms of group means, standard deviations, and so on. The majority of this textbook is devoted to understanding group research, which is the most common approach in psychology. But single-subject research is an important alternative, and it is the primary approach in some more applied areas of psychology.
Before continuing, it is important to distinguish single-subject research from case studies and other more qualitative approaches that involve studying in detail a small number of participants. As described in Chapter 6, case studies involve an in-depth analysis and description of an individual, which is typically primarily qualitative in nature. More broadly speaking, qualitative research focuses on understanding people’s subjective experience by observing behavior and collecting relatively unstructured data (e.g., detailed interviews) and analyzing those data using narrative rather than quantitative techniques. Single-subject research, in contrast, focuses on understanding objective behavior through experimental manipulation and control, collecting highly structured data, and analyzing those data quantitatively.
Again, single-subject research involves studying a small number of participants and focusing intensively on the behavior of each one. But why take this approach instead of the group approach? There are several important assumptions underlying single-subject research, and it will help to consider them now.
First and foremost is the assumption that it is important to focus intensively on the behavior of individual participants. One reason for this is that group research can hide individual differences and generate results that do not represent the behavior of any individual. For example, a treatment that has a positive effect for half the people exposed to it but a negative effect for the other half would, on average, appear to have no effect at all. Single-subject research, however, would likely reveal these individual differences. A second reason to focus intensively on individuals is that sometimes it is the behavior of a particular individual that is primarily of interest. A school psychologist, for example, might be interested in changing the behavior of a particular disruptive student. Although previous published research (both single-subject and group research) is likely to provide some guidance on how to do this, conducting a study on this student would be more direct and probably more effective.
A second assumption of single-subject research is that it is important to discover causal relationships through the manipulation of an independent variable, the careful measurement of a dependent variable, and the control of extraneous variables. For this reason, single-subject research is often considered a type of experimental research with good internal validity. Recall, for example, that Hall and his colleagues measured their dependent variable (studying) many times—first under a no-treatment control condition, then under a treatment condition (positive teacher attention), and then again under the control condition. Because there was a clear increase in studying when the treatment was introduced, a decrease when it was removed, and an increase when it was reintroduced, there is little doubt that the treatment was the cause of the improvement.
A third assumption of single-subject research is that it is important to study strong and consistent effects that have biological or social importance. Applied researchers, in particular, are interested in treatments that have substantial effects on important behaviors and that can be implemented reliably in the real-world contexts in which they occur. This is sometimes referred to as social validity (Wolf, 1976) [1] . The study by Hall and his colleagues, for example, had good social validity because it showed strong and consistent effects of positive teacher attention on a behavior that is of obvious importance to teachers, parents, and students. Furthermore, the teachers found the treatment easy to implement, even in their often-chaotic elementary school classrooms.
Single-subject research has been around as long as the field of psychology itself. In the late 1800s, one of psychology’s founders, Wilhelm Wundt, studied sensation and consciousness by focusing intensively on each of a small number of research participants. Herman Ebbinghaus’s research on memory and Ivan Pavlov’s research on classical conditioning are other early examples, both of which are still described in almost every introductory psychology textbook.
In the middle of the 20th century, B. F. Skinner clarified many of the assumptions underlying single-subject research and refined many of its techniques (Skinner, 1938) [2] . He and other researchers then used it to describe how rewards, punishments, and other external factors affect behavior over time. This work was carried out primarily using nonhuman subjects—mostly rats and pigeons. This approach, which Skinner called the experimental analysis of behavior —remains an important subfield of psychology and continues to rely almost exclusively on single-subject research. For excellent examples of this work, look at any issue of the Journal of the Experimental Analysis of Behavior . By the 1960s, many researchers were interested in using this approach to conduct applied research primarily with humans—a subfield now called applied behavior analysis (Baer, Wolf, & Risley, 1968) [3] . Applied behavior analysis plays an especially important role in contemporary research on developmental disabilities, education, organizational behavior, and health, among many other areas. Excellent examples of this work (including the study by Hall and his colleagues) can be found in the Journal of Applied Behavior Analysis .
Although most contemporary single-subject research is conducted from the behavioral perspective, it can in principle be used to address questions framed in terms of any theoretical perspective. For example, a studying technique based on cognitive principles of learning and memory could be evaluated by testing it on individual high school students using the single-subject approach. The single-subject approach can also be used by clinicians who take any theoretical perspective—behavioral, cognitive, psychodynamic, or humanistic—to study processes of therapeutic change with individual clients and to document their clients’ improvement (Kazdin, 1982) [4] .
Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .
Neag School of Education
Single subject research.
“ Single subject research (also known as single case experiments) is popular in the fields of special education and counseling. This research design is useful when the researcher is attempting to change the behavior of an individual or a small group of individuals and wishes to document that change. Unlike true experiments where the researcher randomly assigns participants to a control and treatment group, in single subject research the participant serves as both the control and treatment group. The researcher uses line graphs to show the effects of a particular intervention or treatment. An important factor of single subject research is that only one variable is changed at a time. Single subject research designs are “weak when it comes to external validity….Studies involving single-subject designs that show a particular treatment to be effective in changing behavior must rely on replication–across individuals rather than groups–if such results are be found worthy of generalization” (Fraenkel & Wallen, 2006, p. 318).
Suppose a researcher wished to investigate the effect of praise on reducing disruptive behavior over many days. First she would need to establish a baseline of how frequently the disruptions occurred. She would measure how many disruptions occurred each day for several days. In the example below, the target student was disruptive seven times on the first day, six times on the second day, and seven times on the third day. Note how the sequence of time is depicted on the x-axis (horizontal axis) and the dependent variable (outcome variable) is depicted on the y-axis (vertical axis).
Once a baseline of behavior has been established (when a consistent pattern emerges with at least three data points), the intervention begins. The researcher continues to plot the frequency of behavior while implementing the intervention of praise.
In this example, we can see that the frequency of disruptions decreased once praise began. The design in this example is known as an A-B design. The baseline period is referred to as A and the intervention period is identified as B.
Another design is the A-B-A design. An A-B-A design (also known as a reversal design) involves discontinuing the intervention and returning to a nontreatment condition.
Sometimes an individual’s behavior is so severe that the researcher cannot wait to establish a baseline and must begin with an intervention. In this case, a B-A-B design is used. The intervention is implemented immediately (before establishing a baseline). This is followed by a measurement without the intervention and then a repeat of the intervention.
Multiple-Baseline Design
Sometimes, a researcher may be interested in addressing several issues for one student or a single issue for several students. In this case, a multiple-baseline design is used.
“In a multiple baseline across subjects design, the researcher introduces the intervention to different persons at different times. The significance of this is that if a behavior changes only after the intervention is presented, and this behavior change is seen successively in each subject’s data, the effects can more likely be credited to the intervention itself as opposed to other variables. Multiple-baseline designs do not require the intervention to be withdrawn. Instead, each subject’s own data are compared between intervention and nonintervention behaviors, resulting in each subject acting as his or her own control (Kazdin, 1982). An added benefit of this design, and all single-case designs, is the immediacy of the data. Instead of waiting until postintervention to take measures on the behavior, single-case research prescribes continuous data collection and visual monitoring of that data displayed graphically, allowing for immediate instructional decision-making. Students, therefore, do not linger in an intervention that is not working for them, making the graphic display of single-case research combined with differentiated instruction responsive to the needs of students.” (Geisler, Hessler, Gardner, & Lovelace, 2009)
Regardless of the research design, the line graphs used to illustrate the data contain a set of common elements.
Generally, in single subject research we count the number of times something occurs in a given time period and see if it occurs more or less often in that time period after implementing an intervention. For example, we might measure how many baskets someone makes while shooting for 2 minutes. We would repeat that at least three times to get our baseline. Next, we would test some intervention. We might play music while shooting, give encouragement while shooting, or video the person while shooting to see if our intervention influenced the number of shots made. After the 3 baseline measurements (3 sets of 2 minute shooting), we would measure several more times (sets of 2 minute shooting) after the intervention and plot the time points (number of baskets made in 2 minutes for each of the measured time points). This works well for behaviors that are distinct and can be counted.
Sometimes behaviors come and go over time (such as being off task in a classroom or not listening during a coaching session). The way we can record these is to select a period of time (say 5 minutes) and mark down every 10 seconds whether our participant is on task. We make a minimum of three sets of 5 minute observations for a baseline, implement an intervention, and then make more sets of 5 minute observations with the intervention in place. We use this method rather than counting how many times someone is off task because one could continually be off task and that would only be a count of 1 since the person was continually off task. Someone who might be off task twice for 15 second would be off task twice for a score of 2. However, the second person is certainly not off task twice as much as the first person. Therefore, recording whether the person is off task at 10-second intervals gives a more accurate picture. The person continually off task would have a score of 30 (off task at every second interval for 5 minutes) and the person off task twice for a short time would have a score of 2 (off task only during 2 of the 10 second interval measures.
I also have additional information about how to record single-subject research data .
I hope this helps you better understand single subject research.
I have created a PowerPoint on Single Subject Research , which also available below as a video.
I have also created instructions for creating single-subject research design graphs with Excel .
Fraenkel, J. R., & Wallen, N. E. (2006). How to design and evaluate research in education (6th ed.). Boston, MA: McGraw Hill.
Geisler, J. L., Hessler, T., Gardner, R., III, & Lovelace, T. S. (2009). Differentiated writing interventions for high-achieving urban African American elementary students. Journal of Advanced Academics, 20, 214–247.
Del Siegle, Ph.D. University of Connecticut [email protected] www.delsiegle.info
Revised 02/02/2024
Want this as a downloadable pdf click here, want a self-paced video course that covers all the test content and more click here.
Target terms (or phrases, in this case): Individuals serve as their own controls, repeated measures, prediction, verification, control
Two important things to know about single subject experimental designs: (1) They are not the same as case studies. Case studies are clinical stories that just tell what happened without manipulation of the environment. Experimental designs involve deliberate manipulation of the environment to answer a particular question. (2) The logic behind single subject experimental designs applies to the everyday work of programming for behavior change with clients. Even if we never publish a research study in a peer reviewed journal, or participate in any kind of formal research, we still need to be very familiar with the methodology in order to evaluate our work as behavior analysts. This is not optional – it’s an integral part of our work.
Individuals serve as their own controls
Please note that “subject” and “participant” are two words that can be used to describe someone (or a unit of people or animals) in a research study. “Participant” is often a preferred term because it emphasizes that people who take part in studies have rights and are actively part of the process.
Definition: Individuals serve as their own controls in a research study when the effects of an intervention are measured on the person themselves, not between one person who got the intervention (treatment) and one person who didn’t (control). In single subject methodology, the individual is essentially assigned to both treatment and control, because the research question is answered differently from other kinds of research. (See D-4 for more about what this all means.)
Example in clinical context: Tami is designing an intervention for her client Ariel, who needs help with remembering to complete her homework. Tami takes baseline data until stability is achieved, then introduces an intervention (series of alarms on Ariel’s phone) and continues to take data until stability is once again achieved. She then returns Ariel to the baseline condition and then introduces the intervention a second time, following the same process as before. The data depicting the dependent variable (Ariel’s homework completion behavior) show a clear relationship between the presence of the alarms and the completion of homework. In this example, no other person was used as a “control” for Ariel. That would not have been a great way to answer the question about how to help Ariel do her homework, since she might have special considerations and circumstances that are unique. Instead, she was the only subject, and the intervention was evaluated using Ariel herself in all phases.
Why it matters: Using subjects as their own controls matters a lot in behavior analysis. It enables us to take into account the unique idiosyncrasies (“weirdness”) of each individual person. We’re all different, and we’re all weird in some way. When we use big groups to answer research questions, one of the goals is to make the groups so big that the numbers “drown out” the differences between people by statistically canceling each other out. There’s nothing inherently wrong with this, but it doesn’t work for us as behavior analysts. We are interested in functional relations between individual behavior and experimental conditions! To do that, we need to study the individual and their environment, and how the two interact. The vast majority of research published in behavior analytic journals was conducted using single subject methodology.
Repeated measures
Definition: When we use single subject experimental designs, we need to capture something to measure to see if our intervention is working. That thing we measure is called a dependent variable.
Examples in clinical context: Randi engages in swearing and property destruction. His team creates an intervention plan for him. In order to empirically answer questions about whether the intervention is working, the team carefully defines and records instances of Randi’s target behaviors.
This also works with skill acquisition (behavior we are teaching so that they will increase). For example, say your client Tanisha needs more skills related to asking for help. We could use “asking for help” as the dependent variable and measure it multiple times throughout the baseline and intervention.
Why it matters: Using repeated measures is super important, because if we only measure the dependent variable once or twice, we won’t be able to thoroughly see what our data points are telling us. Take a look at C-11 (interpret graph data) to understand more about how repeated measures help us analyze level, trend, and variability of data.
Definition: Prediction is looking at the data we have and making an informed guess about where it would go if we kept all variables the same (i.e. if we didn’t change anything). Take a look at C-11 (interpret graph data) for more on how to predict where data will head next.
Example in clinical context: Johnny is a client who is being assessed at a severe behavior clinic due to self injury. His team conducts observations and they highly suspect that his self injury is maintained by access to attention (connection with other people). The team conducts a functional assessment (baseline) condition in which they give Johnny attention every time his self injury occurs. Team members observe that Johnny engages in self injury every single time attention is withdrawn, and stops once he receives attention. After observing this multiple times, graphing the results, and engaging in visual analysis, the team predicts that, if they keep providing attention contingent on self injury, the target behavior will continue as before.
Why it matters: Behavior analysts need to predict what the dependent variable would look like if everything else stayed the same in order to design experiments that demonstrate that an independent variable can change the otherwise predicted outcome.
Verification
Definition: Verification is demonstrating that baseline levels of behavior would have remained without introducing the independent variable (intervention). Verification as a concept can take several forms within a research design, but the foundational idea is the same.
Example in clinical context: Let’s take the example of Johnny above. The team moved into the intervention phase. Now his team ignores self injury and they have taught Johnny to use an “I want attention” button instead, which is always reinforced with attention. Johnny’s levels of self injury are down significantly! To demonstrate that their intervention, rather than something else (such as medication), was responsible for the change in self injury, Johnny’s team could take the button away and start reinforcing self injury again. If Johnny’s behavior looks similar to what it was at the first baseline phase during the assessment, then that is evidence that the intervention was responsible for the change in behavior.
Why it matters: It’s important to go beyond educated guesses and actually demonstrate in a logical manner that our interventions are responsible for the change in behavior that the client needs.
Replication
Definition: Replication is strengthening the case that the independent variable is responsible for changes in behavior by demonstrating it multiple times.
Example in clinical context: Let’s keep talking about Johnny from before. His team could strengthen the probability that their intervention was responsible for the change in the self injury by implementing baseline and intervention/treatment conditions several more times. If the self injury stays low in each intervention phase and high in each baseline phase, each repetition of that change would lend further credibility to the functional relationship.
Replication can also happen in “smaller” or “bigger” ways. In terms of more micro replications, we can often see within-session replication, such as Johnny engaging in the target behavior every time attention was withdrawn, over and over again, within a single assessment session (e.g., 10 minutes). We also see replication on a broader scale, such as if the researchers utilized similar methodology for many other individuals who also engaged in attention-maintained self injury and found similar treatment results.
Why it matters: Every time a possible relationship between two variables is demonstrated, it becomes less and less likely that “chance” or some other factor was primarily responsible for the relationship between the dependent and independent variables. Replication as a concept can take several forms within a research design, but the foundational idea is the same. (See “D-2 distinguish between internal and external validity” for more on how replication ties into validity.)
Share this:.
Available from.
Thanks to remarkable methodological and statistical advances in recent years, single-case design (SCD) research has become a viable and often essential option for researchers in applied psychology, education, and related fields.
This text is a compendium of information and tools for researchers considering SCD research, a methodology in which one or several participants (or other units) comprise a systematically-controlled experimental intervention study. SCD is a highly flexible method of conducting applied intervention research where it is not feasible or practical to collect data from traditional groups of participants.
Initial chapters lay out the key components of SCDs, from articulating dependent variables to documenting methods for achieving experimental control and selecting an appropriate design model. Subsequent chapters show when and how to implement SCDs in a variety of contexts and how to analyze and interpret results.
Authors emphasize key design and analysis tactics, such as randomization, to help enhance the internal validity and scientific credibility of individual studies. This rich resource also includes in-depth descriptions of large-scale SCD research projects being undertaken at key institutions; practical suggestions from journal editors on how to get SCD research published; and detailed instructions for free, user-friendly, web-based randomization software.
Contributors
Series Foreword
Acknowledgements
Introduction: An Overview of Single-Case Intervention Research Thomas R. Kratochwill and Joel R. Levin
I. Methodologies and Analyses
II. Reactions From Leaders in the Field
About the Editors
Thomas R. Kratochwill, PhD, is Sears Roebuck Foundation–Bascom Professor at the University of Wisconsin–Madison, director of the School Psychology Program, and a licensed psychologist in Wisconsin.
He is the author of more than 200 journal articles and book chapters. He has written or edited more than 30 books and has made more than 300 professional presentations.
In 1977 he received the Lightner Witmer Award from APA Division 16 (School Psychology). In 1981 he received the Outstanding Research Contributions Award from the Arizona State Psychological Association and in 1995 received an award for Outstanding Contributions to the Advancement of Scientific Knowledge in Psychology from the Wisconsin Psychological Association. Also in 1995, he was the recipient of the Senior Scientist Award from APA Division 16, and the Wisconsin Psychological Association selected his research for its Margaret Bernauer Psychology Research Award.
In 1995, 2001, and 2002 the APA Division 16 journal School Psychology Quarterly selected one of his articles as the best of the year. In 2005 he received the Jack I. Bardon Distinguished Achievement Award from APA Division 16. He was selected as the founding editor of School Psychology Quarterly in 1984 and served as editor of the journal until 1992.
In 2011 Dr. Kratochwill received the Lifetime Achievement Award from the National Register of Health Service Providers in Psychology and the Nadine Murphy Lambert Lifetime Achievement Award from APA Division 16.
Dr. Kratochwill is a fellow of APA Divisions 15 (Educational Psychology), 16, and 53 (Society of Clinical Child and Adolescent Psychology). He is past president of the Society for the Study of School Psychology and was cochair of the Task Force on Evidence-Based Interventions in School Psychology. He was also a member of the APA Task Force on Evidence-Based Practice for Children and Adolescents and the recipient of the 2007 APA Distinguished Career Contributions to Education and Training of Psychologists.
He is the recipient of the University of Wisconsin–Madison Van Hise Outreach Teaching Award and a member of the University's teaching academy. Most recently he has chaired the What Works Clearinghouse Panel for the development of Standards for Single-Case Research Design for review of evidence-based interventions.
Joel R. Levin, PhD, is Professor Emeritus of Educational Psychology, University of Wisconsin–Madison and University of Arizona. He is internationally renowned for his research and writing on educational research methodology and statistical analysis as well as for his career-long program of research on students' learning strategies and study skills, with more than 400 scholarly publications in those domains. Within APA, he is a Fellow of Division 5 (Evaluation, Measurement and Statistics) and Division 15 (Educational Psychology).
From 1986 to 1988 Dr. Levin was head of the Learning and Instruction division of the American Educational Research Association (AERA), from 1991 to 1996 he was editor of APA's Journal of Educational Psychology , and from 2001 to 2003 he was coeditor of the journal Issues in Education: Contributions From Educational Psychology . During 1994–1995 he served as chair of APA's Council of Editors, and from 1993 to 1995 he was an ex-officio representative on APA's Publications and Communications Board.
Dr. Levin chaired an editors' committee that revised the statistical-reporting guidelines sections for the fourth (1994) edition of the APA Publication Manual , and he served on a similar committee that revised the fifth (2001) and sixth (2010) editions of the manual. From 2003 to 2008 he was APA's chief editorial advisor, a position in which he was responsible for mediating editor–author conflicts, managing ethical violations, and making recommendations bearing on all aspects of the scholarly research and publication process.
Dr. Levin has received two article-of-the-year awards from AERA (1972, with Leonard Marascuilo; 1973, with William Rohwer and Anne Cleary) as well as awards from the University of Wisconsin–Madison for both his teaching and his research (1971 and 1980). In 1992 he was presented with a University of Wisconsin–Madison award for his combined research, teaching, and professional service contributions, followed in 1996 by a prestigious University of Wisconsin–Madison named professorship (Julian C. Stanley Chair).
In 1997 the University of Wisconsin–Madison's School of Education honored Dr. Levin with a distinguished career award, and in 2002 he was accorded APA Division 15's highest research recognition, the E. L. Thorndike Award, for his professional achievements. In 2010 AERA's Educational Statisticians Special Interest Group presented him with an award for exceptional contributions to the field of educational statistics, and most recently, in 2013 the editorial board of the Journal of School Psychology selected his 2012 publication (with John Ferron and Thomas Kratochwill) as the Journal's outstanding article of the year.
A well-written and meaningfully structured compendium that includes the foundational and advanced guidelines for conducting accurate single-case intervention designs. Whether you are an undergraduate or a graduate student, or an applied researcher anywhere along the novice-to-expert column, this book promises to be an invaluable addition to your library. —PsycCRITIQUES
Provides valuable information about single case research design for researchers and graduate students, including methodology, statistical analyses, and the opinions of researchers who have been using it. —Doody's Review Service
This is a welcome addition to the libraries of behavioral researchers interested in knowing more about the lives of children inside and outside of school. Kratochwill and Levin and their contributing authors blend the sometimes esoteric issues of the philosophy of science, experimental design, and statistics with the real-life issues of how to get grant funding and publish research. This volume is useful for new and experienced researchers alike. —Ilene S. Schwartz, PhD, professor, University of Washington, Seattle, and director, Haring Center for Research on Inclusive Education, Seattle, WA
Methodological Issues and Strategies, 5e
How to Mix Methods
Practical Ethics for Psychologists
The Complete Researcher
APA Handbook of Research Methods in Psychology
Kevin p kearns.
The following slides accompanied a presentation delivered at ASHA’s Clinical Practice Research Institute.
Data speak, not men…
“Designs have inherent rigor but not all studies using a design are rigorous” — Randy; yesterday
“Illusion of strong evidence…”– Gilbert, McPeek & Mosteller, 1977
Effects of Interpretation Bias on Research Evidence (Kaptchuk, 2003)
Common Single-Subject Design Strategies
Treatment vs No-treatment comparisons
Component Assessment
Successive Level Analysis
Treatment – Treatment Comparisons
Internal Validity
Research on Visual Inspection of Single-Subject Data (Franklin et al, 1996; Robey et al, 1999)
S1 ITSACORR results were non-significant
S2 ITSACORR results were sig (F < .05)
Too few data points for valid analysis
Failure to establish and make explicit criteria for guiding procedural and methodological decisions prior to change is a serious threat to internal validity that is difficult.
Clinical significance can not be assumed from our perspective alone.
Change in level of performance on any outcome measure, even when effects are large and visually obvious or significant, is an insufficient metric of the impact of experimental treatment on our participants/ patients.
Minimal Clinically Important Difference (MCID) : “the smallest difference in a score that is considered worthwhile or important” (Hays & Woolley, 2000)
Responsiveness of Health Measures (Husted et al., 2000) 1. Distribution based approaches examine internal responsiveness, using distribution/ variability of initial (baseline) scores to examine differences (e.g. Effect size).
2. Anchor based approaches examine external responsiveness by comparing change detected by a dependent measure with an external criterion. For example, specify a level of change that meets “minimal clinically important difference” (MCID).
Anchor-based Responsiveness measures (see Beninato, et al Archives of PMR, 2006) use external criterion as “anchor”
Revisiting Clinically Important Change (Social Validation)
When the perceived change is important to the patient, clinician, researcher, payor or society (Beaton et al., 2001)
Requires that we extend our conceptual frame of reference beyond typical outcome measures and distribution based measures of responsiveness
“Time will tell” — (M. Planck, 1950)
“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die.” — (Kaptchuk, 2003)
Franklin, R. D., Gorman, B. S., Beasley, T. M. & Allison, D. B. (1996). Graphical display and visual analysis.. Design and Snalysis of Single-Case Research , (pp. 119–158). Lawrence Erlbaum Associates.
Kearns, K. P. & Thompson, C. K. (1991). Technical drift and conceptual myopia: The Merlin effect. Clinical Aphasiology , 19, 31–40
Kevin P Kearns SUNY Fredonia
Presented at the Clinical Practice Research Institute (CPRI). Hosted by the American Speech-Language-Hearing Association Research Mentoring Network.
Copyrighted Material. Reproduced by the American Speech-Language-Hearing Association in the Clinical Research Education Library with permission from the author or presenter.
More from the cred library, innovative treatments for persons with dementia, implementation science resources for crisp, when the ears interact with the brain, follow asha journals on twitter.
© 1997-2024 American Speech-Language-Hearing Association Privacy Notice Terms of Use
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton
Before looking at any specific single-subject research designs, it will be helpful to consider some features that are common to most of them. Many of these features are illustrated in Figure 10.1, which shows the results of a generic single-subject study. First, the dependent variable (represented on the y -axis of the graph) is measured repeatedly over time (represented by the x -axis) at regular intervals. Second, the study is divided into distinct phases, and the participant is tested under one condition per phase. The conditions are often designated by capital letters: A, B, C, and so on. Thus Figure 10.1 represents a design in which the participant was tested first in one condition (A), then tested in another condition (B), and finally retested in the original condition (A). (This is called a reversal design and will be discussed in more detail shortly.)
Another important aspect of single-subject research is that the change from one condition to the next does not usually occur after a fixed amount of time or number of observations. Instead, it depends on the participant’s behavior. Specifically, the researcher waits until the participant’s behavior in one condition becomes fairly consistent from observation to observation before changing conditions. This is sometimes referred to as the steady state strategy (Sidman, 1960) [1] . The idea is that when the dependent variable has reached a steady state, then any change across conditions will be relatively easy to detect. Recall that we encountered this same principle when discussing experimental research more generally. The effect of an independent variable is easier to detect when the “noise” in the data is minimized.
The most basic single-subject research design is the reversal design , also called the ABA design . During the first phase, A, a baseline is established for the dependent variable. This is the level of responding before any treatment is introduced, and therefore the baseline phase is a kind of control condition. When steady state responding is reached, phase B begins as the researcher introduces the treatment. There may be a period of adjustment to the treatment during which the behavior of interest becomes more variable and begins to increase or decrease. Again, the researcher waits until that dependent variable reaches a steady state so that it is clear whether and how much it has changed. Finally, the researcher removes the treatment and again waits until the dependent variable reaches a steady state. This basic reversal design can also be extended with the reintroduction of the treatment (ABAB), another return to baseline (ABABA), and so on.
The study by Hall and his colleagues employed an ABAB reversal design. Figure 10.2 approximates the data for Robbie. The percentage of time he spent studying (the dependent variable) was low during the first baseline phase, increased during the first treatment phase until it leveled off, decreased during the second baseline phase, and again increased during the second treatment phase.
Why is the reversal—the removal of the treatment—considered to be necessary in this type of design? Why use an ABA design, for example, rather than a simpler AB design? Notice that an AB design is essentially an interrupted time-series design applied to an individual participant. Recall that one problem with that design is that if the dependent variable changes after the treatment is introduced, it is not always clear that the treatment was responsible for the change. It is possible that something else changed at around the same time and that this extraneous variable is responsible for the change in the dependent variable. But if the dependent variable changes with the introduction of the treatment and then changes back with the removal of the treatment (assuming that the treatment does not create a permanent effect), it is much clearer that the treatment (and removal of the treatment) is the cause. In other words, the reversal greatly increases the internal validity of the study.
There are close relatives of the basic reversal design that allow for the evaluation of more than one treatment. In a multiple-treatment reversal design , a baseline phase is followed by separate phases in which different treatments are introduced. For example, a researcher might establish a baseline of studying behavior for a disruptive student (A), then introduce a treatment involving positive attention from the teacher (B), and then switch to a treatment involving mild punishment for not studying (C). The participant could then be returned to a baseline phase before reintroducing each treatment—perhaps in the reverse order as a way of controlling for carryover effects. This particular multiple-treatment reversal design could also be referred to as an ABCACB design.
In an alternating treatments design , two or more treatments are alternated relatively quickly on a regular schedule. For example, positive attention for studying could be used one day and mild punishment for not studying the next, and so on. Or one treatment could be implemented in the morning and another in the afternoon. The alternating treatments design can be a quick and effective way of comparing treatments, but only when the treatments are fast acting.
There are two potential problems with the reversal design—both of which have to do with the removal of the treatment. One is that if a treatment is working, it may be unethical to remove it. For example, if a treatment seemed to reduce the incidence of self-injury in a child with an intellectual delay, it would be unethical to remove that treatment just to show that the incidence of self-injury increases. The second problem is that the dependent variable may not return to baseline when the treatment is removed. For example, when positive attention for studying is removed, a student might continue to study at an increased rate. This could mean that the positive attention had a lasting effect on the student’s studying, which of course would be good. But it could also mean that the positive attention was not really the cause of the increased studying in the first place. Perhaps something else happened at about the same time as the treatment—for example, the student’s parents might have started rewarding him for good grades. One solution to these problems is to use a multiple-baseline design , which is represented in Figure 10.3. There are three different types of multiple-baseline designs which we will now consider.
In one version of the design, a baseline is established for each of several participants, and the treatment is then introduced for each one. In essence, each participant is tested in an AB design. The key to this design is that the treatment is introduced at a different time for each participant. The idea is that if the dependent variable changes when the treatment is introduced for one participant, it might be a coincidence. But if the dependent variable changes when the treatment is introduced for multiple participants—especially when the treatment is introduced at different times for the different participants—then it is unlikely to be a coincidence.
As an example, consider a study by Scott Ross and Robert Horner (Ross & Horner, 2009) [2] . They were interested in how a school-wide bullying prevention program affected the bullying behavior of particular problem students. At each of three different schools, the researchers studied two students who had regularly engaged in bullying. During the baseline phase, they observed the students for 10-minute periods each day during lunch recess and counted the number of aggressive behaviors they exhibited toward their peers. After 2 weeks, they implemented the program at one school. After 2 more weeks, they implemented it at the second school. And after 2 more weeks, they implemented it at the third school. They found that the number of aggressive behaviors exhibited by each student dropped shortly after the program was implemented at the student’s school. Notice that if the researchers had only studied one school or if they had introduced the treatment at the same time at all three schools, then it would be unclear whether the reduction in aggressive behaviors was due to the bullying program or something else that happened at about the same time it was introduced (e.g., a holiday, a television program, a change in the weather). But with their multiple-baseline design, this kind of coincidence would have to happen three separate times—a very unlikely occurrence—to explain their results.
In another version of the multiple-baseline design, multiple baselines are established for the same participant but for different dependent variables, and the treatment is introduced at a different time for each dependent variable. Imagine, for example, a study on the effect of setting clear goals on the productivity of an office worker who has two primary tasks: making sales calls and writing reports. Baselines for both tasks could be established. For example, the researcher could measure the number of sales calls made and reports written by the worker each week for several weeks. Then the goal-setting treatment could be introduced for one of these tasks, and at a later time the same treatment could be introduced for the other task. The logic is the same as before. If productivity increases on one task after the treatment is introduced, it is unclear whether the treatment caused the increase. But if productivity increases on both tasks after the treatment is introduced—especially when the treatment is introduced at two different times—then it seems much clearer that the treatment was responsible.
In yet a third version of the multiple-baseline design, multiple baselines are established for the same participant but in different settings. For example, a baseline might be established for the amount of time a child spends reading during his free time at school and during his free time at home. Then a treatment such as positive attention might be introduced first at school and later at home. Again, if the dependent variable changes after the treatment is introduced in each setting, then this gives the researcher confidence that the treatment is, in fact, responsible for the change.
In addition to its focus on individual participants, single-subject research differs from group research in the way the data are typically analyzed. As we have seen throughout the book, group research involves combining data across participants. Group data are described using statistics such as means, standard deviations, correlation coefficients, and so on to detect general patterns. Finally, inferential statistics are used to help decide whether the result for the sample is likely to generalize to the population. Single-subject research, by contrast, relies heavily on a very different approach called visual inspection . This means plotting individual participants’ data as shown throughout this chapter, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable. Inferential statistics are typically not used.
In visually inspecting their data, single-subject researchers take several factors into account. One of them is changes in the level of the dependent variable from condition to condition. If the dependent variable is much higher or much lower in one condition than another, this suggests that the treatment had an effect. A second factor is trend , which refers to gradual increases or decreases in the dependent variable across observations. If the dependent variable begins increasing or decreasing with a change in conditions, then again this suggests that the treatment had an effect. It can be especially telling when a trend changes directions—for example, when an unwanted behavior is increasing during baseline but then begins to decrease with the introduction of the treatment. A third factor is latency , which is the time it takes for the dependent variable to begin changing after a change in conditions. In general, if a change in the dependent variable begins shortly after a change in conditions, this suggests that the treatment was responsible.
In the top panel of Figure 10.4, there are fairly obvious changes in the level and trend of the dependent variable from condition to condition. Furthermore, the latencies of these changes are short; the change happens immediately. This pattern of results strongly suggests that the treatment was responsible for the changes in the dependent variable. In the bottom panel of Figure 10.4, however, the changes in level are fairly small. And although there appears to be an increasing trend in the treatment condition, it looks as though it might be a continuation of a trend that had already begun during baseline. This pattern of results strongly suggests that the treatment was not responsible for any changes in the dependent variable—at least not to the extent that single-subject researchers typically hope to see.
The results of single-subject research can also be analyzed using statistical procedures—and this is becoming more common. There are many different approaches, and single-subject researchers continue to debate which are the most useful. One approach parallels what is typically done in group research. The mean and standard deviation of each participant’s responses under each condition are computed and compared, and inferential statistical tests such as the t test or analysis of variance are applied (Fisch, 2001) [3] . (Note that averaging across participants is less common.) Another approach is to compute the percentage of non-overlapping data (PND) for each participant (Scruggs & Mastropieri, 2001) [4] . This is the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition. In the study of Hall and his colleagues, for example, all measures of Robbie’s study time in the first treatment condition were greater than the highest measure in the first baseline, for a PND of 100%. The greater the percentage of non-overlapping data, the stronger the treatment effect. Still, formal statistical approaches to data analysis in single-subject research are generally considered a supplement to visual inspection, not a replacement for it.
Figure 10.2 long description: Line graph showing the results of a study with an ABAB reversal design. The dependent variable was low during first baseline phase; increased during the first treatment; decreased during the second baseline, but was still higher than during the first baseline; and was highest during the second treatment phase. [Return to Figure 10.2]
Figure 10.3 long description: Three line graphs showing the results of a generic multiple-baseline study, in which different baselines are established and treatment is introduced to participants at different times.
For Baseline 1, treatment is introduced one-quarter of the way into the study. The dependent variable ranges between 12 and 16 units during the baseline, but drops down to 10 units with treatment and mostly decreases until the end of the study, ranging between 4 and 10 units.
For Baseline 2, treatment is introduced halfway through the study. The dependent variable ranges between 10 and 15 units during the baseline, then has a sharp decrease to 7 units when treatment is introduced. However, the dependent variable increases to 12 units soon after the drop and ranges between 8 and 10 units until the end of the study.
For Baseline 3, treatment is introduced three-quarters of the way into the study. The dependent variable ranges between 12 and 16 units for the most part during the baseline, with one drop down to 10 units. When treatment is introduced, the dependent variable drops down to 10 units and then ranges between 8 and 9 units until the end of the study. [Return to Figure 10.3]
Figure 10.4 long description: Two graphs showing the results of a generic single-subject study with an ABA design. In the first graph, under condition A, level is high and the trend is increasing. Under condition B, level is much lower than under condition A and the trend is decreasing. Under condition A again, level is about as high as the first time and the trend is increasing. For each change, latency is short, suggesting that the treatment is the reason for the change.
In the second graph, under condition A, level is relatively low and the trend is increasing. Under condition B, level is a little higher than during condition A and the trend is increasing slightly. Under condition A again, level is a little lower than during condition B and the trend is decreasing slightly. It is difficult to determine the latency of these changes, since each change is rather minute, which suggests that the treatment is ineffective. [Return to Figure 10.4]
When the researcher waits until the participant’s behavior in one condition becomes fairly consistent from observation to observation before changing conditions.
The most basic single-subject research design in which the researcher measures the dependent variable in three phases: Baseline, before a treatment is introduced (A); after the treatment is introduced (B); and then a return to baseline after removing the treatment (A). It is often called an ABA design.
Another term for reversal design.
The beginning phase of an ABA design which acts as a kind of control condition in which the level of responding before any treatment is introduced.
In this design the baseline phase is followed by separate phases in which different treatments are introduced.
In this design two or more treatments are alternated relatively quickly on a regular schedule.
In this design, multiple baselines are either established for one participant or one baseline is established for many participants.
This means plotting individual participants’ data, looking carefully at those plots, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable.
This is the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition.
Single-Subject Research Designs Copyright © by Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
4341 Accesses
6 Citations
This chapter addresses the peculiarities, characteristics, and major fallacies of single case research designs. A single case study research design is a collective term for an in-depth analysis of a small non-random sample. The focus on this design is on in-depth. This characteristic distinguishes the case study research from other research designs that understand the individual case as a rather insignificant and interchangeable aspect of a population or sample. Also, researchers find relevant information on how to write a single case research design paper and learn about typical methodologies used for this research design. The chapter closes with referring to overlapping and adjacent research designs.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Baškarada, S. (2014). Qualitative case studies guidelines. The Qualitative Report, 19 (40), 1–25.
Google Scholar
Berg, B., & Lune, H. (2012). Qualitative research methods for the social sciences. Pearson.
Bryman, A. (2004). Social research methods (2nd ed.). Oxford University Press, 592.
Burns, R. B. (2000). Introduction to research methods. United States of America.
Creswell, J. W. (2013). Qualitative inquiry and research design. Choosing among five approaches (3rd ed.). SAGE.
Darke, P., Shanks, G., & Broadbent, M. (1998). Successfully completing case study research: Combining rigour, relevance and pragmatism. Inform Syst J, 8 (4), 273–289.
Article Google Scholar
Dey, I. (1999). Grounding grounded theory: Guidelines for qualitative inquiry . Academic Press.
Dick, B. (2005). Grounded theory: A thumbnail sketch. Retrieved 11 June 2021 from http://www.scu.edu.au/schools/gcm/ar/arp/grounded.html .
Dooley, L. M. (2002). Case study research and theory building. Advances in Developing Human Resources, 4 (3), 335–354.
Edmonds, W. A., & Kennedy, T. D. (2012). An applied reference guide to research designs: Quantitative, qualitative, and mixed methods . Thousand Oaks, CA: Sage.
Edmondson, A. & McManus, S. (2007). Methodological fit in management field research. The Academy of Management Review, 32 (4), 1155–1179.
Eisenhardt, K. M. (1989). Building theories from case study research. Academy of Management Review, 14 (4), 532–550.
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research . Sociology Press.
Flynn, B. B., Sakakibara, S., Schroeder, R. G., Bates, K. A., & Flynn, E. J. (1990). Empirical research methods in operations management. Journal of Operations Management, 9 (2), 250–284.
Flyvbjerg, B. (2006). Five misunderstandings about case-study research. Qualitative Inquiry, 12 (2), 219–245.
General Accounting Office (1990). Case study evaluations. Retrieved May 15, 2021, from https://www.gao.gov/assets/pemd-10.1.9.pdf .
Gomm, R. (2000). Case study method. Key issues, key texts . SAGE.
Halaweh, M. (2012). Integration of grounded theory and case study: An exemplary application from e-commerce security perception research. Journal of Information Technology Theory and Application (JITTA), 13 (1).
Hancock, D., & Algozzine, B. (2016). Doing case study research: A practical guide for beginning researchers (3rd ed.). Teachers College Press.
Hekkala, R. (2007). Grounded theory—the two faces of the methodology and their manifestation in IS research. In Proceedings of the 30th Information Systems Research Seminar in Scandinavia IRIS, 11–14 August, Tampere, Finland (pp. 1–12).
Hyett, N., Kenny, A., & Dickson-Swift, V. (2014). Methodology or method? A critical review of qualitative case study reports. International Journal of Qualitative Studies on Health and Well-Being, 9 , 23606.
Keating, P. J. (1995). A framework for classifying and evaluating the theoretical contributions of case research in management accounting. Journal of Management Accounting Research, 7 , 66.
Levy, J. S. (2008). Case studies: Types, designs, and logics of inference. Conflict Management and Peace Science, 25 (1), 1–18.
Meyer, J.-A., & Kittel-Wegner, E. (2002). Die Fallstudie in der betriebswirtschaftlichen Forschung und Lehre . Stiftungslehrstuhl für ABWL, insb. kleine und mittlere Unternehmen, Universität.
Mitchell, J. C. (1983). Case and situation analysis. The Sociological Review, 31 (2), 187–211.
Ng, Y. N. K. & Hase, S. (2008). Grounded suggestions for doing a grounded theory business research. Electronic Journal on Business Research Methods, 6 (2).
Ng. (2005). A principal-distributor collaboration moden in the crane industry. Ph.D. Thesis, Graduate College of Management, Southern Cross University, Australia.
Ridder, H.-G. (2016). Case study research. Approaches, methods, contribution to theory. Sozialwissenschaftliche Forschungsmethoden (vol. 12). Rainer Hampp Verlag.
Ridder, H.-G. (2017). The theory contribution of case study research designs. Business Research, 10 (2), 281–305.
Maoz, Z. (2002). Case study methodology in international studies: from storytelling to hypothesis testing. In F. P. Harvey & M. Brecher (Eds.). Evaluating methodology in international studies . University of Michigan Press.
May, T. (2011). Social research: Issues, methods and process . Open University Press/Mc.
Merriam, S. B. (2009). Qualitative research in practice: Examples for discussion and analysis .
Onwuegbuzie, A. J., Leech, N. L., & Collins, K. M. (2012). Qualitative analysis techniques for the review of the literature. Qualitative Report, 17 (56).
Piekkari, R., Welch, C., & Paavilainen, E. (2009). The case study as disciplinary convention. Organizational Research Methods, 12 (3), 567–589.
Stake, R. E. (1995). The art of case study research . Sage.
Stake, R. E. (2005). Qualitative case studies. The SAGE handbook of qualitative research (3rd ed.), ed. N. K. Denzin & Y. S. Lincoln (pp. 443–466).
Strauss, A. L., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques . Sage publications.
Strauss, A. L., & Corbin, J. (1998). Basics of qualitative research techniques and procedures for developing grounded theory . Sage.
Tight, M. (2003). Researching higher education . Society for Research into Higher Education; Open University Press.
Tight, M. (2010). The curious case of case study: A viewpoint. International Journal of Social Research Methodology, 13 (4), 329–339.
Walsham, G. (2006). Doing interpretive research. European Journal of Information Systems, 15 (3), 320–330.
Welch, C., Piekkari, R., Plakoyiannaki, E., & Paavilainen-Mäntymäki, E. (2011). Theorising from case studies: Towards a pluralist future for international business research. Journal of International Business Studies, 42 (5), 740–762.
Woods, M. (2009). A contingency theory perspective on the risk management control system within Birmingham City Council. Management Accounting Research, 20 (1), 69–81.
Yin, R. K. (1994). Discovering the future of the case study. Method in evaluation research. American Journal of Evaluation, 15 (3), 283–290.
Yin, R. K. (2014). Case study research. Design and methods (5th ed.). SAGE.
Download references
Authors and affiliations.
Wirtschaft/IFZ – Campus Zug-Rotkreuz, Hochschule Luzern, Zug-Rotkreuz, Zug , Switzerland
Stefan Hunziker & Michael Blankenagel
You can also search for this author in PubMed Google Scholar
Correspondence to Stefan Hunziker .
Reprints and permissions
© 2021 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature
Hunziker, S., Blankenagel, M. (2021). Single Case Research Design. In: Research Design in Business and Management. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-34357-6_8
DOI : https://doi.org/10.1007/978-3-658-34357-6_8
Published : 10 November 2021
Publisher Name : Springer Gabler, Wiesbaden
Print ISBN : 978-3-658-34356-9
Online ISBN : 978-3-658-34357-6
eBook Packages : Business and Economics (German Language)
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Policies and ethics
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Chapter 10: Single-Subject Research
Learning Objectives
Single-subject research is a type of quantitative research that involves studying in detail the behaviour of each of a small number of participants. Note that the term single-subject does not mean that only one participant is studied; it is more typical for there to be somewhere between two and 10 participants. (This is why single-subject research designs are sometimes called small- n designs, where n is the statistical symbol for the sample size.) Single-subject research can be contrasted with group research , which typically involves studying large numbers of participants and examining their behaviour primarily in terms of group means, standard deviations, and so on. The majority of this textbook is devoted to understanding group research, which is the most common approach in psychology. But single-subject research is an important alternative, and it is the primary approach in some areas of psychology.
Before continuing, it is important to distinguish single-subject research from two other approaches, both of which involve studying in detail a small number of participants. One is qualitative research, which focuses on understanding people’s subjective experience by collecting relatively unstructured data (e.g., detailed interviews) and analyzing those data using narrative rather than quantitative techniques. Single-subject research, in contrast, focuses on understanding objective behaviour through experimental manipulation and control, collecting highly structured data, and analyzing those data quantitatively.
It is also important to distinguish single-subject research from case studies. A case study is a detailed description of an individual, which can include both qualitative and quantitative analyses. (Case studies that include only qualitative analyses can be considered a type of qualitative research.) The history of psychology is filled with influential cases studies, such as Sigmund Freud’s description of “Anna O.” (see Note 10.5 “The Case of “Anna O.””) and John Watson and Rosalie Rayner’s description of Little Albert (Watson & Rayner, 1920) [1] , who learned to fear a white rat—along with other furry objects—when the researchers made a loud noise while he was playing with the rat. Case studies can be useful for suggesting new research questions and for illustrating general principles. They can also help researchers understand rare phenomena, such as the effects of damage to a specific part of the human brain. As a general rule, however, case studies cannot substitute for carefully designed group or single-subject research studies. One reason is that case studies usually do not allow researchers to determine whether specific events are causally related, or even related at all. For example, if a patient is described in a case study as having been sexually abused as a child and then as having developed an eating disorder as a teenager, there is no way to determine whether these two events had anything to do with each other. A second reason is that an individual case can always be unusual in some way and therefore be unrepresentative of people more generally. Thus case studies have serious problems with both internal and external validity.
The Case of “Anna O.”
Sigmund Freud used the case of a young woman he called “Anna O.” to illustrate many principles of his theory of psychoanalysis (Freud, 1961) [2] . (Her real name was Bertha Pappenheim, and she was an early feminist who went on to make important contributions to the field of social work.) Anna had come to Freud’s colleague Josef Breuer around 1880 with a variety of odd physical and psychological symptoms. One of them was that for several weeks she was unable to drink any fluids. According to Freud,
She would take up the glass of water that she longed for, but as soon as it touched her lips she would push it away like someone suffering from hydrophobia.…She lived only on fruit, such as melons, etc., so as to lessen her tormenting thirst. (p. 9)
But according to Freud, a breakthrough came one day while Anna was under hypnosis.
[S]he grumbled about her English “lady-companion,” whom she did not care for, and went on to describe, with every sign of disgust, how she had once gone into this lady’s room and how her little dog—horrid creature!—had drunk out of a glass there. The patient had said nothing, as she had wanted to be polite. After giving further energetic expression to the anger she had held back, she asked for something to drink, drank a large quantity of water without any difficulty, and awoke from her hypnosis with the glass at her lips; and thereupon the disturbance vanished, never to return. (p.9)
Freud’s interpretation was that Anna had repressed the memory of this incident along with the emotion that it triggered and that this was what had caused her inability to drink. Furthermore, her recollection of the incident, along with her expression of the emotion she had repressed, caused the symptom to go away.
As an illustration of Freud’s theory, the case study of Anna O. is quite effective. As evidence for the theory, however, it is essentially worthless. The description provides no way of knowing whether Anna had really repressed the memory of the dog drinking from the glass, whether this repression had caused her inability to drink, or whether recalling this “trauma” relieved the symptom. It is also unclear from this case study how typical or atypical Anna’s experience was.
Again, single-subject research involves studying a small number of participants and focusing intensively on the behaviour of each one. But why take this approach instead of the group approach? There are several important assumptions underlying single-subject research, and it will help to consider them now.
First and foremost is the assumption that it is important to focus intensively on the behaviour of individual participants. One reason for this is that group research can hide individual differences and generate results that do not represent the behaviour of any individual. For example, a treatment that has a positive effect for half the people exposed to it but a negative effect for the other half would, on average, appear to have no effect at all. Single-subject research, however, would likely reveal these individual differences. A second reason to focus intensively on individuals is that sometimes it is the behaviour of a particular individual that is primarily of interest. A school psychologist, for example, might be interested in changing the behaviour of a particular disruptive student. Although previous published research (both single-subject and group research) is likely to provide some guidance on how to do this, conducting a study on this student would be more direct and probably more effective.
A second assumption of single-subject research is that it is important to discover causal relationships through the manipulation of an independent variable, the careful measurement of a dependent variable, and the control of extraneous variables. For this reason, single-subject research is often considered a type of experimental research with good internal validity. Recall, for example, that Hall and his colleagues measured their dependent variable (studying) many times—first under a no-treatment control condition, then under a treatment condition (positive teacher attention), and then again under the control condition. Because there was a clear increase in studying when the treatment was introduced, a decrease when it was removed, and an increase when it was reintroduced, there is little doubt that the treatment was the cause of the improvement.
A third assumption of single-subject research is that it is important to study strong and consistent effects that have biological or social importance. Applied researchers, in particular, are interested in treatments that have substantial effects on important behaviours and that can be implemented reliably in the real-world contexts in which they occur. This is sometimes referred to as social validity (Wolf, 1976) [3] . The study by Hall and his colleagues, for example, had good social validity because it showed strong and consistent effects of positive teacher attention on a behaviour that is of obvious importance to teachers, parents, and students. Furthermore, the teachers found the treatment easy to implement, even in their often-chaotic elementary school classrooms.
Single-subject research has been around as long as the field of psychology itself. In the late 1800s, one of psychology’s founders, Wilhelm Wundt, studied sensation and consciousness by focusing intensively on each of a small number of research participants. Herman Ebbinghaus’s research on memory and Ivan Pavlov’s research on classical conditioning are other early examples, both of which are still described in almost every introductory psychology textbook.
In the middle of the 20th century, B. F. Skinner clarified many of the assumptions underlying single-subject research and refined many of its techniques (Skinner, 1938) [4] . He and other researchers then used it to describe how rewards, punishments, and other external factors affect behaviour over time. This work was carried out primarily using nonhuman subjects—mostly rats and pigeons. This approach, which Skinner called the experimental analysis of behaviour —remains an important subfield of psychology and continues to rely almost exclusively on single-subject research. For excellent examples of this work, look at any issue of the Journal of the Experimental Analysis of Behaviour . By the 1960s, many researchers were interested in using this approach to conduct applied research primarily with humans—a subfield now called applied behaviour analysis (Baer, Wolf, & Risley, 1968) [5] . Applied behaviour analysis plays an especially important role in contemporary research on developmental disabilities, education, organizational behaviour, and health, among many other areas. Excellent examples of this work (including the study by Hall and his colleagues) can be found in the Journal of Applied Behaviour Analysis .
Although most contemporary single-subject research is conducted from the behavioural perspective, it can in principle be used to address questions framed in terms of any theoretical perspective. For example, a studying technique based on cognitive principles of learning and memory could be evaluated by testing it on individual high school students using the single-subject approach. The single-subject approach can also be used by clinicians who take any theoretical perspective—behavioural, cognitive, psychodynamic, or humanistic—to study processes of therapeutic change with individual clients and to document their clients’ improvement (Kazdin, 1982) [6] .
Key Takeaways
A type of quantitative research that involves studying the behaviour of each small number of participants in detail.
The study of large numbers of participants and examining their behaviour primarily in terms of group means, standard deviations, and so on.
A detailed description of an individual, which can include both qualitative and quantitative analyses.
The study of strong and consistent effects that can be implemented reliably in the real-world contexts in which they occur.
Laboratory methods that rely on single-subject research; based upon B. F. Skinner’s philosophy of behaviourism which posits that everything organisms do is behaviour.
Starting in the 1960s, researchers began using single-subject techniques to conduct applied research with human subjects.
Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
IMAGES
VIDEO
COMMENTS
Describe the basic elements of a single-subject research design. Design simple single-subject studies using reversal and multiple-baseline designs. ... Figure 10.5 long description: Two graphs showing the results of a generic single-subject study with an ABA design. In the first graph, under condition A, level is high and the trend is increasing.
Key Takeaways. Single-subject research—which involves testing a small number of participants and focusing intensively on the behavior of each individual—is an important alternative to group research in psychology. Single-subject studies must be distinguished from case studies, in which an individual case is described in detail.
Single-subject designs are typically described according to the arrangement of baseline and treatment phases. The conditions in a single-subject experimental study are often assigned letters such as the A phase and the B phase, with A being the baseline, or no-treatment phase, and B the experimental, or treatment phase.
In design of experiments, single-subject curriculum or single-case research design is a research design most often used in applied fields of psychology, education, and human behaviour in which the subject serves as his/her own control, rather than using another individual/group. Researchers use single-subject design because these designs are sensitive to individual organism differences vs ...
The most basic single-subject research design is the ... Figure 10.4 long description: Two graphs showing the results of a generic single-subject study with an ABA design. In the first graph, under condition A, level is high and the trend is increasing. Under condition B, level is much lower than under condition A and the trend is decreasing.
Many of these features are illustrated in Figure 10.1, which shows the results of a generic single-subject study. First, the dependent variable (represented on the y -axis of the graph) is measured repeatedly over time (represented by the x -axis) at regular intervals. Second, the study is divided into distinct phases, and the participant is ...
2. Complimentary research designs. Though the completely randomized group design is considered by many to be the gold standard of evidence (Meldrum, 2000), its limitations as well as ethical and logistical execution difficulties have been noted: e.g., blindness to group heterogeneity, problematic application to individual cases, and experimental weakness in the context of other often-neglected ...
Research design refers to the overall plan, structure or strategy that guides a research project, from its conception to the final analysis of data. Research designs for quantitative studies include descriptive, correlational, experimental and quasi-experimenta l designs. Research designs for qualitative studies include phenomenological ...
A research design is a strategy for answering your research question using empirical data. Creating a research design means making decisions about: Your overall research objectives and approach. Whether you'll rely on primary research or secondary research. Your sampling methods or criteria for selecting subjects. Your data collection methods.
Single subject research design is a type of research methodology characterized by repeated assessment of a particular phenomenon (often a behavior) over time and is generally used to evaluate interventions [].Repeated measurement across time differentiates single subject research design from case studies and group designs, as it facilitates the examination of client change in response to an ...
Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external ...
The single-case experiment has a storied history in psychology dating back to the field's founders: Fechner (1889), Watson (1925), and Skinner (1938).It has been used to inform and develop theory, examine interpersonal processes, study the behavior of organisms, establish the effectiveness of psychological interventions, and address a host of other research questions (for a review, see ...
Figure 10.3 Results of a Generic Single-Subject Study Illustrating Several Principles of Single-Subject Research. Another important aspect of single-subject research is that the change from one condition to the next does not usually occur after a fixed amount of time or number of observations. Instead, it depends on the participant's behavior.
Key Takeaways. Single-subject research—which involves testing a small number of participants and focusing intensively on the behavior of each individual—is an important alternative to group research in psychology. Single-subject studies must be distinguished from qualitative research on a single person or small number of individuals.
Single subject research designs are "weak when it comes to external validity….Studies involving single-subject designs that show a particular treatment to be effective in changing behavior must rely on replication-across individuals rather than groups-if such results are be found worthy of generalization" (Fraenkel & Wallen, 2006, p ...
Abstract. Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case—which can be an individual, clinic, or community—ideally with replications of effects within and/or between cases.
The vast majority of research published in behavior analytic journals was conducted using single subject methodology. Repeated measures. Definition: When we use single subject experimental designs, we need to capture something to measure to see if our intervention is working. That thing we measure is called a dependent variable.
This book is a compendium of tools and information for researchers considering single-case design (SCD) research, a newly viable and often essential methodology in applied psychology, education, and related fields. ... Single-Case Designs and Large-N Studies: The Best of Both Worlds Susan M. Sheridan; Using Single-Case Research Designs in ...
Although usually labeled a quasi-experimental time-series design, single-case research designs are described in this article as a separate form of research design (formerly termed single-subject or N = 1 research) that have a long and influential history in psychology and education (e.g., Kratochwill, 1978; Levin et al., 2003) and can serve as ...
Single-case experimental designs (SCEDs) have become a popular research methodology in educational science, psychology, and beyond. The growing popularity has been accompanied by the development of specific guidelines for the conduct and analysis of SCEDs. In this paper, we examine recent practices in the conduct and analysis of SCEDs by systematically reviewing applied SCEDs published over a ...
Purpose of this presentation: Brief introduction to single subject designs. Identify elements of single designs that contribute to problems with internal validity/experimental control from a reviewer's perspective. Discuss solutions for some of these issues; ultimately necessary for publication and external funding.
Studies that use a single-case design (SCD) measure outcomes for cases (such as a child or family) repeatedly during multiple phases of a study to determine the success of an intervention. The number of phases in the study will depend on the research questions, intervention, and outcome(s) of interest (see Types of SCDs on page 4 for examples).
The most basic single-subject research design is the ... Figure 10.4 long description: Two graphs showing the results of a generic single-subject study with an ABA design. In the first graph, under condition A, level is high and the trend is increasing. Under condition B, level is much lower than under condition A and the trend is decreasing.
Abstract. This chapter addresses the peculiarities, characteristics, and major fallacies of single case research designs. A single case study research design is a collective term for an in-depth analysis of a small non-random sample. The focus on this design is on in-depth.
Key Takeaways. Single-subject research—which involves testing a small number of participants and focusing intensively on the behaviour of each individual—is an important alternative to group research in psychology. Single-subject studies must be distinguished from case studies, in which an individual case is described in detail.