U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Measuring attitude towards mathematics using Likert scale surveys: The weighted average

Carmen león-mantero.

1 Department of Mathematics, University of Córdoba, Córdoba, Spain

José Carlos Casas-Rosal

2 Department of Statistics, Econometrics, Operational Research, Business Organization and Applied Economics, University of Córdoba, Córdoba, Spain

Cristina Pedrosa-Jesús

Alexander maz-machado, associated data.

All relevant data are within the manuscript.

In the research on mathematics education, numerous Likert-type instruments estimating attitudes toward mathematics are sometimes composed of factors with a high correlation, which can make it difficult to assign the statements from the scale to each estimated factor. Further, the measurement of attitudes is usually done by adding the scores but ignoring the existence of possible differences in the importance that each item can have in its factor estimation. A revision of the methodology for the study of attitudes toward mathematics for the correct validation of the instrument is carried out in this research, and an estimation of the factors of attitude is proposed using the calculation of the weighted average of scores based on the importance that each item has in the explanation of its factor, which is given by a structural equation model. This methodology has been applied to Auzmendi’s scale of attitudes toward mathematics measurement in a sample of 1293 university students. The factors were estimated using simple and weighted averages; significant differences have been obtained in the measurements, as well on those shown with the organization proposed by Auzmendi.

Introduction

Among researchers who focus on the influence of the affective domain in the teaching and learning of mathematics, it is assumed that cognitive factors are not the only determining ones. When students and teachers work on mathematics or other subjects, their interests, beliefs, feelings, and attitudes influence play an important role in this process, which justifies an in-depth analysis of these factors [ 1 – 3 ].

Researchers’ interest in trying to measure the influence that certain factors have on the attitudes of students or teachers has given rise to numerous investigations regarding which measurement instruments such as the Thurstone type, Likert scales, or questionnaires are most adequately designed and validated. In the analysis of social behavior such as attitudes toward mathematics, it is necessary to apply a correct sampling methodology and use both a reliable instrument to measure behavior and carefully interpret the results.

The simplicity and the homogeneity of the Likert scale have made it the most frequently used instrument in the measurement of attitudes and beliefs toward mathematics. Due to the impossibility of measuring the factors that constitute these attitudes and beliefs directly, it is common to construct the variables that represent these factors by adding or averaging the valuations obtained from the items used to measure them.

This scale has been widely used since its publication in 1932 by Likert [ 4 ] and is supported by numerous handbooks, such as [ 5 ], which are important references in the study of attitudes. There are numerous quantitative studies of attitudes using a Likert scale [ 6 – 8 ], and, for purposes of this study, attitudes regarding mathematics [ 9 – 12 ]. However, when constructing the factors to be measured, the linearity of the scale is implicitly imposed, and it is assumed that all the items have the same weight.

When the Likert scale is designed to explain a set of factors that can explain the objective factor of study, like attitude in the scale of Auzmendi [ 13 ], there is usually a high association between them, which can make it difficult to design and assign the items to the factors. In these cases, validation is usually carried out by consulting a panel of experts and analysing the reliability by calculating parameters such as Cronbach's alpha coefficient. However, the high association among the factors can generate an overestimation of these parameters and induce an error in the assignment of the items to the dimensional factors of attitude and, therefore, their subsequent estimation.

In this research, a Likert-scale survey that is often used in the study of attitudes is analysed in order to improve the estimation of attitudes students have toward mathematics. Additionally, a new distribution of the items around the attitude factors is proposed based on the conclusions of a panel of experts, with subsequent confirmation by an exploratory factor analysis that confirms that the results found through this methodology cohere with the existing knowledge of the subject.

As mentioned above, it is usual to estimate the factors of an attitude scale by adding or averaging the valuations given to the items that define them [ 4 ]. This paper aims to give the keys to the correct design of research using a Likert-scale survey to estimate attitudes toward mathematics and propose a way of estimating the factors under study through a weighted average using the weights obtained in the estimation of the standardized coefficients of the model of structural equations that is used to validate the instrument. This will allow the assignment of greater importance to the items that contain a greater percentage of explained variability for a given factor. As will be seen later, the factors thus constructed are significantly different from those constructed by addition or simple average.

The following section presents the theoretical framework, introducing the summative character of the Likert scale in estimating attitudes toward mathematics as well as the definition of attitude. A brief review of the most frequently used instruments for estimating it is also made, as well as its use in the scientific literature. Next, the objectives of this work are indicated, the methodology followed is shown, and the samples and the instrument to which it has been applied are presented. Finally, the most important results and conclusions are detailed.

Theoretical framework

The elaboration of instruments for the measurement of attitudes through ordinal items, as proposed by Likert [ 4 ], has been widely used in the scientific literature in the social sciences and psychology for the estimation of constructs that are not directly measurable. Many handbooks establish the summative construction of a Likert scale in order to estimate complex variables like attitude [ 5 ].

Likert proposed the estimation of the objective factors of study in the scale through the sum of the scores of the different items. This is assumed by researchers who use such scales; proof of this is termed by authors such as Cooper, Blackman and Keller [ 14 ] or Krosnick, Judd, and Wittenbrink [ 15 ] as "Likert's method of summated ratings".

Since the middle of the past century, the construct known as “attitude” has been widely studied in various lines of research in the fields of psychology and education. However, McLeod [ 16 ] brought a new vision to the field and contributed to the concept of attitude being part of the affective factors considered in the study of the affective domain in mathematics education. Traditionally, the objective of these investigations was to develop quantitative methods for measuring attitudes toward mathematics and analysing the relationship among these and other characteristics of the participants such as academic performance, gender, age, or level of education. This was the reason these studies are characterized by not specifying the definition of attitude or, in many cases, by defining it in terms of the instrument that is being used to measure it [ 2 ].

Although there is no unanimity with regard to the definition of attitude, there is certain agreement to consider it as consisting of modifiable mental states which, therefore, can be influenced. Hart [ 17 ] and Gómez-Chacón [ 18 ] agree in understanding attitude as an evaluative predisposition that determines personal intentions and influences behavior. Attitudes are characterized by appearing at any age, although they tend to be positive at younger ages; they can be positive toward one part of the subject and negative toward another, and they are adjustable according to their intensity and are reflected in the predisposition to the subject through feelings toward the teacher or a specific type of activity.

The studies of Auzmendi [ 13 ] and Gómez-Chacón [ 18 ] defend the position that attitude consists of three components: cognitive, affective, and behavioral. The cognitive component refers to conceptions and beliefs with respect to the people, objects, or contexts in which we live; the affective component refers to the emotions and feelings that these people, objects, or contexts awaken—they can be both pleasant and unpleasant. Finally, the behavioral component is related to the behavior shown in reaction to certain stimuli.

The review of the literature gives us a list of numerous validated instruments for measuring attitudes toward mathematics such as the Thurstone type, Likert scales and other questionnaires. Among the most frequently used Likert scales for the measurement of attitudes toward mathematics we can find:

  • The E (Enjoyment of mathematics) and V (Value of mathematics) scales designed and validated by Aiken [ 19 ] that respectively measure the enjoyment of mathematical concepts, symbols, and applications using a computer, as well as the importance of the subject. They consist of 12 and 11 items each.
  • The Scale of Attitudes toward mathematics designed and validated by Fennema and Sherman [ 20 ], formed initially by four subscales that contain 12 items each and that try to measure, among others, the confidence, utility, and perception of the teacher toward the subject. Later studies have updated the expressions and vocabulary of the items and reduced them from 48 to 47.

At the Latin American level, the most widely used scale for measuring the attitudes of secondary school students [ 13 , 21 ], university students [ 22 – 26 ], and teachers [ 27 ] is the scale that is analyzed in this study: the Likert-type scale of attitudes toward mathematics developed by Auzmendi [ 13 ].

For the development of a measurement of attitudes regarding mathematics and the validation of the results, it is important, in the first place, to use a panel of experts to create the items. Next, the instrument should be applied to a pilot sample in order to analyse its validity and reliability. In the event that the instrument serves to estimate multiple factors of attitude, an exploratory factor analysis [EFA] should be carried out on the pilot sample to allow a definition of the grouping of the items. This distribution of items must be verified by a confirmatory factor analysis [CFA] on a larger sample [ 28 ]. This analysis is even more important if, as in this case, there is a high association among the different factors estimated with the instrument.

The purpose of this study is to obtain a better estimation of the factors of attitude toward mathematics, substituting the usual sum of the item scores by their weighted average with weights depending on the importance the item has for the factor it measures. This methodology was used to analyse the results obtained through Auzmendi’s scale of attitudes toward mathematics [ 13 ]. The specific objectives are:

  • Obtaining a more adequate reordering of the items that make up the attitudes scale, motivated by the application of an EFA and the analysis made of the statements of the items in the expert panel consultation.
  • Measurement of the degree of association among the five factors proposed by Auzmendi for the measurement of attitudes toward mathematics since a high degree would justify the value of consistency obtained by the author as well as its overestimation.
  • Obtaining the relevance that each item has in the explanation of its corresponding factor so that the weighted means of factor valuation can be calculated.
  • Comparison of the values obtained for the simple average of the factors with the original and new ordination, as well as with the weighted average.

This will allow obtaining more adequate estimates of the dimensions of the attitude toward mathematics using the Auzmendi scale. However, this methodology could also be applied to improving the estimates of other attitude scales and other complex behavioural variables.

Participants

The participants in this research were students of different grades from the University of Cordoba (n = 1293) during the 2014/2015, 2015/2016 and 2016/2017 academic years. The sampling was carried out in two phases—in the first, 408 students were surveyed to obtain a pilot sample, and there were 885 students in the second sample; for a total of 1293 students.

The total sample was composed more women (64.70%) than men and the average age was 20.36 (s = 3.34). They were all adult when they completed the questionnaire. The surveys were conducted in the facilities where they usually receive teaching randomly, anonymously, and voluntarily, and the testing did not request personal data. Each answer was coded with an alphanumeric code to identify the degree and number of the questionnaire.

Firstly, and according to the guidelines of the World Health Organization, students were given a document for their consent. This document was read aloud and all doubts and comments raised by the students were resolved. Then they were given the questionnaire and were asked to assess their desire to complete it, for which they had all the time they needed and were assured that non-participation would not have any negative effect.

Data collection instruments

The instrument that was applied was Auzmendi’s scale of attitudes toward mathematics [ 13 ]. This questionnaire was originally validated on a sample of 1221 Spanish students and consists of 25 questions on a Likert scale, with 15 affirmative and 10 negative statements that allow the following scoring options: Strongly disagree = 1, Disagree = 2, Neutral (Neither agree nor disagree) = 3, Agree = 4, and Strongly agree = 5. They were also asked for information regarding their age, gender, and degree.

Auzmendi [ 13 ] selected the items that constituted each of the attitude factors:

  • enjoyment, relative to how pleasant is working with the subject;
  • anxiety, referring to the feeling of fear and discomfort that the student manifests vis-à-vis the subject;
  • utility of mathematics, or value that the student considers the acquisition of mathematical knowledge will have for their academic or professional future;
  • motivation regarding the study and use of the subject in their studies or in daily life;
  • confidence in their own mathematical ability.

The items are grouped into these factors as shown in Table 1 .

Dimensional factorItems
1, 6, 15, 16, 19 and 21
2, 3, 7, 8, 12, 13, 17, 18 and 22
4, 9, 14 and 24
5, 10 and 25
11, 20 and 23

A preliminary analysis carried out by eight experts in: development of data collection tools (2), mathematics didactics (3), psychology (1), and pedagogy (2) revealed that some of the items could show statements that were not associated with the factors proposed by Auzmendi. Examples include:

  • Item 15, placed by Auzmendi in the Utility factor: "I hope to use little mathematics in my professional life" refers to the hope that the interviewee may have about not using mathematics but not the belief of its utility. Therefore, this item could be included in a more adequate way in the Anxiety factor.
  • In a similar way, item 19 can be analysed: "I would like to have a job in which I had to use mathematics," which, once again, was included in the Utility factor, can motivate a response more related to the enjoyment that the interviewed experiences when working with mathematics than the perception of utility towards this subject.

These findings motivate the need to restructure the items so that the information collected is directed to explaining the underlying factors. A strong relationship between the five dimensional factors that explain the attitudes toward mathematics would demonstrate that the consistency of the questionnaire obtained by the author was high even though some items might not be correctly associated with their factor.

Statistical analysis

Once the data were collected they were processed, the results of the statements were reversed with a negative sense (items 2, 5, 7, 10, 12, 15, 16, 17, 22 and 25), and were analyzed with the SPSS programs [ 29 ] for the estimation of the factorial analysis and the calculation of factor values, AMOS [ 30 ], for the estimation of the structural equation and R for the calculation of the omega coefficients of consistency.

To analyze the latter, both the Cronbach's alpha coefficient and the Omega coefficient for the instrument were calculated globally with values of 0.896 and 0.901, and subsequently for each factor originally defined by Auzmendi [ 13 ] ( Table 2 ). Omega coefficients were calculated because the number of items affects the Cronbach's alpha values [ 31 ]. This coefficient is less sensitive to the number of items in each dimension [ 32 ]. We can observe that the global internal consistency is very high, close to 0.9 for both coefficients. All the constructs can be considered consistent with the exception of Confidence, whose value is excessively reduced.

ComponentCronbach’s AlphaOmega
0.8270.857
0.8660.881
0.6780.723
0.7060.740
0.5550.594

Once the sample has been analyzed, the need arises to structure the statistical techniques that will allow us to obtain the objectives pursued in this research. The methodology followed was distributed in three steps:

  • Step 1. Due to the possible inconsistencies found in the relationship among some items and their factors, this assignment was improved with a reordering based on a double validation—on the one hand, an EFA on a pilot sample of 408 students and, on the other hand, consultation with a panel of experts.
  • Step 2. The sample was expanded to 1293 students and a CFA was applied and validated, estimating a model of structural equations as proposed by authors such as Worthington and Whittaker [ 28 ]. This technique improves the EFA since relationships among the different factors measured through the instrument can be imposed in their definition.
  • Step 3. The creation of the SEM allowed us to estimate, on the one hand, the degree of association among the different factors and, on the other, the weight that the response to each item has on the estimation of its corresponding factor in the following way:

The equations of a SEM can be denoted as

where η represents the factor in a model, B represents the parameter coefficients that link factors with other factors, Γ is the corresponding matrix of parameter coefficients linking the exogenous variables (ξ), which are the items, with the factors; and ζ represents the error in the prediction of η . Therefore, the matrix measures the relevance that each item has for the factor that it estimates and the value is comparable with that of other items due to the homogeneity of the scale. However, standardized coefficients were used to construct the weights.

The latter allowed us to compare the estimate obtained for each factor with the simple average of the scores of the items that explain them and those obtained by weighing the response of each item with its relevance in its explanation [ 33 ].

In order to obtain a more adequate classification of the items, a factorial analysis was proposed by the principal component method with Oblimin rotation, taking as a criterion for extraction obtaining a value higher than the unit in the associated eigenvalue.

The model of structural equations was defined from the configuration obtained in the factor analysis including all the possible correlations among factors, which allowed us to determine which are relevant [ 34 ].

Finally, after making the two estimations of the factors—the weighted average and the simple average—their normality was analyzed using the Kolmogorov-Smirnov test and, after discarding its compliance, the application of the Wilcoxon range test, which allowed to decide if the differences between the estimated factors of the two forms were significant. The effect size was calculated with Cliff’s delta method for ordinal variables [ 35 ].

Exploratory factor analysis

To analyze the applicability of the factorial analysis for the pilot sample of 408 students, the asymmetry and pointing coefficients of the variables were calculated resulting in values between -1.135 and 0.385 for the first and between -1.228 and 1.654 for the second variable so that the distributions are not far in excess of the normality hypothesis. However, remember the ordinal character of the variables and, therefore, the impossibility of the variables to follow this distribution; but the asymmetry and pointing values obtained, together with the high sample, allow the applicability of the technique. Due to the nature of the variables, the analysis of the presence of atypical values is not necessary.

The model that was finally considered in the analysis was constructed from 22 items since items 11, 16, and 20 were removed due to their reduced communalities, which affected it negatively. The Kaiser-Meyer-Olkin measure, with a value of 0.901, and the Bartlett sphericity test, with a p-value lower than 0.001, demonstrate the existence of an association among the items and, therefore, the suitability of the applicability of the EFA.

The 22 variables finally considered were grouped in five dimensional factors with a total variance explained to 61.94% by means of the principal component method, to which an Oblimin rotation was applied. Table 3 shows the commonalities associated with these variables, as well as the factorial loads, whose absolute value is greater than 0.3, associated with each of the five factors.

ItemCommunalitiesFactors
12345
0.7220.834    
0.6660.782    
0.7430.695    
0.5340.617    
0.5530.594    
0.704 0.796   
0.683 0.794   
0.726 0.774   
0.645 0.761   
0.597 0.626   
0.555 0.626   
0.686  0.858  
0.607  0.676  
0.581  0.668  
0.698   0.747 
0.693   0.741 
0.581   0.681 
0.440   0.510 
0.471   0.471 
0.612    -0.803
0.548    -0.667
0.569    -0.649

As can be seen, the items have been ranked in decreasing order based on their factorial loads with respect to their corresponding factors. These exceed the value of 0.6 except the last of factors 1 and 4. All commonalities are also higher than 0.4 and most of them exceed 0.6.

In this way, the new assignment of items to factors carried out in the EFA is as shown in Table 4 , which also specifies the consistency values given by the Cronbach's omega and alpha coefficients as well as the information previously presented for the Auzmendi scale for comparison.

FactorAuzmendi distributionProposed distribution
ItemsAlphaOmegaItemsAlphaOmega
1, 6, 15, 16, 19 y 210.7060.7401, 6 y 210.6500.685
2, 3, 7, 8, 12, 13, 17, 18 y 220.8660.8812, 7, 12, 15, 17 y 220.8680.891
4, 9, 14 y 240.8270.8574, 9, 14, 19 y 240.8410.868
5, 10 y 250.6780.7235, 10 y 250.6780.723
11, 20 y 230.5550.5943, 8, 13, 18 y 230.8020.833
0.8960.9010.9090.910

Six of the 25 items have been redistributed by the EFA so that four of them go from explaining the anxiety that the interviewees show toward mathematics to explain their confidence. In addition, items 15 and 19 were taken from the group that explains the utility of this subject and included in Anxiety and Enjoyment, respectively. Finally, items 11, 16, and 21 have been excluded from the analysis due to their low explanatory capacity.

When the items are reorganized in this way, the global consistency remains practically invariant. Only an improvement of one tenth is observed, and the values of the omega and alpha coefficients of Cronbach improve for all the factors except for Motivation, which remains without changes since this factor stays invariant in this transformation; and the Utility factor, in which it is slightly reduced—from 0.706 to 0.650 in the case of the alpha coefficient, and from 0.740 to 0.685 in the case of the omega coefficient. It is worth highlighting the increase in the consistency of the Confidence factor, which, as happened in Auzmendi [ 13 ], showed an excessively low value of alpha, equal to 0.555, which has risen to 0.802 (from 0.594 to 0.833 in the case of the omega coefficient).

The results support the hypothesis regarding the necessary redistribution of the items of the questionnaire both from the point of view of the interpretation of the statements by the panel of experts and due to the improvement obtained for the internal consistency of the instrument for this sample.

Next, we show the estimation of the structural equation created for the total sample formed by 1296 students, which confirms the factorial analysis estimated in the previous step, and from which the weights of each item will be estimated in the calculation of the weighted averages for the estimation of the factors.

As previously mentioned, the estimation of the standardized coefficients of the regression between the items and the factors to which they are associated according to the previous classification allow us to estimate these factors as the weighted average of the values given to each of the items in its category.

Structural equation

Fig 1 shows the structural equation estimated by the maximum likelihood method for the 22 items considered, the five-dimensional factors, and the organization found in the previous section. The names of the items identify each factor in Auzmendi’s original scale.

An external file that holds a picture, illustration, etc.
Object name is pone.0239626.g001.jpg

The absolute, incremental, and parsimony adjustment measures obtained for this model, as well as the widely accepted criterion of goodness, are shown in Table 5 . All of them, together with the relevance of all the variables considered and the value of the residuals, show the validity of our model.

Adjustment measures (absolute)ValueOptimal value
Goodness of fit indexGFI0.957> 0.9
Root Mean Square Error of ApproximationRMSEA0.042< 0.05
P-value of close fitPCLOSE1.000> 0.05
Standardized root mean square residualSRMR0.039< 0.08
Normed fit indexNFI0.948> 0.90
Comparative adjustment indexCFI0.963> 0.90
Chi-squared test for normalityNCS3.312Values between 1 and 3

Once the structural equation was estimated and validated, we analyzed the correlations among the five factors: Enjoyment, Confidence, Utility, Anxiety, and Motivation. The results are shown in Table 6 . All the relationships are significant; the strongest relationship is observed between the Enjoyment and Confidence factors, which indicates that a greater enjoyment for mathematics facilitates greater confidence in the subject and vice versa. At the other extreme are Motivation and Confidence, which, while significant, have a reduced value that indicates that, although motivation by itself could generate confidence, other factors can influence it.

RelationEstimated valuesRelationEstimated values
Enjoyment<—>Anxiety0.330Anxiety<—>Confidence0.368
Enjoyment<—>Motivation0.158Anxiety<—>Utility0.189
Enjoyment<—>Confidence0.786Motivation<—>Confidence0.095
Enjoyment<—>Utility0.698Motivation<—>Utility0.253
Anxiety<—>Motivation0.520Confidence<—>Utility0.488

Also noteworthy, due to their intensity, are the relationships that indicate that a high perception of mathematics utility increases the enjoyment level shown for this subject. In the same way, a lower level of anxiety is associated with a greater feeling of confidence. It is important to remember that the responses to the items, presented in a negative way, were inverted at the beginning of the study to represent positive values for the attitudes toward mathematics.

Estimation of factors

Once the model of structural equations was built, the five factors, with their proposed distribution, were estimated in two ways—in the first one, through the simple arithmetic mean of the values obtained in each item. In the second, from the estimations of the standardized coefficients of the regression, which are shown in Fig 1 , we calculated the means weighted with these as weights. The results of these coefficients are shown in Table 7 . The differences between the standardized coefficients in some factors are high, as in items 15 and 22 of the Anxiety factor, or 8 and 23 of Confidence.

ItemEstimationFactorItemEstimationFactor
P040.764EnjoymentP050.682Motivation
P090.706P100.714
P140.853P250.600
P240.637P080.767Confidence
P190.744P180.619
P170.845AnxietyP030.655
P020.695P230.574
P220.886P130.694
P070.641P060.683Utility
P120.649P010.724
P150.560P210.646

The high variability of the coefficients shows the need to assign different weights to the items of the same factor. Once the five factors were estimated, a transformation was made to present them on a scale between 0 and 100 for a better interpretation of the results. A descriptive study is shown in Table 8 , in which there are high differences between the calculated values with the scale originally given by Auzmendi and the one proposed in this research, except for the Motivation factor, which, as seen above, contains the same items in both distributions, and in which the weights do not show strong discrepancies among the items.

FactorOriginal distribution Simple averageProposed distribution and Simple meanProposed distribution and weighted mean
MeanStandard deviationMeanStandard deviationMeanStandard deviation
38.795022.379639.253722.011939.560722.1178
50.907019.263051.363123.448151.351523.7481
58.113223.332058.101323.307758.248523.5198
71.140417.224354.887920.493354.426620.7072
56.368416.590363.321720.299863.692920.2701

When comparing the values of the original distribution factors and the simple means with those of the proposed distribution and weighted averages, it was observed that the mean values of the factors Enjoyment, Anxiety, and Utility were being underestimated with the previous methodology, while the average value of the Confidence factor was clearly overestimated.

Finally, we analysed, through the Wilcoxon distributions comparison contrast, the existence of significant differences among the factors. The results are shown in Table 9 . We opted for a non-parametric contrast, due to the absence of normality of some of the constructed variables, analysed with the Kolmogorov-Smirnov test. The effect size of differences was computed through the Cliff’s delta statistic. A value smaller than 0.11 is considered very small; values between 0.11 and 0.28 are considered small; values between 0.28 and 0.43 are considered medium; and values greater than 0.43 are considered large [ 35 ].

FactorOriginal distribution vs. Proposed distribution and weighted meanOriginal distribution vs. proposed distribution and simple meanProposed distribution and simple mean vs. proposed distribution and weighted mean
p-valueEffect sizep-valueEffect sizep-valueEffect size
Enjoyment0.0190.0210.0190.013<0.0010.025
Anxiety0.0290.0070.0550.0060.8060.000
Motivation<0.0010.017--<0.0010.017
Confidence<0.0010.475<0.0010.469<0.0010.026
Utility<0.0010.286<0.0010.247<0.0010.075

Looking at the size of the effect, the differences are large for the Confidence factor and medium for the Utility factor.

All the factors estimated with the weighted mean and the proposed distribution of the instrument are significantly different from those obtained with the simple mean and the distribution of Auzmendi except Motivation. Significant differences have also been found between the two simple averages of the factors Confidence and Utility and, to a lesser extent, Enjoyment and Anxiety. Finally, the differences between the two means calculated with the proposed distribution are also significant when measuring Enjoyment, Motivation, Confidence, and Utility. In this last case, the calculation of the simple mean underestimates the values of Enjoyment, Motivation, and Utility, while Confidence was overestimated.

Conclusions

When carrying out any attitudes study with the use of an information collection instrument, both the correct calibration of the measurement instrument and the calculation of the factors to be estimated from the items are of vital importance. The incorrect use of any of them can lead to erroneous conclusions.

This paper reviewed the methodology for the treatment of a widely used measurement instrument for attitudes toward mathematics, with which the items have been analysed using the five factors that define the attitudes. At first, a new ordering of items was proposed through the analysis of internal consistency—the elaboration of an EFA on a pilot sample, which justified the initially proposed changes. Then, a CFA confirmed the consistency of the results obtained. Especially important is this last step in the attitude measurement, a multi-component instrument, due to the possible relationship among them, usually not addressed in the EFA. The existence of this relationship has been confirmed in the developed case as well.

The need to question the suitability of constructing attitude factors toward mathematics as the sum or simple average of the responses to the items, a method proposed by Likert [ 4 ] and widely used in the literature, has been raised not only for attitudes toward mathematics but for those in which Likert-type scales are used to estimate other factors. The existence of significant differences in the estimation of the factors has been demonstrated through the simple average of the item scores and a weighted average based on the relevance of the items in the estimation of each factor.

The estimation of non-measurable factors such as confidence or utility through the sum of the values of each item or the simple average imposes the condition that all the responses to the items have the same relevance in the explanation of the factor regardless of the content of the question. The coefficients of the structural equation between the factors and the items or the commonalities of the factorial analysis can be measures of the importance that each item has.

This proposed methodology for improving the estimation of attitudes toward mathematics was applied to the survey developed by Auzmendi, finding significant differences in the estimation of a part of the estimated factors, which are large in the Confidence and Utility factors. In the first, an overestimation is generated and in the second, an underestimation.

The aforementioned has been able to bring about conclusion that, in the investigations carried out in recent years on attitudes toward mathematics, higher evaluations have been obtained with respect to the feeling of confidence regarding one’s ability in mathematics than those that the participants in these studies actually possess. Examples in which it could have occurred are in the studies by Auzmendi [ 13 ] (83.93 points out of 100), Flores and Auzmendi [ 25 ] (72.20 points out of 100), or Nortes Martínez and Nortes Checa [ 26 ] (78.80 points out of 100).

In the same way, the interviewees could perceive that mathematics is more useful for their professional and academic life than the attitude that was obtained in these previous investigations (67.93 [ 13 ], 66.13 [ 25 ], 67.80 [ 26 ] points out of 100 respectively).

For example, in the works mentioned above, a poor estimate of these factors could have led these studies to draw erroneous conclusions about the correlations among levels of attitude and variables such as gender, course, or high school modality, as well as the differences found between men and women or among students who study different university subjects. Although in the rest of the factors the effect size is small, the differences—significant—cause important variations in the estimation of the mean and the standard deviation of the factors.

Descriptive studies that explore the attitudes toward mathematics of students of different gender, ethnicity, geographical situation, or who are studying different subjects, aim to provide information to mathematics teachers so that they can carry out interventions that help their students improve. Some works have already been developed along this line such as Hannula et al. [ 2 ], but it is necessary to continue working on this because very little progress has been made and poor results have been obtained. It is especially important that each of the attitude factors are estimated correctly to avoid reaching incorrect conclusions.

Funding Statement

The author(s) received no specific funding for this work.

Data Availability

REVIEW article

A review of key likert scale development advances: 1995–2019.

\r\nAndrew T. Jebb*

  • 1 Department of Psychological Sciences, Purdue University, West Lafayette, IN, United States
  • 2 Department of Psychology, University of Houston, Houston, TX, United States

Developing self-report Likert scales is an essential part of modern psychology. However, it is hard for psychologists to remain apprised of best practices as methodological developments accumulate. To address this, this current paper offers a selective review of advances in Likert scale development that have occurred over the past 25 years. We reviewed six major measurement journals (e.g., Psychological Methods , Educational , and Psychological Measurement ) between the years 1995–2019 and identified key advances, ultimately including 40 papers and offering written summaries of each. We supplemented this review with an in-depth discussion of five particular advances: (1) conceptions of construct validity, (2) creating better construct definitions, (3) readability tests for generating items, (4) alternative measures of precision [e.g., coefficient omega and item response theory (IRT) information], and (5) ant colony optimization (ACO) for creating short forms. The Supplementary Material provides further technical details on these advances and offers guidance on software implementation. This paper is intended to be a resource for psychological researchers to be informed about more recent psychometric progress in Likert scale creation.

Introduction

Psychological data are diverse and range from observations of behavior to face-to-face interviews. However, in modern times, one of the most common measurement methods is the self-report Likert scale ( Baumeister et al., 2007 ; Clark and Watson, 2019 ). Likert scales provide a convenient way to measure unobservable constructs, and published tutorials detailing the process of their development have been highly influential, such as Clark and Watson (1995) and Hinkin (1998) (being cited over 6,500 and 3,000 times, respectively, according to Google scholar).

Notably, however, it has been roughly 25 years since these seminal papers were published, and specific best-practices have changed or evolved since then. Recently, Clark and Watson (2019) gave an update to their 1995 article, integrating some newer topics into a general tutorial of Likert scale creation. However, scale creation—from defining the construct to testing nomological relationships—is such an extensive process that it is challenging for any paper to give full coverage to each of its stages. The authors were quick to note this themselves several times, e.g., “[w]e have space only to raise briefly some key issues” and “unfortunately we do not have the space to do justice to these developments here” (p. 5). Therefore, a contribution to psychology would be a paper that provides a review of advances in Likert scale development since classic tutorials were published. This paper would not be a general tutorial in scale development like Clark and Watson (1995 , 2019) , Hinkin (1998) , or others. Instead, it would focus on more recent advances and serve as a complement to these broader tutorials.

The present paper seeks to serve as such a resource by reviewing developments in Likert scale creation from the past 25 years. However, given that scale development is such an extensive topic, the limitations of this review should be made very explicit. The first limitations are with regard to scope. This is not a review of psychometrics , which would be impossibly broad, or advances in self-report in general , which would also be unwieldy (e.g., including measurement techniques like implicit measures and forced choice scales). This is a review of the initial development and validation of self-report Likert scales . Therefore, we also excluded measurement topics related the use self-report scales, like identifying and controlling for response biases. 1 Although this scope obviously omits many important aspects of measurement, it was necessary to do the review.

Importantly, like Clark and Watson (1995 , 2019 ), Hinkin (1998) , this paper was written at the level of the general psychologist, not methodologists, in order to benefit the field of psychology most broadly. This also meant that our scope was to fine articles that were broad enough to apply to most cases of Likert scale development. As a result, we omitted articles, for example, that only discussed measuring certain types of constructs [e.g., Haynes and Lench’s (2003) paper on the incremental validation of new clinical measures].

The second major limitation concerns its objectivity. Performing any review of what is “significant” requires, at a point, making subjective judgment calls. The majority of the papers we reviewed were fairly easy to decide on. For example, we included Simms et al. (2019) because they tackled a major Likert scale issue: the ideal number of response options (as well as the comparative performance of visual analog scales). By contrast, we excluded Permut et al. (2019) because their advance was about monitoring the attention of subjects taking surveys online, not about scale development, per se . However, other papers were more difficult to decide on. Our method of handling this ambuity is described below, but we do not try claim that subjectivity did not play a part of the review process in some way.

Additionally, (a) we did not survey every single journal where advances may have been published 2 and (b) articles published after 2019 were not included. Despite all these limitations, this review was still worth performing. Self-report Likert scales are an incredibly dominant source of data in psychology and the social sciences in general. The divide between methodological and substantive literatures—and between methodologists and substantive researchers ( Sharpe, 2013 )—can increase over time, but they can also be reduced by good communication and dissemination ( Sharpe, 2013 ). The current review is our attempt to bridge, in part, that gap.

To conduct this review, we examined every issue of six major journals related to psychological measurement from January 1995 to December 2019 (inclusive), screening out articles by either title and/or abstract. The full text of any potentially relevant article was reviewed by either the first or second author, and any borderline cases were discussed until a consensus was reached. A PRISMA flowchart of the process is shown in Figure 1 . The journals we surveyed were: Applied Psychological Measurement , Psychological Assessment , Educational and Psychological Measurement , Psychological Methods , Advances in Methods and Practices in Psychological Science , and Organizational Research Methods . For inclusion, our criteria were that the advance had to be: (a) related to the creation of self-report Likert scales (seven excluded), (b) broad and significant enough for a general psychological audience (23 excluded), and (c) not superseded or encapsulated by newer developments (11 excluded). The advances we included are shown in Table 1 , along with a short descriptive summary of each. Scale developers should not feel compelled to use all of these techniques, just those that contribute to better measurement in their context. More specific contexts (e.g., measuring socially sensitive constructs) can utilize additional resources.

www.frontiersin.org

Figure 1. PRISMA flowchart of review process.

www.frontiersin.org

Table 1. Summary of Likert scale creation developments from 1995–2019.

To supplement this literature review, the remainder of the paper provides a more in-depth discussion of five of these advances that span a range of topics. These were chosen due to their importance, uniqueness, or ease-of-use, and lack of general coverage in classic scale creation papers. These are: (1) conceptualizations of construct validity, (2) approaches for creating more precise construct definitions, (3) readability tests for generating items, (4) alternative measures of precision (e.g., coefficient omega), and (5) ant colony optimization (ACO) for creating short forms. These developments are presented in roughly the order of what stage they occur in the process of scale creation, a schematic diagram of which is shown in Figure 2 .

www.frontiersin.org

Figure 2. Schematic diagram of Likert scale development (with advances in current paper, bolded).

Conceptualizing Construct Validity

Two views of validity.

Psychologists recognize validity as the fundamental concept of psychometrics and one of the most critical aspects of psychological science ( Hood, 2009 ; Cizek, 2012 ). However, what is “validity?” Despite the widespread agreement about its importance, there is disagreement about how validity should be defined ( Newton and Shaw, 2013 ). In particular, there are two divergent perspectives on the definition. The first major perspective defines validity not as a property of tests but as a property of the interpretations of test scores ( Messick, 1989 ; Kane, 1992 ). This view can be therefore called the interpretation camp ( Hood, 2009 ) or validity as construct validity ( Cronbach and Meehl, 1955 ), which is the perspective endorsed by Clark and Watson (1995 , 2019) and standards set forth by governing agencies for the North American educational and psychological measurement supracommunity ( Newton and Shaw, 2013 ). Construct validity is based on a synthesis and analysis of the evidence that supports a certain interpretation of test scores, so validity is a property of interpretive inferences about test scores ( Messick, 1989 , p. 13), especially interpreting score meaning ( Messick, 1989 , p. 17). Because the context of measurement affects test scores ( Messick, 1989 , pp. 14–15), the results of any validation effort are conditional upon the context in and group characteristics of the sample with which the studies were done, as are claims of validity drawn from these empirical results ( Newton, 2012 ; Newton and Shaw, 2013 ).

The other major perspective ( Borsboom et al., 2004 ) revivifies one of the oldest and most intuitive definitions of validity: “…whether or not a test measures what it purports to measure” ( Kelley, 1927 , p. 14). In other words, on this view, validity is a property of tests rather than interpretations. Validity is simply whether or not the statement, “test X measures attribute Y,” is true. To be true, it requires (a) that Y exists and (b) that variations in Y cause variations in X ( Borsboom et al., 2004 ). This definition can be called the test validity view and finds ample precedent in psychometric texts ( Hood, 2009 ). However, Clark and Watson (2019) , citing the Standards for Educational and Psychological Testing ( American Educational Research Association et al., 2014 ), reject this conception of validity.

Ultimately, this disagreement does not show any signs of resolving, and interested readers can consult papers that have attempted to integrate or adjudicate on the two views ( Lissitz and Samuelson, 2007 ; Hood, 2009 ; Cizek, 2012 ).

There Aren’t “Types” of Validity; Validity Is “One”

Even though there are stark differences between these two definitions of validity, one thing they do agree on is that there are not different “types” of validity ( Newton and Shaw, 2013 ). Language like “content validity” and “criterion-related validity” is misleading because it implies that their typical analytic procedures produce empirical evidence that does not bear on the central inference of interpreting the score’s meaning (i.e., construct validity; Messick, 1989 , pp. 13–14, 17, 19–21). Rather, there is only (construct) validity, and different validation procedures and types of evidence all contribute to making inferences about score meaning ( Messick, 1980 ; Binning and Barrett, 1989 ; Borsboom et al., 2004 ).

Despite the agreement that validity is a unitary concept, psychologists seem to disagree in practice; as of 2013, there were 122 distinct subtypes of validity ( Newton and Shaw, 2013 ), many of them named after the fourth edition of the Standards that stated that validity-type language was inappropriate ( American Educational Research Association et al., 1985 ). A consequence of speaking this way is that it perpetuates the view (a) that there are independent “types” of validity (b) that entail different analytic procedures to (c) produce corresponding types of evidence that (d) themselves correspond to different categories of inference ( Messick, 1989 ). This is why to even speak of content, construct, and criterion-related “analyses” (e.g., Lawshe, 1985 ; Landy, 1986 ; Binning and Barrett, 1989 ) can be problematic, since this misleads researchers into thinking that these produce distinct kinds of empirical evidence that have a direct, one-to-one correspondence to the three broad categories of inferences with which they are typically associated ( Messick, 1989 ).

However, an analytic procedure traditionally associated with a certain “type” of validity can be used to produce empirical evidence for another “type” of validity not typically associated with it. For instance, showing that the focal construct is empirically discriminable from similar constructs would constitute strong evidence for the inference of discriminability ( Messick, 1989 ). However, the researcher could use analyses typically associated with “criterion and incremental validity” ( Sechrest, 1963 ) to investigate discriminability as well (e.g., Credé et al., 2017 ). Thus, the key takeaway is to think not of “discriminant validity” or distinct “types” of validity, but to use a wide variety of research designs and statistical analyses to potentially provide evidence that may or may not support a given inference under investigation (e.g., discriminability). This demonstrates that thinking about validity “types” can be unnecessarily restrictive because it misleads researchers into thinking about validity as a fragmented concept ( Newton and Shaw, 2013 ), leading to negative downstream consequences in validation practice.

Creating Clearer Construct Definitions

Ensuring concept clarity.

Defining the construct one is interested in measuring is a foundational part of scale development; failing to do so properly undermines every scientific activity that follows (T. L. Thorndike, 1904 ; Kelley, 1927 ; Mackenzie, 2003 ; Podsakoff et al., 2016 ). However, there are lingering issues with conceptual clarity in the social sciences. Locke (2012) noted that “As someone who has been reviewing journal articles for more than 30 years, I estimate that about 90% of the submissions I get suffer from problems of conceptual clarity” (p. 146), and Podsakoff et al. (2016) stated that, “it is…obvious that the problem of inadequate conceptual definitions remains an issue for scholars in the organizational, behavioral, and social sciences” (p. 160). To support this effort, we surveyed key papers on construct clarity and integrated their recommendations into Table 2 , adding our own comments where appropriate. We cluster this advice into three “aspects” of formulating a construct definition, each of which contains several specific strategies.

www.frontiersin.org

Table 2. Integrative summary of advice for defining constructs.

Specifying the Latent Continuum

In addition to clearly articulating the concept, there are other parts to defining a psychological construct for empirical measurement. Another recent development demonstrates the importance of incorporating the latent continuum in measurement ( Tay and Jebb, 2018 ). Briefly, many psychological concepts like emotion and self-esteem are conceived as having degrees of magnitudes (e.g., “low,” “moderate,” and “high”), and these degrees can be represented by a construct continuum. The continuum was originally a primary focus in early psychological measurement, but the advent of the convenient Likert(-type) scaling ( Likert, 1932 ) pushed it into the background.

However, defining the characteristics of this continuum is needed for proper measurement. For instance, what do the poles (i.e., endpoints) of the construct represent? Is the lower pole its absence , or is it the presence of an opposing construct (i.e., a unipolar or bipolar continuum)? And, what do the different continuum degrees actually represent? If the construct is a positive emotion, do they represent the intensity of experience or the frequency of experience? Quite often, scale developers do not define these aspects but leave them implicit. Tay and Jebb (2018) discuss different problems that can arise from this.

In addition to defining the continuum, there is also the practical issue of fully operationalizing the continuum ( Tay and Jebb, 2018 ). This involves ensuring that the whole continuum is well-represented when creating items. It also means being mindful when including reverse-worded items in their scales. These items may measure an opposite construct , which is desirable if the construct is bipolar (e.g., positive emotions as including happy and sad), but contaminates measurement if the construct is unipolar (e.g., positive emotions as only including feeling happy). Finally, developers should choose a response format that aligns with whether the continuum has been specified as unipolar or bipolar. For example, the numerical rating of 0–4 typically implies a unipolar scale to the respondent, whereas a −3-to-3 response scale implies a bipolar scale. Verbal labels like “Not at all” to “Extremely” imply unipolarity, whereas formats like “Strongly disagree” to “Strongly agree” imply bipolarity. Tay and Jebb (2018) also discuss operationalizing the continuum with regard to two other issues, assessing dimensionality of the scale and assuming the correct response process.

Readability Tests for Items

The current psychometric practice is to keep item statements short and simple with language that is familiar to the target respondents ( Hinkin, 1998 ). Instructions like these alleviate readability problems because psychologists are usually good at identifying and revising difficult items. However, professional psychologists also have a much higher degree of education compared to the rest of the population. In the United States, less than 2% of adults have doctorates, and a majority do not have a degree past high school ( U.S. Census Bureau, 2014 ). The average United States adult has an estimated 8th-grade reading level, with 20% of adults falling below a 5th-grade level ( Doak et al., 1998 ). Researchers can probably catch and remove scale items that are extremely verbose (e.g., “I am garrulous”), but items that might not be easily understood by target respondents may slip through the item creation process. Social science samples frequently consist of university students ( Henrich et al., 2010 ), but this subpopulation has a higher reading level than the general population ( Baer et al., 2006 ), and issues that would manifest for other respondents might not be evident when using such samples.

In addition to asking respondents directly (see Parrigon et al., 2017 for an example), another tool to assess readability is to use readability tests , which have already been used by scale developers in psychology (e.g., Lubin et al., 1990 ; Ravens-Sieberer et al., 2014 ). Readability tests are formulas that score the readability of some piece of writing, often as a function of the number of words per sentence and number of syllables per word. These tests only take seconds to implement and can serve as an additional way to check item language beyond the intuitions of scale developers. When these tests are used, scale items should only be analyzed individually , as testing the readability of the whole scale together can hide one or more difficult items. If an item receives a low readability score, the developer can revise the item.

There are many different readability tests available, such as the Flesch Reading Ease test, the Flesch-Kincaid Grade Level Studies test, the Gunning fog index, SMOG index, Automated Readability Index, and Coleman-Liau Index. These operate in much the same way, outputting an estimated grade level based on sentence and word length.

We reviewed their formulas and reviews on the topic (e.g., Benjamin, 2012 ). At the outset, we state that no statistic is univocally superior to all the others. It is possible to implement several tests and compare the results. However, we recommend the Flesch-Kincaid Grade Level Studies test because it (a) is among the most commonly used, (b) is expressed in grade school levels, and (c) is easily implemented in Microsoft Word. The score indicates what United States grade level the readability is suited. Given average reading grade levels in the United States, researchers can aim for a readability score of 8.0 or below for their items. There are several examples of scale developers using this reading test. Lubin et al. (1990) found that 80% of the Depression Adjective Check Lists was at an eighth-grade reading level. Ravens-Sieberer et al. (2014) used the test to check whether a measure of subjective well-being was suitable for children. As our own exercise, we took three recent instances of scale development in the Journal of Applied Psychology and ran readability tests on their items. This analysis is presented in the Supplementary Material .

Alternative Estimates of Measurement Precision

Alpha and omega.

A major focus of scale development is demonstrating its reliability, defined formally as the proportion of true score variance to total score variance ( Lord and Novick, 1968 ). The most common estimator of reliability in psychology is coefficient alpha ( Cronbach, 1951 ). However, alpha is sometimes a less-than-ideal measure because it assumes that all scale items have the same true score variance ( Novick and Lewis, 1967 ; Sijtsma, 2009 ; Dunn et al., 2014 ; McNeish, 2018 ). Put in terms of latent variable modeling, this means that alpha estimates true reliability only if the factor loadings across items are the same ( Graham, 2006 ), 3 something that is “rare for psychological scales” ( Dunn et al., 2014 , p. 409). Violating this assumption makes alpha underestimate true reliability. Often, this underestimation may be small, but it will increase for scales with fewer items and with greater differences in population factor loadings ( Raykov, 1997 ; Graham, 2006 ).

A proposed solution to this is to relax this assumption and adopt the less stringent congeneric model of measurement. The most prominent estimator in this group is coefficient omega ( McDonald, 1999 ), 4 which uses a factor model to obtain reliability estimates. Importantly, omega performs at least as well as alpha if alpha’s assumptions hold ( Zinbarg et al., 2005 ). However, one caveat is that the estimator requires a good-fitting factor model for estimation. Omega and its confidence interval can be computed with the psych package in R (for unidimensional scales, the “omega.tot” statistic from the function “omega;” Revelle, 2008 ). McNeish (2018) provides a software tutorial in R and Excel [see also Dunn et al. (2014) and Revelle and Condon (2019) ].

Reliability vs. IRT Information

Alpha, omega, and other reliability estimators stem from the classical test theory paradigm of measurement, where the focus is on the overall reliability of the psychological scale. The other measurement paradigm, item response theory (IRT), focuses on the “reliability” of the scale at a given level of the latent trait or at the level of the item ( DeMars, 2010 ). In IRT, this is operationalized as information IRT ( Mellenbergh, 1996 ) 5 .

Although they are analogous concepts, information IRT and reliability are different.

Whereas traditional reliability is only assessed at the scale-level, information IRT can be assessed at three levels: the response category, item, and test. Information IRT is a full mathematical function which shows how the precision changes across latent trait levels. These features translate into several advantages for the scale developer.

First, items can be evaluated for how much precision they have. Items that are not informative can be eliminated in favor of items that are (for a tutorial, see Edelen and Reeve, 2007 ). Second, the test information function shows how precisely the full scale measures each region of the latent trait. If a certain region is deficient, items can be added to better capture that region (or removed, if the region has been measured enough). Finally, suppose the scale developer is only interested in measuring a certain region of the latent trait range, such as middle-performers or high and low performers. In that case, information IRT can help them do so. Further details are provided in the Supplementary Material .

Maximizing Validity in Short Forms Using Ant Colony Optimization

Increasingly, psychologists wish to use short scales in their work ( Leite et al., 2008 ), 6 as they reduce respondent time, fatigue, and required financial compensation. To date, the most common approaches aim to maintain reliability ( Leite et al., 2008 ; Kruyen et al., 2013 ) and include retaining items with the highest factor loadings and item-total correlations. However, these strategies can incidentally impair measurement ( Janssen et al., 2015 ; Olaru et al., 2015 ; Schroeders et al., 2016 ), as items with higher intercorrelations will usually have more similar content, resulting in less scale content (i.e., the attenuation paradox ; Loevinger, 1954 ).

A more recent method for constructing short forms is a computational algorithm called ACO ( Dorigo, 1992 ; Dorigo and Stützle, 2004 ). Instead of just maximizing reliability, this method can incorporate any number of evaluative criteria, such as associations with variables, factor model fit, and others. When reducing a Big 5 personality scale, Olaru et al. (2015) found that, for a mixture of criteria (e.g., CFA fit indices, latent correlations), ACO either equaled or surpassed the alternative methods for creating short forms, such as maximizing factor loadings, minimizing modification indices, a genetic algorithm, and the PURIFY algorithm (see also Schroeders et al., 2016 ). Since ACO has been introduced to psychology, it has been used in the creation of real psychological scales for proactive personality and supervisor support ( Janssen et al., 2015 ), psychological situational characteristics ( Parrigon et al., 2017 ), and others ( Olaru et al., 2015 ; Olderbak et al., 2015 ).

The logic of ACO comes from how ants resolve the problem of determining the shortest path to their hive when they find food ( Deneubourg et al., 1983 ). The ants solve it by (a) randomly sampling different paths toward the food and (b) laying down chemical pheromones that attract other ants. The paths that provide quicker solutions acquire pheromones more rapidly, attracting more ants, and thus more pheromone. Ultimately, a positive feedback loop is created until the ants converge on the best path (the solution).

The ACO algorithm works similarly. When creating a short form of N items, ACO first randomly samples N items from the full scale (the N “paths”). Next, the performance of that short form is evaluated by one or more statistical measures, such as the association with another variable, reliability, and/or factor model fit. Based on these measures, if the sampled items performed well, their probability weight is increased (the amount of “pheromone”). Over repeated iterations, the items that led to good performance will become increasingly weighted for selection, creating a positive feedback loop that eventually converges to a final solution. Thus, ACO, like the ants, does not search and test all possible solutions. Instead, it uses some criterion for evaluating the items and then uses this to update the probability of selecting those items.

ACO is an automated procedure, but this does not mean that researchers should accept its results automatically. Foremost, ACO does not guarantee that the shortened scale has satisfactory content ( Kruyen et al., 2013 ). Therefore, the items that comprise the final scale should always be examined to see if their content is sufficient.

We also strongly recommend that authors using ACO be explicit about the specifications of the algorithm. Authors should always report (a) what criteria they are using to evaluate short form performance and (b) how these are mathematically translated into pheromone weights. Authors should also report all the other relevant details of conducting the algorithm (e.g., the software package, the number of total iterations). In the Supplementary Material , we provide further details and a full R software walkthrough. For more information, the reader can consult additional resources ( Marcoulides and Drezner, 2003 ; Leite et al., 2008 ; Janssen et al., 2015 ; Olaru et al., 2015 ; Schroeders et al., 2016 ).

Measurement in psychology comes in many forms, and for many constructs, one of the best methods is the psychological Likert scale. A recent review suggests that, in the span of just a few years, dozens of scales are added to the psychological science literature ( Colquitt et al., 2019 ). Thus, psychologists must have a clear understanding of the proper theory and procedures for scale creation. This present article aims to increase this clarity by offering a selective review of Likert scale development advances over the past 25 years. Classic papers delineating the process of Likert scale development have proven immensely useful to the field ( Clark and Watson, 1995 , 2019 ; Hinkin, 1998 ), but it is difficult to do justice to this whole topic in a single paper, especially as methodological developments accumulate.

Though this paper reviewed past work, we end with some notes about the future. As methods progress, they become more sophisticated, but sophistication should not be mistaken for accuracy. This applies even to some of the techniques discussed here, such as ACO, which has crucial limitations (e.g., it depends on what predicted external variable is chosen and requires a subjective examination of sufficient content).

Second, we are concerned with the problem of construct proliferation , as are other social scientists (e.g., Shaffer et al., 2016 ; Colquitt et al., 2019 ). Solutions to this problem include paying close attention to the constructs that have already been established in the literature, as well as engaging in a critical and honest reflection on whether one’s target construct is meaningfully different. In cases of scale development, the developer should provide sufficient arguments for these two criteria: the construct’s (a) importance and (b) distinctiveness. Although scholars are quite adept at theoretically distinguishing a “new” construct from a prior one ( Harter and Schmidt, 2008 ), empirical methods should only be enlisted after this has been established.

Finally, as psychological theory progresses, it tends to become more complex. One issue with this increasing complexity is the danger of creating incoherent constructs. Borsboom (2005 , p. 33) provides an example of a scale with three items: (1) “I would like to be a military leader,” (2) “.10 sqrt (0.05+0.05) = …,” and (3) “I am over six feet tall” (p. 33). Although no common construct exists among these items, the scale can certainly be scored and will probably even be reliable, as the random error variance will be low ( Borsboom, 2005 ). Therefore, measures of such incoherent constructs can display good psychometric properties, and psychologists cannot merely rely on empirical evidence for justifying them. Thus, the challenges of scale development of the present and future are equally empirical and theoretical.

Author Contributions

LT conceived the idea for the manuscript and provided feedback and editing. AJ conducted most of the literature review and wrote much of the manuscript. VN assisted with the literature review and contributed writing. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.637547/full#supplementary-material

  • ^ We also do not include the topic of measurement invariance, as this is typically done to validate a Likert scale with regard to a new population .
  • ^ Nor is it true that just because a paper has been published it is a significant advance. A good example is Westen and Rosenthal’s (2003) , two coefficients for quantifying construct validity, which were shown to be severely limited by Smith (2005) .
  • ^ Alpha also assumes normal and uncorrelated errors.
  • ^ There are several versions of omega, such as hierarchical omega for multidimensional scales. McNeish (2018) provides an exceptional discussion of alternatives to alpha, including software tutorials in R and Excel.
  • ^ There are two uses of the word “information” used in this section: as the formal IRT statistic and the general, everyday sense of the word (“We don’t have enough information.”). For the technical term, we will use information IRT , and the latter we will leave simply as “information.”
  • ^ One important distinction is between short scales and short forms . Short forms are a type of short scales, but of course, not all short scales were taken from a larger measure. In this section, we are concerned with the process of developing a short form from an original scale only.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1985). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Google Scholar

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, and Joint Committee on Standards for Educational and Psychological Testing (AERA, APA, & NCME) (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Anderson, J. C., and Gerbing, D. W. (1991). Predicting the performance of measures in a confirmatory factor analysis with a pretest assessment of their substantive validities. J. Appl. Psychol. 76, 732–740. doi: 10.1037/0021-9010.76.5.732

CrossRef Full Text | Google Scholar

Baer, J. D., Baldi, S., and Cook, S. L. (2006). The Literacy of America’s College Students. Washington, DC: American Institutes for Research.

Barchard, K. A. (2012). Examining the reliability of interval level data using root mean square differences and concordance correlation coefficients. Psychol. Methods 17, 294–308. doi: 10.1037/a0023351

PubMed Abstract | CrossRef Full Text | Google Scholar

Baumeister, R. F., Vohs, K. D., and Funder, D. C. (2007). Psychology as the science of self-reports and finger movements: whatever happened to actual behavior? Perspect. Psychol. Sci. 2, 396–403. doi: 10.1111/j.1745-6916.2007.00051.x

Benjamin, R. G. (2012). Reconstructing readability: recent developments and recommendations in the analysis of text difficulty. Educ. Psychol. Rev. 24, 63–88. doi: 10.1007/s10648-011-9181-8

Binning, J. F., and Barrett, G. V. (1989). Validity of personnel decisions: a conceptual analysis of the inferential and evidential bases. J. Appl. Psychol. 74, 478–494. doi: 10.1037/0021-9010.74.3.478

Borsboom, D. (2005). Measuring the Mind: Conceptual Issues in Contemporary Psychometrics. Cambridge: Cambridge University Press.

Borsboom, D., Mellenbergh, G. J., and van Heerden, J. (2004). The concept of validity. Psychol. Rev. 111, 1061–1071. doi: 10.1037/0033-295X.111.4.1061

Calderón, J. L., Morales, L. S., Liu, H., and Hays, R. D. (2006). Variation in the readability of items within surveys. Am. J. Med. Qual. 21, 49–56. doi: 10.1177/1062860605283572

Cizek, G. J. (2012). Defining and distinguishing validity: interpretations of score meaning and justifications of test use. Psychol. Methods 17, 31–43. doi: 10.1037/a0026975

Clark, L. A., and Watson, D. (1995). Constructing validity: basic issues in objective scale development. Psychol. Assess. 7, 309–319. doi: 10.1037/1040-3590.7.3.309

Clark, L. A., and Watson, D. (2019). Constructing validity: new developments in creating objective measuring instruments. Psychol. Assess. 31:1412. doi: 10.1037/pas0000626

Colquitt, J. A., Sabey, T. B., Rodell, J. B., and Hill, E. T. (2019). Content validation guidelines: evaluation criteria for definitional correspondence and definitional distinctiveness. J. Appl. Psychol. 104, 1243–1265. doi: 10.1037/apl0000406

Cooksey, R. W., and Soutar, G. N. (2006). Coefficient beta and hierarchical item clustering: an analytical procedure for establishing and displaying the dimensionality and homogeneity of summated scales. Organ. Res. Methods 9, 78–98. doi: 10.1177/1094428105283939

Credé, M., Tynan, M. C., and Harms, P. D. (2017). Much ado about grit: a meta-analytic synthesis of the grit literature. J. Pers. Soc. Psychol. 113, 492–511. doi: 10.1093/oxfordjournals.bmb.a072872

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. doi: 10.1007/BF02310555

Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychol. Bull. 52, 281–302. doi: 10.1037/h0040957

Cronbach, L. J., and Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educ. Psychol. Meas. 64, 391–418. doi: 10.1177/0013164404266386

DeMars, C. (2010). Item Response Theory. Oxford: Oxford University Press.

Deneubourg, J. L., Pasteels, J. M., and Verhaeghe, J. C. (1983). Probabilistic behaviour in ants: a strategy of errors? J. Theor. Biol. 105, 259–271. doi: 10.1016/s0022-5193(83)80007-1

DeSimone, J. A. (2015). New techniques for evaluating temporal consistency. Organ. Res. Methods 18, 133–152. doi: 10.1177/1094428114553061

Doak, C., Doak, L., Friedell, G., and Meade, C. (1998). Improving comprehension for cancer patients with low literacy skills: strategies for clinicians. CA Cancer J. Clin. 48, 151–162. doi: 10.3322/canjclin.48.3.151

Dorigo, M. (1992). Optimization, Learning, and Natural Algorithms . Ph.D. thesis. Milano: Politecnico di Milano.

Dorigo, M., and Stützle, T. (2004). Ant Colony Optimization. Cambridge, MA: MIT Press.

Dunn, T. J., Baguley, T., and Brunsden, V. (2014). From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. Br. J. Psychol. 105, 399–412. doi: 10.1111/bjop.12046

Edelen, M. O., and Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual. Life Res. 16(Suppl. 1) 5–18. doi: 10.1007/s11136-007-9198-0

Ferrando, P. J., and Lorenzo-Seva, U. (2018). Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educ. Pyschol. Meas. 78, 762–780. doi: 10.1177/0013164417719308

Ferrando, P. J., and Lorenzo-Seva, U. (2019). An external validity approach for assessing essential unidimensionality in correlated-factor models. Educ. Psychol. Meas. 79, 437–461. doi: 10.1177/0013164418824755

Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: what they are and how to use them. Educ. Psychol. Meas. 66, 930–944. doi: 10.1177/0013164406288165

Green, S. B. (2003). A coefficient alpha for test-retest data. Psychol. Methods 8, 88–101. doi: 10.1037/1082-989X.8.1.88

Hardy, B., and Ford, L. R. (2014). It’s not me, it’s you: miscomprehension in surveys. Organ. Res. Methods 17, 138–162. doi: 10.1177/1094428113520185

Harter, J. K., and Schmidt, F. L. (2008). Conceptual versus empirical distinctions among constructs: Implications for discriminant validity. Ind. Organ. Psychol. 1, 36–39. doi: 10.1111/j.1754-9434.2007.00004.x

Haynes, S. N., and Lench, H. C. (2003). Incremental validity of new clinical assessment measures. Psychol. Assess. 15, 456–466. doi: 10.1037/1040-3590.15.4.456

Haynes, S. N., Richard, D. C. S., and Kubany, E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychol. Assess. 7, 238–247. doi: 10.1037/1040-3590.7.3.238

Henrich, J., Heine, S. J., and Norenzayan, A. (2010). The weirdest people in the world? Behav. Brain Sci. 33, 61–135. doi: 10.1017/S0140525X0999152X

Henson, R. K., and Roberts, J. K. (2006). Use of exploratory factor analysis in published research: common errors and some comment on improved practice. Educ. Psychol. Meas. 66, 393–416. doi: 10.1177/0013164405282485

Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organ. Res. Methods 1, 104–121. doi: 10.1177/109442819800100106

Hinkin, T. R., and Tracey, J. B. (1999). An analysis of variance approach to content validation. Organ. Res. Methods 2, 175–186. doi: 10.1177/109442819922004

Hood, S. B. (2009). Validity in psychological testing and scientific realism. Theory Psychol. 19, 451–473. doi: 10.1177/0959354309336320

Hunsley, J., and Meyer, G. J. (2003). The incremental validity of psychological testing and assessment: conceptual, methodological, and statistical issues. Psychol. Assess. 15, 446–455. doi: 10.1037/1040-3590.15.4.446

Janssen, A. B., Schultze, M., and Grotsch, A. (2015). Following the ants: development of short scales for proactive personality and supervisor support by ant colony optimization. Eur. J. Psychol. Assess. 33, 409–421. doi: 10.1027/1015-5759/a000299

Johanson, G. A., and Brooks, G. P. (2010). Initial scale development: sample size for pilot studies. Educ. Psychol. Meas. 70, 394–400. doi: 10.1177/0013164409355692

Kane, M. T. (1992). An argument-based approach to validity in evaluation. Psychol. Bull. 112, 527–535. doi: 10.1177/1356389011410522

Kelley, K. (2016). MBESS (Version 4.0.0) [Computer Software and Manual].

Kelley, K., and Pornprasertmanit, S. (2016). Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures. Psychological Methods 21, 69–92. doi: 10.1037/a0040086

Kelley, T. L. (1927). Interpretation of Educational Measurements. New York, NY: World Book Company.

Knowles, E. S., and Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychol. Assess. 12, 245–252. doi: 10.1037/1040-3590.12.3.245

Kruyen, P. M., Emons, W. H. M., and Sijtsma, K. (2013). On the shortcomings of shortened tests: a literature review. Int. J. Test. 13, 223–248. doi: 10.1080/15305058.2012.703734

Landy, F. J. (1986). Stamp collecting versus science: validation as hypothesis testing. Am. Psychol. 41, 1183–1192. doi: 10.1037/0003-066X.41.11.1183

Lawshe, C. H. (1985). Inferences from personnel tests and their validity. J. Appl. Psychol. 70, 237–238. doi: 10.1037/0021-9010.70.1.237

Leite, W. L., Huang, I.-C., and Marcoulides, G. A. (2008). Item selection for the development of short forms of scales using an ant colony optimization algorithm. Multivariate Behav. Res. 43, 411–431. doi: 10.1080/00273170802285743

Li, X., and Sireci, S. G. (2013). A new method for analyzing content validity data using multidimensional scaling. Educ. Psychol. Meas. 73, 365–385. doi: 10.1177/0013164412473825

Likert, R. (1932). A technique for the measurement of attitudes. Arch. Psychol. 140, 5–53.

Lissitz, R. W., and Samuelson, K. (2007). A suggested change in the terminology and emphasis regarding validity and education. Educ. Res. 36, 437–448. doi: 10.3102/0013189X0731

Locke, E. A. (2012). Construct validity vs. concept validity. Hum. Resour. Manag. Rev. 22, 146–148. doi: 10.1016/j.hrmr.2011.11.008

Loevinger, J. (1954). The attenuation paradox in test theory. Pschol. Bull. 51, 493–504. doi: 10.1037/h0058543

Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley.

Lubin, B., Collins, J. E., Seever, M., Van Whitlock, R., and Dennis, A. J. (1990). Relationships among readability, reliability, and validity in a self-report adjective check list. Psychol. Assess. J. Consult. Clin. Psychol. 2, 256–261. doi: 10.1037/1040-3590.2.3.256

Mackenzie, S. B. (2003). The dangers of poor construct conceptualization. J. Acad. Mark. Sci. 31, 323–326. doi: 10.1177/0092070303254130

Marcoulides, G. A., and Drezner, Z. (2003). Model specification searches using ant colony optimization algorithms. Struct. Equ. Modeling 10, 154–164. doi: 10.1207/S15328007SEM1001

McDonald, R. (1999). Test Theory: A Unified Treatmnet. Mahwah, NJ: Lawrence Erlbaum.

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychol. Methods 23, 412–433. doi: 10.1037/met0000144

McPherson, J., and Mohr, P. (2005). The role of item extremity in the emergence of keying-related factors: an exploration with the life orientation test. Psychol. Methods 10, 120–131. doi: 10.1037/1082-989X.10.1.120

Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychol. Methods 1, 293–299. doi: 10.1037/1082-989X.1.3.293

Messick, S. (1980). Test validity and the ethics of assessment. Am. Psychol. 35, 1012–1027. doi: 10.1037/0003-066X.35.11.1012

Messick, S. (1989). “Validity,” in Educational Measurement , 3rd Edn, ed. R. L. Linn (New York, NY: American Council on Education and Macmillan), 13–103.

Newton, P. E. (2012). Questioning the consensus definition of validity. Measurement 10, 110–122. doi: 10.1080/15366367.2012.688456

Newton, P. E., and Shaw, S. D. (2013). Standards for talking and thinking about validity. Psychol. Methods 18, 301–319. doi: 10.1037/a0032969

Novick, M. R., and Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika 32, 1–13. doi: 10.1007/BF02289400

Olaru, G., Witthöft, M., and Wilhelm, O. (2015). Methods matter: testing competing models for designing short-scale big-five assessments. J. Res. Pers. 59, 56–68. doi: 10.1016/j.jrp.2015.09.001

Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., and Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: toward a brief form for research and applied settings. Front. Psychol. 6:1503. doi: 10.3389/fpsyg.2015.01503

Parrigon, S., Woo, S. E., Tay, L., and Wang, T. (2017). CAPTION-ing the situation: a lexically-derived taxonomy of psychological situation characteristics. J. Pers. Soc. Psychol. 112, 642–681. doi: 10.1037/pspp0000111

Permut, S., Fisher, M., and Oppenheimer, D. M. (2019). TaskMaster: a tool for determiningwhen subjects are on task. Adv. Methods Pract. Psychol. Sci. 2, 188–196. doi: 10.1177/2515245919838479

Peter, S. C., Whelan, J. P., Pfund, R. A., and Meyers, A. W. (2018). A text comprehension approach to questionnaire readability: an example using gambling disorder measures. Psychol. Assess. 30, 1567–1580. doi: 10.1037/pas0000610

Podsakoff, P. M., Mackenzie, S. B., and Podsakoff, N. P. (2016). Recommendations for creating better concept definitions in the organizational, behavioral, and social sciences. Organ. Res. Methods 19, 159–203. doi: 10.1177/1094428115624965

Ravens-Sieberer, U., Devine, J., Bevans, K., Riley, A. W., Moon, J., Salsman, J. M., et al. (2014). Subjective well-being measures for children were developed within the PROMIS project: Presentation of first results. J. Clin. Epidemiol. 67, 207–218. doi: 10.1016/j.jclinepi.2013.08.018

Raykov, T. (1997). Scale reliability, Cronbach’s coefficient alpha, and violations of essential tau-equivalence with fixed congeneric components. Multivariate Behav. Res. 32, 329–353. doi: 10.1207/s15327906mbr3204_2

Raykov, T., Marcoulides, G. A., and Tong, B. (2016). Do two or more multicomponent instruments measure the same construct? Testing construct congruence using latent variable modeling. Educ. Psychol. Meas. 76, 873–884. doi: 10.1177/0013164415604705

Raykov, T., and Pohl, S. (2013). On studying common factor variance in multiple-component measuring instruments. Educ. Psychol. Meas. 73, 191–209. doi: 10.1177/0013164412458673

Reise, S. P., Ainsworth, A. T., and Haviland, M. G. (2005). Item response theory: fundamentals, applications, and promise in psychological research. Curr. Dir. Psychol. Sci. 14, 95–101. doi: 10.1016/B978-0-12-801504-9.00010-6

Reise, S. P., Waller, N. G., and Comrey, A. L. (2000). Factor analysis and scale revision. Psychol. Assess. 12, 287–297. doi: 10.1037/1040-3590.12.3.287

Revelle, W. (1978). ICLUST: a cluster analytic approach for exploratory and confirmatory scale construction. Behav. Res. Methods Instrum. 10, 739–742. doi: 10.3758/bf03205389

Revelle, W. (2008). psych: Procedures for Personality and Psychological Research.(R packageversion 1.0-51).

Revelle, W., and Condon, D. M. (2019). Reliability from α to ω: a tutorial. Psychol. Assess. 31, 1395–1411. doi: 10.1037/pas0000754

Schmidt, F. L., Le, H., and Ilies, R. (2003). Beyond alpha: an empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual differences constructs. Psychol. Methods 8, 206–224. doi: 10.1037/1082-989X.8.2.206

Schroeders, U., Wilhlem, O., and Olaru, G. (2016). Meta-heuristics in short scale construction: ant colony optimization and genetic algorithm. PLoS One 11:e0167110. doi: 10.5157/NEPS

Sechrest, L. (1963). Incremental validity: a recommendation. Educ. Psychol. Meas. 23, 153–158. doi: 10.1177/001316446302300113

Sellbom, M., and Tellegen, A. (2019). Factor analysis in psychological assessment research: common pitfalls and recommendations. Psychol. Assess. 31, 1428–1441. doi: 10.1037/pas0000623

Shaffer, J. A., DeGeest, D., and Li, A. (2016). Tackling the problem of construct proliferation: a guide to assessing the discriminant validity of conceptually related constructs. Organ. Res. Methods 19, 80–110. doi: 10.1177/1094428115598239

Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychol. Methods 18, 572–582. doi: 10.1037/a0034177

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of cronbach’s alpha. Psychometrika 74, 107–120. doi: 10.1007/s11336-008-9101-0

Simms, L. J., Zelazny, K., Williams, T. F., and Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychol. Assess. 31, 557–566. doi: 10.1037/pas0000648.supp

Smith, G. T. (2005). On construct validity: issues of method and measurement. Psychol. Assess. 17, 396–408. doi: 10.1037/1040-3590.17.4.396

Smith, G. T., Fischer, S., and Fister, S. M. (2003). Incremental validity principles in test construction. Psychol. Assess. 15, 467–477. doi: 10.1037/1040-3590.15.4.467

Tay, L., and Jebb, A. T. (2018). Establishing construct continua in construct validation: the process of continuum specification. Ad. Methods Pract. Psychol. Sci. 1, 375–388. doi: 10.1177/2515245918775707

Thorndike, E. L. (1904). An Introduction to the Theory of Mental and Social Measurements. New York, NY: Columbia University Press, doi: 10.1037/13283-000

U.S. Census Bureau (2014). Educational Attainment in the United States: 2014. Washington, DC: U.S. Census Bureau.

Vogt, D. S., King, D. W., and King, L. A. (2004). Focus groups in psychological assessment: enhancing content validity by consulting members of the target population. Psychol. Assess. 16, 231–243. doi: 10.1037/1040-3590.16.3.231

Weijters, B., De Beuckelaer, A., and Baumgartner, H. (2014). Discriminant validity where there should be none: positioning same-scale items in separated blocks of a questionnaire. Appl. Psychol. Meas. 38, 450–463. doi: 10.1177/0146621614531850

Weng, L. J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educ. Psychol. Meas. 64, 956–972. doi: 10.1177/0013164404268674

Westen, D., and Rosenthal, R. (2003). Quantifying construct validity: two simple measures. J. Pers. Soc. Psychol. 84, 608–618. doi: 10.1037/0022-3514.84.3.608

Zhang, X., and Savalei, V. (2016). Improving the factor structure of psychological scales: the expanded format as an alternative to the Likert scale format. Educ. Psychol. Meas. 76, 357–386. doi: 10.1177/0013164415596421

Zhang, Z., and Yuan, K. H. (2016). Robust coefficients alpha and omega and confidence intervals with outlying observations and missing data: methods and software. Educ. Psychol. Meas. 76, 387–411. doi: 10.1177/0013164415594658

Zijlmans, E. A. O., van der Ark, L. A., Tijmstra, J., and Sijtsma, K. (2018). Methods for estimating item-score reliability. Appl. Psychol. Meas. 42, 553–570. doi: 10.1177/0146621618758290

Zinbarg, R. E., Revelle, W., Yovel, I., and Li, W. (2005). Cronbach’s, α Revelle’s β and McDonald’s ωH: their relations with each other and two alternative conceptualizations of reliability. Psychometrika 70, 123–133. doi: 10.1007/s11336-003-0974-7

Keywords : measurement, psychometrics, validation, Likert, reliability, scale development

Citation: Jebb AT, Ng V and Tay L (2021) A Review of Key Likert Scale Development Advances: 1995–2019. Front. Psychol. 12:637547. doi: 10.3389/fpsyg.2021.637547

Received: 03 December 2020; Accepted: 12 April 2021; Published: 04 May 2021.

Reviewed by:

Copyright © 2021 Jebb, Ng and Tay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andrew T. Jebb, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Likert Scale

  • Reference work entry
  • First Online: 01 January 2022
  • pp 2938–2941
  • Cite this reference work entry

research study using likert scale

  • Takashi Yamashita 3 &
  • Roberto J. Millar 4  

636 Accesses

1 Citations

Likert-type scale ; Rating scale

Likert scaling is one of the most fundamental and frequently used assessment strategies in social science research (Joshi et al. 2015 ). A social psychologist, Rensis Likert ( 1932 ), developed the Likert scale to measure attitudes. Although attitudes and opinions had been popular research topics in the social sciences, the measurement of these concepts was not established until this time. In a groundbreaking study, Likert ( 1932 ) introduced this new approach of measuring attitudes toward internationalism with a 5-point scale – (1) strongly approve, (2) approve, (3) undecided, (4) disapprove, and (5) strongly disapprove. For example, one of nine internationalism scale items measured attitudes toward statements like, “All men who have the opportunity should enlist in the Citizen’s Military Training Camps.” Based on the survey of 100 male students from one university, Likert showed the sound psychometric properties (i.e., validity and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Baker TA, Buchanan NT, Small BJ, Hines RD, Whitfield KE (2011) Identifying the relationship between chronic pain, depression, and life satisfaction in older African Americans. Res Aging 33(4):426–443

Google Scholar  

Bishop PA, Herron RL (2015) Use and misuse of the Likert item responses and other ordinal measures. Int J Exerc Sci 8(3):297

Carifio J, Perla R (2008) Resolving the 50-year debate around using and misusing Likert scales. Med Educ 42(12):1150–1152

DeMaris A (2004) Regression with social data: modeling continuous and limited response variables. Wiley, Hoboken

Femia EE, Zarit SH, Johansson B (1997) Predicting change in activities of daily living: a longitudinal study of the oldest old in Sweden. J Gerontol 52B(6):P294–P302. https://doi.org/10.1093/geronb/52B.6.P294

Article   Google Scholar  

Gomez RG, Madey SF (2001) Coping-with-hearing-loss model for older adults. J Gerontol 56(4):P223–P225. https://doi.org/10.1093/geronb/56.4.P223

Joshi A, Kale S, Chandel S, Pal D (2015) Likert scale: explored and explained. Br J Appl Sci Technol 7(4):396

Kong J (2017) Childhood maltreatment and psychological well-being in later life: the mediating effect of contemporary relationships with the abusive parent. J Gerontol 73(5):e39–e48. https://doi.org/10.1093/geronb/gbx039

Kuzon W, Urbanchek M, McCabe S (1996) The seven deadly sins of statistical analysis. Ann Plast Surg 37:265–272

Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):55–55

Pruchno RA, McKenney D (2002) Psychological well-being of black and white grandmothers raising grandchildren: examination of a two-factor model. J Gerontol 57(5):P444–P452. https://doi.org/10.1093/geronb/57.5.P444

Sullivan GM, Artino AR Jr (2013) Analyzing and interpreting data from Likert-type scales. J Grad Med Educ 5(4):541–542

Trochim WM, Donnelly JP, Arora K (2016) Research methods: the essential knowledge base. Cengage Learning, Boston

Download references

Author information

Authors and affiliations.

Department of Sociology, Anthropology, and Health Administration and Policy, University of Maryland Baltimore County, Baltimore, MD, USA

Takashi Yamashita

Gerontology Doctoral Program, University of Maryland Baltimore, Baltimore, MD, USA

Roberto J. Millar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Takashi Yamashita .

Editor information

Editors and affiliations.

Population Division, Department of Economics and Social Affairs, United Nations, New York, NY, USA

Department of Population Health Sciences, Department of Sociology, Duke University, Durham, NC, USA

Matthew E. Dupre

Section Editor information

Department of Sociology and Center for Population Health and Aging, Duke University, Durham, NC, USA

Kenneth C. Land

Department of Sociology, University of Kentucky, Lexington, KY, USA

Anthony Bardo

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this entry

Cite this entry.

Yamashita, T., Millar, R.J. (2021). Likert Scale. In: Gu, D., Dupre, M.E. (eds) Encyclopedia of Gerontology and Population Aging. Springer, Cham. https://doi.org/10.1007/978-3-030-22009-9_559

Download citation

DOI : https://doi.org/10.1007/978-3-030-22009-9_559

Published : 24 May 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-22008-2

Online ISBN : 978-3-030-22009-9

eBook Packages : Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Likert Scale? | Guide & Examples

What Is a Likert Scale? | Guide & Examples

Published on July 3, 2020 by Pritha Bhandari and Kassiani Nikolopoulou. Revised on June 22, 2023.

A Likert scale is a rating scale used to measure opinions, attitudes, or behaviors.

It consists of a statement or a question, followed by a series of five or seven answer statements. Respondents choose the option that best corresponds with how they feel about the statement or question.

Because respondents are presented with a range of possible answers, Likert scales are great for capturing the level of agreement or their feelings regarding the topic in a more nuanced way. However, Likert scales are prone to response bias , where respondents either agree or disagree with all the statements due to fatigue or social desirability or have a tendency toward extreme responding or other demand characteristics .

Likert scales are common in survey research , as well as in fields like marketing, psychology, or other social sciences.

Likert-Scale-5-point-scales

Download Likert scale response options

Table of contents

What are likert scale questions, when to use likert scale questions, how to write strong likert scale questions, how to write likert scale responses, how to analyze data from a likert scale, advantages and disadvantages of likert scales, other interesting articles, frequently asked questions about likert scales.

Likert scales commonly comprise either five or seven options. The options on each end are called response anchors . The midpoint is often a neutral item, with positive options on one side and negative options on the other. Each item is given a score from 1 to 5 or 1 to 7.

The format of a typical five-level Likert question, for example, could be:

  • Strongly disagree
  • Neither agree nor disagree
  • Strongly agree

In addition to measuring the level of agreement or disagreement, Likert scales can also measure other spectrums, such as frequency, satisfaction, or importance.

Prevent plagiarism. Run a free check.

Researchers use Likert scale questions when they are seeking a greater degree of nuance than possible from a simple “yes or no” question.

For example, let’s say you are conducting a survey about customer views on a pair of running shoes. You ask survey respondents “Are you satisfied with the shoes you purchased?”

A dichotomous question like the above gives you very limited information. There is no way you can tell how satisfied or dissatisfied customers really are. You get more specific and interesting information by asking a Likert scale question instead:

“How satisfied are you with the shoes you purchased?”

  • 1 – Very dissatisfied
  • 2 – Dissatisfied
  • 4 – Satisfied
  • 5 – Very satisfied

Likert scales are most useful when you are measuring unobservable individual characteristics , or characteristics that have no concrete, objective measurement. These can be elements like attitudes, feelings, or opinions that cause variations in behavior.

Each Likert scale–style question should assess a single attitude or trait. In order to get accurate results, it is important to word your questions precisely. As a rule of thumb, make sure each question only measures one aspect of your topic.

For example, if you want to assess attitudes towards environmentally friendly behaviors, you can design a Likert scale with a variety of questions that measure different aspects of this topic.

Here are a few pointers:

Include both questions and statements

Use both positive and negative framing, avoid double negatives, ask about only one thing at a time, be crystal clear.

A good rule of thumb is to use a mix of both to keep your participants engaged during the survey. When deciding how to phrase questions and statements, it’s important that they are easily understood and do not bias your respondents in one way or another.

If all of your questions only ask about things in socially desirable ways, your participants may be biased towards agreeing with all of them to show themselves in a positive light.

  • Positive framing
  • Negative framing
Environmental damage caused by single-use water bottles is a serious problem.
Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree
Banning single-use water bottles is pointless for reducing environmental damage.
Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree

Respondents who agree with the first statement should also disagree with the second. By including both of these statements in a long survey, you can also check whether the participants’ responses are reliable and consistent.

Double negatives can lead to confusion and misinterpretations, as respondents may be unsure of what they are agreeing or disagreeing with.

  • Bad example
  • Good example
I never buy non-organic products.
Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree
I try to buy organic products whenever possible.
Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree

Avoid double-barreled questions (asking about two different topics within the same question). When faced with such questions, your respondents may selectively answer about one topic and ignore the other. Questions like this may also confuse respondents, leading them to choose a neutral but inaccurate answer in an attempt to answer both questions simultaneously.

How would you rate your knowledge of climate change and food systems?
Very poor Poor Fair Good Excellent
How would you rate your knowledge of climate change?
Very poor Poor Fair Good Excellent
How would you rate your knowledge of food systems?
Very poor Poor Fair Good Excellent

The accuracy of your data also relies heavily on word choice:

  • Pose your questions clearly, leaving no room for misunderstanding.
  • Make language and stylistic choices that resonate with your target demographic.
  • Stay away from jargon that could discourage or confuse your respondents.

When using Likert scales, how you phrase your response options is just as crucial as how you phrase your questions.

Here are a few tips to keep in mind.

Decide on a number of response options

Choose the type of response option, choose between unipolar and bipolar options, make sure that you use mutually exclusive options.

More options give you deeper insights but can make it harder for participants to decide on one answer. Fewer options mean you capture less detail, but the scale is more user-friendly.

Usually, researchers include five or seven response options. It’s a good idea to include an odd number so that there is a midpoint. However, if you want to force your respondents to choose, an even number of responses removes the neutral option.

How frequently do you buy biodegradable products?
Never Occasionally Sometimes Often Always
How frequently do you buy biodegradable products?
Never Rarely Occasionally Sometimes Often Very often Always

You can measure a wide range of perceptions, motivations, and intentions using Likert scales. Response options should strive to cover the full range of opinions you anticipate a participant can have.

Some of the most common types of items include:

  • Agreement: Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree
  • Quality: Very Poor, Poor, Fair, Good, Excellent
  • Likelihood: Extremely Unlikely, Somewhat Unlikely, Likely, Somewhat Likely, Extremely Likely
  • Experience: Very Negative, Somewhat Negative, Neutral, Somewhat Positive, Very Positive

Some researchers also include a “don’t know” option. This allows them to distinguish between respondents who do not feel sufficiently informed to give an opinion and those who are “neutral” on the topic. However, including a “don’t know” option may trigger unmotivated respondents to select that for every question.

On a unipolar scale, you measure only one attribute (e.g., satisfaction). On a bipolar scale, you can measure two attributes (e.g., satisfaction or dissatisfaction) along a continuum.

How satisfied are you with the range of organic products available?
Not at all satisfied Somewhat satisfied Satisfied Very satisfied Extremely satisfied
How satisfied are you with the range of organic products available?
Extremely dissatisfied Dissatisfied Neither dissatisfied nor satisfied Satisfied Extremely satisfied

Your choice depends on your research questions and aims. If you want finer-grained details about one attribute, select unipolar items. If you want to allow a broader range of responses, select bipolar items.

Unipolar scales are most accurate when five-point scales are used. Conversely, bipolar scales are most accurate when a seven-point scale is used (with three scale points on each side of a truly neutral midpoint.)

Avoid overlaps in the response items. If two items have similar meanings, it risks making your respondent’s choice random.

Environmental damage caused by single-use water bottles is a serious problem.
Strongly agree Agree Neither agree nor disagree Indifferent Disagree Strongly disagree
Environmental damage caused by single-use water bottles is a serious problem.
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

Before analyzing your data, it’s important to consider what type of data you are dealing with. Likert-derived data can be treated either as ordinal-level or interval-level data . However, most researchers treat Likert-derived data as ordinal: assuming there is not an equal distance between responses.

Furthermore, you need to decide which descriptive statistics and/or inferential statistics may be used to describe and analyze the data obtained from your Likert scale.

You can use descriptive statistics to summarize the data you collected in simple numerical or visual form.

  • Ordinal data: To get an overall impression of your sample, you find the mode, or most common score, for each question. You also create a bar chart for each question to visualize the frequency of each item choice.
  • Interval data: You add up the scores from each question to get the total score for each participant. You find the mean , or average, score and the standard deviation , or spread, of the scores for your sample.

You can use inferential statistics to test hypotheses , such as correlations between different responses or patterns in the whole dataset.

  • Ordinal data: You hypothesize that knowledge of climate change is related to belief that environmental damage is a serious problem. You use a chi-square test of independence to see if these two attributes are correlated.
  • Interval data: You investigate whether age is related to attitudes towards environmentally friendly behavior. Using a Pearson correlation test, you assess whether the overall score for your Likert scale is related to age.

Lastly, be sure to clearly state in your analysis whether you treat the data at interval level or at ordinal level.

Analyzing data at the ordinal level

Researchers usually treat Likert-derived data as ordinal . Here, response categories are presented in a ranking order, but the distances between the categories cannot be presumed to be equal.

For example, consider a scale where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree, and 5 = strongly disagree.

In this scale, 4 is more negative than 3, 2, or 1. However, it cannot be inferred that a response of 4 is twice as negative as a response of 2.

Treating Likert-derived data as ordinal, you can use descriptive statistics to summarize the data you collected in simple numerical or visual form. The median or mode generally is used as the measure of central tendency . In addition, you can create a bar chart for each question to visualize the frequency of each item choice.

Appropriate inferential statistics for ordinal data are, for example, Spearman’s correlation or a chi-square test for independence .

Analyzing data at the interval level

However, you can also choose to treat Likert-derived data at the interval level . Here, response categories are presented in a ranking order, and the distance between categories is presumed to be equal.

Appropriate inferential statistics used here are an analysis of variance (ANOVA) or Pearson’s correlation . Such analysis is legitimate, provided that you state the assumption that the data are at interval level.

In terms of descriptive statistics, you add up the scores from each question to get the total score for each participant. You find the mean , or average, score and the standard deviation , or spread, of the scores for your sample.

Likert scales are a practical and accessible method of collecting data.

  • Quantitative: Likert scales easily operationalize complex topics by breaking down abstract phenomena into recordable observations. This enables statistical testing of your hypotheses.
  • Fine-grained: Because Likert-type questions aren’t binary ( yes/no , true/false , etc.) you can get detailed insights into perceptions, opinions, and behaviors.
  • User-friendly: Unlike open-ended questions, Likert scales are closed-ended and don’t ask respondents to generate ideas or justify their opinions. This makes them quick for respondents to fill out and ensures they can easily yield data from large samples.

Problems with Likert scales often come from inappropriate design choices.

  • Response bias: Due to social desirability bias , people often avoid selecting the extreme items or disagreeing with statements to seem more “normal” or show themselves in a favorable light.
  • Fatigue/inattention: In Likert scales with many questions, respondents can get bored and lose interest. They may absent-mindedly select responses regardless of their true feelings. This results in invalid responses.
  • Subjective interpretation: Some items can be vague and interpreted very differently by respondents. Words like “somewhat” or “fair” don’t have precise or narrow definitions.
  • Restricted choice: Since Likert-type questions are closed-ended, respondents sometimes have to choose the most relevant answer even if it may not accurately reflect reality.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. & Nikolopoulou, K. (2023, June 22). What Is a Likert Scale? | Guide & Examples. Scribbr. Retrieved August 14, 2024, from https://www.scribbr.com/methodology/likert-scale/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, survey research | definition, examples & methods, what is quantitative research | definition, uses & methods, levels of measurement | nominal, ordinal, interval and ratio, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Everything You Need to Know About the Likert Scale 

Appinio Research · 28.10.2021 · 14min read

Women using a scale on the wall

Measuring subjective experiences such as happiness and satisfaction can be a challenging task, especially in the context of market research.

How do we accurately gauge the level of satisfaction that a person feels towards a particular subject?

The Likert scale was developed in the 1930s to address this very problem.

In this article, we'll explore everything you need to know about the Likert scale and how it can be used to collect meaningful data in market research.

Want to run your study now? Talk to our experts!

What is a Likert scale?

Named after the renowned organizational psychologist Rensis Likert, the Likert scale was first introduced in 1930.

This scale provides an objective way to quantify attitudes and feelings , as well as the degree of consensus on certain subjects or objects.

Some researchers refer to the Likert scale as a satisfaction scale, as it is commonly used to measure satisfaction levels on a given topic.

How is it defined?

A Likert scale is an ordered scale that allows respondents to select an option that aligns with their opinion.

This scale is commonly used in market research to obtain a valuation value and quantify intangible or abstract concepts.

In many ways, it's similar to a multiple-choice question (MC), but with the key difference that it restricts responses to a single set of logical values.

For instance, a respondent might be asked to indicate their level of satisfaction on a scale ranging from "not at all satisfied" to "very satisfied."

Furthermore, when using a Likert scale, respondents are required to make specific choices based on whether they "agree" or "disagree" with a particular statement, rather than simply answering "yes" or "no.

Why using a Likert scale?

The Likert scale is a popular tool for market research due to its reliability in measuring opinions , perceptions , and behaviors objectively.

It is widely used by researchers to understand opinions and views about a brand, product, target market, employee satisfaction, and more.

For instance, if you want to evaluate the success of a recent work-from-home policy, you can easily create an HR questionnaire using the Likert scale to assess employee satisfaction.

Some of the advantages of using the Likert scale include:

Easy-to-understand questions that require no additional guidance

Raw answers that offer precise data and contribute to faster problem understanding and deeper insights into underlying reasons

Easy to create and set up

Easy analysis of quantitative data , making it easier to understand and analyze the results

Comprehensive approach that allows collection of various opinions on topics that are difficult to measure, such as customer feelings and perceptions

Clear range of opinions offered to respondents, making it easy for them to select one option per question

Overall, the Likert scale is a versatile and valuable tool for market research, enabling the collection of quantitative data and providing deeper insights into opinions and attitudes towards a particular subject.

Use cases for Likert scales

The Likert scale is most suitable for evaluating individual attitudes and opinions on a particular subject.

However, it is not appropriate for assessing attributes like age or gender.

Here are some of the areas where the scale can be used to obtain more precise information:

  • Evaluating customer feedback on new products
  • Measuring employee satisfaction levels
  • Assessing customer satisfaction levels
  • Collecting feedback on events
  • Studying target markets
  • Conducting public health assessments
  • Evaluating partnerships
  • Conducting needs assessments linked to guidelines

In general, these are the variables that can be measured using the Likert scale:

  • Agreement 
  • Importance 
  • Frequency 
  • Interest 

How to create a Likert scale step by step

Create a Likert scale question is super easy, follow these steps:

Start with a declarative statement that measures a certain attitude or opinion instead of asking a direct question, e.g. " I am satisfied with the equipment provided to me to work from home ".

Create a series of response categories that are logically distributed along a scale. For example, you could use a five-point scale ranging from "Strongly Disagree" to "Strongly Agree" or a seven-point scale ranging from "Very Dissatisfied" to "Very Satisfied".

Assign a numeric value to each option in the response categories. The values usually range from 1 to 5, 1 to 7, or some other range, depending on the number of response categories and the scale used.

Make sure the response categories are mutually exclusive and cover the entire range of possible responses. This means that each response category should be distinct and that there should be no overlap between them.

Test your Likert scale with a pilot study to ensure that it is effective in measuring the intended attitude or opinion. This can help you identify any issues with the scale and make any necessary adjustments before using it in a larger study or survey.

What types of Likert scales can you create for your own research?

A popular variation of the traditional Likert scale is known as the " Likert-type scale ".

Like the traditional Likert scale, Likert-type scales have an ordered set of response categories with a balanced number of positive and negative options. However, there are a few key differences:

Likert-type scales can have labels for each answer option, or only for the final categories (also known as anchor categories).

Likert-type scales do not use the traditional spectrum of "Strongly disagree" to "Strongly agree" responses. Instead, they use other categories of ordered responses to measure different variables, such as:

Frequency (e.g. Never, Sometimes, Often, Always)

Intensity (e.g. Mild, Moderate, Severe)

Quantity (e.g. Not at all, A little, A lot)

There are other types of Likert-type scales, such as the Semantic Differential Scale , which use bipolar adjectives (such as "good" vs. "bad" or "happy" vs. "sad") to measure attitudes or opinions. When creating a Likert-type scale, it is important to carefully consider the response options and labels to ensure they accurately measure the variable of interest.

Even or Odd Likert scales?

The number of answer options in a Likert scale can vary from five to nine, with more options generally leading to more precise results.

However, it's important to choose the right number of options depending on the situation.

Here's a list of pros and cons for both, even and odd scales, to help you choose which one is best for your use case.

Even Likert scale

It eliminates the chance of , by eliminating the neutral option Respondents may get as they are not given the choice to pick a neutral option
Called " ", as forces respondents to take a side.  
Respondents can be more and  

Odd Likert scale

Gives respondents the chance to pick also a neutral statement and Gives respondents an (i.e. the midpoint)

Appropriate when tackling sensitive topic as it provides a midpoint

The midpoint could be , leading to errors and confusions

What is the best type of Likert scale to use?

Choosing the best type of Likert scale can be a challenging task, as each has its pros and cons.

However, there are some things you should keep in mind to make the best decision:

Remember the main objective of the survey : the choice of the scale should align with the research questions and what you want to measure.

Consider the subject of the survey : if the subject is not controversial, you can use an even scale without a neutral point. However, if the topic is more sensitive, it's better to use an odd scale that includes a midpoint.

Know your respondents : it's important to understand your target audience and determine whether they prefer having a neutral response option or not.

In any case, keep in mind that the Likert scale should never be used to elicit answers or force respondents to form an opinion.

A well-structured survey always starts with a clear objective and aims to obtain honest and meaningful responses.

What are the advantages and disadvantages of the Likert scale?

The Likert scale has its advantages and disadvantages that every researcher should keep in mind before creating a survey. Here are the benefits and drawbacks of using a Likert scale.

  • Provides a quantifiable measure of attitudes and opinions
  • Ease of response for participants
  • Easy to code and analyze answers
  • Fast and efficient method for data collection
  • Inexpensive compared to other methods of data collection

Disadvantages

  • The scale is one-dimensional and only provides a limited number of options, which may not be equally spaced
  • It may not capture the complexity of people's attitudes and opinions
  • There is a risk of response bias and order effect
  • Respondents may avoid choosing the extreme options on the scale, leading to less accurate results.
Research experts answer: What makes the Likert scale such a good rating scale? The Likert scale is a highly effective rating scale because it provides nuanced insights into people's attitudes and opinions. Its range of options captures varying degrees of agreement or disagreement, enabling you to have access to more detailed data analysis and a better understanding of the respondent’s true sentiments.

Best Practices when creating Likert Scales

Use wide scales : It is recommended to use a Likert scale with as wide a range as possible. Responses can always be grouped later for analysis.

Choose a scale : The scale should have at least two extreme positions and an intermediate response option that serves as a graduation between them.

Be specific : Questions should be clear and specific to avoid confusion. The more precise the question, the more valuable the data will be.

  • Limit options : Keep in mind that using too many options can result in respondents choosing an option randomly, leading to inaccurate data. We recommend using around or fewer than 7 options on your Likert scale.

Avoid generalization : Instead of asking general questions such as "Do you like our products?", ask more specific questions such as "Are you satisfied with the quality of our products?" or "Do you think our products are good value for money?"

Cover all bases : The Likert scale should cover the full range of answers, including a midpoint. If the answers only range from "Extremely satisfied" to "Fairly satisfied," respondents who are not satisfied will not know which answer to choose, resulting in skewed results.

Use labels instead of numbers : Numbered scales, such as 1 to 5, can be confusing and lead to inaccurate data. It is best to indicate the scale options using words, e.g. extremely satisfied.

  • Clear and concise labels : Make sure the labels are clear and concise.

Always indicate the midpoint : The midpoint should always be indicated as provides good indications to respondents when choosing their reply.

Provide balanced response options : Response options should be balanced to avoid bias.

Conclusion for Likert Scales

In conclusion, the Likert scale remains a popular and effective tool for measuring people's attitudes and satisfaction levels. By providing a quantifiable and easily understandable system of responses, Likert scales offer a fast and cost-effective way to gather valuable data.

However, it is important to recognize the potential downsides of using these scales and take steps to mitigate them.

By following the guidelines provided in this article, researchers can create well-structured Likert scale surveys that yield accurate and meaningful results.

Want to run your market research study?

Access templates for questionnaires and sample surveys on price analysis on the Appinio Platform for free!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

360-Degree Feedback Survey Process Software Examples

15.08.2024 | 31min read

360-Degree Feedback: Survey, Process, Software, Examples

What is ANOVA Test Definition Types Examples

13.08.2024 | 30min read

What is ANOVA Test? Definition, Types, Examples

Environmental Analysis Definition Steps Tools Examples

08.08.2024 | 30min read

Environmental Analysis: Definition, Steps, Tools, Examples

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Using a Likert Scale in Psychology

PeopleImages / DigitalVision / Getty Images

What a Likert Scale Looks Like

Creating items on a likert scale.

  • Disadvantages

A Likert scale is a type of psychometric scale frequently used in psychology questionnaires. It was developed by and named after organizational psychologist Rensis Likert. Self-report inventories are one of the most widely used tools in psychological research.

On a Likert scale, respondents are asked to rate the level to which they agree with a statement. Such scales are often used to assess personality , attitudes , and behaviors.

At a Glance

While you might not have known what they were called, you've probably encountered many different Likert scales. Simply put, a Likert scale is a type of assessment item that asks you to rate your agreement with a statement (often from "Strongly Agree" to "Strongly Disagree.") Such scales can be a great way to get a nuanced look at how people feel about a particular topic, which is why you'll often see this type of item on political surveys and psychological questionnaires.

On a survey or questionnaire, a typical Likert item usually takes the following format:

  • Strongly disagree
  • Neither agree nor disagree
  • Strongly agree

It is important to note that the individual questions that take this format are known as Likert items, while the Likert scale is the format of these items.

Other Items on a Likert Scale

In addition to looking at how much respondents agree with a statement, Likert items may also focus on likelihood, frequency, or importance. In such cases, survey takers would be asked to identify:

  • How likely they believe something to be true (Always true, Usually true, Sometimes true, Usually not true, Never true)
  • How frequently they engage in a behavior or experience a particular thought (Very frequently, Frequently, Occasionally, Rarely, or Never)
  • How important they feel something is to them (Very important, Important, Somewhat important, Not very important, Not important)

A Note on Pronunciation

If you've ever taken a psychology course, you've probably heard the term pronounced "lie-kurt." Since the term is named after Rensis Likert, the correct pronunciation should be "lick-urt."

In some cases, experts who are very knowledgeable about the subject matter might develop items on their own. Oftentimes, it is helpful to have a group of experts help brainstorm different ideas to include on a scale.

  • Start by creating a large pool of potential items to draw from.
  • Select a group of judges to score the items.
  • Sum the item scores given by the judges.
  • Calculate intercorrelations between paired items.
  • Eliminate items that have a low correlation between the summed scores.
  • Find averages for the top quarter and the lowest quarter of judges and do a t-test of the means between the two. Eliminate questions with low t-values, which indicates that they score low in the ability to discriminate.

After weeding out the questions that have been deemed irrelevant or not relevant enough to include, the Likert scale is then ready to be administered.

Experts suggest that when creating Likert scale items, survey creators should pay careful attention to wording and clearly define target constructs.

Some researchers have questioned whether having an even or odd number of response options might influence the usefulness of such data. Some research has found that having five options increases psychometric precision but found no advantages to having six or more response options.

Advantages of a Likert Scale

Because Likert items are not simply yes or no questions, researchers are able to look at the degree to which people agree or disagree with a statement.

Research suggests that Likert scales are a valuable and convenient way for psychologists to measure characteristics that cannot be readily observed.

Likert scales are often used in political polling in order to obtain a more nuanced look at how people feel about particular issues or certain candidates.

Disadvantages of a Likert Scale

Likert scales are convenient and widely used, but that doesn't mean that they don't have some drawbacks. As with other assessment forms, Likert scales can also be influenced by the need to appear socially desirable or acceptable.

People may not be entirely honest or forthright in their answers or may even answer items in ways that make themselves appear better than they are. This effect can be particularly pronounced when looking at behaviors that are viewed as socially unacceptable.

What This Means For You

The next time you fill out a questionnaire or survey, notice if they use Likert scales to evaluate your feelings about a subject. Such surveys are common in doctor's offices to help assess your symptoms and their severity. They are also often used in political or consumer polls to judge your feelings about a particular issue, candidate, or product.

Joshi A, Kale S, Chandel S, Pal DK. Likert scale: Explored and explained . British Journal of Applied Science & Technology. 2015;7(4):396-403. doi:10.9734/BJAST/2015/14975

East Carolina University Psychology Department. How do you pronounce "Likert?" What is a Likert scale?

Clark LA, Watson D. Constructing validity: New developments in creating objective measuring instruments .  Psychol Assess . 2019;31(12):1412-1427. doi:10.1037/pas0000626

Simms LJ, Zelazny K, Williams TF, Bernstein L. Does the number of response options matter? Psychometric perspectives using personality questionnaire data .  Psychol Assess . 2019;31(4):557-566. doi:10.1037/pas0000648

Jebb AT, Ng V, Tay L. A review of key Likert scale development advances: 1995-2019 .  Front Psychol . 2021;12:637547. doi:10.3389/fpsyg.2021.637547

Sullman MJM, Taylor JE. Social desirability and self-reported driving behaviours: Should we be worried? Transportation Research Part F: Traffic Psychology and Behavior. 2010;13(3):215-221. doi:10.1016/j.trf.2010.04.004

Likert R. A technique for the measurement of attitudes . Archives of Psychology. 1932;22(140):1–55.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Introduction

Categories of response

Size of likert scales, directionality of likert scales, ordinal measures and the use of descriptive and inferential statistics.

  • Is Internet technology "making us stupid"?
  • What is the impact of artificial intelligence (AI) technology on society?

Abstract vector hi speed internet technology background

Likert scale

Our editors will review what you’ve submitted and determine whether to revise the article.

  • Education Resources Information Center - Using Likert-Type Scales in the Social Sciences
  • Verywell Mind - Using a Likert Scale in Psychology
  • Simply Psychology - Likert Scale
  • Academia - Likert scale
  • Frontiers - A Review of Key Likert Scale Development Advances: 1995–2019
  • National Center for Biotechnology Information - PubMed Central - Analyzing and Interpreting Data From Likert-Type Scales
  • Table Of Contents

Likert scale , rating system, used in questionnaires, that is designed to measure people’s attitudes, opinions, or perceptions. Subjects choose from a range of possible responses to a specific question or statement; responses typically include “strongly agree,” “agree,” “neutral,” “disagree,” and “strongly disagree.” Often, the categories of response are coded numerically, in which case the numerical values must be defined for that specific study, such as 1 = strongly agree, 2 = agree, and so on. The Likert scale is named for American social scientist Rensis Likert , who devised the approach in 1932.

Likert scales are widely used in social and educational research. When using Likert scales, the researcher must consider issues such as categories of response (values in the scale), size of the scale, direction of the scale, the ordinal nature of Likert-derived data, and appropriate statistical analysis of such data.

Generally, a Likert scale presents the respondent with a statement and asks the respondent to rate the extent to which he or she agrees with it. Variations include presenting the subject with a question rather than a statement. The categories of response are mutually exclusive and usually cover the full range of opinion. Some researchers include a “don’t know” option, to distinguish between respondents who do not feel sufficiently informed to give an opinion and those who are “neutral” on the topic.

The size of a Likert scale may vary. Traditionally, researchers have employed a five-point scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree). A larger scale (e.g., seven categories) could offer more choices to respondents, but it has been suggested that people tend not to select the extreme categories in large rating scales, perhaps not wanting to appear extreme in their view. Moreover, it may not be easy for subjects to discriminate between categories that are only subtly different. On the other hand , rating scales with just three categories (e.g., poor, satisfactory, good) may not afford sufficient discrimination . An even number of categories, as in a four-point or six-point Likert scale, forces respondents to come down broadly “for” or “against” a statement.

A feature of Likert scales is their directionality: the categories of response may be increasingly positive or increasingly negative. While interpretation of a category may vary among respondents (e.g., one person’s “agree” is another’s “strongly agree”), all respondents should nevertheless understand that “strongly agree” is a more positive opinion than “agree.” One important consideration in the design of questionnaires is the use of reverse scoring on some items. Imagine a questionnaire with positive statements about the benefits of public health education programs (e.g., “TV campaigns are a good way to persuade people to stop smoking in the presence of children”). A subject who strongly agreed with all such statements would be presumed to have a very positive view about the benefits of this method of health education. However, perhaps the subject was not participating wholeheartedly and simply checked the same response category for each item. To ensure that respondents are reading and evaluating statements carefully, a few negative statements may be included (e.g., “Money spent on public health education programs would be better spent on research into new therapies”). If a respondent answers positively to positive statements and negatively to negative statements, the researcher may have increased confidence in the data .

Likert scales fall within the ordinal level of measurement: the categories of response have directionality, but the intervals between them cannot be presumed equal. Thus, for a scale where 1 = strongly agree, 2 = agree, 3 = neutral, 4 = disagree, and 5 = strongly disagree, a mark of 4 would be more negative than either 3, 2, or 1 (directionality). However, it cannot inferred that a response of 4 is twice as negative as a response of 2.

Deciding which descriptive and inferential statistics may legitimately be used to describe and analyze the data obtained from a Likert scale is a controversial issue. Treating Likert-derived data as ordinal, the median or mode generally is used as the measure of central tendency. In addition, for responses in each category, one may state the frequency or percentage frequency. The appropriate inferential statistics for ordinal data are those employing nonparametric tests, such as the chi-square test or the Mann-Whitney U test.

However, many researchers treat Likert-derived data as if it were at the interval level (where numbers on the scale not only have directionality but also are an equal distance apart). They analyze their data using parametric tests, such as analysis of variance (ANOVA) or Pearson’s product-moment correlation, arguing that such analysis is legitimate , provided that one states the assumption that the data are interval level. Calculating the mean, standard deviation , and parametric statistics requires arithmetic manipulation of data (e.g., addition and multiplication).

Since numerical values in Likert scales represent verbal statements, one might question whether it makes sense to perform such manipulations. Moreover, Likert-derived data may fail to meet other assumptions for parametric tests (e.g., a normal distribution). Thus, careful consideration must also be given to the appropriate descriptive and inferential statistics, and the researcher must be explicit about any assumptions made.

Likert scale interpretation: How to analyze the data with examples

  • January 10, 2022
  • 10 min read
  • Best practice

What are Likert scale and Likert scale questionnaires?

Likert scale examples: the types and uses of satisfaction scale questions, likert scale interpretation: analyzing likert scale/type data, how to use filtering and cross tabulation for your likert scale analysis, 1. compare new and old information to ensure a better understanding of progress, 2. compare information with other types of data and objective indicators, 3. make a visual representation: help the audience understand the data better, 4. focus on insights instead of just the numbers, how to analyze likert scale data, likert scale interpretation example overview, interpreting likert scale results, explore useful surveyplanet features for data analyzing.

Likert scaling consists of questions that are answerable with a statement that is scaled with 5 or 7 options that the respondent can choose from.

Have you ever answered a survey question that asks to what extent you agree with a statement? The answers were probably: strongly disagree, disagree, neither disagree nor agree, agree, or strongly agree. Well, that’s a Likert question.

Regardless of the name—a satisfaction scale, an agree-disagree scale, or a strongly agree scale—the format is pretty powerful and a widely used means of survey measurement, primarily used in customer experience and employee satisfaction surveys.

In this article, we’ll answer some common questions about Likert scales and how they are used, though most importantly Likert scale scoring and interpretation. Learn our advice about how to benefit from conclusions drawn from satisfaction surveys and how to use them to implement changes that will improve your business!

A Likert scale usually contains 5 or 7 response options—ranging from strongly agree to strongly disagree—with differing nuances between these and a mandatory mid-point of neither agree nor disagree (for those who hold no opinion). The Likert-type scale got its name from psychologist Rensis Likert, who developed it in 1932.

Likert scales are a type of closed-ended question, like common yes-or-no questions, they allow participants to choose from a predefined set of answers, as opposed to being able to phrase their opinions in their own words. But unlike yes-or-no questions, satisfaction-scale questions allow for the measurement of people’s views on a specific topic with a greater degree of nuance.

Since these questions are predefined, it’s essential to include questions that are as specific and understandable as possible.

Answer presets can be numerical, descriptive, or a combination of both numbers and words. Responses range from one extreme attitude to the other, while always including a neutral opinion in the middle.

A Likert scale question is one of the most commonly used in surveys to measure how satisfied a customer or employee is. The most common example of their use is in customer satisfaction surveys , which are an integral part of market research .

Are satisfaction-scale questions the best survey questions?

Maybe you’ve answered one too many customer satisfaction surveys with Likert scales in your lifetime and now consider them way too generic and bland. But, the fact is they are one of the most popular types of survey questions.

First of all, they are pretty appealing to respondents because they are easy to understand and do not require too much thinking to answer.

And, while binary (yes-or-no) questions offer only two response options (i.e., if a customer is satisfied with your products and services or not), satisfaction-scale questions provide a clearer understanding of customers’ thoughts and opinions.

By using well-prepared additional questions, questions about particular products or service segments can be asked. That way, getting to the bottom of customer dissatisfaction is possible, making it easier to find a way to address their complaints and improve their experience.

Such surveys enable figuring out why customers are satisfied with one product but not another. This empowers the recognition of products and service areas that customers are confident in while helping to find ways to improve others.

When it comes to analyzing and interpreting survey scale results, Likert questions are helpful because they provide quantitative data that is easy to code and interpret. Results can also be analyzed through cross-tabulation analysis (we’ll get back to that later).

Likert questions can be used for many kinds of research. For example, determine the level of customer satisfaction with the latest product, assess employee satisfaction, or get post-event feedback from attendees after a specific event.

Questions can take different forms, but the most common is the 5-point or 7-point Likert scale question. There are 4-point and even 10-point Likert scale questions as well.

How to choose from these options?

The most common is the 5-point question. Most researchers advise the use of at least five response options (if not more). This ensures that respondents have enough choices to express their opinion as accurately as possible.

Some researchers suggest always using an even number of responses so respondents are not presented with a neutral answer, therefore having to “choose a side.” This is to avoid a tepid response even when respondents have an opinion, which is one of the most common types of errors in surveying .

Likert scale interpretation involves analyzing the responses to understand the participants’ attitudes toward the statements.

It’s important to note that Likert scales provide a quantitative representation of attitudes but do not necessarily capture underlying reasoning or motivations. Qualitative methods, such as interviews or open-ended questions, are often used in conjunction with Likert scales to gain a deeper understanding of participants’ perspectives.

Overall, Likert scale interpretation of data involves analyzing the numerical ratings, considering the directionality of the scale, examining central tendency and variability, identifying response patterns, and conducting comparative analyses to draw meaningful conclusions about people’s attitudes or opinions.

How to analyze satisfaction survey scale questions

For a survey to be its best , how gathered information is analyzed is as important as the gathering itself. That’s why we’ll now turn to the most effective ways of analyzing responses from satisfaction survey scales.

When using Likert scale questions, the analysis tools used are mean, median, and mode. These help better understand the information collected.

The mean (or average) is the average value of data, calculated by adding all the numbers and dividing this sum by the total number of values offered to respondents. The median is the middle value of a data set, while the mode is the number that occurs most often.

Some other useful ways of analyzing information are filtering and cross tabulation.

Using a filter, the responses of one particular group of respondents are focused upon and the rest filtered out. For example, how female customers rate a product can be determined by filtering out male respondents, while concentrating on customers aged 20 to 30 can be gleaned by filtering out older respondents.

Cross tabulation, on the other hand, is a method to compare two sets of information in one chart and analyze the relationship between multiple variables. In other words, it can show the responses of a particular subgroup while it can also be combined with other subgroups.

Say you want to look at the responses of unemployed female respondents aged 20 to 30. By using cross tabulation, all three parameters—gender, age, and employment status—can be combined and correlation calculated.

If this all sounds confusing, SurveyPlanet luckily doesn’t just offer great examples of surveys and the ability to create custom themes , but also the power to export survey results into several different formats, such as Microsoft Excel and Word, CSV, PDF, and JSON files.

How to interpret Likert scale data?

When information has been gathered and analyzed, it’s time to present it to stakeholders. This is the final stage of research. Analyzing the results of Likert scale questionnaires is a vital way to improve services and grow a business. Presenting the results correctly is a key step.

Here’s how to develop a clear goal and present it understandably and engagingly.

Compare the newly obtained information with data gathered from previous surveys. Sure, information gathered from the latest research is valuable on its own, but not helpful enough. For example, it tells you if customers are currently satisfied with products or services, but not whether things are better or worse than last year.

The key to improving customer service—and thus developing a business—is comparing current responses with previous ones. This is called longitudinal analysis. It can provide valuable insights about how a business is developing, if things are improving or declining, and what issues need to be solved.

If there is no previous data, then start collecting feedback immediately in order to compare results with future surveys. This is called benchmarking. It helps keep track of progress and how products, services, and overall customer satisfaction changes over time.

The most crucial information to compare new findings with is previous surveys. But it is highly recommended to constantly compare findings with other types of information, such as Google Analytics, sales data, and other objective indicators.

Another good practice is comparing qualitative with quantitative data . The more information, the more accurate the research results, which will help better convey findings to stakeholders. This will also improve business decision-making, strengthening the experiences of customers and employees.

Numbers are easier to understand when suitable visual representation is provided. However, it is essential to use a medium that adequately highlights key findings.

Line graphs, pie charts, bar charts, histograms, scatterplots, infographics, and many more techniques can be used.

But don’t forget good old tables. Even if they’re not so visually dynamic and a little harder on the eyes, some information is simply best presented in tables, especially numerical data.

Working with all of these options, more satisfactory presentations can be created.

When presenting findings to stakeholders, don’t just focus on the numbers. Instead, highlight the conclusions about customer or employee satisfaction drawn from the research. That way, everyone present at the meeting will gain a deeper understanding of what you’re trying to convey.

A valuable and exciting piece of advice is to focus on the story the numbers tell. Don’t simply list the numbers collected. Instead, use relevant examples and connect all the information, building on each dataset to make a meaningful whole.

Define and describe problems that need to be solved in engaging and easy-to-understand terms so that listeners don’t have a hard time understanding what is being shared. Include suggestions that could improve, for example, customer experience outcomes. It is also important to share findings with the relevant teams, listen to their perspectives, and find solutions together.

An example of Likert scale data analysis and interpretation

Let’s consider an example scenario and go through the steps of analyzing and interpreting Likert scale data.

Scenario: A company conducts an employee satisfaction survey using a Likert scale to measure employees’ attitudes toward various aspects of their work environment. The scale ranges from 1 (Strongly Disagree) to 5 (Strongly Agree).

Item 1: “I feel valued and appreciated at work.”

Item 2: “My workload is manageable.”

Item 3: “I receive adequate training and support.”

Item 4: “I have opportunities for growth and advancement.”

Item 5: “My supervisor provides constructive feedback.”

Step 1: Calculate mean scores by summing up the responses and dividing by the number of respondents.

Item 1: Mean score = (4+5+5+4+3)/5 = 4.2

Item 2: Mean score = (3+4+3+3+4)/5 = 3.4

Item 3: Mean score = (4+4+5+4+3)/5 = 4.0

Item 4: Mean score = (3+4+3+2+4)/5 = 3.2

Item 5: Mean score = (4+3+4+3+5)/5 = 3.8

Step 2: Assess central tendency by looking at the distribution of responses to identify the most frequent response or central point.

Item 1: 4 (Agree) is the most frequent response.

Item 2: 3 (Neutral) is the most frequent response.

Item 3: 4 (Agree) is the most frequent response.

Item 4: 3 (Neutral) is the most frequent response.

Item 5: 4 (Agree) is the most frequent response.

Step 3: Consider Variability by assessing the range or spread of responses to understand the diversity of opinions.

Item 1: Range = 5-3 = 2 (relatively low variability)

Item 2: Range = 4-3 = 1 (low variability)

Item 3: Range = 5-3 = 2 (relatively low variability)

Item 4: Range = 4-2 = 2 (relatively low variability)

Item 5: Range = 5-3 = 2 (relatively low variability)

Step 4: Identify response patterns By looking for consistent agreement or disagreement across items or patterns of response clusters.

Step 5: Comparative analysis of responses among different groups, such as other departments or job positions, to identify attitude variations.

In this example, there is a pattern of agreement on items related to feeling valued at work (Item 1), receiving training and support (Item 3), and receiving constructive feedback (Item 5). However, there is a relatively neutral response pattern for workload manageability (Item 2) and growth opportunities (Item 4).

For example, you could compare responses between different departments to see if there are significant differences in employee satisfaction levels.

Based on the analysis, employees feel valued and appreciated at work (Item 1) and perceive adequate training and support (Item 3). However, there may be room for improvement regarding workload manageability (Item 2), opportunities for growth (Item 4), and the provision of constructive feedback (Item 5).

The relatively low variability across items suggests moderate agreement within the group. However, the neutral response pattern for workload manageability and opportunities for growth may indicate areas that require attention to enhance employee satisfaction.

Likert scales are a highly effective way of collecting qualitative data. They help you gain a deeper understanding of customers’ or employees’ opinions and needs.

Make this kind of vital research easier. Discover our unique features —like exporting and printing results —that will save time and energy. Let SurveyPlanet take care of your surveys!

Photo by Lukas from Pexels

Likert Scale Questionnaire: Examples & Analysis

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Various kinds of rating scales have been developed to measure attitudes directly (i.e., the person knows their attitude is being studied).  The most widely used is the Likert scale (1932).

In its final form, the Likert scale is a five (or seven) point scale that is used to allow an individual to express how much they agree or disagree with a particular statement.

The Likert scale (typically) provides five possible answers to a statement or question that allows respondents to indicate their positive-to-negative strength of agreement or strength of feeling regarding the question or statement.

Likert Scale

I believe that ecological questions are the most important issues facing human beings today.

likert scale agreement

A Likert scale assumes that the strength/intensity of an attitude is linear, i.e., on a continuum from strongly agree to strongly disagree, and makes the assumption that attitudes can be measured.

For example, each of the five (or seven) responses would have a numerical value that would be used to measure the attitude under investigation.

Examples of Items for Surveys

In addition to measuring statements of agreement, Likert scales can measure other variations such as frequency, quality, importance, and likelihood, etc.
Strongly AgreeAgreeUndecidedDisagreeStrongly Disagree
AlwaysOftenSometimesRarelyNever
Very ImportantImportantModerately ImportantSlightly ImportantUnimportant
ExcellentGoodFairPoorVery Poor
Almost Always TrueUsually TrueOccasionally TrueUsually Not TrueRarely True
DefinitelyProbablyPossiblyProbably NotDefinitely Not

Analyzing Data

The response categories in the Likert scales have a rank order, but the intervals between values cannot be presumed equal. Therefore, the mean (and standard deviation) are inappropriate for ordinal data (Jamieson, 2004).

Statistics you can use are:

  • Summarize using a median or a mode (not a mean as it is ordinal scale data ); the mode is probably the most suitable for easy interpretation.
  • Display the distribution of observations in a bar chart (it can’t be a histogram because the data is not continuous).

Critical Evaluation

Likert Scales have the advantage that they do not expect a simple yes / no answer from the respondent but rather allow for degrees of opinion and even no opinion at all.

Therefore, quantitative data is obtained, which means that the data can be analyzed relatively easily.

Offering anonymity on self-administered questionnaires should further reduce social pressure and thus may likewise reduce social desirability bias.

Paulhus (1984) found that more desirable personality characteristics were reported when people were asked to write their names, addresses, and telephone numbers on their questionnaire than when they were told not to put identifying information on the questionnaire.

Limitations

However, like all surveys, the validity of the Likert scale attitude measurement can be compromised due to social desirability.

This means that individuals may lie to put themselves in a positive light.  For example, if a Likert scale was measuring discrimination, who would admit to being racist?

Bowling, A. (1997). Research Methods in Health . Buckingham: Open University Press.

Burns, N., & Grove, S. K. (1997). The Practice of Nursing Research Conduct, Critique, & Utilization . Philadelphia: W.B. Saunders and Co.

Jamieson, S. (2004). Likert scales: how to (ab) use them . Medical Education, 38(12) , 1217-1218.

Likert, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology , 140, 1–55.

Paulhus, D. L. (1984). Two-component models of socially desirable responding . Journal of personality and social psychology, 46(3) , 598.

Further Information

  • History of the Likert Scale
  • Essential Elements of Questionnaire Design and Development

Print Friendly, PDF & Email

  • Open access
  • Published: 16 August 2024

Validity of the Musculoskeletal Tumor Society Score for lower extremity in patients with bone sarcoma or giant cell tumour of bone undergoing bone resection and reconstruction surgery in hip and knee

  • Nikolai Sherling 1 ,
  • Müjgan Yilmaz 2 ,
  • Christina Enciso Holm 2 ,
  • Michael Mørk Petersen 2 , 3 &
  • Linda Fernandes 1  

BMC Cancer volume  24 , Article number:  1019 ( 2024 ) Cite this article

13 Accesses

Metrics details

The Musculoskeletal Tumor Society Score (MSTS) is widely used to evaluate functioning following surgery for bone and soft-tissue sarcoma. However, concerns have been raised about its content validity due to the lack of patient involvement during item development. Additionally, literature reports inconsistent results regarding data quality and structural validity. This study aimed to evaluate content, structural and construct validity of the Danish version of the MSTS for lower extremity (MSTS-LE).

The study included patients from three complete cohorts ( n  = 87) with bone sarcoma or giant cell tumour of bone who underwent bone resection and reconstruction surgery in hip and knee. Content validity was evaluated by linking MSTS items to frameworks of functioning , core outcome sets and semi-structured interviews. Data quality, internal consistency and factor analysis were used to assess the underlying structure of the MSTS. Construct validity was based on predefined hypotheses of correlation between the MSTS and concurrent measurements.

Content validity analysis revealed concerns regarding the MSTS. The MSTS did not sufficiently cover patient-important functions, the item Emotional acceptance could not be linked to the framework of functioning , the items Pain and Emotional acceptance pertained to domains beyond functioning and items’ response options did not match items. A two-factor solution emerged, with the items Pain and Emotional acceptance loading highly on a second factor distinct from functioning . Internal consistency and construct validity showed values below accepted levels.

The Danish MSTS-LE demonstrated inadequate content validity, internal consistency, and construct validity. In addition, our analyses did not support unidimensionality of the MSTS. Consequently, the MSTS-LE is not a simple reflection of the construct of functioning and the interpretation of a sum score is problematic. Clinicians and researcher should exercise caution when relying solely on MSTS scores for assessing lower extremity function. Alternative outcome measurements of functioning should be considered for the evaluation of postoperative function in this patient group.

Peer Review reports

One commonly used outcome measurement in patients treated surgically for soft tissue or bone sarcoma is the Musculoskeletal Tumor Society Score (MSTS) [ 1 ]. The MSTS was developed to evaluate postoperative function aiming to permit comparisons of end-results from different surgical treatments [ 1 , 2 ]. To be useable, an outcome measurement should demonstrate content validity, i.e. include items important to the patient group and items relevant for the construct to be measured [ 3 ]. No study has asked patients with bone sarcoma what functions and activities in daily life they consider important and compared that to the MSTS. Additionally, the construct of the MSTS has not been compared to established frameworks. Therefore, it is unknown whether the items of the MSTS measure functions and activities that are important to the patient group or whether they reflect the construct of functioning . Despite the lack of evidence for content validity, the MSTS for the lower extremity (MSTS-LE) has shown consistent results for construct validity, with moderate to high correlations to other measurements of functioning , for example to the Toronto Extremity Salvage Score (TESS) and the Short Form 36 physical function [ 4 , 5 , 6 , 7 ]. Conversely, the results for internal structure of the MSTS are inconsistent. Some studies have found ceiling effects for both MSTS single items and sum score [ 4 , 6 , 7 ] while others did not find ceiling effects [ 5 , 8 ]. Three studies have tested the MSTS for structural validity, with the conclusion of a one-factor solution, i.e., unidimensionality, for the MSTS [ 5 , 6 , 9 ]. However, the studies showed eigenvalues close to cut-off for a two-factor solution and all three studies showed moderate to low factor loadings for the items Pain and Emotional acceptance [ 5 , 6 , 9 ]. Based on results of low factor loadings for Pain and Emotional acceptance, one could question their reflection of functioning . If the MSTS is to be used in future research and clinical practice for the evaluation of functioning , further evidence of its ability to reflect functioning is needed.

The aim of this study was therefore to evaluate the validity of the MSTS-LE, more specifically its content validity, data quality, internal consistency, structural and construct validity.

Design, inclusion, and patient characteristics

Data for this project was extracted from three cohorts including patients with bone sarcoma or giant cell tumour of bone in the lower extremity going through bone tumour resection and reconstruction with a tumour prosthesis in the hip or knee (Table  1 ). Assessments were completed once for each patient, i.e., the study design was cross-sectional. All three cohorts ( n  = 87) were used for analyses of internal structure. Cohort one ( n  = 30) was in addition tested for content and construct validity.

Cohort one included 30 patients enrolled from a complete cohort of 72 patients [ 10 ]. The patients had undergone surgery between 2006 and 2016 at the Musculoskeletal Tumor Section, Department of Orthopedic Surgery, University Hospital Rigshospitalet, Copenhagen. The included patients ( n  = 30) were interviewed using the Patient Specific Functional Scale (PSFS) and assessed using the MSTS and concurrent outcome measurements at mean 7 (range, 2–12) years after surgery.

Cohort two included 24 patients enrolled from a complete cohort of 50 patients [ 11 ]. The patients had undergone surgery between 1985 and 2005 at the Musculoskeletal Tumor Section, Rigshospitalet, Copenhagen. The included patients ( n  = 24) were assessed using the MSTS at mean 15 (range, 4–29) years after surgery.

Cohort three included 33 patients enrolled from a national cohort using the Global Modular Replacement System (GMRS) only as tumour prosthesis for reconstruction of bone [ 12 ]. The patients had undergone surgery between 2005 and 2013 at the Musculoskeletal Tumor Sections at Rigshospitalet, Copenhagen, and Aarhus University Hospital, Aarhus. The included patients ( n  = 33) were assessed using the MSTS at mean 5 (range, 1–11) years after surgery.

Musculoskeletal Tumour Society Score (MSTS)

The Danish MSTS-LE was used [ 1 , 7 ]. It comprises six items (Pain, Function, Emotional acceptance, Supports, Walking ability and Gait) and is administered by a clinician. Each item is scored on a 5-point Likert scale, ranging from 0 (worst possible score) to 5 (best possible score) [ 1 ]. The items have unique response options for 0 through 5 (Table  2 ). A sum score for the six items is calculated (maximum 30 points) and normalised to a 0–100 score.

Semi-structured interview used for the evaluation of content validity

Patient Specific Functional Scale (PSFS) is designed to identify patient-important functions and activities. It is valid for use in numerous diseases and conditions and can be administered as a semi-structured interview or as a patient reported outcome (PRO) [ 13 , 14 ]. We chose the semi-structured interview modality, carried out by a physiotherapist (LF). The patients were asked to identify up to five important functions or activities they were unable to perform or had difficulties with because of the condition. Once identified, the activities were categorised and listed. Activities that included the same type of movement were categorised into a meaningful concept [ 15 ]. For example, “walking on uneven surfaces”, “walking fast” or “walking long distances” were categorised into Walking. Sport represented any sports-related activity, for example playing golf, water polo, or swimming. Running was a separate category, as it could be either sports related or a means of moving quickly from one place to another, e.g., run to catch a bus. After identifying important activities, the patients were asked to score them for level of difficulty on a 11-point scale (0 = unable to perform the activity, 10 = able to perform activity at the same level as before surgery) [ 13 , 16 ]. For each category, the number of patients identifying the activity and the median level of difficulty was presented in a chart. Individual mean PSFS scores were also used for the evaluation of construct validity.

Concurrent outcome measurements

Numeric Rating Scale (NRS) is a valid and widely used tool for the measurements of pain intensity among patients with varying conditions [ 17 , 18 ]. The patients were asked to score current pain intensity (0 = no pain, 10 = worst pain imaginable).

Toronto Extremity Salvage Score (TESS) is a patient-specific PRO developed to account for the heterogeneity of functioning in patients with bone and soft-tissue sarcoma [ 19 , 20 ]. It is unidimensional and comprises 30 questions about daily tasks, work/school and leisure time [ 19 ]. Difficulties performing the activities are scored on a 5-point Likert scale (1 = impossible to do, 5 = not at all difficult). The total score is calculated as a percentage of the maximum score. The Danish version has shown acceptable comprehensibility, test-retest reliability and construct validity [ 21 ].

The EORTC QLQ-C30 is a multidimensional PRO measuring quality of life (QoL) in patients with cancer [ 22 ]. The QLQ-C30 is widely used, shows robust psychometric properties, has population-based reference data and is translated into Danish [ 22 , 23 , 24 ]. It consists of 30 questions scored on Likert scales [ 4 ]. We used the sum score and sub scores for physical functioning, emotional functioning and pain in the analyses, normalised to 0–100-points. A high sum score represents a high QoL, a high functioning score represents high levels of functioning and a high pain score represents high levels of pain [ 22 , 24 ].

30-second chair stand test (CST) assesses muscle power and strength of the lower extremity, it can predict deterioration of function and can be used in people with different diseases and ages [ 25 , 26 , 27 , 28 , 29 , 30 ]. It has shown good reliability (ICC > 0.80) and a measurement error of 1 repetition [ 26 ]. The patients were asked to stand up and sit down from a 45 cm chair as many times as possible in 30 s. A standardised protocol from the Association of Danish Physiotherapists was used.

6-minute walk test (6MWT) assesses walking capacity and has been used on patients with numerous diagnoses, including bone sarcoma [ 10 , 26 , 31 , 32 , 33 , 34 , 35 ]. It has shown ICCs of > 0.90 and measurement errors between 14 and 30 m [ 26 , 34 ]. The patients were asked to walk as fast as possible back and forth on a 20-meter walking track in an enclosed corridor at the hospital. A standardised protocol from the Association of Danish Physiotherapists was used.

Demographic data, PRO scorings, and physical tests were presented as number (%), mean (SD), median (range) values as appropriate for different scales. A sample size of at least five observations per item and at least 100 observations has been suggested for determining structural validity [ 36 , 37 , 38 ]. We were able to include 87 patients from three complete cohorts between 1985 and 2016. The MSTS scorings had no missing data. For the data collection of concurrent measurements, there was one patient in Cohort 1 that declined physical tests (CST, 6MWT) at the hospital because of logistical reasons, and one patient had internally missing data in QLQ-C30 physical functioning. Different statistical analyses were applied for different psychometric evaluations. Analyses were performed using IBM SPSS v.29.

Content validity is defined as the degree to which the content of a PRO is an adequate reflection of the construct to be measured [ 36 , 39 ]. Since the MSTS intends to measure functioning [ 1 ], the six items and their response options should be a reflection of functioning . An international consensus for quality rating of PROs has recommend three overarching criteria for the evaluation of content validity: relevance, comprehensiveness, and comprehensibility [ 3 ]. Relevance includes an evaluation of items’ relevance for the construct and the population of interest. To evaluate items’ relevance for the construct of functioning , MSTS items were listed and, wherever possible, linked to codes of the International Classification of Functioning, Disability and Health (ICF) [ 40 ]. To evaluate MSTS items’ relevance for the population of interest, we linked MSTS-items to activities identified in the PSFS. Comprehensiveness includes an evaluation of whether key concepts are included in an outcome measurement. Key concepts can be found in core outcome sets [ 41 , 42 , 43 , 44 , 45 ]. Since there is no specific core outcome set for patients that undergo bone sarcoma surgery, we chose to link MSTS items to key concepts defined in core outcome sets for cancer and primary total knee and hip joint replacement [ 43 , 44 ]. Comprehensibility was evaluated by linking response options of the MSTS to the ICF and PSFS. Response options should match items to meet quality standards [ 3 ]. The linking processes were done independently by two of the authors (NS, LF) following recommendations for ICF-linking of outcome measures [ 15 ].

Data quality. Missing data of individual items, central tendency, distribution of item-scoring and floor- and ceiling effects were described. Floor- and ceiling effects were defined as present if > 15% of patients scored the lowest or highest possible score, respectively [ 46 ].

Internal consistency has been defined as the degree of interrelatedness amongst the individual items [ 39 ]. The analysis requires a unidimensional scale of at least three items [ 39 ]. If our analysis of structural validity suggested > 1 dimension, internal consistency was tested separately for each dimension [ 46 ]. Inter-item correlation, item-total correlation, and Cronbach’s alpha if item deleted were determined [ 47 ]. An inter-item correlation between 0.20 and 0.50 is recommended [ 36 ]. The item-total correlations assume that patients with a high total score also have high scores on all items [ 36 ]. If an item shows an item-total correlation of < 0.30 it does not help greatly in distinguishing between patients with high and low scores and can be removed. A Cronbach’s alpha if item deleted shows the value for remaining items that are still in the analysis. A high value indicates that the deleted item is redundant and a low value that there is room for more items under the same construct. A Cronbach’s alpha value between 0.70 and 0.90 is commonly considered acceptable interrelatedness [ 48 ].

Structural validity has been defined as the degree to which the scores of a PRO are an adequate reflection of the dimensionality of the construct to be measured [ 39 ]. Initially, the data was tested for suitability for factor analysis. Inter-item correlation coefficients between 0.20 and 0.80, overall correlation of a Kaiser-Meyer-Olkin (KMO) of > 0.50 (ideally > 0.80) and a significant Bartlett’s test of sphericity have been recommended as prerequisites for factor analysis [ 36 , 37 ]. We applied a principal component analysis (PCA). The number of latent factors extracted was based on the shape of a scree plot (elbow and levelling), Kaiser’s criterion (eigenvalue > 1) and the cumulative percentage of explained variance after each factor (ideally 70–80%) [ 37 , 49 , 50 ]. Oblique rotation (direct oblimin) method was applied since our factor correlation matrix showed a coefficient above the suggested cut-off 0.32 [ 37 , 49 , 50 ]. There is no consensus on threshold for sufficient loading of an item to a factor, but with a sample size of at least 100 patients, a loading of > 0.30 is usually considered significant [ 50 ]. Items that load substantially (> 0.3) on more than one factor are called complex variables and need to be taken into consideration [ 50 ].

Construct validity is defined as the degree to which the score of an outcome measurement is consistent with hypotheses of expected relationships to other PROs [ 39 ]. High correlations are expected when measurements of the same construct and with the same mode of administration are compared (convergent). Conversely, lower correlations are expected when different constructs are compared (divergent). Previously published results of correlations between the MSTS and concurrent outcome measures were used as guidance when formulating predefined hypotheses [ 19 ]. MSTS sum scores were expected to have high correlations to scorings from TESS, PSFS and QLQ-C30 physical function, as they all measure functioning subjectively [ 7 , 51 ]. MSTS sum score was expected to correlate at a moderate level with QLQ-C30 sum score, since it is a multidimensional measurement [ 52 ]. Concurrent measurements of more narrow constructs (e.g., pain, walk capacity, emotional function) were expected to have high correlations to single items of the MSTS but low correlations to MSTS sum score [ 53 ] The research group formulated hypotheses of correlation prior to analyses. Cut-offs for high (≥ 0.60), moderate (> 0.30 to < 0.60) and low (≥ 0.30) correlation were applied [ 40 ]. For a positive rating of hypothesis testing, at least 75% of predefined hypotheses should be confirmed [ 46 ]. The Spearman’s rank correlation coefficient test was used.

Content validity

Semi-structured interview. The patients ( n  = 30), identified a total of 94 important activities which they found impossible or difficult to perform. These single activities were categorized into 12 meaningful concepts (Fig.  1 ). The three most frequently identified activities were Walking ( n  = 14), Sports ( n  = 19) and Running ( n  = 20), with median (min–max) difficulty levels of 3.5 (0–5) points, 1 (0–7) point, and 0 (0–6) points, respectively.

figure 1

Number of activities (dark grey bar) the patients found important and were unable to perform or had difficulties with because of the condition. Median score (light grey bar) of the level of difficulty ranging from 0–10 points (0 = unable to perform the activity, 10 = able to perform activity at same level as before surgery). ***Squatting: This includes the isometric position in a squat and the dynamic squat. **Walking: This is a summary of walking in various speeds and distances in diverse terrain. *Sports: This includes various sports activities such as soccer, swimming, golf, tennis, badminton, dancing, water polo and skiing

Items’ relevance for the construct of functioning. All MSTS-items, except for Emotional acceptance, could be linked to ICF-codes (Table  3 ). The item Function was considered a wide concept and could be linked to any ICF-code under the domains (b) and (d).

Items’ relevance for the included sample. Two of six MSTS-items could be linked to PSFS (Table  3 ). The MSTS-item Function could be linked to any activity identified in the PSFS.

Key concepts . The MSTS-items Pain and Functioning were linked to the different domains Pain and Function defined in both core outcome sets [ 43 , 44 ]. The domain ‘patient satisfaction’, in the core outcome set for joint replacement, was partly linked to the MSTS-item Emotional acceptance, since one response option included the word ‘satisfied’.

Comprehensiveness. The response options for the items Pain, Function and Walking Ability changed content throughout the scale (Table  3 ). The response options ‘disabling’ and ‘disability’ could be linked to several ICF-codes and the response option ‘recreational’ could be linked to several activities identified in the PSFS (Table  3 ).

Data quality

Item median values ranged from 3 to 5 and all response options were used (Table  4 ). None of the items showed floor effects, but all items, except for Function, showed ceiling effects (Table  4 ). There were no internal missing values.

Internal consistency

Three inter-item correlation coefficients exceeded 0.50 (Supports and Walking ability, r  = 0.60; Supports and Gait, r  = 0.55; Walking ability and Gait, r  = 0.55) (Table  5 ). As our PCA did not support unidimensionality, but a two-factor solution, the item-total and the Cronbach’s alpha was only tested for Factor 1. The item Function showed the lowest item-total correlation ( r  = 0.45) but did not fall below the limit of < 0.30 (Table  5 ). The items Supports and Walking ability showed Cronbach’s alpha, if item deleted, below accepted values between 0.70 and 0.90 (Table  5 ).

Structural validity

The inter-item correlation between Pain and Gait showed a low correlation ( r  = 0.19). Since this study was not a data reduction exercise and the two items had acceptable correlations to remaining items, they were retained. The KMO was 0.79 and the Bartlett’s test was significant ( p  < 0.001) suggesting adequate data for the performance of a factor analysis.

The scree plot illustrated a steep slope for Factor 1 (eigenvalue 2.904), intermediate slope for Factor 2 (eigenvalue 1.017) and almost flat slope for Factor 3 (eigenvalue 0.685) (Fig.  2 ). The cumulative percentage of total variance explained was 48.4% for Factor 1 and 65.4% for Factor 1 and 2. Based on eigenvalues, cumulative percent and the scree plot, a two-factor solution was suggested for the analysis of factor-loading pattern.

figure 2

Scree plot of the principal component analysis

The factor loading pattern for a two-factor solution showed high loadings for Supports, Gait, Walking ability and Function to Factor 1, but not for Pain and Emotional acceptance (Table  6 ). The items Walking ability and Function loaded > 0.30 on two factors, thus identified as complex variables.

Construct validity

Six out of 13 (46%) predefined hypotheses were ascertained (Table  7 ). The TESS, QLQ-C30 sum score, QLQ-C30 physical functioning sub score and pain ratings showed high correlations to the MSTS (Table  7 ). The QLQ-C30 sum score was not expected to have a high correlation to the MSTS, since it measures QoL and not functioning only. The MSTS showed a low correlation to the PSFS, which was unexpected since both should reflect the construct of functioning . Also, the MSTS item Walking ability had an unexpectedly low correlation to walking capacity (6MWT).

The MSTS-LE showed insufficient content validity. The internal consistency and hypothesis testing were below acceptable levels. We found ceiling effects in five of six items and, in contrast to other studies, our analyses supported a two-factor solution.

The evaluation of content validity showed that there were concerns with the three quality criteria; relevance, comprehensiveness, and comprehensibility. The item Emotional acceptance was not relevant to the construct of functioning , the item Function was relevant to the construct but had a too broad and unspecific content. Pain and Function should pertain to separate constructs. Three items did not have matching response options and many patient-important activities identified in the interview were not represented in the MSTS. The MSTS has been criticised for not involving patients’ perception of function in the development of items and response options [ 19 ]. We used a semi-structured interview to evaluate the MSTS-items’ relevance to the population of interest. Our results showed that the patients reported many more functions and activities that were important for them, than those in the MSTS. For example, recreational activities such as gardening, bicycling, hiking, and different sports activities were considered important, but not specifically named in the MSTS. For the measurement of functioning , an alternative to the MSTS could be the TESS [ 54 ]. The items of the TESS were development based on input from patients with bone and soft-tissue sarcoma [ 19 ]. Comparing the TESS to our interview, the TESS includes kneeling, walking, gardening, and recreational activities also found in our interviews, suggesting that TESS has a more relevant content than the MSTS for this patient group. The evaluation of the items’ relevance to the construct of functioning showed that the item Emotional acceptance could not be linked to the ICF. This suggests that Emotional acceptance does not reflect functioning and should not be part of PROs with functioning as the construct of interest. Further, the linking process of the item Function was of concern, as it could be linked to many ICF-codes, reflecting several functions, resulting in a very broad and unspecific content. This was supported by a relatively low item-total correlation for the item Function, suggesting that the content is unspecific and there is scope for more items under the same construct [ 48 ]. An unspecific content will make interpretation difficult. The items Pain and Function were linked to important key concepts, but were defined as two separate domains in the core outcome sets suggesting that they reflect different constructs [ 43 , 44 ]. When separate constructs are measured, they should either pertain to different PROs, or they should be treated separately in multidimensional scales [ 36 ]. Based on the unspecific content of the item Function and the potential mix of different constructs within the MSTS-LE, a sum score should be interpreted with caution. Moreover, the evaluation of comprehensibility of response options showed similar results as Lee et al., with concerns about the formulations for Pain, Function and Walking ability [ 4 ]. The item Pain relates to the intake of analgesics rather than the perception of pain and the items Function and Walking ability change content throughout the scale. One main requirement in formulating items and their response options is that they should be simple, easy to understand and the response options should match their items [ 3 ]. Since the response options for three items of the MSTS-LE change in content, the response options are difficult to interpret and do not match the items.

The factor analysis in our study supported a two-factor solution. In contrast to our results, two earlier studies considered the MSTS to be unidimensional, i.e. consisting of one factor only [ 5 , 6 ]. The scree plots in the two earlier studies showed elbow shapes located at the second factor, similar to our study, but their eigenvalues for the second factors were just below 1, whereas ours was just above 1. Determining the number of factors, and thereby the dimensionality of a measurement, can be difficult when scree plots do not take a characteristic sharp elbow shape and eigenvalues are close to the cut-off value 1. One of the earlier studies discussed the possibility of a two-factor solution but decided to let the eigenvalue < 1 for a second factor determine the unidimensionality of the MSTS [ 5 ]. Values close to cut-offs can lead to different conclusions in different studies, which in this case indicates that the MSTS is not sufficiently robust between samples. Further, looking at the factor-loading patterns, it is doubtful whether the MSTS can be supported as a unidimensional measurement of functioning . Our study clearly showed that the items Pain and Emotional acceptance had low loadings to Factor 1 and high loadings to Factor 2, indicating that Pain and Emotional acceptance are explained by another underlying construct than functioning . This is supported by the three earlier studies showing lower loadings for Pain and Emotional acceptance compared to the other items of the MSTS although, they never tested a two-factor solution and investigated whether Pain and Emotional acceptance had a better fit to a second factor [ 5 , 6 , 9 ]. Since Pain and Emotional acceptance can be vaguely explained by the underlying construct of functioning , they should be treated as a separate factor. The MSTS-LE should therefore not be considered a unidimensional measurement of functioning , but rather a multidimensional measurement where the dimensions should be treated separately with separate subscores rather than a sum score, as is current practice.

One limitation in our study was sample size. It is recommended that at least 100 patients are included when performing factor analyses [ 36 , 37 , 38 ]. With the data available ( n  = 87) one could consider increasing the limit for an item to contribute sufficiently to a factor from > 0.30 to > 0.50 [ 36 ]. By doing so, the items Function would not load sufficiently to Factor 1. This leaves the item Function a complex variable only, not pertaining to any of the two factors, which complicates the interpretation of the MSTS even further. Another limitation is time from surgery to assessment point. In all three cohorts time from surgery varied widely and for most included patients many years had elapsed. Time from surgery can affect which patients could be included from the complete cohorts. Because around 60–80% of the patients in the three cohorts were alive at inclusion [ 11 , 12 , 55 ], the cohorts could comprise patients with a better outcome of physical function than the background population. Including a subgroup with a better function from the total population has presumably biased the results to better scorings of the MSTS and can possibly explain our high ceiling effects.

Conclusions

The MSTS showed insufficient content validity and when asking patients, other functions than those included in the MSTS were of importance. Our findings do not support the MSTS as a unidimensional measurement of functioning , but a two-factor solution. Thus, MSTS sum scores should be interpreted with caution. We suggest that alternative outcomes, such as the TESS and objective measurements, are considered for the evaluation of functioning in clinical practice and future research.

Data availability

The datasets used during the current study are available from the corresponding author on reasonable request.

Abbreviations

Musculoskeletal Tumor Society Score

Musculoskeletal Tumor Society Score – Lower Extremity

Toronto Extremity Salvage Score

Global Modular Replacement System

Patient Specific Functional Scale

Patient Reported Outcome

Numeric Rating Scale

The EORTC Quality of Life Questionnaire Core 30, v. 3

Quality of Life

30-second chair stand test

6-minute walk test

International Classification of Functioning, Disability and Health

Kaiser-Meyer-Olkin

Principal Component Analysis

Enneking WF, Dunham W, Gebhardt MC, Malawar M, Pritchard DJ. A system for the functional evaluation of reconstructive procedures after surgical treatment of tumors of the musculoskeletal system. Clin Orthop Relat Res 1993:241–6.

Amino K, Kawaguchi N, Matsumoto S, Manabe J, Furuya K, Isobe Y. Functional Evaluation of Limb Salvage Operation for Malignant Bone and Soft Tissue Tumors Using the Evaluation System of the Musculoskeletal Tumor Society. New Developments for Limb Salvage in Musculoskeletal Tumors. 1989:27–30.

Terwee CB, Prinsen CAC, Chiarotto A, Westerman MJ, Patrick DL, Alonso J et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Quality of Life Research 2018;March.

Lee SH, Kim DJ, Oh JH, Yoo KH, Kim HS. Validation of a functional evaluation system in patients with musculoskeletal tumors. Clin Orthop Relat Res 2003:217–26.

Rebolledo DCS, Vissoci JRN, Pietrobon RP, de Camargo OP, Baptista AM. Validation of the Brazilian version of the musculoskeletal tumor society rating scale for lower extremity bone sarcoma. Clin Orthop Relat Res. 2013;471:4020–6.

Article   PubMed   PubMed Central   Google Scholar  

Iwata S, Uehara K, Ogura K, Akiyama T, Shinoda Y, Yonemoto T et al. Reliability and Validity of a Japanese-language and Culturally Adapted Version of the Musculoskeletal Tumor Society Scoring System for the Lower Extremity. Clin Orthop Relat Res. 2016;474:2044–52. 2016.

Saebye CKP, Keller J, Baad-Hansen T. Validation of the Danish version of the musculoskeletal tumour society score questionnaire. World J Orthop. 2019;10:23–32.

Janssen SJ, Pereira NRP, Raskin KA, Ferrone ML, Hornicek FJ, van Dijk CN, et al. A comparison of questionnaires for assessing physical function in patients with lower extremity bone metastases. J Surg Oncol. 2016;114:691–6.

Article   PubMed   Google Scholar  

Mallet J, El Kinani M, Crenn V, Ageneau P, Berchoud J, Varenne Y et al. French translation and validation of the cross-cultural adaptation of the MSTS functional assessment questionnaire completed after tumor surgery. Orthopaedics & Traumatology: Surgery & Research. 2023;109(3):103574.

Fernandes L, Holm CE, Villadsen A, Sørensen MS, Zebis MK, Petersen MM. Clinically important reductions in physical function and quality of life in adults with Tumor prostheses in the hip and knee: a cross-sectional study. Clin Orthop Relat Res. 2021;479:2306–19.

Holm CE, Bardram C, Riecke AF, Horstmann P, Petersen MM. Implant and limb survival after resection of primary bone tumors of the lower extremities and reconstruction with mega-prostheses fifty patients followed for a mean of fourteen years. Int Orthop. 2018;42:1175–81.

Yilmaz M, Sørensen MS, Saebye CKP, Baad-Hansen T, Petersen MM. Long-term results of the global modular replacement system tumor prosthesis for reconstruction after limb-sparing bone resections in orthopedic oncologic conditions: results from a national cohort. J Surg Oncol. 2019;120:183–92.

Stratford P, Gill C, Westaway M, Binkley J. Assessing disability and change on individual patients: a report of a patient specific measure. Physiotherapy Can. 1995;47:258–63.

Article   Google Scholar  

Barten JA, Pisters MF, Huisman P, Takken T, Veenhof C. Measurement properties of patient-specific instruments measuring physical function. J Clin Epidemiol. 2012;65:590–601.

Article   CAS   PubMed   Google Scholar  

Cieza A, Geyh S, Chatterji S, Kostanjsek N, Ustün B, Stucki G. ICF linking rules: an update based on lessons learned. J Rehabil Med. 2005;37:212–8.

Berghmans DDP, Lenssen AF, van Rhijn LW, de Bie RA. The patient-specific functional scale: its reliability and responsiveness in patients undergoing a total knee arthroplasty. J Orthop Sports Phys Ther. 2015;45(7):550–6.

Hawker GA, Mian S, Kendzerska T, French M. Measures of adult pain: visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ), short-form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), short Form-36 Bodily Pain Scale (SF. Arthritis Care Res (Hoboken). 2011;63(Suppl 1):S240–52.

PubMed   Google Scholar  

Hjermstad MJ, Fayers PM, Haugen DF, Caraceni A, Hanks GW, Loge JH, et al. Studies comparing numerical rating scales, verbal rating scales, and visual analogue scales for assessment of pain intensity in adults: a systematic literature review. J Pain Symptom Manage. 2011;41:1073–93.

Davis AM, Wright JG, Williams JI, Bombardier C, Griffin A, Bell RS. Development of a measure of physical function for patients with bone and soft tissue sarcoma. Qual Life Res. 1996;5:508–16.

Willeumier JJ, van der Wal CWPG, van der Wal RJP, Dijkstra PDS, Vliet Vlieland TPM, van de Sande MAJ. Cross-cultural adaptation, translation, and Validation of the Toronto Extremity Salvage Score for Extremity Bone and soft tissue tumor patients in Netherlands. Sarcoma. 2017;2017:6197525.

Saebye CKP, Safwat A, Kaa AK, Pedersen NA, Keller J. Validation of a Danish version of the Toronto Extremity Salvage Score questionnaire for patients with sarcoma in the extremities. Dan Med J. 2014;61:A4734.

Fayers PM, Aaronson NK, Bjordal K, Groenvold M, Curran D, Bottomley A. The EORTC QLQ-C30 Scoring Manual. Vol. (3rd Edition). 2001.

Juul T, Petersen MA, Holzner B, Laurberg S, Christensen P, Grønvold M. Danish population-based reference data for the EORTC QLQ-C30: associations with gender, age and morbidity. Qual Life Res. 2014;23:2183–93.

Koller M, Aaronson NK, Blazeby J, Bottomley A, Dewolf L, Fayers PM, et al. Translation procedures for standardised quality of life questionnaires: the European Organisation for Research and Treatment of Cancer (EORTC) approach. Eur J Cancer. 2007;43(12):1810–20.

Orange ST, Marshall P, Madden LA, Vince RV. Can sit-to-stand muscle power explain the ability to perform functional tasks in adults with severe obesity? J Sports Sci. 2019;37:1227–34.

Dobson F, Hinman RS, Hall M, Terwee CB, Roos EM, Bennell KL. Measurement properties of performance-based measures to assess physical function in hip and knee osteoarthritis: a systematic review. Osteoarthritis Cartilage. 2012;20:1548–62.

Crockett K, Ardell K, Hermanson M, Penner A, Lanovaz J, Farthing J, et al. The relationship of knee-extensor strength and rate of torque development to sit-to-stand performance in older adults. Physiotherapy Can. 2013;65:229–35.

Slaughter SE, Wagg AS, Jones CA, Schopflocher D, Ickert C, Bampton E, et al. Mobility of vulnerable elders study: Effect of the sit-to-stand activity on mobility, function, and quality of life. J Am Med Dir Assoc. 2015;16:138–43.

Tveter AT, Dagfinrud H, Moseng T, Holm I. Health-related physical fitness measures: reference values and reference equations for Use in Clinical Practice. Arch Phys Med Rehabil. 2014;95:1366–73.

Jones CJ, Rikli RE, Beam WC. A 30-s chair-stand test as a measure of lower body strength in community-residing older adults. Res Q Exerc Sport. 1999;70:113–9.

Burr JF, Bredin SSD, Faktor MD, Warburton DER. The 6-Minute Walk Test as a predictor of objectively measured aerobic fitness in healthy working-aged adults. Phys Sportsmed. 2011;39:133–9.

Dam JC, van Bekkering E, Bramer WP, Beishuizen JAM, Fiocco A, Dijkstra M. Functional outcome after surgery in patients with bone sarcoma around the knee; results from a long-term prospective study. J Surg Oncol. 2017;115:1028–32.

Galiano-Castillo N, Arroyo-Morales M, Ariza-Garcia A, Sánchez-Salado C, Fernández-Lao C, Cantarero-Villanueva I, et al. The six-minute walk test as a measure of health in breast cancer patients. J Aging Phys Act. 2016;24:508–15.

Bohannon RW, Crouch R. Minimal clinically important difference for change in 6-minute walk test distance of adults with pathology: a systematic review. J Eval Clin Pract. 2017;23:377–81.

Schmidt K, Vogt L, Thiel C, Jäger E, Banzer W. Validity of the six-minute walk test in cancer patients. Int J Sports Med. 2013;34(7):631–6.

de Vet HCW, Terwee CB, Lidwine B, Mokkink LB, Knol DL. Measurement in Medicine. Vol. 1st ed. New York: 2011.

Park JH, Kim JI. Practical Consideration of Factor Analysis for the Assessment of Construct Validity. J Korean Acad Nurs. 2021;51(6):643–7.

Suhr DD. Principal Component Analysis vs. exploratory factor analysis. Stat Data Anal. 2005;30:203–30.

Google Scholar  

Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford P, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–45.

Andresen EM. Criteria for assessing the tools of disability outcomes research. Arch Phys Med Rehabil. 2000;81:S15–20.

Chiarotto A, Ostelo RW, Turk DC, Buchbinder R, Boers M. Core outcome sets for research and clinical practice. Braz J Phys Ther. 2017;21:77–84.

Ramsey I, Eckert M, Hutchinson AD, Marker J, Corsini N. Core outcome sets in cancer and their approaches to identifying and selecting patient-reported outcome measures: a systematic review. J Patient Rep Outcomes 2020;4.

Ramsey I, Corsini N, Hutchinson AD, Marker J, Eckert M. A core set of patient-reported outcomes for population-based cancer survivorship research: a consensus study. J Cancer Surviv. 2021;15:201–12.

Singh JA, Dowsey MM, Dohm M, Goodman SM, Leong AL, Voshaar MMJHS, et al. Achieving consensus on total joint replacement trial outcome reporting using the OMERACT filter: endorsement of the final core domain set for total hip and total knee replacement trials for endstage arthritis. J Rheumatol. 2017;44:1723–6.

Singh JA, Dowsey MM, Choong PF. Patient endorsement of the Outcome measures in Rheumatology (OMERACT) total joint replacement (TJR) clinical trial draft core domain set. BMC Musculoskelet Disord 2017;18.

Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

Scholtes VA, Terwee CB, Poolman RW. What makes a measurement instrument valid and reliable? Injury. 2011;42(3):236–40.

Tavakol M, Dennick R. Making sense of Cronbach’s alpha. Int J Med Educ. 2011;2:53–5.

Brown JD. Choosing the Right Number of Components or factors in PCA and EFA. JALT Testing&Evaluation SIG Newsl. 2009;2009(13):3–19.

Brown JD. Choosing the right type of Rotation in PCA and EFA. JALT Testing&Evaluation SIG Newsl. 2009;13:3.

Kim HS, Yun J, Kang S, Han I. Cross-cultural adaptation and validation of the Korean Toronto Extremity Salvage Score for extremity sarcoma. J Surg Oncol. 2015;112(1):93–7.

Saebye CKP, Fugloe M, Nymark H, Safwat T, Petersen A, Baad-Hansen MM. Factors associated with reduced functional outcome and quality of life in patients having limb-sparing surgery for soft tissue sarcomas - a national multicenter study of 128 patients. Acta Oncol. 2017;56(2):239–44.

Marchese VG, Rai SN, Carlson CA, Pamela SH, Spearing EM. Assessing functional mobility in survivors of lower-extremity sarcoma: reliability and validity of a new assessment tool. 2007;49(2):183–9.

Kask G, Barner-Rasmussen I, Repo JP, Kjäldman M, Kilk K, Blomqvist C, et al. Functional outcome measurement in patients with lower-extremity soft tissue sarcoma: a systematic literature review. Ann Surg Oncol. 2019;26(13):4707–22.

Holm CE, Soerensen MS, Yilmaz M, Petersen MM. Evaluation of tumor-prostheses over time: Complications, functional outcome, and comparative statistical analysis after resection and reconstruction in orthopedic oncologic conditions in the lower extremities. SAGE Open Med [Internet]. 2022 Apr 21 [cited 2023 Aug 8];10:20503121221094190. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9047786/

Bekendtgørelse af lov om. videnskabsetisk behandling af sundhedsvidenskabelige forskningsprojekter og sundhedsdatavidenskabelige forskningsprojekter [Internet]. LBK 1338 af 01/09/2020. http://www.retsinformation.dk/eli/lta/2020/1338

Download references

Acknowledgements

The authors thank the patients participating in the three cohorts included.

LF received funding from Vissing Fonden, Aalborg, Denmark (grant number 85969).

Author information

Authors and affiliations.

Department of Midwifery, Physiotherapy, Occupational Therapy, and Psychomotor Therapy, Faculty of Health, University College Copenhagen, Copenhagen, Denmark

Nikolai Sherling & Linda Fernandes

Musculoskeletal Tumor Section, Department of Orthopedic Surgery, University Hospital Rigshospitalet, Copenhagen, Denmark

Müjgan Yilmaz, Christina Enciso Holm & Michael Mørk Petersen

Institute of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

Michael Mørk Petersen

You can also search for this author in PubMed   Google Scholar

Contributions

N.S., M.M.P. and L.F. contributed to concept and design. M.Y., C.E.H. and L.F. collected the data. N.S. and L.F. contributed to data analysis, statistical analysis, and manuscript preparation. N.S. and L.F. contributed to literature search, manuscript editing, and manuscript review. M.Y., C.E.H. and M.M.P. revised the manuscript. All authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Linda Fernandes .

Ethics declarations

Ethics approval and consent to participate.

The study was carried out according to the Helsinki Declaration. Approvals from the Danish Data Protection Agency (VD-2018-20-6594, 2013-41-2591, 2013-41‐2591), the Capital Regional Committee on Health Research Ethics (H-18032141) and the Danish Health and Medicines Authority (no. 3-3013-894/1, 3‐3013‐1045/1/) were obtained prior to inclusion. Patients in Cohort 1 and 2 received oral and written information and signed informed consent prior to inclusion. In case of minors under the age of 16, informed consent to participate was obtained from the parents. Cohort 3 was a retrospective cohort, using data from registers. In Denmark, informed consent in registered based studies is deemed unnecessary according to national legislation [ 56 ].

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sherling, N., Yilmaz, M., Holm, C.E. et al. Validity of the Musculoskeletal Tumor Society Score for lower extremity in patients with bone sarcoma or giant cell tumour of bone undergoing bone resection and reconstruction surgery in hip and knee. BMC Cancer 24 , 1019 (2024). https://doi.org/10.1186/s12885-024-12686-9

Download citation

Received : 29 August 2023

Accepted : 24 July 2024

Published : 16 August 2024

DOI : https://doi.org/10.1186/s12885-024-12686-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Musculoskeletal tumor society score
  • Bone sarcoma
  • Physical function
  • Outcome measures

ISSN: 1471-2407

research study using likert scale

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

research study using likert scale

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Adapting harvests: a comprehensive study of farmers’ perceptions, adaptation strategies, and climatic trends in dera ghazi khan, pakistan.

research study using likert scale

1. Introduction

2. materials and methods, 2.1. study area, 2.2. data collection, 2.2.1. climate data, 2.2.2. survey data.

  • Selection of the District Dera Ghazi Khan, Punjab, Pakistan as the study area.
  • Selection of two tehsils, D.G. Khan and Taunsa (sub-administrative unit), from the study area.
  • Selection of three union councils (smallest administrative unit in the country) from each chosen tehsil.
  • Selection of three villages from each chosen union council using simple random sampling.
  • A sample of 10 farmers from each village was taken randomly for interview.

2.3. Data Analysis

2.3.1. descriptive statistics, 2.3.2. mann–kendall trend test, 2.3.3. ordinal logistic regression, 2.3.4. binary logistic regression, 3.1. climate change trends, 3.2. demographic characteristics of farmers, 3.3. farmers’ perceptions of climate change, 3.4. socio-economic factors affecting farmers’ perceptions regarding climate change, 3.5. climate change effects on farmer’s life and income sources, 3.6. farmer’s adaptation strategies to climate change, 3.7. factors affecting farmers’ adaptation strategies, 4. discussion, 5. conclusions and implications, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Lobell, D.B.; Schlenker, W.; Costa-Roberts, J. Climate trends and global crop production since 1980. Science 2011 , 333 , 616–620. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Schmidhuber, J.; Tubiello, F.N. Global food security under climate change. Proc. Natl. Acad. Sci. USA 2007 , 104 , 19703–19708. [ Google Scholar ] [ CrossRef ]
  • Zeleke, T.; Beyene, F.; Deressa, T.; Yousuf, J.; Kebede, T. Vulnerability of Smallholder Farmers to Climate Change-Induced Shocks in East Hararghe Zone, Ethiopia. Sustainability 2021 , 13 , 2162. [ Google Scholar ] [ CrossRef ]
  • Arsene, M.B.; Nkulu Mwine Fyama, J. Potential threats to agricultural food production and farmers’ coping strategies in the marshlands of Kabare in the Democratic Republic of Congo. Cogent Food Agric. 2021 , 7 , 1933747. [ Google Scholar ] [ CrossRef ]
  • Eckstein, D.; Künzel, V.; Schäfer, L. The Global Climate Risk Index 2021 ; Germanwatch: Bonn, Germany, 2021. [ Google Scholar ]
  • Aslam, A.Q.; Ahmad, S.R.; Ahmad, I.; Hussain, Y.; Hussain, M.S. Vulnerability and impact assessment of extreme climatic event: A case study of southern Punjab, Pakistan. Sci. Total Environ. 2017 , 580 , 468–481. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Qamer, F.M.; Abbas, S.; Ahmad, B.; Hussain, A.; Salman, A.; Muhammad, S.; Nawaz, M.; Shrestha, S.; Iqbal, B.; Thapa, S. A framework for multi-sensor satellite data to evaluate crop production losses: The case study of 2022 Pakistan floods. Sci. Rep. 2023 , 13 , 4240. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Government of Pakistan. Pakistan Economic Survey 2022–23 ; Finance and Economic Affairs Division, Ministry of Finance, Government of Pakistan: Islamabad, Pakistan, 2023.
  • Füssel, H.-M.; Klein, R.J.T. Climate Change Vulnerability Assessments: An Evolution of Conceptual Thinking. Clim. Change 2006 , 75 , 301–329. [ Google Scholar ] [ CrossRef ]
  • Nelson, E.; Mendoza, G.; Regetz, J.; Polasky, S.; Tallis, H.; Cameron, D.; Chan, K.M.; Daily, G.C.; Goldstein, J.; Kareiva, P.M.; et al. Modeling multiple ecosystem services, biodiversity conservation, commodity production, and tradeoffs at landscape scales. Front. Ecol. Environ. 2009 , 7 , 4–11. [ Google Scholar ] [ CrossRef ]
  • Robinson, S.-A. Climate change adaptation in SIDS: A systematic review of the literature pre and post the IPCC Fifth Assessment Report. WIREs Clim. Change 2020 , 11 , e653. [ Google Scholar ] [ CrossRef ]
  • Grothmann, T.; Patt, A. Adaptive capacity and human cognition: The process of individual adaptation to climate change. Glob. Environ. Change 2005 , 15 , 199–213. [ Google Scholar ] [ CrossRef ]
  • Gbetibouo, G.A.; Hassan, R.M.; Ringler, C. Modelling farmers’ adaptation strategies for climate change and variability: The case of the Limpopo Basin, South Africa. Agrekon 2010 , 49 , 217–234. [ Google Scholar ] [ CrossRef ]
  • Deressa, T.T.; Hassan, R.M.; Ringler, C. Perception of and adaptation to climate change by farmers in the Nile basin of Ethiopia. J. Agric. Sci. 2011 , 149 , 23–31. [ Google Scholar ] [ CrossRef ]
  • Below, T.B.; Mutabazi, K.D.; Kirschke, D.; Franke, C.; Sieber, S.; Siebert, R.; Tscherning, K. Can farmers’ adaptation to climate change be explained by socio-economic household-level variables? Glob. Environ. Change 2012 , 22 , 223–235. [ Google Scholar ] [ CrossRef ]
  • Marie, M.; Yirga, F.; Haile, M.; Tquabo, F. Farmers’ choices and factors affecting adoption of climate change adaptation strategies: Evidence from northwestern Ethiopia. Heliyon 2020 , 6 , e03867. [ Google Scholar ] [ CrossRef ]
  • Wheeler, S.; Zuo, A.; Bjornlund, H. Farmers’ climate change beliefs and adaptation strategies for a water scarce future in Australia. Glob. Environ. Change 2013 , 23 , 537–547. [ Google Scholar ] [ CrossRef ]
  • Menike, L.M.C.S.; Arachchi, K.A.G.P.K. Adaptation to Climate Change by Smallholder Farmers in Rural Communities: Evidence from Sri Lanka. Procedia Food Sci. 2016 , 6 , 288–292. [ Google Scholar ] [ CrossRef ]
  • Tesfaye, W.; Seifu, L. Climate change perception and choice of adaptation strategies. Int. J. Clim. Change Strateg. Manag. 2016 , 8 , 253–270. [ Google Scholar ] [ CrossRef ]
  • Usman, M.; Ali, A.; Bashir, M.K.; Radulescu, M.; Mushtaq, K.; Wudil, A.H.; Baig, S.A.; Akram, R. Do farmers’ risk perception, adaptation strategies, and their determinants benefit towards climate change? Implications for agriculture sector of Punjab, Pakistan. Environ. Sci. Pollut. Res. 2023 , 30 , 79861–79882. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ahmad, D.; Afzal, M. Climate change adaptation impact on cash crop productivity and income in Punjab province of Pakistan. Environ. Sci. Pollut. Res. 2020 , 27 , 30767–30777. [ Google Scholar ] [ CrossRef ]
  • Government of Punjab. Climate of Dera Ghazi Khan. Available online: https://dgkhan.punjab.gov.pk/climate (accessed on 5 May 2023).
  • Saleem, M.; Arfan, M.; Ansari, K.; Hassan, D. Analyzing the Impact of Ungauged Hill Torrents on the Riverine Floods of the River Indus: A Case Study of Koh E Suleiman Mountains in the DG Khan and Rajanpur Districts of Pakistan. Resources 2023 , 12 , 26. [ Google Scholar ] [ CrossRef ]
  • Mahmood, N.; Arshad, M.; Mehmood, Y.; Faisal Shahzad, M.; Kächele, H. Farmers’ perceptions and role of institutional arrangements in climate change adaptation: Insights from rainfed Pakistan. Clim. Risk Manag. 2021 , 32 , 100288. [ Google Scholar ] [ CrossRef ]
  • Visschers, V.H.M. Public Perception of Uncertainties Within Climate Change Science. Risk Anal. 2018 , 38 , 43–55. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pohlert, T. Non-Parametric Trend Tests and Change-Point Detection. CRAN Repository. 2016. Available online: https://CRAN.R-project.org/package=trend (accessed on 5 May 2023).
  • Prokopy, L.S.; Arbuckle, J.G.; Barnes, A.P.; Haden, V.R.; Hogan, A.; Niles, M.T.; Tyndall, J. Farmers and Climate Change: A Cross-National Comparison of Beliefs and Risk Perceptions in High-Income Countries. Environ. Manag. 2015 , 56 , 492–504. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Robitzsch, A. Why Ordinal Variables Can (Almost) Always Be Treated as Continuous Variables: Clarifying Assumptions of Robust Continuous and Ordinal Factor Analysis Estimation Methods. Front. Educ. 2020 , 5 , 589965. [ Google Scholar ] [ CrossRef ]
  • Abid, M.; Scheffran, J.; Schneider, U.A.; Ashfaq, M. Farmers’ perceptions of and adaptation strategies to climate change and their determinants: The case of Punjab province, Pakistan. Earth Syst. Dynam. 2015 , 6 , 225–243. [ Google Scholar ] [ CrossRef ]
  • Ayanlade, A.; Radeny, M.; Morton, J.F. Comparing smallholder farmers’ perception of climate change with meteorological data: A case study from southwestern Nigeria. Weather Clim. Extrem. 2017 , 15 , 24–33. [ Google Scholar ] [ CrossRef ]
  • Pakistan Meteorological Department. Pakistan’s Monthly Climate Summary: August 2022 ; Pakistan Meteorological Department: Islamabad, Pakistan, 2022.
  • Iqbal, S.; Khan, A.N.; Jadoon, M.A.; Alam, I. Effects of Flood-2010 on Agricultural Sector in Khyber Pakhtunkhwa: A Case of District Charsadda. Sarhad J. Agric. 2018 , 34 , 1–224. [ Google Scholar ] [ CrossRef ]
  • Douxchamps, S.; Van Wijk, M.T.; Silvestri, S.; Moussa, A.S.; Quiros, C.; Ndour, N.Y.B.; Buah, S.; Somé, L.; Herrero, M.; Kristjanson, P.; et al. Linking agricultural adaptation strategies, food security and vulnerability: Evidence from West Africa. Reg. Environ. Change 2016 , 16 , 1305–1317. [ Google Scholar ] [ CrossRef ]
  • Kosoe, E.A.; Ahmed, A. Climate change adaptation strategies of cocoa farmers in the Wassa East District: Implications for climate services in Ghana. Clim. Serv. 2022 , 26 , 100289. [ Google Scholar ] [ CrossRef ]
  • Wang, D.; Hejazi, M.; Cai, X.; Valocchi, A.J. Climate change impact on meteorological, agricultural, and hydrological drought in central Illinois. Water Resour. Res. 2011 , 47 , W09527. [ Google Scholar ] [ CrossRef ]
  • Church, S.P.; Dunn, M.; Babin, N.; Mase, A.S.; Haigh, T.; Prokopy, L.S. Do advisors perceive climate change as an agricultural risk? An in-depth examination of Midwestern U.S. Ag advisors’ views on drought, climate change, and risk management. Agric. Hum. Values 2018 , 35 , 349–365. [ Google Scholar ] [ CrossRef ]
  • Nam, W.-H.; Choi, J.-Y.; Hong, E.-M. Irrigation vulnerability assessment on agricultural water supply risk for adaptive management of climate change in South Korea. Agric. Water Manag. 2015 , 152 , 173–187. [ Google Scholar ] [ CrossRef ]
  • Iglesias, A.; Garrote, L. Adaptation strategies for agricultural water management under climate change in Europe. Agric. Water Manag. 2015 , 155 , 113–124. [ Google Scholar ] [ CrossRef ]
  • Bhattacharyya, P.; Pathak, H.; Pal, S. Water Management for Climate-Smart Agriculture. In Climate Smart Agriculture: Concepts, Challenges, and Opportunities ; Bhattacharyya, P., Pathak, H., Pal, S., Eds.; Springer: Singapore, 2020; pp. 57–72. [ Google Scholar ]
  • Abid, M.; Scheffran, J.; Schneider, U.A.; Elahi, E. Farmer Perceptions of Climate Change, Observed Trends and Adaptation of Agriculture in Pakistan. Environ. Manag. 2019 , 63 , 110–123. [ Google Scholar ] [ CrossRef ]
  • Amir, S.; Saqib, Z.; Khan, M.I.; Ali, A.; Khan, M.A.; Bokhari, S.A.; Zaman ul, H. Determinants of farmers’ adaptation to climate change in rain-fed agriculture of Pakistan. Arab. J. Geosci. 2020 , 13 , 1025. [ Google Scholar ] [ CrossRef ]
  • Debaeke, P.; Pellerin, S.; Scopel, E. Climate-smart cropping systems for temperate and tropical agriculture: Mitigation, adaptation and trade-offs. Cah. Agric. 2017 , 26 , 34002. [ Google Scholar ] [ CrossRef ]
  • Atube, F.; Malinga, G.M.; Nyeko, M.; Okello, D.M.; Alarakol, S.P.; Okello-Uma, I. Determinants of smallholder farmers’ adaptation strategies to the effects of climate change: Evidence from northern Uganda. Agric. Food Secur. 2021 , 10 , 6. [ Google Scholar ] [ CrossRef ]
  • Deressa, T.T.; Hassan, R.M.; Ringler, C.; Alemu, T.; Yesuf, M. Determinants of farmers’ choice of adaptation methods to climate change in the Nile Basin of Ethiopia. Glob. Environ. Change 2009 , 19 , 248–255. [ Google Scholar ] [ CrossRef ]
  • Suarez, P.; Linnerooth-Bayer, J.; Mechler, R. Feasibility of Risk Financing Schemes for Climate Adaptation: The Case of Malawi ; World Bank: Washington, DC, USA, 2007. [ Google Scholar ]
  • Gebru, G.W.; Ichoku, H.E.; Phil-Eze, P.O. Determinants of smallholder farmers’ adoption of adaptation strategies to climate change in Eastern Tigray National Regional State of Ethiopia. Heliyon 2020 , 6 , e04356. [ Google Scholar ] [ CrossRef ]
  • Ashraf, M.; Routray, J.K.; Saeed, M. Determinants of farmers’ choice of coping and adaptation measures to the drought hazard in northwest Balochistan, Pakistan. Nat. Hazards 2014 , 73 , 1451–1473. [ Google Scholar ] [ CrossRef ]
  • Asfaw, S.; McCarthy, N.; Lipper, L.; Arslan, A.; Cattaneo, A. What determines farmers’ adaptive capacity? Empirical evidence from Malawi. Food Secur. 2016 , 8 , 643–664. [ Google Scholar ] [ CrossRef ]
  • Trinh, T.Q.; Rañola, R.F.; Camacho, L.D.; Simelton, E. Determinants of farmers’ adaptation to climate change in agricultural production in the central region of Vietnam. Land Use Policy 2018 , 70 , 224–231. [ Google Scholar ] [ CrossRef ]
  • Salvatore Di, F.; Marcella, V. How Can African Agriculture Adapt to Climate Change? A Counterfactual Analysis from Ethiopia. Land Econ. 2013 , 89 , 743. [ Google Scholar ] [ CrossRef ]
  • Barnett, J.; O’neill, S. maladaptation. Glob. Environ. Change 2010 , 20 , 211–213. [ Google Scholar ] [ CrossRef ]
  • Hussain, S.S.; Mudasser, M. Prospects for wheat production under changing climate in mountain areas of Pakistan—An econometric analysis. Agric. Syst. 2007 , 94 , 494–501. [ Google Scholar ] [ CrossRef ]
  • Atta ur, R.; Khan, A.N. Analysis of flood causes and associated socio-economic damages in the Hindukush region. Nat. Hazards 2011 , 59 , 1239–1260. [ Google Scholar ] [ CrossRef ]
  • Ahmad, H.; Öztürk, M.; Ahmad, W.; Khan, S.M. Status of Natural Resources in the Uplands of the Swat Valley Pakistan. In Climate Change Impacts on High-Altitude Ecosystems ; Öztürk, M., Hakeem, K.R., Faridah-Hanum, I., Efe, R., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 49–98. [ Google Scholar ]
  • Raza, A.; Ahrends, H.; Habib-Ur-Rahman, M.; Gaiser, T. Modeling Approaches to Assess Soil Erosion by Water at the Field Scale with Special Emphasis on Heterogeneity of Soils and Crops. Land 2021 , 10 , 422. [ Google Scholar ] [ CrossRef ]
  • Shah, A.A.; Shaw, R.; Ye, J.; Abid, M.; Amir, S.M.; Kanak Pervez, A.K.M.; Naz, S. Current capacities, preparedness and needs of local institutions in dealing with disaster risk reduction in Khyber Pakhtunkhwa, Pakistan. Int. J. Disaster Risk Reduct. 2019 , 34 , 165–172. [ Google Scholar ] [ CrossRef ]
  • Mamun, A.A.; Roy, S.; Islam, A.R.M.T.; Alam, G.M.M.; Alam, E.; Chandra Pal, S.; Sattar, M.A.; Mallick, J. Smallholder Farmers’ Perceived Climate-Related Risk, Impact, and Their Choices of Sustainable Adaptation Strategies. Sustainability 2021 , 13 , 11922. [ Google Scholar ] [ CrossRef ]
  • Saeed Khan, K. Analysing local perceptions of post-conflict and post-floods livelihood interventions in Swat, Pakistan. Dev. Policy Rev. 2019 , 37 , O274–O292. [ Google Scholar ] [ CrossRef ]
  • Ali, A.; Rana, I.A.; Ali, A.; Najam, F.A. Flood risk perception and communication: The role of hazard proximity. J. Environ. Manag. 2022 , 316 , 115309. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bergmann, S. Climate Change Changes Religion. Stud. Theol. Nord. J. Theol. 2009 , 63 , 98–118. [ Google Scholar ] [ CrossRef ]
  • Hulme, M. Climate Change and the Significance of Religion. Econ. Political Wkly. 2017 , 52 , 14–17. [ Google Scholar ]
  • Rahman, M.H.U.; Ahmad, A.; Wang, X.; Wajid, A.; Nasim, W.; Hussain, M.; Ahmad, B.; Ahmad, I.; Ali, Z.; Ishaque, W.; et al. Multi-model projections of future climate and climate change impacts uncertainty assessment for cotton production in Pakistan. Agric. For. Meteorol. 2018 , 253–254 , 94–113. [ Google Scholar ] [ CrossRef ]
  • Amin, A.; Nasim, W.; Mubeen, M.; Ahmad, A.; Nadeem, M.; Urich, P.; Fahad, S.; Ahmad, S.; Wajid, A.; Tabassum, F.; et al. Simulated CSM-CROPGRO-cotton yield under projected future climate by SimCLIM for southern Punjab, Pakistan. Agric. Syst. 2018 , 167 , 213–222. [ Google Scholar ] [ CrossRef ]
  • Anser, M.K.; Hina, T.; Hameed, S.; Nasir, M.H.; Ahmad, I.; Naseer, M.A.U.R. Modeling Adaptation Strategies against Climate Change Impacts in Integrated Rice-Wheat Agricultural Production System of Pakistan. Int. J. Environ. Res. Public Health 2020 , 17 , 2522. [ Google Scholar ] [ CrossRef ]
  • Ali, M.F.; Rose, S. Farmers’ perception and adaptations to climate change: Findings from three agro-ecological zones of Punjab, Pakistan. Environ. Sci. Pollut. Res. 2021 , 28 , 14844–14853. [ Google Scholar ] [ CrossRef ]
  • Shahid, R.; Shijie, L.; Shahid, S.; Altaf, M.A.; Shahid, H. Determinants of reactive adaptations to climate change in semi-arid region of Pakistan. J. Arid Environ. 2021 , 193 , 104580. [ Google Scholar ] [ CrossRef ]
  • Dang, H.L.; Li, E.; Nuberg, I.; Bruwer, J. Factors influencing the adaptation of farmers in response to climate change: A review. Clim. Dev. 2019 , 11 , 765–774. [ Google Scholar ] [ CrossRef ]
  • Ojo, T.O.; Baiyegunhi, L.J.S. Determinants of climate change adaptation strategies and its impact on the net farm income of rice farmers in south-west Nigeria. Land Use Policy 2020 , 95 , 103946. [ Google Scholar ] [ CrossRef ]
  • Thinda, K.T.; Ogundeji, A.A.; Belle, J.A.; Ojo, T.O. Understanding the adoption of climate change adaptation strategies among smallholder farmers: Evidence from land reform beneficiaries in South Africa. Land Use Policy 2020 , 99 , 104858. [ Google Scholar ] [ CrossRef ]
  • Kibue, G.W.; Pan, G.; Joseph, S.; Xiaoyu, L.; Jufeng, Z.; Zhang, X.; Li, L. More than two decades of climate change alarm: Farmers knowledge, attitudes and perceptions. Afr. J. Agric. Res. 2015 , 10 , 2617–2625. [ Google Scholar ]
  • Bryan, E.; Ringler, C.; Okoba, B.; Roncoli, C.; Silvestri, S.; Herrero, M. Adapting agriculture to climate change in Kenya: Household strategies and determinants. J. Environ. Manag. 2013 , 114 , 26–35. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sardar, A.; Kiani, A.K.; Kuslu, Y. Does adoption of climate-smart agriculture (CSA) practices improve farmers’ crop income? Assessing the determinants and its impacts in Punjab province, Pakistan. Environ. Dev. Sustain. 2021 , 23 , 10119–10140. [ Google Scholar ] [ CrossRef ]
  • Jamil, I.; Jun, W.; Mughal, B.; Waheed, J.; Hussain, H.; Waseem, M. Agricultural Innovation: A comparative analysis of economic benefits gained by farmers under climate resilient and conventional agricultural practices. Land Use Policy 2021 , 108 , 105581. [ Google Scholar ] [ CrossRef ]
  • Gifford, R. The dragons of inaction: Psychological barriers that limit climate change mitigation and adaptation. Am. Psychol. 2011 , 66 , 290–302. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • McNeeley, S.M.; Lazrus, H. The Cultural Theory of Risk for Climate Change Adaptation. Weather Clim. Soc. 2014 , 6 , 506–519. [ Google Scholar ] [ CrossRef ]
  • Khatri-Chhetri, A.; Aggarwal, P.K.; Joshi, P.K.; Vyas, S. Farmers’ prioritization of climate-smart agriculture (CSA) technologies. Agric. Syst. 2017 , 151 , 184–191. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Variable NameVariable Description
Dependent Variables (Ordinal)
Perceived weather uncertaintyLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived pollutionLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived soil erosionLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived floodsLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived heatwaveLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived rainLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Perceived droughtLikert Scale: ranges from 1 (Very low) to 5 (Very High)
Dependent Variables (Logit)
Change in planting dates1 = Adopted, 0 = Not adopted
Change crop varieties1 = Adopted, 0 = Not adopted
Use of water conservation techniques1 = Adopted, 0 = Not adopted
Implementation of soil conservation techniques1 = Adopted, 0 = Not adopted
Use of shades and shelters1 = Adopted, 0 = Not adopted
Migration1 = Adopted, 0 = Not adopted
Insurance1 = Adopted, 0 = Not adopted
Search for off-farming jobs1 = Adopted, 0 = Not adopted
Religious beliefs or prayers1 = Adopted, 0 = Not adopted
Change the use of chemical fertilizers, pesticides, and insecticides1 = Adopted, 0 = Not adopted
Independent Variables
AgeContinuous
EducationContinuous
Land in acresContinuous
ExperienceContinuous
Climatic ParametersSum of RanksKendall’s Taup-Value (Two-Tailed)Var (S)Sen’s SlopeHypothesis
Annual Rainfall640.3370.0419506.592H Accept
Annual Maximum Temperature10.0051.0009330H Accept
Annual Minimum Temperature−66−0.3570.034938.6−0.065H Accept
Environmental IssuesPerception (%)Mean
Very LowLowModerateHighVery High
Weather Uncertainty15.027.827.222.27.82.80
Floods3.324.438.927.85.63.07
Rain1.115.628.343.911.13.48
Drought7.832.837.220.61.72.75
Heat waves9.429.435.021.15.02.82
Soil erosion25.622.828.318.94.42.53
VariablesPerceived Weather UncertaintyPerceived RainPerceived Soil ErosionPerceived FloodsPerceived HeatwavePerceived Drought
EstimateOREstimateOREstimateOREstimateOREstimateOREstimateOR
Age−0.017 **0.9840.007 **1.007−0.026 **0.9740.006 **1.006−0.193 *0.825−0.031 **0.969
Education0.226 *1.2540.180 *1.1970.109 *1.1150.326 *1.3860.272 *1.3130.220 *1.246
Land−0.016 **0.9840.002 **1.0020.279 *1.3210.117 *1.1240.000 **1.0000.005 **1.005
Experience in farming−0.012 **0.988−0.066 *0.936−0.012 **0.9880.041 *1.0420.025 **1.0250.072 *1.075
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Shah, S.A.A.; Mehmood, M.S.; Muhammad, I.; Ahamad, M.I.; Wu, H. Adapting Harvests: A Comprehensive Study of Farmers’ Perceptions, Adaptation Strategies, and Climatic Trends in Dera Ghazi Khan, Pakistan. Sustainability 2024 , 16 , 7070. https://doi.org/10.3390/su16167070

Shah SAA, Mehmood MS, Muhammad I, Ahamad MI, Wu H. Adapting Harvests: A Comprehensive Study of Farmers’ Perceptions, Adaptation Strategies, and Climatic Trends in Dera Ghazi Khan, Pakistan. Sustainability . 2024; 16(16):7070. https://doi.org/10.3390/su16167070

Shah, Syed Ali Asghar, Muhammad Sajid Mehmood, Ihsan Muhammad, Muhammad Irfan Ahamad, and Huixin Wu. 2024. "Adapting Harvests: A Comprehensive Study of Farmers’ Perceptions, Adaptation Strategies, and Climatic Trends in Dera Ghazi Khan, Pakistan" Sustainability 16, no. 16: 7070. https://doi.org/10.3390/su16167070

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Scholars Crossing

  • Liberty University
  • Jerry Falwell Library
  • Special Collections
  • < Previous

Home > ETD > Doctoral > 5842

Doctoral Dissertations and Projects

Purpose, performance, and process influence on airline pilot trust in automation technology: a quantitative study.

Thomas Robert Meyer , Liberty University Follow

School of Aeronautics

Doctor of Philosophy

Julie Speakes

automation technology, purpose, performance, process, trust, System Trustworthiness Scale

Disciplines

Recommended citation.

Meyer, Thomas Robert, "Purpose, Performance, and Process Influence on Airline Pilot Trust in Automation Technology: A Quantitative Study" (2024). Doctoral Dissertations and Projects . 5842. https://digitalcommons.liberty.edu/doctoral/5842

The purpose of this quantitative, descriptive survey study was to determine if purpose, performance, and process influence airline pilot trust in automation technology. The role of a tool is an extension of human capabilities. Initially limited to mechanical extensions of arms and legs, tools are more sophisticated and extend into mental abilities. Quantum leaps in computer and automation technology mitigate repetitive or complex calculations using developed cognitive processes. Through the Trust in Automation theoretical lens, this study used the Likert-based System Trustworthiness Scale offered online to approximately 3,000 airline pilots using simple random sampling methods comprising voluntary submissions. Data was analyzed using multiple linear regression. The findings of this research indicated that airline pilots generally trust automation technology. Further, airline pilot trust in automation technology is influenced by system performance, purpose, and process. Despite the benefits of this study, there remains vast potential for unlimited future research into variations in pilot demographics, diverse technologies, and differing flight deck automation technology design philosophies. This study was intended as a generalized overview. A more granular and specific study may provide profound insight.

Since August 09, 2024

Included in

Aviation Commons

  • Collections
  • Faculty Expert Gallery
  • Theses and Dissertations
  • Conferences and Events
  • Open Educational Resources (OER)
  • Explore Disciplines

Advanced Search

  • Notify me via email or RSS .

Faculty Authors

  • Submit Research
  • Expert Gallery Login

Student Authors

  • Undergraduate Submissions
  • Graduate Submissions
  • Honors Submissions

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

IMAGES

  1. 30 Free Likert Scale Templates And Examples Templatelab

    research study using likert scale

  2. Likert Scale Definition Examples And Analysis

    research study using likert scale

  3. Editable Rubric Template

    research study using likert scale

  4. Results from Likert-scale portion of questionnaire.

    research study using likert scale

  5. How To Use Likert Scale In Online Surveys Ultimate Guide With Examples

    research study using likert scale

  6. The Psychology Behind The Likert Scale

    research study using likert scale

COMMENTS

  1. Analyzing and Interpreting Data From Likert-Type Scales

    A sizable percentage of the educational research manuscripts submitted to the Journal of Graduate Medical Education employ a Likert scale for part or all of the outcome assessments. Thus, understanding the interpretation and analysis of data derived from Likert scales is imperative for those working in medical education and education research.

  2. Measuring attitude towards mathematics using Likert scale ...

    In the research on mathematics education, numerous Likert-type instruments estimating attitudes toward mathematics are sometimes composed of factors with a high correlation, which can make it difficult to assign the statements from the scale to each estimated factor. Further, the measurement of attitudes is usually done by adding the scores but ignoring the existence of possible differences in ...

  3. A Review of Key Likert Scale Development Advances: 1995-2019

    Abstract. Developing self-report Likert scales is an essential part of modern psychology. However, it is hard for psychologists to remain apprised of best practices as methodological developments accumulate. To address this, this current paper offers a selective review of advances in Likert scale development that have occurred over the past 25 ...

  4. (PDF) Likert Scale: Explored and Explained

    model (use for est imation of ability), Likert scale. (measures human attitude) a re the examples of. such scales in Psychometrics used widely in the. social science & educational research [3,4,5 ...

  5. A descriptive analysis and interpretation of data from Likert scales in

    The study used descriptive research because the variables of interest in this study are not directly observed and as such, they are assessed by self-report measures using Likert rating scales ...

  6. Likert Scale in Social Sciences Research: Problems and Difficulties

    A Likert-type scale using a total score of a ll items is an in terval scale. By contra st, items using the Likert scale are ordinal scales (Carifio & Perla, 2008).

  7. Measuring attitude towards mathematics using Likert scale surveys: The

    In this research, a Likert-scale survey that is often used in the study of attitudes is analysed in order to improve the estimation of attitudes students have toward mathematics. Additionally, a new distribution of the items around the attitude factors is proposed based on the conclusions of a panel of experts, with subsequent confirmation by ...

  8. A Review of Key Likert Scale Development Advances: 1995-2019

    Introduction. Psychological data are diverse and range from observations of behavior to face-to-face interviews. However, in modern times, one of the most common measurement methods is the self-report Likert scale (Baumeister et al., 2007; Clark and Watson, 2019).Likert scales provide a convenient way to measure unobservable constructs, and published tutorials detailing the process of their ...

  9. Likert Scale

    Likert scaling is one of the most fundamental and frequently used assessment strategies in social science research (Joshi et al. 2015).A social psychologist, Rensis Likert (), developed the Likert scale to measure attitudes.Although attitudes and opinions had been popular research topics in the social sciences, the measurement of these concepts was not established until this time.

  10. Likert Scale: Survey Use & Examples

    The Likert scale is a well-loved tool in the realm of survey research. Named after psychologist Rensis Likert, it measures attitudes or feelings towards a topic on a continuum, typically from one extreme to the other. The scale provides quantitative data about qualitative aspects, such as attitudes, satisfaction, agreement, or likelihood.

  11. Examining Perceptions and Attitudes: A Review of Likert-Type Scales

    The purpose of this article is to compare and discuss the use of Likert-type scales and Q-methodology to examine perceptions and attitudes in nursing research. ... Q-methodology in nursing research: A promising method for the study of subjectivity. Western Journal of Nursing Research, 30, 759-773. Crossref. PubMed. Web of Science. Google ...

  12. What Is a Likert Scale?

    Revised on June 22, 2023. A Likert scale is a rating scale used to measure opinions, attitudes, or behaviors. It consists of a statement or a question, followed by a series of five or seven answer statements. Respondents choose the option that best corresponds with how they feel about the statement or question.

  13. PDF International Journal of Educational Methodology

    The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points.

  14. The Likert Scale: Definition, Examples and Use Cases

    The Likert scale is a popular tool for market research due to its reliability in measuring opinions, perceptions, and behaviors objectively. It is widely used by researchers to understand opinions and views about a brand, product, target market, employee satisfaction, and more. For instance, if you want to evaluate the success of a recent work ...

  15. Likert scale

    A Likert scale (/ ˈ l ɪ k ər t / LIK-ərt, [1] [note 1]) is a psychometric scale named after its inventor, American social psychologist Rensis Likert, [2] which is commonly used in research questionnaires.It is the most widely used approach to scaling responses in survey research, such that the term (or more fully the Likert-type scale) is often used interchangeably with rating scale ...

  16. Likert Scale 101: Everything You Need to Know (With Examples)

    A Likert scale is a rating scale used in survey research to measure attitudes, beliefs, opinions, or perceptions about a particular topic. The name comes from the inventor, psychologist Rensis Likert, who developed the concept in the 1930s. The scale consists of a series of numbered response options, ranging from strongly disagree to strongly ...

  17. Using a Likert Scale in Psychology

    Using a Likert Scale in Psychology. A Likert scale is a type of psychometric scale frequently used in psychology questionnaires. It was developed by and named after organizational psychologist Rensis Likert. Self-report inventories are one of the most widely used tools in psychological research.

  18. (PDF) THE LIKERT SCALE: EXPLORING THE UNKNOWNS AND THEIR ...

    4 Department of Statistics, Faculty of Physical Sciences, University for Development Studies, Ghana. *A uthor's, Email address: [email protected]. Abstract. The Likert scale is such an ...

  19. Measuring attitude towards mathematics using Likert scale surveys: The

    In this research, a Likert-scale survey that is often used in the study of attitudes is analysed in order to improve the estimation of attitudes students have toward mathematics. Addition-ally, a new distribution of the items around the attitude factors is proposed based on the con-clusions of a panel of experts, with subsequent confirmation by ...

  20. How to Analyze Likert Scale Data

    Likert scales are the most broadly used method for scaling responses in survey studies. Survey questions that ask you to indicate your level of agreement, from strongly agree to strongly disagree, use the Likert scale. The data in the worksheet are five-point Likert scale data for two groups. Likert data seem ideal for survey items, but there ...

  21. Likert scale

    Likert scale, rating system, used in questionnaires, that is designed to measure people's attitudes, opinions, or perceptions. Subjects choose from a range of possible responses to a specific question or statement; responses typically include "strongly agree," "agree," "neutral," "disagree," and "strongly disagree.".

  22. Likert scale interpretation of the results w/ examples

    An example of Likert scale data analysis and interpretation. Let's consider an example scenario and go through the steps of analyzing and interpreting Likert scale data. Scenario: A company conducts an employee satisfaction survey using a Likert scale to measure employees' attitudes toward various aspects of their work environment.

  23. Likert Scale Questionnaire: Examples & Analysis

    A Likert scale assumes that the strength/intensity of an attitude is linear, i.e., on a continuum from strongly agree to strongly disagree, and makes the assumption that attitudes can be measured. For example, each of the five (or seven) responses would have a numerical value that would be used to measure the attitude under investigation.

  24. Likert Scale

    The Likert scale is typically a 5-point scale ranging from strongly agree to strongly disagree. A Likert scale is typically used on surveys or questionnaires, which begins with a statement and ...

  25. Validity of the Musculoskeletal Tumor Society Score for lower extremity

    Each item is scored on a 5-point Likert scale, ranging from 0 (worst possible score) to 5 (best possible score) . The items have unique response options for 0 through 5 (Table ... a Delphi study. Quality of Life Research 2018;March. Lee SH, Kim DJ, Oh JH, Yoo KH, Kim HS. Validation of a functional evaluation system in patients with ...

  26. Adapting Harvests: A Comprehensive Study of Farmers' Perceptions

    The questions were designed based on a 5-point Likert scale to identify varying degrees of an individual's subjective risk assessment ... and water conservation techniques. The findings of the study correlate with research indicating that a greater level of education can enhance farmers' level of information and understanding regarding ...

  27. "Purpose, Performance, and Process Influence on Airline Pilot Trust in

    The purpose of this quantitative, descriptive survey study was to determine if purpose, performance, and process influence airline pilot trust in automation technology. The role of a tool is an extension of human capabilities. Initially limited to mechanical extensions of arms and legs, tools are more sophisticated and extend into mental abilities. Quantum leaps in computer and automation ...