University Library, University of Illinois at Urbana-Champaign

University of Illinois Library Wordmark

SPSS Tutorial: General Statistics and Hypothesis Testing

  • About This Tutorial
  • SPSS Components
  • Importing Data
  • General Statistics and Hypothesis Testing
  • Further Resources

Merging Files based on a shared variable.

This section and the "Graphics" section provide a quick tutorial for a few common functions in SPSS, primarily to provide the reader with a feel for the SPSS user interface. This is not a comprehensive tutorial, but SPSS itself provides comprehensive tutorials and case studies through it's help menu. SPSS's help menu is more than a quick reference. It provides detailed information on how and when to use SPSS's various menu options. See the "Further Resources" section for more information. 

To perform a one sample t-test click "Analyze"→"Compare Means"→"One Sample T-Test" and the following dialog box will appear:

hypothesis testing using spss

The dialogue allows selection of any scale variable from the box at the left and a test value that represents a hypothetical mean. Select the test variable and set the test value, then press "Ok." Three tables will appear in the Output Viewer:

hypothesis testing using spss

The first table gives descriptive statistics about the variable. The second shows the results of the t_test, including the "t" statistic, the degrees of freedom ("df") the p-value ("Sig."), the difference of the test value from the variable mean, and the upper and lower bounds for a ninety-five percent confidence interval. The final table shows one-sample effect sizes.

One-Way ANOVA

In the Data Editor, select "Analyze"→"Compare Means"→"One-Way ANOVA..." to open the dialog box shown below.

hypothesis testing using spss

To generate the ANOVA statistic the variables chosen cannot have a "Nominal" level of measurement; they must be "ordinal." 

Once the nominal variables have been changed to ordinal, select "the dependent variable and  the factor, then click "OK." The following output will appear in the Output Viewer:

hypothesis testing using spss

Linear Regression

To obtain a linear regression select "Analyze"->"Regression"->"Linear" from the menu, calling up the dialog box shown below:

hypothesis testing using spss

The output of this most basic case produces a summary chart showing R, R-square, and the Standard error of the prediction; an ANOVA chart; and a chart providing statistics on model coefficients:

hypothesis testing using spss

For Multiple regression, simply add more independent variables in the "Linear Regression" dialogue box. To plot a regression line see the "Legacy Dialogues" section of the "Graphics" tab.

Scholarly Commons

Profile Photo

  • << Previous: Importing Data
  • Next: Graphics >>
  • Last Updated: Mar 1, 2024 4:56 PM
  • URL: https://guides.library.illinois.edu/spss

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing using spss

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Prevent plagiarism. Run a free check.

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 3, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.5: Hypothesis Tests for Regression Models

  • Last updated
  • Save as PDF
  • Page ID 29536

  • Danielle Navarro
  • University of New South Wales

So far we’ve talked about what a regression model is, how the coefficients of a regression model are estimated, and how we quantify the performance of the model (the last of these, incidentally, is basically our measure of effect size). The next thing we need to talk about is hypothesis tests. There are two different (but related) kinds of hypothesis tests that we need to talk about: those in which we test whether the regression model as a whole is performing significantly better than a null model; and those in which we test whether a particular regression coefficient is significantly different from zero.

At this point, you’re probably groaning internally, thinking that I’m going to introduce a whole new collection of tests. You’re probably sick of hypothesis tests by now, and don’t want to learn any new ones. Me too. I’m so sick of hypothesis tests that I’m going to shamelessly reuse the F-test from Chapter 14 and the t-test from Chapter 13. In fact, all I’m going to do in this section is show you how those tests are imported wholesale into the regression framework.

Testing the model as a whole

Okay, suppose you’ve estimated your regression model. The first hypothesis test you might want to try is one in which the null hypothesis that there is no relationship between the predictors and the outcome, and the alternative hypothesis is that the data are distributed in exactly the way that the regression model predicts . Formally, our “null model” corresponds to the fairly trivial “regression” model in which we include 0 predictors, and only include the intercept term b 0

H 0 :Y i =b 0 +ϵ i

If our regression model has K predictors, the “alternative model” is described using the usual formula for a multiple regression model:

\(H_{1}: Y_{i}=\left(\sum_{k=1}^{K} b_{k} X_{i k}\right)+b_{0}+\epsilon_{i}\)

How can we test these two hypotheses against each other? The trick is to understand that just like we did with ANOVA, it’s possible to divide up the total variance SS tot into the sum of the residual variance SS res and the regression model variance SS mod . I’ll skip over the technicalities, since we covered most of them in the ANOVA chapter, and just note that:

SS mod =SS tot −SS res

And, just like we did with the ANOVA, we can convert the sums of squares in to mean squares by dividing by the degrees of freedom.

\(\mathrm{MS}_{m o d}=\frac{\mathrm{SS}_{m o d}}{d f_{m o d}}\) \(\mathrm{MS}_{r e s}=\frac{\mathrm{SS}_{r e s}}{d f_{r e s}}\)

So, how many degrees of freedom do we have? As you might expect, the df associated with the model is closely tied to the number of predictors that we’ve included. In fact, it turns out that df mod =K. For the residuals, the total degrees of freedom is df res =N−K−1.

\(\ F={MS_{mod} \over MS_{res}}\)

and the degrees of freedom associated with this are K and N−K−1. This F statistic has exactly the same interpretation as the one we introduced in Chapter 14. Large F values indicate that the null hypothesis is performing poorly in comparison to the alternative hypothesis. And since we already did some tedious “do it the long way” calculations back then, I won’t waste your time repeating them. In a moment I’ll show you how to do the test in R the easy way, but first, let’s have a look at the tests for the individual regression coefficients.

Tests for individual coefficients

The F-test that we’ve just introduced is useful for checking that the model as a whole is performing better than chance. This is important: if your regression model doesn’t produce a significant result for the F-test then you probably don’t have a very good regression model (or, quite possibly, you don’t have very good data). However, while failing this test is a pretty strong indicator that the model has problems, passing the test (i.e., rejecting the null) doesn’t imply that the model is good! Why is that, you might be wondering? The answer to that can be found by looking at the coefficients for the regression.2 model:

I can’t help but notice that the estimated regression coefficient for the baby.sleep variable is tiny (0.01), relative to the value that we get for dan.sleep (-8.95). Given that these two variables are absolutely on the same scale (they’re both measured in “hours slept”), I find this suspicious. In fact, I’m beginning to suspect that it’s really only the amount of sleep that I get that matters in order to predict my grumpiness.

Once again, we can reuse a hypothesis test that we discussed earlier, this time the t-test. The test that we’re interested has a null hypothesis that the true regression coefficient is zero (b=0), which is to be tested against the alternative hypothesis that it isn’t (b≠0). That is:

H 1 : b≠0

How can we test this? Well, if the central limit theorem is kind to us, we might be able to guess that the sampling distribution of \(\ \hat{b}\), the estimated regression coefficient, is a normal distribution with mean centred on b. What that would mean is that if the null hypothesis were true, then the sampling distribution of \(\ \hat{b}\) has mean zero and unknown standard deviation. Assuming that we can come up with a good estimate for the standard error of the regression coefficient, SE (\(\ \hat{b}\)), then we’re in luck. That’s exactly the situation for which we introduced the one-sample t way back in Chapter 13. So let’s define a t-statistic like this,

\(\ t = { \hat{b} \over SE(\hat{b})}\)

I’ll skip over the reasons why, but our degrees of freedom in this case are df=N−K−1. Irritatingly, the estimate of the standard error of the regression coefficient, SE(\(\ \hat{b}\)), is not as easy to calculate as the standard error of the mean that we used for the simpler t-tests in Chapter 13. In fact, the formula is somewhat ugly, and not terribly helpful to look at. For our purposes it’s sufficient to point out that the standard error of the estimated regression coefficient depends on both the predictor and outcome variables, and is somewhat sensitive to violations of the homogeneity of variance assumption (discussed shortly).

In any case, this t-statistic can be interpreted in the same way as the t-statistics that we discussed in Chapter 13. Assuming that you have a two-sided alternative (i.e., you don’t really care if b>0 or b<0), then it’s the extreme values of t (i.e., a lot less than zero or a lot greater than zero) that suggest that you should reject the null hypothesis.

Running the hypothesis tests in R

To compute all of the quantities that we have talked about so far, all you need to do is ask for a summary() of your regression model. Since I’ve been using regression.2 as my example, let’s do that:

The output that this command produces is pretty dense, but we’ve already discussed everything of interest in it, so what I’ll do is go through it line by line. The first line reminds us of what the actual regression model is:

You can see why this is handy, since it was a little while back when we actually created the regression.2 model, and so it’s nice to be reminded of what it was we were doing. The next part provides a quick summary of the residuals (i.e., the ϵi values),

which can be convenient as a quick and dirty check that the model is okay. Remember, we did assume that these residuals were normally distributed, with mean 0. In particular it’s worth quickly checking to see if the median is close to zero, and to see if the first quartile is about the same size as the third quartile. If they look badly off, there’s a good chance that the assumptions of regression are violated. These ones look pretty nice to me, so let’s move on to the interesting stuff. The next part of the R output looks at the coefficients of the regression model:

Each row in this table refers to one of the coefficients in the regression model. The first row is the intercept term, and the later ones look at each of the predictors. The columns give you all of the relevant information. The first column is the actual estimate of b (e.g., 125.96 for the intercept, and -8.9 for the dan.sleep predictor). The second column is the standard error estimate \(\ \hat{\sigma_b}\). The third column gives you the t-statistic, and it’s worth noticing that in this table t= \(\ \hat{b}\) /SE(\(\ \hat{b}\)) every time. Finally, the fourth column gives you the actual p value for each of these tests. 217 The only thing that the table itself doesn’t list is the degrees of freedom used in the t-test, which is always N−K−1 and is listed immediately below, in this line:

The value of df=97 is equal to N−K−1, so that’s what we use for our t-tests. In the final part of the output we have the F-test and the R 2 values which assess the performance of the model as a whole

So in this case, the model performs significantly better than you’d expect by chance (F(2,97)=215.2, p<.001), which isn’t all that surprising: the R 2 =.812 value indicate that the regression model accounts for 81.2% of the variability in the outcome measure. However, when we look back up at the t-tests for each of the individual coefficients, we have pretty strong evidence that the baby.sleep variable has no significant effect; all the work is being done by the dan.sleep variable. Taken together, these results suggest that regression.2 is actually the wrong model for the data: you’d probably be better off dropping the baby.sleep predictor entirely. In other words, the regression.1 model that we started with is the better model.

One-Sample T-Test using SPSS Statistics

Introduction.

The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. This population mean is not always known, but is sometimes hypothesized. For example, you want to show that a new teaching method for pupils struggling to learn English grammar can improve their grammar skills to the national average. Your sample would be pupils who received the new teaching method and your population mean would be the national average score. Alternately, you believe that doctors that work in Accident and Emergency (A & E) departments work 100 hour per week despite the dangers (e.g., tiredness) of working such long hours. You sample 1000 doctors in A & E departments and see if their hours differ from 100 hours.

This "quick start" guide shows you how to carry out a one-sample t-test using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a one-sample t-test to give you a valid result. We discuss these assumptions next.

SPSS Statistics

Assumptions.

When you choose to analyse your data using a one-sample t-test, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using a one-sample t-test. You need to do this because it is only appropriate to use a one-sample t-test if your data "passes" four assumptions that are required for a one-sample t-test to give you a valid result. In practice, checking for these four assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these four assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a one-sample t-test when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these four assumptions:

  • Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e., continuous ). Examples of variables that meet this criterion include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about interval and ratio variables in our article: Types of Variable .
  • Assumption #2: The data are independent (i.e., not correlated/related ), which means that there is no relationship between the observations. This is more of a study design issue than something you can test for, but it is an important assumption of the one-sample t-test.
  • Assumption #3: There should be no significant outliers . Outliers are data points within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the one-sample t-test, reducing the accuracy of your results. Fortunately, when using SPSS Statistics to run a one-sample t-test on your data, you can easily detect possible outliers. In our enhanced one-sample t-test guide, we: (a) show you how to detect outliers using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers.
  • Assumption #4: Your dependent variable should be approximately normally distributed . We talk about the one-sample t-test only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that the assumption can be a little violated and still provide valid results. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics. In addition to showing you how to do this in our enhanced one-sample t-test guide, we also explain what you can do if your data fails this assumption (i.e., if it fails it more than a little bit).

You can check assumptions #3 and #4 using SPSS Statistics. Before doing this, you should make sure that your data meets assumptions #1 and #2, although you don't need SPSS Statistics to do this. When moving on to assumptions #3 and #4, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use a one-sample t-test. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a one-sample t-test might not be valid. This is why we dedicate a number of sections of our enhanced one-sample t-test guide to help you get this right. You can find out about our enhanced content on our Features: Overview page.

In the section, Procedure , we illustrate the SPSS Statistics procedure required to perform a one-sample t-test assuming that no assumptions have been violated. First, we set out the example we use to explain the one-sample t-test procedure in SPSS Statistics.

Testimonials

Example and Setup in SPSS Statistics

A researcher is planning a psychological intervention study, but before he proceeds he wants to characterise his participants' depression levels. He tests each participant on a particular depression index, where anyone who achieves a score of 4.0 is deemed to have 'normal' levels of depression. Lower scores indicate less depression and higher scores indicate greater depression. He has recruited 40 participants to take part in the study. Depression scores are recorded in the variable dep_score . He wants to know whether his sample is representative of the normal population (i.e., do they score statistically significantly differently from 4.0).

For a one-sample t-test, there will only be one variable's data to be entered into SPSS Statistics: the dependent variable, dep_score , which is the depression score.

Test Procedure in SPSS Statistics

The 5-step Compare Means > One-Sample T Test... procedure below shows you how to analyse your data using a one-sample t-test in SPSS Statistics when the four assumptions in the previous section, Assumptions , have not been violated. At the end of these five steps, we show you how to interpret the results from this test. If you are looking for help to make sure your data meets assumptions #3 and #4, which are required when using a one-sample t-test, and can be tested using SPSS Statistics, you can learn more in our enhanced guides on our Features: Overview page.

Since some of the options in the Compare Means > One-Sample T Test... procedure changed in SPSS Statistics version 27 , we show how to carry out a one-sample t-test depending on whether you have SPSS Statistics versions 27 or 28 (or the subscription version of SPSS Statistics) or version 26 or an earlier version of SPSS Statistics . The latest versions of SPSS Statistics are version 28 and the subscription version . If you are unsure which version of SPSS Statistics you are using, see our guide: Identifying your version of SPSS Statistics .

SPSS Statistics versions 27 and 28 and the subscription version of SPSS Statistics

Shows the SPSS Statistics menu for the one-sample t-test

Published with written permission from SPSS Statistics, IBM Corporation.

'One-Sample T Test' dialogue box with the dependent variable, 'dep_score', in the box on the left

Note 1: By default, SPSS Statistics uses 95% confidence intervals (labelled as the C onfidence Interval Percentage in SPSS Statistics). This equates to declaring statistical significance at the p < .05 level. If you wish to change this you can enter any value from 1 to 99. For example, entering "99" into this box would result in a 99% confidence interval and equate to declaring statistical significance at the p < .01 level. For this example, keep the default 95% confidence intervals.

Note 2: If you are testing more than one dependent variable and you have any missing values in your data, you need to think carefully about whether to select Exclude c a ses analysis by analysis or Exc l ude cases listwise ) in the –Missing Values– area. Selecting the incorrect option could mean that SPSS Statistics removes data from your analysis that you wanted to include. We discuss this further and what options to select in our enhanced one-sample t-test guide.

Continue

Now that you have run the Compare Means > One-Sample T Test... procedure to carry out a one-sample t-test, go to the Interpreting Results section. You can ignore the section below, which shows you how to carry out a one-sample t-test if you have SPSS Statistics version 26 or an earlier version of SPSS Statistics.

SPSS Statistics version 26 and earlier versions of SPSS Statistics

Shows the SPSS Statistics menu for the one-sample t-test

Interpreting the SPSS Statistics output of the one-sample t-test

SPSS Statistics generates two main tables of output for the one-sample t-test that contains all the information you require to interpret the results of a one-sample t-test.

If your data passed assumption #3 (i.e., there were no significant outliers) and assumption #4 (i.e., your dependent variable was approximately normally distributed for each category of the independent variable), which we explained earlier in the Assumptions section, you will only need to interpret these two main tables. However, since you should have tested your data for these assumptions, you will also need to interpret the SPSS Statistics output that was produced when you tested for them (i.e., you will have to interpret: (a) the boxplots you used to check if there were any significant outliers; and (b) the output SPSS Statistics produces for your Shapiro-Wilk test of normality to determine normality). If you do not know how to do this, we show you in our enhanced one-sample t-test guide. Remember that if your data failed any of these assumptions, the output that you get from the one-sample t-test procedure (i.e., the tables we discuss below), will no longer be relevant, and you will need to interpret these tables differently.

However, in this "quick start" guide, we take you through each of the two main tables in turn, assuming that your data met all the relevant assumptions:

Descriptive statistics

You can make an initial interpretation of the data using the One-Sample Statistics table, which presents relevant descriptive statistics:

'One-Sample Statistics' table with columns 'N', 'Mean', 'Std. Deviation' & 'Std. Error Mean' shown for the dependent variable

It is more common than not to present your descriptive statistics using the mean and standard deviation (" Std. Deviation " column) rather than the standard error of the mean (" Std. Error Mean " column), although both are acceptable. You could report the results, using the standard deviation, as follows:

Mean depression score (3.72 ± 0.74) was lower than the population 'normal' depression score of 4.0.

Mean depression score ( M = 3.72, SD = 0.74) was lower than the population 'normal' depression score of 4.0.

However, by running a one-sample t-test, you are really interested in knowing whether the sample you have ( dep_score ) comes from a 'normal' population (which has a mean of 4.0). This is discussed in the next section.

One-sample t-test

The One-Sample Test table reports the result of the one-sample t-test. The top row provides the value of the known or hypothesized population mean you are comparing your sample data to, as highlighted below:

'Test Value' of 4 is highlighted in the 'One-Sample Test' table in SPSS Statistics

In this example, you can see the 'normal' depression score value of "4" that you entered in earlier. You now need to consult the first three columns of the One-Sample Test table, which provides information on whether the sample is from a population with a mean of 4 (i.e., are the means statistically significantly different), as highlighted below:

't', 'df' & 'Sig. (2-tailed)' values for the dependent variable, 'dep_score', are highlighted in the 'One-Sample Test' table

Moving from left-to-right, you are presented with the observed t -value (" t " column), the degrees of freedom (" df "), and the statistical significance ( p -value) (" Sig. (2-tailed) ") of the one-sample t-test. In this example, p < .05 (it is p = .022). Therefore, it can be concluded that the population means are statistically significantly different. If p > .05, the difference between the sample-estimated population mean and the comparison population mean would not be statistically significantly different.

Note: If you see SPSS Statistics state that the " Sig. (2-tailed) " value is ".000", this actually means that p < .0005. It does not mean that the significance level is actually zero.

SPSS Statistics also reports that t = -2.381 (" t " column) and that there are 39 degrees of freedom (" df " column). You need to know these values in order to report your results, which you could do as follows:

Depression score was statistically significantly lower than the population normal depression score, t (39) = -2.381, p = .022.

The breakdown of the last part (i.e., t (39) = -2.381, p = .022) is as follows:

In following table, 1 = 't', 2 = '39', 3 = '-2.381', and 4 = 'p = .022'

You can also include measures of the difference between the two population means in your written report. This information is included in the columns on the far-right of the One-Sample Test table, as highlighted below:

'Mean Difference' & '95% Confidence Interval of the difference' values highlighted for the dependent variable, 'dep_score'

This section of the table shows that the mean difference in the population means is -0.28 (" Mean Difference " column) and the 95% confidence intervals (95% CI) of the difference are -0.51 to -0.04 (" Lower " to " Upper " columns). For the measures used, it will be sufficient to report the values to 2 decimal places. You could write these results as:

Depression score was statistically significantly lower by 0.28 (95% CI, 0.04 to 0.51) than a normal depression score of 4.0, t (39) = -2.381, p = .022.

Depression score was statistically significantly lower by a mean of 0.28, 95% CI [0.04 to 0.51], than a normal depression score of 4.0, t (39) = -2.381, p = .022.

Standardised effect sizes

After reporting the unstandardised effect size, we might also report a standardised effect size such as Cohen's d (Cohen, 1988) or Hedges' g (Hedges, 1981). In our example, this may be useful for future studies where researchers want to compare the "size" of the effect in their studies to the size of the effect in this study.

There are many different types of standardised effect size, with different types often trying to "capture" the importance of your results in different ways. In SPSS Statistics versions 18 to 26 , SPSS Statistics did not automatically produce a standardised effect size as part of a one-sample t-test analysis. However, it is easy to calculate a standardised effect size such as Cohen's d (Cohen, 1988) using the results from the one-sample t-test analysis. In SPSS Statistics versions 27 and 28 (and the subscription version of SPSS Statistics), two standardised effect sizes are automatically produced: Cohen's d and Hedges' g , as shown in the One-Sample Effect Sizes table below:

'Cohen's d' & 'Hedges' g'. One-Sample Effect Sizes table. One-sample t-test in SPSS

Reporting the SPSS Statistics output of the one-sample t-test

You can report the findings, without the tests of assumptions, as follows:

Mean depression score (3.73 ± 0.74) was lower than the normal depression score of 4.0, a statistically significant difference of 0.28 (95% CI, 0.04 to 0.51), t (39) = -2.381, p = .022.

Mean depression score ( M = 3.73, SD = 0.74) was lower than the normal depression score of 4.0, a statistically significant mean difference of 0.28, 95% CI [0.04 to 0.51], t (39) = -2.381, p = .022.

Adding in the information about the statistical test you ran, including the assumptions, you have:

A one-sample t-test was run to determine whether depression score in recruited subjects was different to normal, defined as a depression score of 4.0. Depression scores were normally distributed, as assessed by Shapiro-Wilk's test ( p > .05) and there were no outliers in the data, as assessed by inspection of a boxplot. Mean depression score (3.73 ± 0.74) was lower than the normal depression score of 4.0, a statistically significant difference of 0.28 (95% CI, 0.04 to 0.51), t (39) = -2.381, p = .022.

A one-sample t-test was run to determine whether depression score in recruited subjects was different to normal, defined as a depression score of 4.0. Depression scores were normally distributed, as assessed by Shapiro-Wilk's test ( p > .05) and there were no outliers in the data, as assessed by inspection of a boxplot. Mean depression score ( M = 3.73, SD = 0.74) was lower than the normal depression score of 4.0, a statistically significant mean difference of 0.28, 95% CI [0.04 to 0.51], t (39) = -2.381, p = .022.

Null hypothesis significance testing

You can write the result in respect of your null and alternative hypothesis as:

There was a statistically significant difference between means ( p < .05). Therefore, we can reject the null hypothesis and accept the alternative hypothesis.

Practical vs. statistical significance

Although a statistically significant difference was found between the depression scores in the recruited subjects vs. the normal depression score, it does not necessarily mean that the difference encountered, 0.28 (95% CI, 0.04 to 0.51), is enough to be practically significant. Indeed, the researcher might accept that although the difference is statistically significant (and would report this), the difference is not large enough to be practically significant (i.e., the subjects can be treated as normal).

In our enhanced one-sample t-test guide, we show you how to write up the results from your assumptions tests and one-sample t-test procedure if you need to report this in a dissertation/thesis, assignment or research report. We do this using the Harvard and APA styles. We also explain how to interpret the results from the One-Sample Effect Sizes table, which include the two standardised effect sizes: Cohen's d and Hedges' g . You can learn more about our enhanced content in our Features: Overview section.

Hypothesis test in SPSS

April 16, 2019

For the purpose of this tutorial, I’m gonna be using the sample data set demo.sav , available under installdir/IBM/SPSS/Statistics/[version]/Samples/[lang] , in my case, on Windows that would be C:\Program Files\IBM\SPSS\Statistics\25\Samples\English .

  • If you haven’t already make sure to open the sample data set demo.sav (this data set is incidentally available in many different formats, such as txt and xlsx ).
  • Click on Analyze>>Nonparametric Tests>>One Sample…
  • In the resulting window, choose Automatically compare observed data to hypothesized .
  • Click on the tab Fields .
  • Depending on the version of SPSS, either all variables or just the categorical ones are available in the right column, Test Fields . However, for the purpose of this tutorial we’ll perform a one-sample binomial test so keep Gender which is a nominal variable and remove the rest (if the column Test Fields isn’t populated just add Gender and you’re good to go). The following hypothesis test will consequently answer the question What proportion of this sample is male or female?
  • Under the next tab, Settings , there is the possibility to customize Significance level and Confidence interval. However the defaults are already at 0.05 and 95% respectively which will do just fine.
  • Click Run .
  • The result is a single nonparametric test. In the resulting table the null hypothesis is stated as The categories defined by Gender = Female and Male occur with probabilities 0.5 and 0.5 . The significance for this test SPSS calculated as 0.608 which is quite high and consequently the recommendation is to retain the null hypothesis (as the significance level is 0.05), which in this case means that the proportions male and female are about equal.

Hypothesis test summary for a one-sample nonparametric test

Logo for Rhode Island College Digital Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Quantitative Data Analysis With SPSS

10 Quantitative Analysis with SPSS: Getting Started

Mikaila Mariel Lemonik Arthur

This chapter focuses on getting started with SPSS. Note that before you can start to work with SPSS, you need to get your data into an appropriate format, as discussed in the chapter on Preparing Quantitative Data and Data Management . It is possible to enter data directly into SPSS, but the interface is not conducive to data entry and so researchers are better off entering their data using a spreadsheet program and then importing it.

Importing Data Into SPSS

In some cases, existing data will be able to be downloaded in SPSS format (*.sav is the file extension for an SPSS datafile), in which case it can be opened in SPSS by going to File → Open → Data and then locating the location of the file.  However, in most cases, researchers will need to import data stored in another file format into SPSS. To import data, go to the file menu, then select import data. Next, choose the type of data you wish to import from the menu that appears. In most cases, researchers will be importing Excel or CSV data (when they have entered it themselves or are downloading it from a general-purpose site like the Census Bureau) or SAS or Stata data (when they are downloading it from a site that makes prepared statistical data files available).

A screenshot showing the visual navigation to import data in SPSS. To navigate by keys: Alt+F opens the file menu; then Alt+D opens the import data menu. Then choose Alt+B to run a query on database data; Alt+E for Excel, Alt+C for CSV, Alt+T for text data, Alt+S for SAS; Alt+a for Stata; Alt+B for dBase--there are two commands using Alt+B; Alt+L for Lotus; Alt+Y for SYLK; Alt+M for Cognos TM1; and Alt+O for Cognos Business Intelligence.

Once you click on a data type, a window will pop up for you to select the file you wish to import. Be sure it is of the file type you have chosen. If you import a file in a format that is already designed to work with statistical software, such as Stata, the importation process will be as seamless as opening a file. Researchers should be sure that immediately after importing, they save their file (File → Save As) so that it is stored in SPSS format and can be opened in SPSS, rather than imported, in the future. It is essential to remember that SPSS is not cloud-resident software and does not have an autosave function, so any time a file is changed, it must be manually saved.

A screenshot of the popup window for importation of an Excel file. To navigate the window: Alt+k for selecting the worksheet; Alt+n for selecting the range within the worksheet; Alt+e for the percentage of variables that determine data type (default is 95); Alt+I for ignore hidden rows and columns (which will be greyed out if none are hidden); Alt+M for remove leading spaces from string values; Alt+g for remove trailing spaces for string values.

If you import a file in Excel, CSV (comma-separated values) or text format, SPSS will open an import wizard with a number of steps. The steps vary slightly depending on which file type you are importing. For instance, to import an Excel file, as shown in Figure 2, you first need to specify the worksheet (if the file has multiple worksheets—SPSS can only import one worksheet at a time). You can choose to specify a limited range of cells. Checking the checkbox next to “Read variable names from first row of data” will replace the V1, V2, V3, and so on column headers with whatever appears in the top row of data in the Excel file. You can also choose to change the percentage of values that are used to determine data type, remove leading and trailing spaces from string values, and—if your Excel file has hidden rows or columns—you can choose to ignore them. Below the options, a preview of your Excel file will be shown; you can scroll through the preview to see that data is being displayed correctly. Clicking OK will finalize the import.

A screenshot of the import CSV popup. Alt+v toggles whether the first line contains variable names; Alt+M whether to remove leading spaces from string variables; Alt+G for removing trailing spaces from string variables; Alt+D to indicate whether the delimiter between values is a comma, semicolon, or tab; Alt+S to indicate whether the decimal symbol is a period or comma; Alt+T to indicate whether the text qualifier is a double quote, single quote, or none; and Alt+C for whether to cache data locally. Alt+O opens a text wizard which will be discussed under importing text.

A different set of options appears when you import a CSV file, as shown in Figure 3. The top of the popup window shows a preview of the data in CSV format. While toggles related to whether the first line contains variable names, removing leading and trailing spaces, and indicating the percentage of values that determine the data type are the same as for importing data from Excel, there are additional options that are important for the proper importing of CSV data. First of all, the user must specify whether values are delimited by a comma, a semicolon, or a tab. While commas are the most common delimiters in CSV files, the other delimiters are possible, and looking at the preview should make clear which of the delimiters is being used in a given file, as shown in the example below.

Second, the user must specify whether the period or the comma is the decimal symbol. Data produced in the United States typically uses the period (as in 1238.67), as does data produced in many other English-speaking countries, while most of Europe and Latin America use the comma. Third, the user must specify the text qualifier (single quotes, double quotes, or none). This is the character used to note that the contents of a particular entry in the CSV file are textual (string variables) in nature, not numerical. If your data includes text, it should be clear from the preview which qualifier is being used. Users can also toggle whether data is cached locally or not; caching locally speeds the importation process.

Finally, there is a button for Advanced Options (Text Wizard). The text wizard offers the same window and options that users see if they are importing a text file directly, and this wizard offers more direct control over the importation process over a series of six steps. First, users can specify a predefined format if they have a *.tpf file on their computers (this is rare) and see a preview of what the data in the file looks like. In step two, they can indicate if the file is delimited (as above) or fixed-width (where values are stored in columns of constant size specified within the file); which—if any—row contains the variable names; and the decimal symbol. Note that some forms of fixed-width files may not be supported. Third, they indicate which line of the file contains the first line of data, whether each line represents a case or a specific given number of variables represents a case, and how many cases to import. This last choice includes the option to import a random sample of cases. Fourth, users specify the delimiter and the text qualifier and determine how to handle leading and trailing spaces in string values. Fifth, users can double-check variable names and formats. Finally, before clicking the “Finish” button, users can choose to save their selections as a *.tpf file to be reused or to paste the syntax (to be discussed later in this chapter).

In all cases, once the importation options have been selected and OK or Finish has been clicked, the data is imported. An output window (see Figure 4) may open with various warnings and details about the importation process, and the Data View window (see Figure 5) will show the data, with variable names at the top of each column. At this point, be sure to save the dataset in a location and with a name you will be able to locate later.

Before users are done setting up their dataset, they must be sure that appropriate variable information is included. When datasets are imported from other statistical programs, they will typically come with variable information. But when they are imported from Excel or CSV files, the variable information must be manually entered, typically from a codebook or related document. Variable information is entered using Variable View. Users can switch between Data View and Variable View by clicking the tabs at the bottom of the screen or using the Ctrl+T key combination. As you can see in Figure 6, a screenshot of a completed dataset, Variable View shows each variable in a row, with a variety of information about that variable. When a dataset is imported, each of these pieces of information need to be entered by hand for each variable. To move between columns by key commands, use the tab key; to open variable information that requires a menu for entry, click the space bar twice.

A screenshot of variable view in SPSS. Details are provided in the text.

  • Name requires that each variable be given a short name, without any spaces. There are additional rules about names, but in short, names should be primarily alphanumeric in nature and cannot be words or use symbols that have meaning for the underlying computer processing. Names can be entered directly.
  • Type specifies the variable type. To open up the menu allowing the selection of variable types, click on the cell, then click on the three dots [.…] that appear on the right side of the cell. Users can then choose from among numeric, dollar, date, numeric with leading zeros, string, and other variable types.
  • Width specifies the number of characters of width for the variable itself in data storage, while decimals specifies how many decimal places the variable will have. These can both be entered or edited directly or in the dialog box for Type.

A screenshot of the value labels popup window showing values 1 through 7 and their labels, working full time, working part time, and so on. Tab moves users through the popup window.

more completely what the variable is measuring. It can be entered directly.

A screenshot of the missing values popup in SPSS. Alt+N selects no missing values. Alt+D selects discrete missing values, and then three blanks can be filled in with specific missing values. Alt+R selects range plus one optional discrete missing value. Within this option, Alt+L moves the cursor to the blank for the low end of the range, Alt+H to the blank for the high end of the range, and Alt+s moves the cursor to the blank for the single discrete missing value.

  • Missing provides for the indication that particular values—like “refused to answer”—should be treated by the SPSS software as missing data rather than as analytically useful categories. Clicking the three dots [.…] opens a dialog box for specifying missing values. When there are no missing values, “no missing values” should be selected. Otherwise, users can select “discrete missing values” and then enter three specific missing values—the numerical values, not the value labels—or they can elect “range plus one optional discrete missing value” to specific a range from low to high of missing values, optionally adding an additional single discrete value.
  • Columns specifies the width of the display column for the variable. It can be entered directly.
  • Align specifies whether the variable data will be aligned right, center, or left. Users can click in the cell to make a menu appear or can press spacebar twice and then use arrows to select the desired alignment.
  • Measure permits the indication of level of measurement from among nominal, ordinal, and scale variables. Users can click in the cell to make a menu appear or can press spacebar twice and then use arrows to select the desired level of measurement. Note that measure is often wrong in datasets and analysts should not rely on it in determining the level of measurement for selection of statistical tests; SPSS does not use this characteristic when running tests.
  • Some datasets will have additional criteria. For example, the dataset shown in Figure 6 has a column called origsort which displays the original sort order of the dataset, so that if an analyst sorts the variables they can be returned to their original order.

When entering variable information, it is especially important to include Name, Label, and Values and be sure Type is correct and any Missing values are specified. Other variable information is less crucial, though clearly it is better to fully specify all variable information. Once all variable information is entered and double-checked and the dataset has been saved, it is ready for use.

When a user first opens SPSS, they are greeted with the “Welcome Dialog” (see figure 9). This dialog provides tips, links to help resources, and options for creating a new file (by selecting “new dataset”) or opening recently used files. There is a checkbox for turning off the Welcome Dialog so that it will not be shown in the future.

Alt+D toggles the "don't show this dialog in the future option" on the Welcome Dialog; user using keyboard shortcuts will find it easier to disable and then navigate to the menus to open or create files.

When the Welcome Dialog is turned off, SPSS opens with a blank file. Going to File → Open → Data (Alt+F, O, D) brings up the dialog for opening a data file; the Open menu also provides for opening other types of files, which will be discussed below. Earlier in this chapter, the differences between Data View and Variable view were discussed; when you open a data file, be sure to observe which view you are using.

Alt+N moves the cursor to the Find box, where you can type the text you are searching for. Tab is needed to switch between find and replace. Clicking in variable view behind the dialog box and then using tab moves the focus from column to column in variable view: you will typically want to search either Name or Label. Alt+C toggles "Match case." Alt+H opens additional options, including match must be contained in the cell (Alt+O), match must be to the entire cell (Alt+L); cell begins with match (Alt+B); cell ends with match (Alt+W); search down (Alt+D); and search up (Alt+U). Alt+F clicks the "Find Next" button.

It can be useful to be able to search for a variable or case in the datafile. There are two main ways to do this, both under the Edit menu (Alt+E). [1] The Edit menu offers Find and Go To. Find, which can also be accessed by pressing Ctrl+F, allows users to search for all or part of a variable name. Figure 10 displays the Search dialog, with options shown after clicking on the “show options” button. (Users can also use the Replace function, but this carries the risk of writing over data and so should be avoided in almost all cases.) Be sure to select the column you wish to search—the Find function can only examine one column in Variable View at a time. Most typically, users will want to search variable names or labels. The checkbox for Match Case toggles whether or not case (in other words, capitalization) matters to the search. Expanding the options permits users to specify how much and which part of a cell must be matched as well as search order.

Users can also navigate to specific variables by using the Edit → Go to Case (to navigate to a specific case—or row in data view) and Edit → Go to Variable (to navigate to a specific variable—a row in variable view or a column in data view). Users can also access detailed variable information via the tool Utilities → Variables.

Another useful feature is the ability to sort variables and cases. Both types of sorting can be found in the data menu. Variables can be sorted by any of the characteristics in variable view; when sorting, the original sort order can be saved as a new characteristic. Cases can be sorted on any variable.

SPSS Options

The Options dialog can be reached by going to Edit → Options (or Alt+E, Alt+N). There are a wide variety of options available to help users customize their SPSS experience, a few of which are particularly important. First of all, using various dialogs and menus in the program is much easier if the options Variable List—Display Names (Alt+N) and Alphabetical (Alt+H) are selected under General. You can also change the display language for both the user interface and for output under Language, change fonts and colors for output under Viewer, set number options under Data; change currency options under Currency; set default output for graphs and charts under Charts; and set default file locations for saving files under File locations. While most of these options can be left on their default settings, it is really important for most users to set variables to display names and alphabetical before use. Options will be preserved if you use the same computer and user account, but if you are working on a public computer you should get in the habit of checking every time you start the program.

Getting More Out of SPSS

So far, we have been working only with Data View and Variable View in the main dataset window. But when researchers produce the results of an analysis, these results appear in a new window called Output—IBM SPSS Statistics Viewer. New Output windows can be opened from the File menu by going to Open → Output or from the Window menu by selecting “Go to Designated Viewer Window” (the later command also brings the output window to the foreground if one is already open). Output will be discussed in more detail when the results of different tests are discussed. For now, note that output can be saved in *.spv format, but this format can only be viewed in SPSS. To save output in a format viewable in other applications, go to File → Export, where you can choose a file location and a file format (like Word, PowerPoint, HTML, or PDF). Individual output items can also be copied and pasted.

SPSS also offers a Syntax viewer and editor, which can also be accessed from both the File and Window menus. While syntax is beyond the scope of this text, it provides the option for writing code (kind of like a computer program) to control SPSS rather than using menus and buttons in a graphical user interface. Experienced users, or those doing many similar repetitive tasks, often find working via syntax to be faster and more efficient, but the learning curve is quite steep. If you are interested in learning more about how to write syntax in SPSS, Help → Command Syntax Reference brings up a very long document detailing the commands available.

Finally, the Help menu in SPSS offers a variety of options for getting help in using the program, including links to web resource guides, PDF documentation, and help forums. These tools can also be reached directly via the SPSS website. In addition, many dialog boxes contain a “Help” button that takes users to webpages with more detail on the tool in question.

Go to https://www.baseball-reference.com/ and select 10 baseball players of your choice. In an Excel or other spreadsheet, enter the name, position, batting arm, throwing arm, weight in pounds, and height in inches, as well as, from the Summary: Career section, HR (home runs) and WAR (wins above replacement). Each player should get one row of the Excel spreadsheet. Once you have entered the data, import it into SPSS. Then use Variable View to enter the relevant information about each variable—including value labels for position, batting arm, and throwing arm. Sort your cases by home runs. Finally, save your file.

Media Attributions

  • import menu
  • import excel © IBM SPSS is licensed under a All Rights Reserved license
  • import csv © IBM SPSS is licensed under a All Rights Reserved license
  • output window © IBM SPSS is licensed under a All Rights Reserved license
  • spss data view © IBM SPSS is licensed under a All Rights Reserved license
  • variable-view © IBM SPSS is licensed under a All Rights Reserved license
  • value labels © IBM SPSS is licensed under a All Rights Reserved license
  • missing values © IBM SPSS is licensed under a All Rights Reserved license
  • welcome dialog © IBSM SPSS is licensed under a All Rights Reserved license
  • find and replace © IBM SPSS is licensed under a All Rights Reserved license
  • Note that "Search," another option under the Edit menu, does not search variables or cases but instead launches a search of SPSS web resources and help files. ↵

A data type that represents non-numerical data; string values can include any sequence of letters, numbers, and spaces.

The possible levels or response choices of a given variable.

Social Data Analysis Copyright © 2021 by Mikaila Mariel Lemonik Arthur is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

SPSS tutorials website header logo

SPSS Correlation Analysis Tutorial

Also see Pearson Correlations - Quick Introduction .

Correlation Test - What Is It?

Null hypothesis.

  • Assumptions
  • Correlation Test in SPSS

A (Pearson) correlation is a number between -1 and +1 that indicates to what extent 2 quantitative variables are linearly related. It's best understood by looking at some scatterplots .

SPSS Correlation Analysis Nice Scatterplot

  • a correlation of -1 indicates a perfect linear descending relation: higher scores on one variable imply lower scores on the other variable.
  • a correlation of 0 means there's no linear relation between 2 variables whatsoever. However, there may be a (strong) non-linear relation nevertheless.
  • a correlation of 1 indicates a perfect ascending linear relation: higher scores on one variable are associated with higher scores on the other variable.

A correlation test (usually) tests the null hypothesis that the population correlation is zero. Data often contain just a sample from a (much) larger population: I surveyed 100 customers (sample) but I'm really interested in all my 100,000 customers (population). Sample outcomes typically differ somewhat from population outcomes. So finding a non zero correlation in my sample does not prove that 2 variables are correlated in my entire population; if the population correlation is really zero, I may easily find a small correlation in my sample. However, finding a strong correlation in this case is very unlikely and suggests that my population correlation wasn't zero after all.

Correlation Test - Assumptions

Computing and interpreting correlation coefficients themselves does not require any assumptions. However, the statistical significance -test for correlations assumes

  • independent observations;
  • normality: our 2 variables must follow a bivariate normal distribution in our population. This assumption is not needed for sample sizes of N = 25 or more. For reasonable sample sizes, the central limit theorem ensures that the sampling distribution will be normal.

SPSS - Quick Data Check

Let's run some correlation tests in SPSS now. We'll use adolescents.sav , a data file which holds psychological test data on 128 children between 12 and 14 years old. Part of its variable view is shown below.

SPSS Adolescents Data Variable View

Now, before running any correlations, let's first make sure our data are plausible in the first place. Since all 5 variables are metric, we'll quickly inspect their histograms by running the syntax below.

Histogram Output

Our histograms tell us a lot: our variables have between 5 and 10 missing values . Their means are close to 100 with standard deviations around 15 -which is good because that's how these tests have been calibrated. One thing bothers me , though, and it's shown below.

SPSS Correlation Analysis Histogram With Outlier

It seems like somebody scored zero on some tests -which is not plausible at all. If we ignore this, our correlations will be severely biased . Let's sort our cases, see what's going on and set some missing values before proceeding.

SPSS Correlation Outlier In Data View

If we now rerun our histograms, we'll see that all distributions look plausible. Only now should we proceed to running the actual correlations.

Running a Correlation Test in SPSS

SPSS Menu Arrow

Move all relevant variables into the variables box. You probably don't want to change anything else here.

SPSS Correlations Dialog

Clicking P aste results in the syntax below. Let's run it.

SPSS CORRELATIONS Syntax

Correlation output.

SPSS Correlation Test Output

By default, SPSS always creates a full correlation matrix. Each correlation appears twice: above and below the main diagonal. The correlations on the main diagonal are the correlations between each variable and itself -which is why they are all 1 and not interesting at all. The 10 correlations below the diagonal are what we need. As a rule of thumb, a correlation is statistically significant if its “Sig. (2-tailed)” < 0.05. Now let's take a close look at our results: the strongest correlation is between depression and overall well-being : r = -0.801. It's based on N = 117 children and its 2-tailed significance , p = 0.000. This means there's a 0.000 probability of finding this sample correlation -or a larger one- if the actual population correlation is zero. Note that IQ does not correlate with anything . Its strongest correlation is 0.152 with anxiety but p = 0.11 so it's not statistically significantly different from zero. That is, there's an 0.11 chance of finding it if the population correlation is zero. This correlation is too small to reject the null hypothesis. Like so, our 10 correlations indicate to which extent each pair of variables are linearly related. Finally, note that each correlation is computed on a slightly different N -ranging from 111 to 117. This is because SPSS uses pairwise deletion of missing values by default for correlations.

Scatterplots

Strictly, we should inspect all scatterplots among our variables as well. After all, variables that don't correlate could still be related in some non-linear fashion. But for more than 5 or 6 variables, the number of possible scatterplots explodes so we often skip inspecting them. However, see SPSS - Create All Scatterplots Tool . The syntax below creates just one scatterplot, just to get an idea of what our relation looks like. The result doesn't show anything unexpected, though.

SPSS Correlation Analysis Nice Scatterplot

Reporting a Correlation Test

The figure below shows the most basic format recommended by the APA for reporting correlations. Importantly, make sure the table indicates which correlations are statistically significant at p < 0.05 and perhaps p < 0.01. Also see SPSS Correlations in APA Format .

Correlation Table in APA format

If possible, report the confidence intervals for your correlations as well. Oddly, SPSS doesn't include those. However, see SPSS Confidence Intervals for Correlations Tool .

Thanks for reading!

Tell us what you think!

This tutorial has 56 comments:.

hypothesis testing using spss

By thavorn on December 20th, 2022

I love to learn again to refresh my statistic knowledge, thank you.

Privacy Overview

SSRIC

STAT7S: Exercise Using SPSS to Explore Hypothesis Testing – Paired-Samples t Test

Author:   Ed Nelson Department of Sociology M/S SS97 California State University, Fresno Fresno, CA 93740 Email:  [email protected]

Note to the Instructor: The data set used in this exercise is gss14_subset_for_classes_STATISTICS.sav which is a subset of the 2014 General Social Survey. Some of the variables in the GSS have been recoded to make them easier to use and some new variables have been created.  The data have been weighted according to the instructions from the National Opinion Research Center.  This exercise uses COMPARE MEANS (paired-samples t test) to explore hypothesis testing.  A good reference on using SPSS is SPSS for Windows Version 23.0 A Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson.  The online version of the book is on the Social Science Research and Instructional Council's Website .  You have permission to use this exercise and to revise it to fit your needs.  Please send a copy of any revision to the author. Included with this exercise (as separate files) are more detailed notes to the instructors, the SPSS syntax necessary to carry out the exercise (SPSS syntax file), and the SPSS output for the exercise (SPSS output file). Please contact the author for additional information.

I’m attaching the following files.

  • Data subset  (.sav format)
  • Extended notes for instructors  (MS Word; docx format).
  • Syntax file  (.sps format)
  • Output file  (.spv format)
  • This page  (MS Word; docx format).

Goals of Exercise

The goal of this exercise is to explore hypothesis testing and the paired-samples t test. The exercise also gives you practice in using COMPARE MEANS.

Part I – Populations and Samples

Populations are the complete set of objects that we want to study.  For example, a population might be all the individuals that live in the United States at a particular point in time.  The U.S. does a complete enumeration of all individuals living in the United States every ten years (i.e., each year ending in a zero).  We call this a census.  Another example of a population is all the students in a particular school or all college students in your state.  Populations are often large and it’s too costly and time consuming to carry out a complete enumeration.  So what we do is to select a sample from the population where a sample is a subset of the population and then use the sample data to make an inference about the population.

A statistic describes a characteristic of a sample while a parameter describes a characteristic of a population.  The mean age of a sample is a statistic while the mean age of the population is a parameter.   We use statistics to make inferences about parameters.  In other words, we use the mean age of the sample to make an inference about the mean age of the population.  Notice that the mean age of the sample (our statistic) is known while the mean age of the population (our parameter) is usually unknown.

There are many different ways to select samples.  Probability samples are samples in which every object in the population has a known, non-zero, chance of being in the sample (i.e., the probability of selection).  This isn’t the case for non-probability samples.  An example of a non-probability sample is an instant poll which you hear about on radio and television shows.  A show might invite you to go to a website and answer a question such as whether you favor or oppose same-sex marriage.  This is a purely volunteer sample and we have no idea of the probability of selection.

We’re going to use the General Social Survey (GSS) for this exercise.  The GSS is a national probability sample of adults in the United States conducted by the National Opinion Research Center (NORC).  The GSS started in 1972 and has been an annual or biannual survey ever since. For this exercise we’re going to use a subset of the 2014 GSS. Your instructor will tell you how to access this data set which is called gss14_subset_for_classes_STATISTICS.sav.

In STAT6S we compared means from two independent samples.  Independent samples are samples in which the composition of one sample does not influence the composition of the other sample.  In this exercise we’re using the 2014 GSS which is a sample of adults in the United States.  If we divide this sample into men and women we would have a sample of men and a sample of women and they would be independent samples.  The individuals in one of the samples would not influence who is in the other sample.

In this exercise we’re going to compare means from two dependent samples.  Dependent samples are samples in which the composition of one sample influences the composition of the other sample.  The 2014 GSS includes questions about the years of school completed by the respondent’s parents – d22_maeduc and d24_paeduc.  Let’s assume that we think that respondent’s fathers have more education than respondent’s mothers.  We would compare the mean years of school completed by mothers with the mean years of school completed by fathers.  If the respondent’s mother is in one sample, then the respondent’s father must be in the other sample.  The composition of the samples is therefore dependent on each other.  SPSS calls these paired-samples so we’ll use that term from now on.

Let’s start by asking whether fathers or mothers have more years of school?  Click on “Analyze” in the menu bar and then on “Compare Means” and finally on “Means.”  (See Chapter 6, introduction in the online SPSS book mentioned on page 1.)  Select the variables d22_maeduc and d24_paeduc and move them to the “Dependent List” box.  These are the variables for which you are going to compute means.  The output from SPSS will show you the mean, number of cases, and standard deviation for fathers and mothers.

Fathers have about two-tenths of a year more education than mothers.  Why can’t we just conclude that fathers have more education than mothers?  If we were just describing the sample , we could.  But what we want to do is to make inferences about differences between fathers and mothers in the population .  We have a sample of fathers and a sample of mothers and some amount of sampling error will always be present in both samples.  The larger the sample, the less the sampling error and the smaller the sample, the more the sampling error.  Because of this sampling error we need to make use of hypothesis testing as we did in the two previous exercises (STAT5S and STAT6S).

Part II – Now it’s Your Turn

In this part of the exercise you want to compare the years of school completed by respondents and their spouses to determine whether men have more education than their spouses or whether women have more education than their spouses.

Use SPSS to get the sample means as we did in Part I and then compare them to begin answering this question.  But we need to be careful here.  Respondents could be either male or female.  We need to separate respondents into two groups – men and women – and then separately compare male respondents with their spouses and female respondents with their spouses.  We can do this by putting the variables d4_educ and d29_speduc into the “Dependent List” box and d5_sex into the “Independent List” box.

Part III – Hypothesis Testing – Paired-Samples t Test

In Part I we compared the mean years of school completed by fathers and mothers.  Now we want to determine if this difference is statistically significant by carrying out the paired-samples t test.

Click on “Analyze” and then on “Compare Means” and finally on “Paired-Samples T Test.”  (See Chapter 6, paired-samples t test in the online SPSS book.)  Move the two variables listed above into the “Paired Variables” box.  Do this by selecting d22_maeduc and click on the arrow to move it into the “Variable 1” box.  Then select the other variable, d24_paeduc, and click on the arrow to move it into the “Variable 2” box.  Now click on “OK” and SPSS will carry out the paired-samples t test.  It doesn’t matter which variable you put in the “Variable 1” and “Variable 2” boxes.

You should see three boxes in the output screen. The first box gives you four pieces of information.

  • Means for mothers and fathers.
  • N which is the number of mothers and fathers on which the t test is based.  This includes only those cases with valid information.  In other words, cases with missing information (e.g., don’t know, no answer) are excluded.
  • Standard deviations for mothers and fathers.
  • Standard error of the mean for mothers and fathers which is an estimate of the amount of sampling error for the two samples.

The second box gives you the paired sample correlation which is the correlation between mother’s and father’s years of school completed for the paired samples.  If you haven’t discussed correlation yet don’t worry about what this means.

The third box has more information in it.  With paired samples what we do is subtract the years of school completed for one parent in each pair from the years of school completed for the other parent in the same pair.  Since we put mother’s years of school completed in variable 1 and father’s education in variable 2 SPSS will subtract father’s education from mother’s education.  So if the father completed 12 years and the mother completed 10 years we would subtract 12 from 10 which would give you -2.  For this pair the father completed two more years than the mother.

The third box gives you the following information.

  • The mean difference score for all the pairs in the sample which is -0.176.  This means that fathers had an average of almost two-tenths of a year more education than the mothers.  By the way, in Part I when we compared the means for d22_maeduc and d24_paeduc the difference was 0.22.  Here the mean difference score is .176.  Why aren’t they the same?  See if you can figure this out.  (Hint: it has something to do with comparing differences for pairs.)
  • The standard deviation of the difference scores for all these pairs which is 3.206.
  • The standard error of the mean which is an estimate of the amount of sampling error.
  • The 95% confidence interval for the mean difference score.  If you haven’t talked about confidence intervals yet, just ignore this.  We’ll talk about confidence intervals in a later exercise.
  • The value of t for the paired-sample t test which is -2.324.  There is a formula for computing t which your instructor may or may not want to cover in your course.
  • The degrees of freedom for the t test which is 1,795 which is the number of pairs minus one or 1,796 – 1 or 1,795.  In other words, 1,795 of the difference scores are free to vary.  Once these difference scores are fixed, then the final difference score is fixed or determined.
  • The two-tailed significance value which is .020 which we’ll cover next.

Notice how we are going about this.  We have a sample of adults in the United States (i.e., the 2014 GSS).  We calculate the mean years of school completed by respondent’s fathers and mothers in the sample who answered the question.  But we want to test the hypothesis that the mean years of school completed by fathers is greater than the mean for mothers in the population .  We’re going to use our sample data to test a hypothesis about the population.

The hypothesis we want to test is that the mean years of school completed by fathers is greater than the mean years of school completed by mothers in the population.  We’ll call this our research hypothesis.  It’s what we expect to be true.  But there is no way to prove the research hypothesis directly.  So we’re going to use a method of indirect proof.  We’re going to set up another hypothesis that says that the research hypothesis is not true and call this the null hypothesis.  If we can’t reject the null hypothesis then we don’t have any evidence in support of the research hypothesis.  You can see why this is called a method of indirect proof. We can’t prove the research hypothesis directly but if we can reject the null hypothesis then we have indirect evidence that supports the research hypothesis. We haven’t proven the research hypothesis, but we have support for this hypothesis.

Here are our two hypotheses.

·       research hypothesis – the mean difference score in the population is negative.  In other words, the mean years of school completed by fathers is greater than the mean years for mothers for all pairs in the population. 

·       null hypothesis – the mean difference score for all pairs in the population is equal to 0. 

It’s the null hypothesis that we are going to test.

Now all we have to do is figure out how to use the t test to decide whether to reject or not reject the null hypothesis.  Look again at the significance value which is 0.020.  That tells you that the probability of being wrong if you rejected the null hypothesis is. 02 or 2 times out of one hundred.  With odds like that, of course, we’re going to reject the null hypothesis.  A common rule is to reject the null hypothesis if the significance value is less than .05 or less than five out of one hundred.

But wait a minute.  The SPSS output said this was a two-tailed significance value. What does that mean?  Look back at the research hypothesis which was that the mean difference score for all pairs in the population was less than 0.   We’re predicting that the mean difference score for all pairs in the population will be negative.  That’s called a one-tailed test and we have to use a one-tailed significance value.  It’s easy to get the one-tailed significance value if we know the two-tailed significance value.  If the two-tailed significance value is .020 then the one-tailed significance value is half that or .020 divided by two or .010.  We still reject the null hypothesis which means that we have evidence to support our research hypothesis. We haven’t proven the research hypothesis to be true but we have evidence to support it.

Part IV – Now it’s Your Turn Again

In this part of the exercise you want to compare the years of school completed by respondents and their spouses to determine if women have more education than their spouses but this time you want to test the appropriate null hypotheses.

Remember from Part II that we have to test this hypothesis first for men and then for women.  We’re going to do this by selecting out all the men and then computing the paired-samples t test.   Do this by clicking on “Data” in the menu bar and then clicking on “Select Cases.”  Select “If condition is satisfied” and then click on “If” in the box below.  Select d5_sex and move it to the box on the right by clicking on the arrow pointing to the right.  Now click on the equals sign and then on 1 so the expression in the box reads “d5_sex = 1”.  Click on “Continue” and then on “OK”.  To make sure you have selected out the males run a frequency distribution for d5_sex.  You should only see the males (i.e., value 1).  Now carry out the paired-samples t test. Repeat this for the females (i.e., value 2) by selecting out the females and then running the paired-samples t test again.

For each paired-sample t test, state the research and the null hypotheses.  Do you reject or not reject the null hypotheses?  Explain why.

Document Viewers

  • Free PDF Viewer
  • Free Word Viewer
  • Free Excel Viewer
  • Free PowerPoint Viewer

Creative Commons License

  • Flashes Safe Seven
  • FlashLine Login
  • Faculty & Staff Phone Directory
  • Emeriti or Retiree
  • All Departments
  • Maps & Directions

Kent State University Home

  • Building Guide
  • Departments
  • Directions & Parking
  • Faculty & Staff
  • Give to University Libraries
  • Library Instructional Spaces
  • Mission & Vision
  • Newsletters
  • Circulation
  • Course Reserves / Core Textbooks
  • Equipment for Checkout
  • Interlibrary Loan
  • Library Instruction
  • Library Tutorials
  • My Library Account
  • Open Access Kent State
  • Research Support Services
  • Statistical Consulting
  • Student Multimedia Studio
  • Citation Tools
  • Databases A-to-Z
  • Databases By Subject
  • Digital Collections
  • Discovery@Kent State
  • Government Information
  • Journal Finder
  • Library Guides
  • Connect from Off-Campus
  • Library Workshops
  • Subject Librarians Directory
  • Suggestions/Feedback
  • Writing Commons
  • Academic Integrity
  • Jobs for Students
  • International Students
  • Meet with a Librarian
  • Study Spaces
  • University Libraries Student Scholarship
  • Affordable Course Materials
  • Copyright Services
  • Selection Manager
  • Suggest a Purchase

Library Locations at the Kent Campus

  • Architecture Library
  • Fashion Library
  • Map Library
  • Performing Arts Library
  • Special Collections and Archives

Regional Campus Libraries

  • East Liverpool
  • College of Podiatric Medicine

hypothesis testing using spss

  • Kent State University
  • SPSS Tutorials

Pearson Correlation

Spss tutorials: pearson correlation.

  • The SPSS Environment
  • The Data View Window
  • Using SPSS Syntax
  • Data Creation in SPSS
  • Importing Data into SPSS
  • Variable Types
  • Date-Time Variables in SPSS
  • Defining Variables
  • Creating a Codebook
  • Computing Variables
  • Recoding Variables
  • Recoding String Variables (Automatic Recode)
  • Weighting Cases
  • rank transform converts a set of data values by ordering them from smallest to largest, and then assigning a rank to each value. In SPSS, the Rank Cases procedure can be used to compute the rank transform of a variable." href="https://libguides.library.kent.edu/SPSS/RankCases" style="" >Rank Cases
  • Sorting Data
  • Grouping Data
  • Descriptive Stats for One Numeric Variable (Explore)
  • Descriptive Stats for One Numeric Variable (Frequencies)
  • Descriptive Stats for Many Numeric Variables (Descriptives)
  • Descriptive Stats by Group (Compare Means)
  • Frequency Tables
  • Working with "Check All That Apply" Survey Data (Multiple Response Sets)
  • Chi-Square Test of Independence
  • One Sample t Test
  • Paired Samples t Test
  • Independent Samples t Test
  • One-Way ANOVA
  • How to Cite the Tutorials

Sample Data Files

Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:

  • Data definitions (*.pdf)
  • Data - Comma delimited (*.csv)
  • Data - Tab delimited (*.txt)
  • Data - Excel format (*.xlsx)
  • Data - SAS format (*.sas7bdat)
  • Data - SPSS format (*.sav)
  • SPSS Syntax (*.sps) Syntax to add variable labels, value labels, set variable types, and compute several recoded variables used in later tutorials.
  • SAS Syntax (*.sas) Syntax to read the CSV-format sample data and set variable labels and formats/value labels.

The bivariate Pearson Correlation produces a sample correlation coefficient, r , which measures the strength and direction of linear relationships between pairs of continuous variables. By extension, the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among the same pairs of variables in the population, represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a parametric measure.

This measure is also known as:

  • Pearson’s correlation
  • Pearson product-moment correlation (PPMC)

Common Uses

The bivariate Pearson Correlation is commonly used to measure the following:

  • Correlations among pairs of variables
  • Correlations within and between sets of variables

The bivariate Pearson correlation indicates the following:

  • Whether a statistically significant linear relationship exists between two continuous variables
  • The strength of a linear relationship (i.e., how close the relationship is to being a perfectly straight line)
  • The direction of a linear relationship (increasing or decreasing)

Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among categorical variables. If you wish to understand relationships that involve categorical variables and/or non-linear relationships, you will need to choose another measure of association.

Note: The bivariate Pearson Correlation only reveals associations among continuous variables. The bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the correlation coefficient is.

Data Requirements

To use Pearson correlation, your data must meet the following requirements:

  • Two or more continuous variables (i.e., interval or ratio level)
  • Cases must have non-missing values on both variables
  • Linear relationship between the variables
  • the values for all variables across cases are unrelated
  • for any case, the value for any variable cannot influence the value of any variable for other cases
  • no case can influence another case on any variable
  • The biviariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
  • Each pair of variables is bivariately normally distributed
  • Each pair of variables is bivariately normally distributed at all levels of the other variable(s)
  • This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data.
  • Random sample of data from the population
  • No outliers

The null hypothesis ( H 0 ) and alternative hypothesis ( H 1 ) of the significance test for correlation can be expressed in the following ways, depending on whether a one-tailed or two-tailed test is requested:

Two-tailed significance test:

H 0 : ρ  = 0 ("the population correlation coefficient is 0; there is no association") H 1 : ρ ≠ 0 ("the population correlation coefficient is not 0; a nonzero correlation could exist")

One-tailed significance test:

H 0 : ρ  = 0 ("the population correlation coefficient is 0; there is no association") H 1 : ρ   > 0 ("the population correlation coefficient is greater than 0; a positive correlation could exist")      OR H 1 : ρ   < 0 ("the population correlation coefficient is less than 0; a negative correlation could exist")

where ρ is the population correlation coefficient.

Test Statistic

The sample correlation coefficient between two variables x and y is denoted r or r xy , and can be computed as: $$ r_{xy} = \frac{\mathrm{cov}(x,y)}{\sqrt{\mathrm{var}(x)} \dot{} \sqrt{\mathrm{var}(y)}} $$

where cov( x , y ) is the sample covariance of x and y ; var( x ) is the sample variance of x ; and var( y ) is the sample variance of y .

Correlation can take on any value in the range [-1, 1]. The sign of the correlation coefficient indicates the direction of the relationship, while the magnitude of the correlation (how close it is to -1 or +1) indicates the strength of the relationship.

  •  -1 : perfectly negative linear relationship
  •   0 : no relationship
  • +1  : perfectly positive linear relationship

The strength can be assessed by these general guidelines [1] (which may vary by discipline):

  • .1 < | r | < .3 … small / weak correlation
  • .3 < | r | < .5 … medium / moderate correlation
  • .5 < | r | ……… large / strong correlation

Note: The direction and strength of a correlation are two distinct properties. The scatterplots below [2] show correlations that are r = +0.90, r = 0.00, and r = -0.90, respectively. The strength of the nonzero correlations are the same: 0.90. But the direction of the correlations is different: a negative correlation corresponds to a decreasing relationship, while and a positive correlation corresponds to an increasing relationship. 

Scatterplot of data with correlation r = -0.90

Note that the r = 0.00 correlation has no discernable increasing or decreasing linear pattern in this particular graph. However, keep in mind that Pearson correlation is only capable of detecting linear associations, so it is possible to have a pair of variables with a strong nonlinear relationship and a small Pearson correlation coefficient. It is good practice to create scatterplots of your variables to corroborate your correlation coefficients.

[1]  Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

[2]  Scatterplots created in R using ggplot2 , ggthemes::theme_tufte() , and MASS::mvrnorm() .

Data Set-Up

Your dataset should include two or more continuous numeric variables, each defined as scale, which will be used in the analysis.

Each row in the dataset should represent one unique subject, person, or unit. All of the measurements taken on that person or unit should appear in that row. If measurements for one subject appear on multiple rows -- for example, if you have measurements from different time points on separate rows -- you should reshape your data to "wide" format before you compute the correlations.

Run a Bivariate Pearson Correlation

To run a bivariate Pearson Correlation in SPSS, click  Analyze > Correlate > Bivariate .

hypothesis testing using spss

The Bivariate Correlations window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side. To select variables for the analysis, select the variables in the list on the left and click the blue arrow button to move them to the right, in the Variables field.

hypothesis testing using spss

A Variables : The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pair of variables in this list.

B Correlation Coefficients: There are multiple types of correlation coefficients. By default, Pearson is selected. Selecting Pearson will produce the test statistics for a bivariate Pearson Correlation.

C Test of Significance:  Click Two-tailed or One-tailed , depending on your desired significance test. SPSS uses a two-tailed test by default.

D Flag significant correlations: Checking this option will include asterisks (**) next to statistically significant correlations in the output. By default, SPSS marks statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the alpha = 0.001 level (which is treated as alpha = 0.01)

E Options : Clicking Options will open a window where you can specify which Statistics to include (i.e., Means and standard deviations , Cross-product deviations and covariances ) and how to address Missing Values (i.e., Exclude cases pairwise or Exclude cases listwise ). Note that the pairwise/listwise setting does not affect your computations if you are only entering two variable, but can make a very large difference if you are entering three or more variables into the correlation procedure.

hypothesis testing using spss

Example: Understanding the linear association between weight and height

Problem statement.

Perhaps you would like to test whether there is a statistically significant linear relationship between two continuous variables, weight and height (and by extension, infer whether the association is significant in the population). You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between height and weight, and to determine the strength and direction of the association.

Before the Test

In the sample data, we will use two variables: “Height” and “Weight.” The variable “Height” is a continuous measure of height in inches and exhibits a range of values from 55.00 to 84.41 ( Analyze > Descriptive Statistics > Descriptives ). The variable “Weight” is a continuous measure of weight in pounds and exhibits a range of values from 101.71 to 350.07.

Before we look at the Pearson correlations, we should look at the scatterplots of our variables to get an idea of what to expect. In particular, we need to determine if it's reasonable to assume that our variables have linear relationships. Click Graphs > Legacy Dialogs > Scatter/Dot . In the Scatter/Dot window, click Simple Scatter , then click Define . Move variable Height to the X Axis box, and move variable Weight to the Y Axis box. When finished, click OK .

Scatterplot of height and weight with a linear fit line added. Height and weight appear to be reasonably linearly related, albeit with some unusually outlying points.

To add a linear fit like the one depicted, double-click on the plot in the Output Viewer to open the Chart Editor. Click Elements > Fit Line at Total . In the Properties window, make sure the Fit Method is set to Linear , then click Apply . (Notice that adding the linear regression trend line will also add the R-squared value in the margin of the plot. If we take the square root of this number, it should match the value of the Pearson correlation we obtain.)

From the scatterplot, we can see that as height increases, weight also tends to increase. There does appear to be some linear relationship.

Running the Test

To run the bivariate Pearson Correlation, click  Analyze > Correlate > Bivariate . Select the variables Height and Weight and move them to the Variables box. In the Correlation Coefficients area, select Pearson . In the Test of Significance area, select your desired significance test, two-tailed or one-tailed. We will select a two-tailed significance test in this example. Check the box next to Flag significant correlations .

Click OK to run the bivariate Pearson Correlation. Output for the analysis will display in the Output Viewer.

The results will display the correlations in a table, labeled Correlations .

Table of Pearson Correlation output. Height and weight have a significant positive correlation (r=0.513, p < 0.001).

A Correlation of Height with itself (r=1), and the number of nonmissing observations for height (n=408).

B Correlation of height and weight (r=0.513), based on n=354 observations with pairwise nonmissing values.

C Correlation of height and weight (r=0.513), based on n=354 observations with pairwise nonmissing values.

D Correlation of weight with itself (r=1), and the number of nonmissing observations for weight (n=376).

The important cells we want to look at are either B or C. (Cells B and C are identical, because they include information about the same pair of variables.) Cells B and C contain the correlation coefficient for the correlation between height and weight, its p-value, and the number of complete pairwise observations that the calculation was based on.

The correlations in the main diagonal (cells A and D) are all equal to 1. This is because a variable is always perfectly correlated with itself. Notice, however, that the sample sizes are different in cell A ( n =408) versus cell D ( n =376). This is because of missing data -- there are more missing observations for variable Weight than there are for variable Height.

If you have opted to flag significant correlations, SPSS will mark a 0.05 significance level with one asterisk (*) and a 0.01 significance level with two asterisks (0.01). In cell B (repeated in cell C), we can see that the Pearson correlation coefficient for height and weight is .513, which is significant ( p < .001 for a two-tailed test), based on 354 complete observations (i.e., cases with nonmissing values for both height and weight).

Decision and Conclusions

Based on the results, we can state the following:

  • Weight and height have a statistically significant linear relationship ( r =.513, p < .001).
  • The direction of the relationship is positive (i.e., height and weight are positively correlated), meaning that these variables tend to increase together (i.e., greater height is associated with greater weight).
  • The magnitude, or strength, of the association is approximately moderate (.3 < | r | < .5).
  • << Previous: Chi-Square Test of Independence
  • Next: One Sample t Test >>
  • Last Updated: Dec 18, 2023 12:59 PM
  • URL: https://libguides.library.kent.edu/SPSS

Street Address

Mailing address, quick links.

  • How Are We Doing?
  • Student Jobs

Information

  • Accessibility
  • Emergency Information
  • For Our Alumni
  • For the Media
  • Jobs & Employment
  • Life at KSU
  • Privacy Statement
  • Technology Support
  • Website Feedback

IMAGES

  1. hypothesis test 1 categorical variable on spss

    hypothesis testing using spss

  2. Hypothesis test in SPSS

    hypothesis testing using spss

  3. SPSS Updated Tutorial: One Sample Hypothesis Testing

    hypothesis testing using spss

  4. Testing the Null Hypothesis with ANOVA in SPSS

    hypothesis testing using spss

  5. hypothesis testing SPSS

    hypothesis testing using spss

  6. One Sample T Test in SPSS

    hypothesis testing using spss

VIDEO

  1. ONE SAMPLE HYPOTHESIS TESTING USING SPSS

  2. TWO SAMPLE HYPOTHESIS TESTING IN SPSS

  3. Using SPSS for Bivariate Hypothesis Testing with Linear Regression

  4. Testing of Hypothesis : Normality Test with SPSS By Dr. Ayan Majumdar

  5. SPSS Hypothesis Testing 2

  6. 02. SPSS Classroom

COMMENTS

  1. SPSS Tutorial: General Statistics and Hypothesis Testing

    To perform a one sample t-test click "Analyze"→"Compare Means"→"One Sample T-Test" and the following dialog box will appear: The dialogue allows selection of any scale variable from the box at the left and a test value that represents a hypothetical mean. Select the test variable and set the test value, then press "Ok."

  2. 8.7: Running the Hypothesis Test in Practice

    In SPSS we can load that file and then run the following procedure: Once in the appropriate dialog, you would select Success as your variable (it's the only one) and then select Exact Binomial as the test type: Click on OK and you'll get something like this: Right now, this output looks pretty unfamiliar to you, but you can see that it's ...

  3. Hypothesis Testing and Confidence Intervals with SPSS

    This video shows how to use SPSS to construct a confidence interval and hypothesis test for one quantitative variable (one sample t-test).

  4. SPSS Hypothesis test (One-sample T Test)

    About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features NFL Sunday Ticket Press Copyright ...

  5. An overview of statistical tests in SPSS

    NOTE: The output below was produced using SPSS version 15. The commands should work with earlier versions of SPSS (back to version 7.5). ... The p-value is the two-tailed p-value for the hypothesis test that the correlation is 0. By looking at the sample sizes, we can see how the correlations command handles the missing values.

  6. PDF SPSS & Hypothesis Testing

    SPSS & Hypothesis Testing. Hypothesis testing is a decision making process for evaluating claims about a population. Q1: Give at least three examples of a population. Q2: For the population listed in 1), give an example of a sample from the population. Q3: Can you make up some hypothesis about the population in 1).

  7. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  8. What statistical analysis should I use? Statistical analyses using SPSS

    Two independent samples t-test. An independent samples t-test is used when you want to compare the means of a normally distributed interval dependent variable for two independent groups. For example, using the hsb2 data file, say we wish to test whether the mean for write is the same for males and females. t-test groups = female(0 1) .

  9. Hypothesis Testing

    SPSS is good at one thing: computing a statistical index. The chapter describes following six steps when students apply inferential statistics through hypothesis testing. These include: set alpha, state students' hypotheses, collect data, find students' critical value, obtain students' observed value, and make students' statistical ...

  10. Pearson's Hypothesis Test in SPSS

    This video trains you on how use Pearson's Hypothesis Test in SPSSFor complete training on SPSS, check the playlist here:https://www.youtube.com/playlist?lis...

  11. 8.6: Reporting the Results of a Hypothesis Test

    When reporting your results, you indicate which (if any) of these significance levels allow you to reject the null hypothesis. This is summarized in Table 8.1. This allows us to soften the decision rule a little bit, since p <.01 implies that the data meet a stronger evidentiary standard than p <.05 would. Nevertheless, since these levels are ...

  12. PDF Data Analysis using SPSS

    Starting in SPSS: Looking at data. Seeing what data looks like is the first step to data analysis. Open up Studentss in SPSS. choose the File menu and select Open-> Data (will need to search for wherever you downloaded the sample files) Take a look at the data and answer the following questions.

  13. 12.5: Hypothesis Tests for Regression Models

    Formally, our "null model" corresponds to the fairly trivial "regression" model in which we include 0 predictors, and only include the intercept term b 0. H 0 :Y i =b 0 +ϵ i. If our regression model has K predictors, the "alternative model" is described using the usual formula for a multiple regression model: H1: Yi = ( ∑K k ...

  14. Independent t-test using SPSS Statistics

    Independent t-test using SPSS Statistics Introduction. The independent-samples t-test (or independent t-test, for short) compares the means between two unrelated groups on the same continuous, dependent variable. For example, you could use an independent t-test to understand whether first year graduate salaries differed based on gender (i.e ...

  15. One-Sample T-Test using SPSS Statistics

    One-Sample T-Test using SPSS Statistics Introduction. The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. This population mean is not always known, but is sometimes hypothesized. ... Indicates the probability of obtaining the observed t-value if the null hypothesis is correct. Table 4.1: ...

  16. One-Tailed and Two-Tailed Hypothesis Tests Explained

    One-tailed hypothesis tests are also known as directional and one-sided tests because you can test for effects in only one direction. When you perform a one-tailed test, the entire significance level percentage goes into the extreme end of one tail of the distribution. In the examples below, I use an alpha of 5%.

  17. SPSS

    Lecturer: Dr. Erin M. BuchananMissouri State University Spring 2015This video covers simple and multiple linear regression and how to work a 6 step hypothesi...

  18. Hypothesis test in SPSS

    The significance for this test SPSS calculated as 0.608 which is quite high and consequently the recommendation is to retain the null hypothesis (as the significance level is 0.05), which in this case means that the proportions male and female are about equal. Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer ...

  19. 10 Quantitative Analysis with SPSS: Getting Started

    Figure 2. The Import Data Window for an Excel File. If you import a file in Excel, CSV (comma-separated values) or text format, SPSS will open an import wizard with a number of steps. The steps vary slightly depending on which file type you are importing. For instance, to import an Excel file, as shown in Figure 2, you first need to specify the ...

  20. SPSS Correlation Analyis

    Null Hypothesis. A correlation test (usually) tests the null hypothesis that the population correlation is zero. Data often contain just a sample from a (much) larger population: I surveyed 100 customers (sample) but I'm really interested in all my 100,000 customers (population). Sample outcomes typically differ somewhat from population outcomes.

  21. STAT7S: Exercise Using SPSS to Explore Hypothesis Testing

    This exercise uses COMPARE MEANS (paired-samples t test) to explore hypothesis testing. A good reference on using SPSS is SPSS for Windows Version 23.0 A Basic Tutorial by Linda Fiddler, John Korey, Edward Nelson (Editor), and Elizabeth Nelson.

  22. Two-Sample Independent Hypothesis Testing using SPSS

    Showing how to perform two-sample independent hypothesis testing with examples using SPSS.Please watch the updated version. https://youtu.be/D0JZcCymoW0

  23. SPSS Tutorials: Pearson Correlation

    The null hypothesis (H 0) and alternative hypothesis (H 1) ... C Test of Significance: Click Two-tailed or One-tailed, depending on your desired significance test. SPSS uses a two-tailed test by default. D Flag significant correlations: Checking this option will include asterisks (**) next to statistically significant correlations in the output ...