Correlation Calculator

Input your values with a space or comma between in the table below

Critical Value

Results shown here

Sample size, n

Sample correlation coefficient, r, standardized sample score.

Correlation Coefficient Significance Calculator using p-value

Instructions: Use this Correlation Coefficient Significance Calculator to enter the sample correlation \(r\), sample size \(n\) and the significance level \(\alpha\), and the solver will test whether or not the correlation coefficient is significantly different from zero using the critical correlation approach.

hypothesis testing for correlation calculator

More About Significance of the Correlation Coefficient Significance Calculator

The sample correlation \(r\) is a statistic that estimates the population correlation, \(\rho\). On typical statistical test consists of assessing whether or not the correlation coefficient is significantly different from zero.

There are least two methods to assess the significance of the sample correlation coefficient: One of them is based on the critical correlation. Such approach is based upon on the idea that if the sample correlation \(r\) is large enough, then the population correlation \(\rho\) is different from zero.

In order to assess whether or not the sample correlation is significantly different from zero, the following t-statistic is obtained

So, this is the formula for the t test for correlation coefficient, which the calculator will provide for you showing all the steps of the calculation.

If the above t-statistic is significant, then we would reject the null hypothesis \(H_0\) (that the population correlation is zero). You can also the critical correlation approach , with the same purpose of assessing whether or not the sample correlation is significantly different from zero, but in that case by comparing the sample correlation with a critical correlation value.

Related Calculators

Descriptive Statistics Calculator of Grouped Data

log in to your account

Reset password.

Correlation Coefficient Calculator

Reporting correlation in apa format, correlation test, residuals normality, correlation calculator, what is covariance, how to calculate the covariance, what is correlation, what is the pearson correlation coefficient, how to calculate the pearson correlation, population pearson correlation formula, sample pearson correlation formula, correlation effect size, assumptions, correlation tests, what is spearman's rank correlation coefficient, how to calculate the spearman's rank correlation, example - spearman's rank calculation, distribution, calculators.

Switch to German

Testing the Significance of Correlations

hypothesis testing for correlation calculator

  • Comparison of correlations from independent samples
  • Comparison of correlations from dependent samples
  • Testing linear independence (Testing against 0)
  • Testing correlations against a fixed value
  • Calculation of confidence intervals of correlations
  • Fisher-Z-Transformation
  • Calculation of the Phi correlation coefficient r Phi for categorial data
  • Calculation of the weighted mean of a list of correlations
  • Transformation of the effect sizes r , d , f , Odds Ratio and eta square
  • Calculation of Linear Correlations

1. Comparison of correlations from independent samples

Correlations, which have been retrieved from different samples can be tested against each other. Example: Imagine, you want to test, if men increase their income considerably faster than women. You could f. e. collect the data on age and income from 1 200 men and 980 women. The correlation could amount to r = .38 in the male cohort and r = .31 in women. Is there a significant difference in the correlation of both cohorts?

(Calculation according to Eid, Gollwitzer & Schmidt, 2011 , pp. 547; single sided test)

2. Comparison of correlations from dependent samples

  • 85 children from grade 3 have been tested with tests on intelligence (1), arithmetic abilities (2) and reading comprehension (3). The correlation between intelligence and arithmetic abilities amounts to r 12 = .53, intelligence and reading correlates with r 13 = .41 and arithmetic and reading with r 23 = .59. Is the correlation between intelligence an arithmetic abilities higher than the correlation between intelligence and reading comprehension?

(Calculation according to Eid et al., 2011 , S. 548 f.; single sided testing)

3. Testing linear independence (Testing against 0)

With the following calculator, you can test if correlations are different from zero. The test is based on the Student's t distribution with n - 2 degrees of freedom. An example: The length of the left foot and the nose of 18 men is quantified. The length correlates with r = .69. Is the correlation significantly different from 0?

(Calculation according to Eid et al., 2011 , S. 542; two sided test)

4. Testing correlations against a fixed value

With the following calculator, you can test if correlations are different from a fixed value. The test uses the Fisher-Z-transformation.

(Calculation according to Eid et al., 2011 , S. 543f.; two sided test)

5. Calculation of confidence intervals of correlations

The confidence interval specifies the range of values that includes a correlation with a given probability (confidence coefficient). The higher the confidence coefficient, the larger the confidence interval. Commonly, values around .9 are used.

based on Bonett & Wright (2000); cf. simulation of Gnambs (2022)

6. Fisher-Z-Transformation

The Fisher-Z-Transformation converts correlations into an almost normally distributed measure. It is necessary for many operations with correlations, f. e. when averaging a list of correlations. The following converter transforms the correlations and it computes the inverse operations as well. Please note, that the Fisher-Z is typed uppercase.

r Phi is a measure for binary data such as counts in different categories, e. g. pass/fail in an exam of males and females. It is also called contingency coefficent or Yule's Phi. Transformation to d Cohen is done via the effect size calculator .

8. Calculation of the weighted mean of a list of correlations

Due to the askew distribution of correlations(see Fisher-Z-Transformation ), the mean of a list of correlations cannot simply be calculated by building the arithmetic mean. Usually, correlations are transformed into Fisher-Z-values and weighted by the number of cases before averaging and retransforming with an inverse Fisher-Z. While this is the usual approach, Eid et al. (2011, pp. 544) suggest using the correction of Olkin & Pratt (1958) instead, as simulations showed it to estimate the mean correlation more precisely. The following calculator computes both for you, the "traditional Fisher-Z-approach" and the algorithm of Olkin and Pratt.

Please fill in the correlations into column A and the number of cases into column B. You can as well copy the values from tables of your spreadsheet program. Finally click on "OK" to start the calculation. Some values already filled in for demonstration purposes.

9. Transformation of the effect sizes r , d , f , Odds Ratio and eta square

Correlations are an effect size measure. They quantify the magnitude of an empirical effect. There are a number of other effect size measures as well, with d Cohen probably being the most prominent one. The different effect size measures can be converted into another. Please have a look at the online calculators on the page Computation of Effect Sizes .

10. Calculation of Linear Correlations

The Online-Calculator computes linear pearson or product moment correlations of two variables. Please fill in the values of variable 1 in column A and the values of variable 2 in column B and press 'OK'. As a demonstration, values for a high positive correlation are already filled in by default.

Many hypothesis tests on this page are based on Eid et al. (2011). jStat is used to generate the Student's t-distribution for testing correlations against each other. The spreadsheet element is based on Handsontable.

  • Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall, and Spearman correlations. Psychometrika, 65(1), 23-28. doi: 10.1007/BF0229418
  • Eid, M., Gollwitzer, M., & Schmitt, M. (2011). Statistik und Forschungsmethoden Lehrbuch . Weinheim: Beltz.
  • Gnambs, T. (2022, April 6). A brief note on the standard error of the Pearson correlation. https://doi.org/10.31234/osf.io/uts98
Please use the following citation: Lenhard, W. & Lenhard, A. (2014). Hypothesis Tests for Comparing Correlations . available: https://www.psychometrica.de/correlation.html. Psychometrica. DOI: 10.13140/RG.2.1.2954.1367

Copyright © 2017-2022; Drs. Wolfgang & Alexandra Lenhard

Hypothesis Testing Calculator

Related: confidence interval calculator, type ii error.

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

Correlation Coefficient Calculator

What is the correlation coefficient, how to use this correlation calculator with steps, pearson correlation coefficient formula, spearman correlation coefficient, kendall rank correlation (tau), matthews correlation (pearson phi).

Welcome to Omni's correlation coefficient calculator! Here you can learn all there is about this important statistical concept. Apart from discussing the general definition of correlation and the intuition behind it, we will also cover in detail the formulas for the four most popular correlation coefficients :

  • Pearson correlation;
  • Spearman correlation;
  • Kendall tau correlation (including the variants); and
  • Matthews correlation (MCC, a.k.a. Pearson phi).

As a bonus, we will also explain how Pearson correlation is linked to simple linear regression . We will start, however, by explaining what the correlation coefficient is all about. Let's go!

Correlation coefficients are measures of the strength and direction of relation between two random variables. The type of relationship that is being measured varies depending on the coefficient. In general, however, they all describe the co-changeability between the variables in question – how increasing (or decreasing) the value of one variable affects the value of the other variable – does it tend to increase or decrease?

Importantly, correlation coefficients are all normalized , i.e., they assume values between -1 and +1. Values of ±1 indicate the strongest possible relationship between variables, and a value of 0 means there's no relationship at all.

And that's it when it comes to the general definition of correlation! If you wonder how to calculate correlation , the best answer is to... use Omni's correlation coefficient calculator 😊! It allows you to easily compute all of the different coefficients in no time. In the next section, we explain how to use this tool in the most effective way.

If you wonder how to calculate correlation by hand, you will find all the necessary formulas and definitions for several correlation coefficients in the following sections.

To use our correlation coefficient calculator:

  • Kendall rank correlation; or
  • Matthews correlation.
  • Input your data into the rows. When at least three points (both an x and y coordinate) are in place, it will give you your result.
  • Be aware that this is a correlation calculator with steps ! If you turn on the option Show calculation details? , our tool will show you the intermediate stages of calculations. This is very useful when you need to verify the correctness of your calculations.
  • 0.8 ≤ |corr| ≤ 1.0 very strong ;
  • 0.6 ≤ |corr| < 0.8 strong ;
  • 0.4 ≤ |corr| < 0.6 moderate ;
  • 0.2 ≤ |corr| < 0.4 weak ; and
  • 0.0 ≤ |corr| < 0.2 very weak .

The Pearson correlation between two variables X and Y is defined as the covariance between these variables divided by the product of their respective standard deviations :

This translates into the following explicit formula:

where x ‾ \overline{x} x and y ‾ \overline{y} y ​ stand for the average of the sample, x 1 , . . . , x n x_1, ..., x_n x 1 ​ , ... , x n ​ and y 1 , . . . , y n y_1, ..., y_n y 1 ​ , ... , y n ​ , respectively.

Remember that the Pearson correlation detects only a linear relationship – a low value of Pearson correlation doesn't mean that there is no relationship at all! The two variables may be strongly related, yet their relationship may not be linear but of some other type.

In least squares regression Y = a X + b Y = aX + b Y = a X + b , the square of the Pearson correlation between X X X and Y Y Y is equal to the coefficient of determination, R² , which expresses the fraction of the variance in Y Y Y that is explained by X X X :

If you want to discover more about the Pearson correlation, visit our dedicated Pearson correlation calculator website .

The Spearman coefficient is closely related to the Pearson coefficient. Namely, the Spearman rank correlation between X X X and Y Y Y is defined as the Pearson correlation between the rank variables r ( X ) r(X) r ( X ) and r ( Y ) r(Y) r ( Y ) . That is, the formula for Spearman's rank correlation r h o rho r h o reads:

To obtain the rank variables, you just need to order the observations (in each sample separately) from lowest to highest. The smallest observation then gets rank 1 , the second-smallest rank 2 , and so on – the highest observation will have rank n . You only need to be careful when the same value appears in the data set more than once (we say there are ties ). If this happens, assign to all these identical observations the rank equal to the arithmetic mean of the ranks you would assign to these observations where they all had different values.

The Spearman correlation is sensitive to the monotonic relationship between the variables, so it is more general than the Pearson correlation – it can capture, e.g., quadratic or exponential relationships.

There is also a simpler and more explicit formula for Spearman correlation , but it holds only if there are no ties in either of our samples . More details await you in the Spearman's rank correlation calculator .

We most often denote Kendall's rank correlation by the Greek letter τ (tau), and that's why it's often referred to as Kendall tau.

Consider two samples, x and y , each of size n : x 1 , ..., x n and y 1 , ..., y n . Clearly, there are n(n+1)/2 possible pairs of x and y .

We have to go through all these pairs one by one and count the number of concordant and discordant pairs . Namely, for two pairs (x i , y i ) and (x j , y j ) we have the following rules:

  • If x i < x j and y i < y j then this pair is concordant.
  • If x i > x j and y i > y j then this pair is concordant.
  • If x i < x j and y i > y j then this pair is discordant.
  • If x i > x j and y i < y j then this pair is discordant.

The Kendall rank correlation coefficient formula reads:

  • C C C – Number of concordant pairs; and
  • D D D – Number of discordant pairs.

That is, τ \tau τ is the difference between the number of concordant and discordant pairs divided by the total number of all pairs.

Easy, don't you think? However, it is so only if there are no ties . That is, there are no repeating values in both sample x and sample y . If there are ties, there are two additional variants of Kendall tau. (Fortunately, our correlation coefficient calculator can calculate them all!) To define them, we need to distinguish different kinds of ties:

  • If x i = x j and y i ≠ y j then we have a tie in x .
  • If x i ≠ x j and y i = y j then we have a tie in y .
  • If x i = x j and y i = y j then we have a double tie .

The Kendall rank tau-b correlation coefficient formula reads:

  • T x T_x T x ​ – Number of ties in x x x ; and
  • T y T_y T y ​ – Number of ties in y y y .

Use tau-b if the two variables have the same number of possible values (before ranking). In other words, if you can summarize the data in a square contingency table . An example of such a situation is when both variables use a 5-point Likert scale: strongly disagree, disagree, neither agree nor disagree, agree, or strongly agree .

If your data is assembled in a rectangular non-square contingency table , or, in other words, if the two variables have a different number of possible values , then use tau-c (sometimes called Stuart-Kendall tau-c ):

  • m m m – m i n ( r , c ) {\rm min}(r,c) min ( r , c ) ;
  • r r r – The number of rows in the contingency table; and
  • c c c – The number of columns in the contingency table.

But where is tau-a , you may think? Fortunately, tau-a is defined in the same simple way as before (when we had no ties):

Kendall tau correlation coefficient is sensitive monotonic relationship between the variables.

The Matthews correlation (abbreviated as MCC, also known as Pearson phi) measures the quality of binary classifications . Most often, we can encounter it in machine learning and biology/medicine-related data.

To write down the formula for the Matthews correlation coefficient we need to assemble our data in a 2x2 contingency table, which in this context is also called the confusion matrix :

where we use the following quite standard abbreviations::

  • TP – True positive;
  • FP – False positive;
  • TN – True negative; and
  • FN – False negative.

Matthews correlation is given by the following formula:

The interpretation of this coefficient is a bit different now:

  • +1 means we have a perfect prediction;
  • 0 means we don't have any valid information; and
  • -1 means we have a complete inconsistency between prediction and the actual outcome.

If you're interested, don't hesitate to visit our Matthews correlation coefficient calculator .

What does a positive correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the same direction : when the first one increases, the other tends to increase, and when the first one decreases, then the other one tends to decrease as well.

What does a negative correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the opposite directions : when the first one increases, the other tends to decrease, and when the first one decreases, then the other one tends to increase.

How to read a correlation matrix?

A correlation matrix is a table that shows the values of a correlation coefficient between all possible pairs of several variables . It always has ones at the main diagonal (this is the correlation of a variable with itself) and is symmetric (because the correlation between X and Y is the same as between Y and X). For these reasons, the redundant cells sometimes get trimmed . If there is some color-coding , make sure to check what it means: it may either illustrate the strength and direction of correlation or its statistical significance.

Conditional probability

Flat vs. round earth, post test probability.

  • Biology (100)
  • Chemistry (100)
  • Construction (144)
  • Conversion (294)
  • Ecology (30)
  • Everyday life (262)
  • Finance (569)
  • Health (440)
  • Physics (509)
  • Sports (104)
  • Statistics (182)
  • Other (181)
  • Discover Omni (40)

12.4 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter "rho."
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

DRAWING A CONCLUSION: There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05

Using the p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the p -value using LinRegTTEST: On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight " ≠ 0 " The output screen shows the p-value on the line that reads "p =". (Most computer statistical software can calculate the p -value.)

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero."
  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero."
  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n - 2 degrees of freedom.
  • The formula for the test statistic is t = r n − 2 1 − r 2 t = r n − 2 1 − r 2 . The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

An alternative way to calculate the p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

  • Consider the third exam/final exam example .
  • The line of best fit is: ŷ = -173.51 + 4.83 x with r = 0.6631 and there are n = 11 data points.
  • Can the regression line be used for prediction? Given a third exam score ( x value), can we use the line to predict the final exam score (predicted y value)?
  • H 0 : ρ = 0
  • H a : ρ ≠ 0
  • The p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The p -value, 0.026, is less than the significance level of α = 0.05.
  • Decision: Reject the Null Hypothesis H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r r is significant or not . Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Example 12.7

Suppose you computed r = 0.801 using n = 10 data points. df = n - 2 = 10 - 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Try It 12.7

For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

Example 12.8

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Try It 12.8

For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

Example 12.9

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Try It 12.9

For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example . The line of best fit is: ŷ = –173.51+4.83 x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score ( x value), can we use the line to predict the final exam score (predicted y value)?

  • Use the "95% Critical Value" table for r with df = n – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602, r is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Example 12.10

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

Try It 12.10

For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics
  • Publication date: Sep 19, 2013
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient

© Jun 23, 2022 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Pearson Correlation Coefficient (r) | Guide & Examples

Pearson Correlation Coefficient (r) | Guide & Examples

Published on May 13, 2022 by Shaun Turney . Revised on February 10, 2024.

The Pearson correlation coefficient ( r ) is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

Table of contents

What is the pearson correlation coefficient, visualizing the pearson correlation coefficient, when to use the pearson correlation coefficient, calculating the pearson correlation coefficient, testing for the significance of the pearson correlation coefficient, reporting the pearson correlation coefficient, other interesting articles, frequently asked questions about the pearson correlation coefficient.

The Pearson correlation coefficient ( r ) is the most widely used correlation coefficient and is known by many names:

  • Pearson’s r
  • Bivariate correlation
  • Pearson product-moment correlation coefficient (PPMCC)
  • The correlation coefficient

The Pearson correlation coefficient is a descriptive statistic , meaning that it summarizes the characteristics of a dataset. Specifically, it describes the strength and direction of the linear relationship between two quantitative variables.

Although interpretations of the relationship strength (also known as effect size ) vary between disciplines, the table below gives general rules of thumb:

The Pearson correlation coefficient is also an inferential statistic , meaning that it can be used to test statistical hypotheses . Specifically, we can test whether there is a significant relationship between two variables.

Prevent plagiarism. Run a free check.

Another way to think of the Pearson correlation coefficient ( r ) is as a measure of how close the observations are to a line of best fit .

The Pearson correlation coefficient also tells you whether the slope of the line of best fit is negative or positive. When the slope is negative, r is negative. When the slope is positive, r is positive.

When r is 1 or –1, all the points fall exactly on the line of best fit:

Strong positive correlation and strong negative correlation

When r is greater than .5 or less than –.5, the points are close to the line of best fit:

Perfect positive correlation and Perfect negative correlation

When r is between 0 and .3 or between 0 and –.3, the points are far from the line of best fit:

Low positive correlation and low negative correlation

When r is 0, a line of best fit is not helpful in describing the relationship between the variables:

Zero correlation

The Pearson correlation coefficient ( r ) is one of several correlation coefficients that you need to choose between when you want to measure a correlation. The Pearson correlation coefficient is a good choice when all of the following are true:

  • Both variables are quantitative : You will need to use a different method if either of the variables is qualitative .
  • The variables are normally distributed : You can create a histogram of each variable to verify whether the distributions are approximately normal. It’s not a problem if the variables are a little non-normal.
  • The data have no outliers : Outliers are observations that don’t follow the same patterns as the rest of the data. A scatterplot is one way to check for outliers—look for points that are far away from the others.
  • The relationship is linear: “Linear” means that the relationship between the two variables can be described reasonably well by a straight line. You can use a scatterplot to check whether the relationship between two variables is linear.

Pearson vs. Spearman’s rank correlation coefficients

Spearman’s rank correlation coefficient is another widely used correlation coefficient. It’s a better choice than the Pearson correlation coefficient when one or more of the following is true:

  • The variables are ordinal .
  • The variables aren’t normally distributed .
  • The data includes outliers.
  • The relationship between the variables is non-linear and monotonic.

Below is a formula for calculating the Pearson correlation coefficient ( r ):

\begin{equation*} r = \frac{ n\sum{xy}-(\sum{x})(\sum{y})}{% \sqrt{[n\sum{x^2}-(\sum{x})^2][n\sum{y^2}-(\sum{y})^2]}} \end{equation*}

The formula is easy to use when you follow the step-by-step guide below. You can also use software such as R or Excel to calculate the Pearson correlation coefficient for you.

Step 1: Calculate the sums of x and y

Start by renaming the variables to “ x ” and “ y .” It doesn’t matter which variable is called x and which is called y —the formula will give the same answer either way.

Next, add up the values of x and y . (In the formula, this step is indicated by the Σ symbol, which means “take the sum of”.)

Σ x = 3.63 + 3.02 + 3.82 + 3.42 + 3.59 + 2.87 + 3.03 + 3.46 + 3.36 + 3.30

Σ y = 53.1 + 49.7 + 48.4 + 54.2 + 54.9 + 43.7 + 47.2 + 45.2 + 54.4 + 50.4

Step 2: Calculate x 2 and y 2 and their sums

Create two new columns that contain the squares of x and y . Take the sums of the new columns.

Σ x 2  = 13.18 + 9.12 + 14.59 + 11.70 + 12.89 +  8.24 +  9.18 + 11.97 + 11.29 + 10.89

Σ x 2  = 113.05

Σ y 2  = 2 819.6 + 2 470.1 + 2 342.6 + 2 937.6 + 3 014.0 + 1 909.7 + 2 227.8 + 2 043.0 + 2 959.4 + 2 540.2

Step 3: Calculate the cross product and its sum

In a final column, multiply together x and y (this is called the cross product). Take the sum of the new column.

Σ xy = 192.8 + 150.1 + 184.9 + 185.4 + 197.1 + 125.4 + 143.0 + 156.4 + 182.8 + 166.3

Step 4: Calculate r

Use the formula and the numbers you calculated in the previous steps to find r .

n = 10

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing for correlation calculator

The Pearson correlation coefficient can also be used to test whether the relationship between two variables is significant .

The Pearson correlation of the sample is r . It is an estimate of rho ( ρ ), the Pearson correlation of the population . Knowing r and n (the sample size), we can infer whether ρ is significantly different from 0.

  • Null hypothesis ( H 0 ): ρ = 0
  • Alternative hypothesis ( H a ): ρ ≠ 0

To test the hypotheses , you can either use software like R or Stata or you can follow the three steps below.

Step 1: Calculate the t value

Calculate the t value (a test statistic ) using this formula:

\begin{equation*} t = \frac{r} {\sqrt{\dfrac{1-r^2}{n-2}}} \end{equation*}

Step 2: Find the critical value of t

You can find the critical value of t ( t* ) in a t table. To use the table, you need to know three things:

  • The degrees of freedom ( df ): For Pearson correlation tests, the formula is df = n – 2.
  • Significance level (α): By convention, the significance level is usually .05.
  • One-tailed or two-tailed: Most often, two-tailed is an appropriate choice for correlations.

Step 3: Compare the t value to the critical value

Determine if the absolute t value is greater than the critical value of t . “Absolute” means that if the t value is negative you should ignore the minus sign.

Step 4: Decide whether to reject the null hypothesis

  • If the t value is greater than the critical value, then the relationship is statistically significant ( p <  α ). The data allows you to reject the null hypothesis and provides support for the alternative hypothesis.
  • If the t value is less than the critical value, then the relationship is not statistically significant ( p >  α ). The data doesn’t allow you to reject the null hypothesis and doesn’t provide support for the alternative hypothesis.

If you decide to include a Pearson correlation ( r ) in your paper or thesis, you should report it in your results section . You can follow these rules if you want to report statistics in APA Style :

  • You don’t need to provide a reference or formula since the Pearson correlation coefficient is a commonly used statistic.
  • You should italicize r when reporting its value.
  • You shouldn’t include a leading zero (a zero before the decimal point) since the Pearson correlation coefficient can’t be greater than one or less than negative one.
  • You should provide two significant digits after the decimal point.

When Pearson’s correlation coefficient is used as an inferential statistic (to test whether the relationship is significant), r is reported alongside its degrees of freedom and p value. The degrees of freedom are reported in parentheses beside r .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

You should use the Pearson correlation coefficient when (1) the relationship is linear and (2) both variables are quantitative and (3) normally distributed and (4) have no outliers.

You can use the cor() function to calculate the Pearson correlation coefficient in R. To test the significance of the correlation, you can use the cor.test() function.

You can use the PEARSON() function to calculate the Pearson correlation coefficient in Excel. If your variables are in columns A and B, then click any blank cell and type “PEARSON(A:A,B:B)”.

There is no function to directly test the significance of the correlation.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2024, February 10). Pearson Correlation Coefficient (r) | Guide & Examples. Scribbr. Retrieved April 11, 2024, from https://www.scribbr.com/statistics/pearson-correlation-coefficient/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, simple linear regression | an easy introduction & examples, coefficient of determination (r²) | calculation & interpretation, hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

“extremely user friendly”

“truly amazing!”

“so easy to use!”

Statistics calculator

Statistics Calculator

You want to analyze your data effortlessly? DATAtab makes it easy and online.

Statistics App

Online Statistics Calculator

What do you want to calculate online? The online statistics calculator is simple and uncomplicated! Here you can find a list of all implemented methods!

Create charts online with DATAtab

Create your charts for your data directly online and uncomplicated. To do this, insert your data into the table under Charts and select which chart you want.

 Create charts online

The advantages of DATAtab

Statistics, as simple as never before..

DATAtab is a modern statistics software, with unique user-friendliness. Statistical analyses are done with just a few clicks, so DATAtab is perfect for statistics beginners and for professionals who want more flow in the user experience.

Directly in the browser, fully flexible.

Directly in the browser, fully flexible. DATAtab works directly in your web browser. You have no installation and maintenance effort whatsoever. Wherever and whenever you want to use DATAtab, just go to the website and get started.

All the statistical methods you need.

DATAtab offers you a wide range of statistical methods. We have selected the most central and best known statistical methods for you and do not overwhelm you with special cases.

Data security is a top priority.

All data that you insert and evaluate on DATAtab always remain on your end device. The data is not sent to any server or stored by us (not even temporarily). Furthermore, we do not pass on your data to third parties in order to analyze your user behavior.

Many tutorials with simple examples.

In order to facilitate the introduction, DATAtab offers a large number of free tutorials with focused explanations in simple language. We explain the statistical background of the methods and give step-by-step explanations for performing the analyses in the statistics calculator.

Practical Auto-Assistant.

DATAtab takes you by the hand in the world of statistics. When making statistical decisions, such as the choice of scale or measurement level or the selection of suitable methods, Auto-Assistants ensure that you get correct results quickly.

Charts, simple and clear.

With DATAtab data visualization is fun! Here you can easily create meaningful charts that optimally illustrate your results.

New in the world of statistics?

DATAtab was primarily designed for people for whom statistics is new territory. Beginners are not overwhelmed with a lot of complicated options and checkboxes, but are encouraged to perform their analyses step by step.

Online survey very simple.

DATAtab offers you the possibility to easily create an online survey, which you can then evaluate immediately with DATAtab.

Our references

Wifi

Alternative to statistical software like SPSS and STATA

DATAtab was designed for ease of use and is a compelling alternative to statistical programs such as SPSS and STATA. On datatab.net, data can be statistically evaluated directly online and very easily (e.g. t-test, regression, correlation etc.). DATAtab's goal is to make the world of statistical data analysis as simple as possible, no installation and easy to use. Of course, we would also be pleased if you take a look at our second project Statisty .

Extensive tutorials

Descriptive statistics.

Here you can find out everything about location parameters and dispersion parameters and how you can describe and clearly present your data using characteristic values.

Hypothesis Test

Here you will find everything about hypothesis testing: One sample t-test , Unpaired t-test , Paired t-test and Chi-square test . You will also find tutorials for non-parametric statistical procedures such as the Mann-Whitney u-Test and Wilcoxon-Test . mann-whitney-u-test and the Wilcoxon test

The regression provides information about the influence of one or more independent variables on the dependent variable. Here are simple explanations of linear regression and logistic regression .

Correlation

Correlation analyses allow you to analyze the linear association between variables. Learn when to use Pearson correlation or Spearman rank correlation . With partial correlation , you can calculate the correlation between two variables to the exclusion of a third variable.

Partial Correlation

The partial correlation shows you the correlation between two variables to the exclusion of a third variable.

Levene Test

The Levene Test checks your data for variance equality. Thus, the levene test is used as a prerequisite test for many hypothesis tests .

The p-value is needed for every hypothesis test to be able to make a statement whether the null hypothesis is accepted or rejected.

Distributions

DATAtab provides you with tables with distributions and helpful explanations of the distribution functions. These include the Table of t-distribution and the Table of chi-squared distribution

Contingency table

With a contingency table you can get an overview of two categorical variables in the statistics.

Equivalence and non-inferiority

In an equivalence trial, the statistical test aims at showing that two treatments are not too different in characteristics and a non-inferiority trial wants to show that an experimental treatment is not worse than an established treatment.

If there is a clear cause-effect relationship between two variables, then we can speak of causality. Learn more about causality in our tutorial.

Multicollinearity

Multicollinearity is when two or more independent variables have a high correlation.

Effect size for independent t-test

Learn how to calculate the effect size for the t-test for independent samples.

Reliability analysis calculator

On DATAtab, Cohen's Kappa can be easily calculated online in the Cohen’s Kappa Calculator . there is also the Fleiss Kappa Calculator . Of course, the Cronbach's alpha can also be calculated in the Cronbach's Alpha Calculator .

Analysis of variance with repeated measurement

Repeated measures ANOVA tests whether there are statistically significant differences in three or more dependent samples.

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.9 - hypothesis test for the population correlation coefficient.

There is one more point we haven't stressed yet in our discussion about the correlation coefficient r and the coefficient of determination \(R^{2}\) — namely, the two measures summarize the strength of a linear relationship in samples only . If we obtained a different sample, we would obtain different correlations, different \(R^{2}\) values, and therefore potentially different conclusions. As always, we want to draw conclusions about populations , not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient \(\rho\) (the greek letter "rho").

In general, a researcher should use the hypothesis test for the population correlation \(\rho\) to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.

Consider evaluating whether or not a linear relationship exists between skin cancer mortality and latitude. We will see in Lesson 2 that we can perform either of the following tests:

  • t -test for testing \(H_{0} \colon \beta_{1}= 0\)
  • ANOVA F -test for testing \(H_{0} \colon \beta_{1}= 0\)

For this example, it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response.

By contrast, suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age ( Husband and Wife data ). In this case, one could treat the husband's age as the response:

husband's age vs wife's age plot

...or one could treat the wife's age as the response:

wife's age vs husband's age plot

In cases such as these, we answer our research question concerning the existence of a linear relationship by using the t -test for testing the population correlation coefficient \(H_{0}\colon \rho = 0\).

Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient \(\rho\).

Steps for Hypothesis Testing for \(\boldsymbol{\rho}\) Section  

Step 1: hypotheses.

First, we specify the null and alternative hypotheses:

  • Null hypothesis \(H_{0} \colon \rho = 0\)
  • Alternative hypothesis \(H_{A} \colon \rho ≠ 0\) or \(H_{A} \colon \rho < 0\) or \(H_{A} \colon \rho > 0\)

Step 2: Test Statistic

Second, we calculate the value of the test statistic using the following formula:

Test statistic:  \(t^*=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\) 

Step 3: P-Value

Third, we use the resulting test statistic to calculate the P -value. As always, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Step 4: Decision

Finally, we make a decision:

  • If the P -value is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. We conclude that "there is sufficient evidence at the\(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y."
  • If the P -value is larger than the significance level \(\alpha\), we fail to reject the null hypothesis. We conclude "there is not enough evidence at the  \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y ."

Example 1-5: Husband and Wife Data Section  

Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test \(H_{0} \colon \rho = 0\) against the alternative \(H_{A} \colon \rho ≠ 0\), we obtain the following test statistic:

\begin{align} t^*&=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\\ &=\dfrac{0.939\sqrt{170-2}}{\sqrt{1-0.939^2}}\\ &=35.39\end{align}

To obtain the P -value, we need to compare the test statistic to a t -distribution with 168 degrees of freedom (since 170 - 2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a two-sided test, multiply the probability by 2. Minitab helps us out here:

Student's t distribution with 168 DF

The output tells us that the probability of getting a test-statistic smaller than 35.39 is greater than 0.999. Therefore, the probability of getting a test-statistic greater than 35.39 is less than 0.001. As illustrated in the following video, we multiply by 2 and determine that the P-value is less than 0.002.

Since the P -value is small — smaller than 0.05, say — we can reject the null hypothesis. There is sufficient statistical evidence at the \(\alpha = 0.05\) level to conclude that there is a significant linear relationship between a husband's age and his wife's age.

Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:

Correlation: WAge, HAge

Pearson correlation of WAge and HAge = 0.939

P-Value = 0.000

Final Note Section  

One final note ... as always, we should clarify when it is okay to use the t -test for testing \(H_{0} \colon \rho = 0\)? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:

  • When it is not obvious which variable is the response.
  • For each x , the y 's are normal with equal variances.
  • For each y , the x 's are normal with equal variances.
  • Either, y can be considered a linear function of x .
  • Or, x can be considered a linear function of y .
  • The ( x , y ) pairs are independent

Module 12: Linear Regression and Correlation

Hypothesis test for correlation, learning outcomes.

  • Conduct a linear regression t-test using p-values and critical values and interpret the conclusion in context

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient  ρ is “close to zero” or “significantly different from zero.” We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that the correlation coefficient is “not significant.”

  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.”
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook).

Method 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the  p -value).

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\dfrac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

Recall: ORDER OF OPERATIONS

1st find the numerator:

Step 1: Find [latex]n-2[/latex], and then take the square root.

Step 2: Multiply the value in Step 1 by [latex]r[/latex].

2nd find the denominator: 

Step 3: Find the square of [latex]r[/latex], which is [latex]r[/latex] multiplied by [latex]r[/latex].

Step 4: Subtract this value from 1, [latex]1 -r^2[/latex].

Step 5: Find the square root of Step 4.

3rd take the numerator and divide by the denominator.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAM:  p- value method

  • Consider the  third exam/final exam example (example 2).
  • The line of best fit is: [latex]\hat{y}[/latex] = -173.51 + 4.83 x  with  r  = 0.6631 and there are  n  = 11 data points.
  • Can the regression line be used for prediction?  Given a third exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?
  • H 0 :  ρ  = 0
  • H a :  ρ  ≠ 0
  • The  p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The  p -value, 0.026, is less than the significance level of  α  = 0.05.
  • Decision: Reject the Null Hypothesis  H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because  r  is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r is significant or not . Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If  r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632. Therefore, r is significant.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

r = –0.624-0.532. Therefore, r is significant.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the  third exam/final exam example  again. The line of best fit is: [latex]\hat{y}[/latex] = –173.51+4.83 x  with  r  = 0.6631 and there are  n  = 11 data points. Can the regression line be used for prediction?  Given a third-exam score ( x  value), can we use the line to predict the final exam score (predicted  y  value)?

  • Use the “95% Critical Value” table for  r  with  df  =  n  – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602,  r  is significant.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between  x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population).
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y  values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Introductory Statistics. Authored by : Barbara Illowsky, Susan Dean. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/1-introduction . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction

Footer Logo Lumen Candela

Privacy Policy

Calculator: t-Value for Correlation Coefficients

Skip Navigation Links

t-Value Calculator for Correlation Coefficients

This calculator will tell you the t-value and degrees of freedom associated with a Pearson correlation coefficient, given the correlation value r, and the sample size. Please enter the necessary parameter values, and then click 'Calculate'.

hypothesis testing for correlation calculator

  • Calculators
  • Descriptive Statistics
  • Merchandise
  • Which Statistics Test?

P Value from Pearson (R) Calculator

This should be self-explanatory, but just in case it's not: your r score goes in the R Score box, the number of pairs in your sample goes in the N box (you must have at least 3 pairs), then you select your significance level and press the button.

If you need to derive a r score from raw data, you can find a Pearson (r) calculator here .

Enter your values above, then press "Calculate".

Correlation Calculators

This site features a number of different correlation calculators which you might find helpful.

Pearson Correlation Coefficient Calculator Spearman's Rho (Correlation) Calculator Phi Coefficient Calculator Point-Biserial Correlation Coefficient Calculator

HyperLink

Two sample t test

Kruskal wallis

Pearson product moment

Two sample f test

One way anova

Bartlett test

Chi square goodness of fit

Fligner killeen

Hypothesis Test Calculator

Upload your data set below to get started

Or input your data as csv

Sharing helps us build more free tools

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.5: Testing the Significance of the Correlation Coefficient

  • Last updated
  • Save as PDF
  • Page ID 800

The correlation coefficient, \(r\), tells us about the strength and direction of the linear relationship between \(x\) and \(y\). However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient \(r\) and the sample size \(n\), together. We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute \(r\), the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, \(r\), is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is \(\rho\), the Greek letter "rho."
  • \(\rho =\) population correlation coefficient (unknown)
  • \(r =\) sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between \(x\) and \(y\). We can use the regression line to model the linear relationship between \(x\) and \(y\) in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between \(x\) and \(y\). Therefore, we CANNOT use the regression line to model a linear relationship between \(x\) and \(y\) in the population.
  • If \(r\) is significant and the scatter plot shows a linear trend, the line can be used to predict the value of \(y\) for values of \(x\) that are within the domain of observed \(x\) values.
  • If \(r\) is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If \(r\) is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed \(x\) values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: \(H_{0}: \rho = 0\)
  • Alternate Hypothesis: \(H_{a}: \rho \neq 0\)

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis \(H_{0}\) : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between \(x\) and \(y\) in the population.
  • Alternate Hypothesis \(H_{a}\) : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between \(x\) and \(y\) in the population.

DRAWING A CONCLUSION:There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the \(p\text{-value}\)
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, \(\alpha = 0.05\)

Using the \(p\text{-value}\) method, you could choose any appropriate significance level you want; you are not limited to using \(\alpha = 0.05\). But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, \(\alpha = 0.05\). (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a \(p\text{-value}\) to make a decision

Using the ti83, 83+, 84, 84+ calculator.

To calculate the \(p\text{-value}\) using LinRegTTEST:

On the LinRegTTEST input screen, on the line prompt for \(\beta\) or \(\rho\), highlight "\(\neq 0\)"

The output screen shows the \(p\text{-value}\) on the line that reads "\(p =\)".

(Most computer statistical software can calculate the \(p\text{-value}\).)

If the \(p\text{-value}\) is less than the significance level ( \(\alpha = 0.05\) ):

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero."

If the \(p\text{-value}\) is NOT less than the significance level ( \(\alpha = 0.05\) )

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is NOT significantly different from zero."

Calculation Notes:

  • You will use technology to calculate the \(p\text{-value}\). The following describes the calculations to compute the test statistics and the \(p\text{-value}\):
  • The \(p\text{-value}\) is calculated using a \(t\)-distribution with \(n - 2\) degrees of freedom.
  • The formula for the test statistic is \(t = \frac{r\sqrt{n-2}}{\sqrt{1-r^{2}}}\). The value of the test statistic, \(t\), is shown in the computer or calculator output along with the \(p\text{-value}\). The test statistic \(t\) has the same sign as the correlation coefficient \(r\).
  • The \(p\text{-value}\) is the combined area in both tails.

An alternative way to calculate the \(p\text{-value}\) ( \(p\) ) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: \(p\text{-value}\) method

  • Consider the third exam/final exam example.
  • The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points.
  • Can the regression line be used for prediction? Given a third exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?
  • \(H_{0}: \rho = 0\)
  • \(H_{a}: \rho \neq 0\)
  • \(\alpha = 0.05\)
  • The \(p\text{-value}\) is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The \(p\text{-value}\), 0.026, is less than the significance level of \(\alpha = 0.05\).
  • Decision: Reject the Null Hypothesis \(H_{0}\)
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Because \(r\) is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of \(r\) is significant or not . Compare \(r\) to the appropriate critical value in the table. If \(r\) is not between the positive and negative critical values, then the correlation coefficient is significant. If \(r\) is significant, then you may want to use the line for prediction.

Example \(\PageIndex{1}\)

Suppose you computed \(r = 0.801\) using \(n = 10\) data points. \(df = n - 2 = 10 - 2 = 8\). The critical values associated with \(df = 8\) are \(-0.632\) and \(+0.632\). If \(r <\) negative critical value or \(r >\) positive critical value, then \(r\) is significant. Since \(r = 0.801\) and \(0.801 > 0.632\), \(r\) is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

Exercise \(\PageIndex{1}\)

For a given line of best fit, you computed that \(r = 0.6501\) using \(n = 12\) data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because \(r >\) the positive critical value.

Example \(\PageIndex{2}\)

Suppose you computed \(r = –0.624\) with 14 data points. \(df = 14 – 2 = 12\). The critical values are \(-0.532\) and \(0.532\). Since \(-0.624 < -0.532\), \(r\) is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

Exercise \(\PageIndex{2}\)

For a given line of best fit, you compute that \(r = 0.5204\) using \(n = 9\) data points, and the critical value is \(0.666\). Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because \(r <\) the positive critical value.

Example \(\PageIndex{3}\)

Suppose you computed \(r = 0.776\) and \(n = 6\). \(df = 6 - 2 = 4\). The critical values are \(-0.811\) and \(0.811\). Since \(-0.811 < 0.776 < 0.811\), \(r\) is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

Exercise \(\PageIndex{3}\)

For a given line of best fit, you compute that \(r = -0.7204\) using \(n = 8\) data points, and the critical value is \(= 0.707\). Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because \(r <\) the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points. Can the regression line be used for prediction? Given a third-exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?

  • Use the "95% Critical Value" table for \(r\) with \(df = n - 2 = 11 - 2 = 9\).
  • The critical values are \(-0.602\) and \(+0.602\)
  • Since \(0.6631 > 0.602\), \(r\) is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Example \(\PageIndex{4}\)

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if \(r\) is significant and the line of best fit associated with each r can be used to predict a \(y\) value. If it helps, draw a number line.

  • \(r = –0.567\) and the sample size, \(n\), is \(19\). The \(df = n - 2 = 17\). The critical value is \(-0.456\). \(-0.567 < -0.456\) so \(r\) is significant.
  • \(r = 0.708\) and the sample size, \(n\), is \(9\). The \(df = n - 2 = 7\). The critical value is \(0.666\). \(0.708 > 0.666\) so \(r\) is significant.
  • \(r = 0.134\) and the sample size, \(n\), is \(14\). The \(df = 14 - 2 = 12\). The critical value is \(0.532\). \(0.134\) is between \(-0.532\) and \(0.532\) so \(r\) is not significant.
  • \(r = 0\) and the sample size, \(n\), is five. No matter what the \(dfs\) are, \(r = 0\) is between the two critical values so \(r\) is not significant.

Exercise \(\PageIndex{4}\)

For a given line of best fit, you compute that \(r = 0\) using \(n = 100\) data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between \(x\) and \(y\) in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between \(x\) and \(y\) in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatter plot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of \(y\) for varying values of \(x\). In other words, the expected value of \(y\) for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The \(y\) values for any particular \(x\) value are normally distributed about the line. This implies that there are more \(y\) values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of \(y\) values lie on the line.
  • The standard deviations of the population \(y\) values about the line are equal for each value of \(x\). In other words, each of these normal distributions of \(y\) values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

Linear regression is a procedure for fitting a straight line of the form \(\hat{y} = a + bx\) to data. The conditions for regression are:

  • Linear In the population, there is a linear relationship that models the average value of \(y\) for different values of \(x\).
  • Independent The residuals are assumed to be independent.
  • Normal The \(y\) values are distributed normally for any value of \(x\).
  • Equal variance The standard deviation of the \(y\) values is equal for each \(x\) value.
  • Random The data are produced from a well-designed random sample or randomized experiment.

The slope \(b\) and intercept \(a\) of the least-squares line estimate the slope \(\beta\) and intercept \(\alpha\) of the population (true) regression line. To estimate the population standard deviation of \(y\), \(\sigma\), use the standard deviation of the residuals, \(s\). \(s = \sqrt{\frac{SEE}{n-2}}\). The variable \(\rho\) (rho) is the population correlation coefficient. To test the null hypothesis \(H_{0}: \rho =\) hypothesized value , use a linear regression t-test. The most common null hypothesis is \(H_{0}: \rho = 0\) which indicates there is no linear relationship between \(x\) and \(y\) in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit:

\[\hat{y} = a + bx\]

\[a = y\text{-intercept}\]

\[b = \text{slope}\]

Standard deviation of the residuals:

\[s = \sqrt{\frac{SSE}{n-2}}\]

\[SSE = \text{sum of squared errors}\]

\[n = \text{the number of data points}\]

IMAGES

  1. Hypothesis Testing- Meaning, Types & Steps

    hypothesis testing for correlation calculator

  2. How To Conduct Hypothesis Testing For A Population Correlation Coefficient

    hypothesis testing for correlation calculator

  3. Casio: Hypothesis test for correlation using Test-T-Reg

    hypothesis testing for correlation calculator

  4. PPT

    hypothesis testing for correlation calculator

  5. Hypothesis Testing for a Population Correlation Coefficient Using the TI-84 (LinRegTTest)

    hypothesis testing for correlation calculator

  6. Hypothesis Test for Regression and Correlation Analysis

    hypothesis testing for correlation calculator

VIDEO

  1. Hypothesis Testing

  2. Correlation Hypothesis Test Theory

  3. Testing of hypothesis about correlation coefficient

  4. Conduct a Multiple Linear Correlation Hypothesis Test Using Free Web Calculators

  5. Hypothesis Testing based on Correlation

  6. Calculator Casio fx-350MS: Computing r and t

COMMENTS

  1. Correlation Hypothesis Test Calculator for r

    Discover the power of statistics with our free hypothesis test for Pearson correlation coefficient (r) on two numerical data sets. Our user-friendly calculator provides accurate results to determine the strength and significance of relationships between variables. Uncover valuable insights from your data and make informed decisions with ease. Try our hassle-free statistics calculator now!

  2. Correlation Coefficient Significance Calculator using p-value

    t = r\sqrt { \frac {n-2} {1-r^2}} t = r 1 −r2n −2. So, this is the formula for the t test for correlation coefficient, which the calculator will provide for you showing all the steps of the calculation. If the above t-statistic is significant, then we would reject the null hypothesis H_0 H 0 (that the population correlation is zero). You ...

  3. Correlation coefficient calculator

    The Correlation Calculator computes both Pearson and Spearman's Rank correlation coefficients, and tests the significance of the results. Additionally, it calculates the covariance. You may change the X and Y labels. Separate data by Enter or comma, , after each value. The tool ignores non-numeric cells.

  4. Online-Calculator for testing correlations: Psychometrica

    The Online-Calculator computes linear pearson or product moment correlations of two variables. Please fill in the values of variable 1 in column A and the values of variable 2 in column B and press 'OK'. As a demonstration, values for a high positive correlation are already filled in by default. Data. linear.

  5. Hypothesis Test Calculator

    Calculation Example: There are six steps you would follow in hypothesis testing: Formulate the null and alternative hypotheses in three different ways: H 0: θ = θ 0 v e r s u s H 1: θ ≠ θ 0. H 0: θ ≤ θ 0 v e r s u s H 1: θ > θ 0. H 0: θ ≥ θ 0 v e r s u s H 1: θ < θ 0.

  6. Hypothesis Testing Calculator with Steps

    Hypothesis Testing Calculator. The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is ...

  7. Correlation Coefficient Calculator

    Our correlation coefficient calculator will also, whenever possible, display the interpretation of the result. It uses Evan's scale (1996) to describe the strength of correlation. This scale is based on the absolute value of correlation and the thresholds are the following: 0.8 ≤ |corr| ≤ 1.0 very strong; 0.6 ≤ |corr| < 0.8 strong;

  8. 11.2: Correlation Hypothesis Test

    The formula for the test statistic is t = r n−2√ 1−r2√. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r. The p-value is the combined area in both tails.

  9. 12.4 Testing the Significance of the Correlation Coefficient

    H 0: ρ = 0. H a: ρ ≠ 0. α = 0.05. The p-value is 0.026 (from LinRegTTest on your calculator or from computer software).; The p-value, 0.026, is less than the significance level of α = 0.05.; Decision: Reject the Null Hypothesis H 0; Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (x) and the final exam ...

  10. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r ( n − 2 1 −r2)− −−−−−−−√ t = r ( n − 2 1 − r 2).

  11. p-Value Calculator for Correlation Coefficients

    p-Value Calculator for Correlation Coefficients. This calculator will tell you the significance (both one-tailed and two-tailed probability values) of a Pearson correlation coefficient, given the correlation value r, and the sample size. Please enter the necessary parameter values, and then click 'Calculate'. Correlation value (r):

  12. Pearson Correlation Coefficient (r)

    Revised on February 10, 2024. The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. When one variable changes, the other variable changes in the same direction.

  13. Online Statistics Calculator: Hypothesis testing, t-test, chi-square

    Alternative to statistical software like SPSS and STATA. DATAtab was designed for ease of use and is a compelling alternative to statistical programs such as SPSS and STATA. On datatab.net, data can be statistically evaluated directly online and very easily (e.g. t-test, regression, correlation etc.). DATAtab's goal is to make the world of statistical data analysis as simple as possible, no ...

  14. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  15. Hypothesis Test for Correlation

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero.". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we ...

  16. Pearson Correlation Coefficient Calculator

    Pearson Correlation Coefficient Calculator. The Pearson correlation coefficient is used to measure the strength of a linear association between two variables, where the value r = 1 means a perfect positive correlation and the value r = -1 means a perfect negataive correlation. So, for example, you could use this test to find out whether people's height and weight are correlated (they will be ...

  17. t-Value Calculator for Correlation Coefficients

    t-Value Calculator for Correlation Coefficients. This calculator will tell you the t-value and degrees of freedom associated with a Pearson correlation coefficient, given the correlation value r, and the sample size. Please enter the necessary parameter values, and then click 'Calculate'. Correlation value (r): Sample size:

  18. 2.5.2 Hypothesis Testing for Correlation

    You should be familiar with using a hypothesis test to determine bias within probability problems. It is also possible to use a hypothesis test to determine whether a given product moment correlation coefficient calculated from a sample could be representative of the same relationship existing within the whole population. For full information on hypothesis testing, see the revision notes from ...

  19. PDF Lecture 2: Hypothesis testing and correlation

    r= x i−x std(x)y i−y std(y)i=1 n ∑ n In words, we z-score each variable (subtract off the mean, divide by the standard deviation) and then compute the average product of the variables. (Technical note: in the above formula, std should be computed using a version of standard deviation where we normalize by n instead of n - 1.) - Correlation can be given a nice geometric interpretation ...

  20. 41: Full Regression Analysis Calculator

    Full regression analysis Calculator. Create a scatter plot, the regression equation, r and r2 r 2, and perform the hypothesis test for a nonzero correlation below by entering a point, click Plot Points and then continue until you are done. You can also input all your data at once by putting the first variable's data separated by commas in the ...

  21. P Value from Pearson (R) Calculator

    If you need to derive a r score from raw data, you can find a Pearson (r) calculator here . Significance Level: Enter your values above, then press "Calculate". This site features a number of different correlation calculators which you might find helpful. A simple calculator that generates a P Value from a Pearson (r) score.

  22. Quickly Perform Hypothesis Tests Online for Free

    Hypothesis Test Calculator. Upload your data set below to get started. Upload File. Or input your data as csv. column_one,column_two,column_three 1,2,3 4,5,6 7,8,9. Submit CSV. Sharing helps us build more free tools.

  23. 2.5.2 Hypothesis Testing for Correlation

    How is a hypothesis test for correlation carried out? Most of the time the hypothesis test will be carried out by using a critical value; You won't be expected to calculate p-values but you might be given a p-value; Step 1. Write the null and alternative hypotheses clearly. The hypothesis test could either be a one-tailed test or a two-tailed test

  24. 12.5: Testing the Significance of the Correlation Coefficient

    The p-value is calculated using a t -distribution with n − 2 degrees of freedom. The formula for the test statistic is t = r√n − 2 √1 − r2. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.