Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

11.2: Correlation Hypothesis Test

  • Last updated
  • Save as PDF
  • Page ID 25691

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

The correlation coefficient, \(r\), tells us about the strength and direction of the linear relationship between \(x\) and \(y\). However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient \(r\) and the sample size \(n\), together. We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute \(r\), the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, \(r\), is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is \(\rho\), the Greek letter "rho."
  • \(\rho =\) population correlation coefficient (unknown)
  • \(r =\) sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between \(x\) and \(y\). We can use the regression line to model the linear relationship between \(x\) and \(y\) in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between \(x\) and \(y\). Therefore, we CANNOT use the regression line to model a linear relationship between \(x\) and \(y\) in the population.
  • If \(r\) is significant and the scatter plot shows a linear trend, the line can be used to predict the value of \(y\) for values of \(x\) that are within the domain of observed \(x\) values.
  • If \(r\) is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If \(r\) is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed \(x\) values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: \(H_{0}: \rho = 0\)
  • Alternate Hypothesis: \(H_{a}: \rho \neq 0\)

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis \(H_{0}\) : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between \(x\) and \(y\) in the population.
  • Alternate Hypothesis \(H_{a}\) : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between \(x\) and \(y\) in the population.

DRAWING A CONCLUSION:There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the \(p\text{-value}\)
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, \(\alpha = 0.05\)

Using the \(p\text{-value}\) method, you could choose any appropriate significance level you want; you are not limited to using \(\alpha = 0.05\). But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, \(\alpha = 0.05\). (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a \(p\text{-value}\) to make a decision

Using the ti83, 83+, 84, 84+ calculator.

To calculate the \(p\text{-value}\) using LinRegTTEST:

On the LinRegTTEST input screen, on the line prompt for \(\beta\) or \(\rho\), highlight "\(\neq 0\)"

The output screen shows the \(p\text{-value}\) on the line that reads "\(p =\)".

(Most computer statistical software can calculate the \(p\text{-value}\).)

If the \(p\text{-value}\) is less than the significance level ( \(\alpha = 0.05\) ):

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero."

If the \(p\text{-value}\) is NOT less than the significance level ( \(\alpha = 0.05\) )

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is NOT significantly different from zero."

Calculation Notes:

  • You will use technology to calculate the \(p\text{-value}\). The following describes the calculations to compute the test statistics and the \(p\text{-value}\):
  • The \(p\text{-value}\) is calculated using a \(t\)-distribution with \(n - 2\) degrees of freedom.
  • The formula for the test statistic is \(t = \frac{r\sqrt{n-2}}{\sqrt{1-r^{2}}}\). The value of the test statistic, \(t\), is shown in the computer or calculator output along with the \(p\text{-value}\). The test statistic \(t\) has the same sign as the correlation coefficient \(r\).
  • The \(p\text{-value}\) is the combined area in both tails.

An alternative way to calculate the \(p\text{-value}\) ( \(p\) ) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: \(p\text{-value}\) method

  • Consider the third exam/final exam example.
  • The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points.
  • Can the regression line be used for prediction? Given a third exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?
  • \(H_{0}: \rho = 0\)
  • \(H_{a}: \rho \neq 0\)
  • \(\alpha = 0.05\)
  • The \(p\text{-value}\) is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The \(p\text{-value}\), 0.026, is less than the significance level of \(\alpha = 0.05\).
  • Decision: Reject the Null Hypothesis \(H_{0}\)
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Because \(r\) is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of \(r\) is significant or not . Compare \(r\) to the appropriate critical value in the table. If \(r\) is not between the positive and negative critical values, then the correlation coefficient is significant. If \(r\) is significant, then you may want to use the line for prediction.

Example \(\PageIndex{1}\)

Suppose you computed \(r = 0.801\) using \(n = 10\) data points. \(df = n - 2 = 10 - 2 = 8\). The critical values associated with \(df = 8\) are \(-0.632\) and \(+0.632\). If \(r <\) negative critical value or \(r >\) positive critical value, then \(r\) is significant. Since \(r = 0.801\) and \(0.801 > 0.632\), \(r\) is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

Exercise \(\PageIndex{1}\)

For a given line of best fit, you computed that \(r = 0.6501\) using \(n = 12\) data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because \(r >\) the positive critical value.

Example \(\PageIndex{2}\)

Suppose you computed \(r = –0.624\) with 14 data points. \(df = 14 – 2 = 12\). The critical values are \(-0.532\) and \(0.532\). Since \(-0.624 < -0.532\), \(r\) is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

Exercise \(\PageIndex{2}\)

For a given line of best fit, you compute that \(r = 0.5204\) using \(n = 9\) data points, and the critical value is \(0.666\). Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because \(r <\) the positive critical value.

Example \(\PageIndex{3}\)

Suppose you computed \(r = 0.776\) and \(n = 6\). \(df = 6 - 2 = 4\). The critical values are \(-0.811\) and \(0.811\). Since \(-0.811 < 0.776 < 0.811\), \(r\) is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

Exercise \(\PageIndex{3}\)

For a given line of best fit, you compute that \(r = -0.7204\) using \(n = 8\) data points, and the critical value is \(= 0.707\). Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because \(r <\) the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points. Can the regression line be used for prediction? Given a third-exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?

  • Use the "95% Critical Value" table for \(r\) with \(df = n - 2 = 11 - 2 = 9\).
  • The critical values are \(-0.602\) and \(+0.602\)
  • Since \(0.6631 > 0.602\), \(r\) is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Example \(\PageIndex{4}\)

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if \(r\) is significant and the line of best fit associated with each r can be used to predict a \(y\) value. If it helps, draw a number line.

  • \(r = –0.567\) and the sample size, \(n\), is \(19\). The \(df = n - 2 = 17\). The critical value is \(-0.456\). \(-0.567 < -0.456\) so \(r\) is significant.
  • \(r = 0.708\) and the sample size, \(n\), is \(9\). The \(df = n - 2 = 7\). The critical value is \(0.666\). \(0.708 > 0.666\) so \(r\) is significant.
  • \(r = 0.134\) and the sample size, \(n\), is \(14\). The \(df = 14 - 2 = 12\). The critical value is \(0.532\). \(0.134\) is between \(-0.532\) and \(0.532\) so \(r\) is not significant.
  • \(r = 0\) and the sample size, \(n\), is five. No matter what the \(dfs\) are, \(r = 0\) is between the two critical values so \(r\) is not significant.

Exercise \(\PageIndex{4}\)

For a given line of best fit, you compute that \(r = 0\) using \(n = 100\) data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between \(x\) and \(y\) in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between \(x\) and \(y\) in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatter plot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of \(y\) for varying values of \(x\). In other words, the expected value of \(y\) for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The \(y\) values for any particular \(x\) value are normally distributed about the line. This implies that there are more \(y\) values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of \(y\) values lie on the line.
  • The standard deviations of the population \(y\) values about the line are equal for each value of \(x\). In other words, each of these normal distributions of \(y\) values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

Linear regression is a procedure for fitting a straight line of the form \(\hat{y} = a + bx\) to data. The conditions for regression are:

  • Linear In the population, there is a linear relationship that models the average value of \(y\) for different values of \(x\).
  • Independent The residuals are assumed to be independent.
  • Normal The \(y\) values are distributed normally for any value of \(x\).
  • Equal variance The standard deviation of the \(y\) values is equal for each \(x\) value.
  • Random The data are produced from a well-designed random sample or randomized experiment.

The slope \(b\) and intercept \(a\) of the least-squares line estimate the slope \(\beta\) and intercept \(\alpha\) of the population (true) regression line. To estimate the population standard deviation of \(y\), \(\sigma\), use the standard deviation of the residuals, \(s\). \(s = \sqrt{\frac{SEE}{n-2}}\). The variable \(\rho\) (rho) is the population correlation coefficient. To test the null hypothesis \(H_{0}: \rho =\) hypothesized value , use a linear regression t-test. The most common null hypothesis is \(H_{0}: \rho = 0\) which indicates there is no linear relationship between \(x\) and \(y\) in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit:

\[\hat{y} = a + bx\]

\[a = y\text{-intercept}\]

\[b = \text{slope}\]

Standard deviation of the residuals:

\[s = \sqrt{\frac{SSE}{n-2}}\]

\[SSE = \text{sum of squared errors}\]

\[n = \text{the number of data points}\]

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 5.

  • Example: Correlation coefficient intuition
  • Correlation coefficient intuition
  • Calculating correlation coefficient r

Correlation coefficient review

What is a correlation coefficient.

  • It always has a value between − 1 ‍   and 1 ‍   .
  • Strong positive linear relationships have values of r ‍   closer to 1 ‍   .
  • Strong negative linear relationships have values of r ‍   closer to − 1 ‍   .
  • Weaker relationships have values of r ‍   closer to 0 ‍   .

Practice problem

Want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

13.2 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between X 1 and X 2 .

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between X 1 and X 2 because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship X 1 and X 2 . If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0
  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between X 1 and X 2 in the population.
  • Alternate Hypothesis H a : The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population.

Drawing a Conclusion There are two methods of making the decision concerning the hypothesis. The test statistic to test this hypothesis is:

Where the second formula is an equivalent form of the test statistic, n is the sample size and the degrees of freedom are n-2. This is a t-statistic and operates in the same way as other t tests. Calculate the t-value and compare that with the critical value from the t-table at the appropriate degrees of freedom and the level of confidence you wish to maintain. If the calculated value is in the tail then cannot accept the null hypothesis that there is no linear relationship between these two independent random variables. If the calculated t-value is NOT in the tailed then cannot reject the null hypothesis that there is no linear relationship between the two variables.

A quick shorthand way to test correlations is the relationship between the sample size and the correlation. If:

then this implies that the correlation between the two variables demonstrates that a linear relationship exists and is statistically significant at approximately the 0.05 level of significance. As the formula indicates, there is an inverse relationship between the sample size and the required correlation for significance of a linear relationship. With only 10 observations, the required correlation for significance is 0.6325, for 30 observations the required correlation for significance decreases to 0.3651 and at 100 observations the required level is only 0.2000.

Correlations may be helpful in visualizing the data, but are not appropriately used to "explain" a relationship between two variables. Perhaps no single statistic is more misused than the correlation coefficient. Citing correlations between health conditions and everything from place of residence to eye color have the effect of implying a cause and effect relationship. This simply cannot be accomplished with a correlation coefficient. The correlation coefficient is, of course, innocent of this misinterpretation. It is the duty of the analyst to use a statistic that is designed to test for cause and effect relationships and report only those results if they are intending to make such a claim. The problem is that passing this more rigorous test is difficult so lazy and/or unscrupulous "researchers" fall back on correlations when they cannot make their case legitimately.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Business Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/13-2-testing-the-significance-of-the-correlation-coefficient

© Dec 6, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

Correlation Coefficient

What is the correlation coefficient.

The correlation coefficient is the specific measure that quantifies the strength of the linear relationship between two variables in a correlation analysis. The coefficient is what we symbolize with the r in a correlation report.

How is the correlation coefficient used?

For two variables, the formula compares the distance of each datapoint from the variable mean and uses this to tell us how closely the relationship between the variables can be fit to an imaginary line drawn through the data. This is what we mean when we say that correlations look at linear relationships.

What are some limitations to consider?

Correlation only looks at the two variables at hand and won’t give insight into relationships beyond the bivariate data. This test won’t detect (and therefore will be skewed by) outliers in the data and can’t properly detect curvilinear relationships.

Correlation coefficient variants

This page focuses on the Pearson product-moment correlation. This is one of the most common types of correlation measures used in practice, but there are others. One closely related variant is the Spearman correlation, which is similar in usage but applicable to ranked data.

What do the values of the correlation coefficient mean?

See how to assess correlations using statistical software.

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library .

The correlation coefficient r is a unit-free value between -1 and 1. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r = and p = .

  • The closer r is to zero, the weaker the linear relationship.
  • Positive r values indicate a positive correlation, where the values of both variables tend to increase together.
  • Negative r values indicate a negative correlation, where the values of one variable tend to increase when the values of the other variable decrease.
  • The values 1 and -1 both represent "perfect" correlations, positive and negative respectively. Two perfectly correlated variables change together at a fixed rate. We say they have a linear relationship; when plotted on a scatterplot, all data points can be connected with a straight line.
  • The p-value helps us determine whether or not we can meaningfully conclude that the population correlation coefficient is different from zero, based on what we observe from the sample.

What is a p-value?

A p-value is a measure of probability used for hypothesis testing. The goal of hypothesis testing is to determine whether there is enough evidence to support a certain hypothesis about your data. Actually, we formulate two hypotheses: the null hypothesis and the alternative hypothesis. In the case of correlation analysis, the null hypothesis is typically that the observed relationship between the variables is the result of pure chance (i.e. the correlation coefficient is really zero — there is no linear relationship). The alternative hypothesis is that the correlation we’ve measured is legitimately present in our data (i.e. the correlation coefficient is different from zero).

The p-value is the probability of observing a non-zero correlation coefficient in our sample data when in fact the null hypothesis is true. A low p-value would lead you to reject the null hypothesis. A typical threshold for rejection of the null hypothesis is a p-value of 0.05. That is, if you have a p-value less than 0.05, you would reject the null hypothesis in favor of the alternative hypothesis—that the correlation coefficient is different from zero.

How do we actually calculate the correlation coefficient?

The sample correlation coefficient can be represented with a formula:

$$ r=\frac{\sum\left[\left(x_i-\overline{x}\right)\left(y_i-\overline{y}\right)\right]}{\sqrt{\mathrm{\Sigma}\left(x_i-\overline{x}\right)^2\ \ast\ \mathrm{\Sigma}(y_i\ -\overline{y})^2}} $$

View Annotated Formula

Let’s step through how to calculate the correlation coefficient using an example with a small set of simple numbers, so that it’s easy to follow the operations.

Let’s imagine that we’re interested in whether we can expect there to be more ice cream sales in our city on hotter days. Ice cream shops start to open in the spring; perhaps people buy more ice cream on days when it’s hot outside. On the other hand, perhaps people simply buy ice cream at a steady rate because they like it so much.

We start to answer this question by gathering data on average daily ice cream sales and the highest daily temperature. Ice Cream Sales and Temperature are therefore the two variables which we’ll use to calculate the correlation coefficient. Sometimes data like these are called bivariate data , because each observation (or point in time at which we’ve measured both sales and temperature) has two pieces of information that we can use to describe it. In other words, we’re asking whether Ice Cream Sales and Temperature seem to move together.

As before, a useful way to take a first look is with a scatterplot:

hypothesis of correlation coefficient

We can also look at these data in a table, which is handy for helping us follow the coefficient calculation for each datapoint. When talking about bivariate data, it’s typical to call one variable X and the other Y (these also help us orient ourselves on a visual plane, such as the axes of a plot). Let’s call Ice Cream Sales X , and Temperature Y .

Notice that each datapoint is paired . Remember, we are really looking at individual points in time, and each time has a value for both sales and temperature.

1. Start by finding the sample means

Now that we’re oriented to our data, we can start with two important subcalculations from the formula above: the sample mean , and the difference between each datapoint and this mean (in these steps, you can also see the initial building blocks of standard deviation ).

The sample means are represented with the symbols  x̅ and y̅ , sometimes called “x bar” and “y bar.” The means for Ice Cream Sales ( x̅ ) and Temperature ( y̅ ) are easily calculated as follows:

$$ \overline{x} =\ [3\ +\ 6\ +\ 9] ÷ 3 = 6 $$

$$ \overline{y} =\ [70\ +\ 75\ +\ 80] ÷ 3 = 75 $$

2. Calculate the distance of each datapoint from its mean

With the mean in hand for each of our two variables, the next step is to subtract the mean of Ice Cream Sales (6) from each of our Sales data points ( x i in the formula), and the mean of Temperature (75) from each of our Temperature data points ( y i in the formula). Note that this operation sometimes results in a negative number or zero!

3. Complete the top of the coefficient equation

This piece of the equation is called the Sum of Products. A product is a number you get after multiplying, so this formula is just what it sounds like: the sum of numbers you multiply.

$$ \sum[(x_i-\overline{x})(y_i-\overline{y})] $$

We take the paired values from each row in the last two columns in the table above, multiply them (remember that multiplying two negative numbers makes a positive!), and sum those results:

$$ [(-3)(-5)] + [(0)(0)] + [(3)(5)] = 30 $$

How does the Sum of Products relate to the scatterplot?

correlation-sp-regions.png

The Sum of Products calculation and the location of the data points in our scatterplot are intrinsically related.

Notice that the Sum of Products is positive for our data. When the Sum of Products (the numerator of our correlation coefficient equation) is positive, the correlation coefficient r will be positive, since the denominator—a square root—will always be positive. We know that a positive correlation means that increases in one variable are associated with increases in the other (like our Ice Cream Sales and Temperature example), and on a scatterplot, the data points angle upwards from left to right. But how does the Sum of Products capture this?

  • The only way we will get a positive value for the Sum of Products is if the products we are summing tend to be positive.
  • The only way to get a positive value for each of the products is if both values are negative or both values are positive.
  • The only way to get a pair of two negative numbers is if both values are below their means (on the bottom left side of the scatter plot), and the only way to get a pair of two positive numbers is if both values are above their means (on the top right side of the scatter plot).

So, the Sum of Products tells us whether data tend to appear in the bottom left and top right of the scatter plot (a positive correlation), or alternatively, if the data tend to appear in the top left and bottom right of the scatter plot (a negative correlation).

4. Complete the bottom of the coefficient equation

The denominator of our correlation coefficient equation looks like this:

$$ \sqrt{\mathrm{\Sigma}{(x_i\ -\ \overline{x})}^2\ \ast\ \mathrm{\Sigma}(y_i\ -\overline{y})^2} $$

Let's tackle the expressions in this equation separately and drop in the numbers from our Ice Cream Sales example:

$$ \mathrm{\Sigma}{(x_i\ -\ \overline{x})}^2=-3^2+0^2+3^2=9+0+9=18 $$

$$ \mathrm{\Sigma}{(y_i\ -\ \overline{y})}^2=-5^2+0^2+5^2=25+0+25=50 $$

When we multiply the result of the two expressions together, we get:

$$ 18\times50\ =\ 900 $$

This brings the bottom of the equation to:

$$ \sqrt{900}=30 $$

5. Finish the calculation, and compare our result with the scatterplot

Here's our full correlation coefficient equation once again:

Let's pull in the numbers for the numerator and denominator that we calculated above:

$$ r=\frac{30}{30}=1 $$

A perfect correlation between ice cream sales and hot summer days! Of course, finding a perfect correlation is so unlikely in the real world that had we been working with real data, we’d assume we had done something wrong to obtain such a result.

But this result from the simplified data in our example should make intuitive sense based on simply looking at the data points. Let's look again at our scatterplot:

hypothesis of correlation coefficient

Now imagine drawing a line through that scatterplot. Would it look like a perfect linear fit?

hypothesis of correlation coefficient

A picture can be worth 1,000 correlation coefficients!

Scatterplots, and other data visualizations, are useful tools throughout the whole statistical process, not just before we perform our hypothesis tests.

In fact, it’s important to remember that relying exclusively on the correlation coefficient can be misleading—particularly in situations involving curvilinear relationships or extreme outliers. In the scatterplots below, we are reminded that a correlation coefficient of zero or near zero does not necessarily mean that there is no relationship between the variables; it simply means that there is no linear relationship.

Similarly, looking at a scatterplot can provide insights on how outliers—unusual observations in our data—can skew the correlation coefficient. Let’s look at an example with one extreme outlier. The correlation coefficient indicates that there is a relatively strong positive relationship between X and Y. But when the outlier is removed, the correlation coefficient is near zero.

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1. Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1 . Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Module 12: Linear Regression and Correlation

Testing the significance of the correlation coefficient, learning outcomes.

  • Calculate and interpret the correlation coefficient

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is “close to zero” or “significantly different from zero”. We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.” Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero. What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is “not significant.”

Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.” What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.

  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

Method 1: Using a p -value to make a decision

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the p -value.)

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\frac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of is significant or not. Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is  significant . Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

Concept Review

Linear regression is a procedure for fitting a straight line of the form [latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex] to data. The conditions for regression are:

  • Linear: In the population, there is a linear relationship that models the average value of y for different values of x .
  • Independent: The residuals are assumed to be independent.
  • Normal: The y values are distributed normally for any value of x .
  • Equal variance: The standard deviation of the y values is equal for each x value.
  • Random: The data are produced from a well-designed random sample or randomized experiment.

The slope  b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. To estimate the population standard deviation of y , σ , use the standard deviation of the residuals, s .

[latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex] The variable ρ (rho) is the population correlation coefficient.

To test the null hypothesis  H 0 : ρ = hypothesized value , use a linear regression t-test. The most common null hypothesis is H 0 : ρ = 0 which indicates there is no linear relationship between x and y in the population.

The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit: [latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex]

where  a = y -intercept,  b = slope

Standard deviation of the residuals:

[latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex]

SSE = sum of squared errors

n = the number of data points

  • OpenStax, Statistics, Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : http://cnx.org/contents/[email protected]:83/Introductory_Statistics . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Minitab quick guide.

Minitab 18

Minitab ®

Access Minitab Web , using Google Chrome  .

Click on the section to view the Minitab procedures.

After saving the Minitab File to your computer or cloud location, you must first open Minitab .

  • To open a Minitab project (.mpx file): File > Open > Project
  • To open a data file (.mtw, .csv or .xlsx): File > Open > Worksheet

Descriptive, graphical

  • Bar Chart : Graph > Bar Chart > Counts of unique values > One Variable
  • Pie Chart : Graph > Pie Chart > Counts of unique values > Select Options > Under Label Slices With choose Percent

Descriptive, numerical

  • Frequency Tables : Stat > Tables > Tally Individual Variables

Inference (one proportion)

Hypothesis Test

  • With raw data : Stat > Basic Statistics > 1 Proportion > Select variable > Check Perform hypothesis test and enter null value > Select Options tab > Choose correct alternative > Under method , choose Normal approximation
  • With summarized data : Stat > Basic Statistics > 1 Proportion > Choose Summarized data from the dropdown menu > Enter data > Check Perform hypothesis test and enter null value > Select Options tab > Choose correct alternative > Under method, choose Normal approximation

Confidence Interval

  • With raw data : Stat > Basic Statistics > 1 Proportion > Select variable > Select Options tab > Enter correct confidence level, make sure the alternative is set as not-equal, and choose Normal approximation method
  • Histogram : Graph > Histogram > Simple
  • Dotplot : Graph > Dotplot > One Y, Simple
  • Boxplot : Graph > Boxplot > One Y, Simple
  • Mean, Std. Dev., 5-number Summary, etc .: Stat > Basic Statistics > Display Descriptive Statistics > Select Statistics tab to choose exactly what you want to display

Inference (one mean)

  • With raw data : Stat > Basic Statistics > 1-Sample t > Select variable > Check Perform hypothesis test and enter null value > Select Options tab > Choose the correct alternative
  • With summarized data : Stat > Basic Statistics > 1-Sample t > Select Summarized data from the dropdown menu > Enter data (n, x-bar, s) > Check Perform hypothesis test and enter null value > Select Options tab > Choose correct alternative
  • With raw data : Stat > Basic Statistics > 1-Sample t > Select variable > Select Options tab > Enter correct confidence level and make sure the alternative is set as not-equal
  • With summarized data : Stat > Basic Statistics > 1-Sample t > Select Summarized data from the dropdown menu > Enter data (n, x-bar, s) > Select Options tab > Enter correct confidence level and make sure the alternative is set as not-equal
  • Side-by-side Histograms : Graph > Histogram > Under One Y Variable , select Groups Displayed Separately > Enter the categorical variable under Group Variables > Choose In separate panels of one graph under Display Groups
  • Side-by-side Dotplots : Graph > Dotplot > One Y Variable , Groups Displayed on the Scale
  • Side-by-side Boxplots : Graph > Boxplot > One Y, With Categorical Variables
  • Mean, Std. Dev., 5-number Summary, etc .: Stat > Basic Statistics > Display Descriptive Statistics > Select variables (enter the categorical variable under By variables ) > Select Statistics tab to choose exactly what you want to display

Inference (independent samples)

  • With raw data : Stat > Basic Statistics > 2-Sample t > Select variables (response/quantitative as Samples and explanatory/categorical as Sample IDs ) > Select Options tab > Choose correct alternative
  • With summarized data : Stat > Basic Statistics > 2-Sample t > Select Summarized data from the dropdown menu > Enter data > Select Options tab > Choose correct alternative
  • Same as above, choose confidence level and make sure the alternative is set as not-equal

Inference (paired difference)

  • Stat > Basic Statistics > Paired t > Enter correct columns in Sample 1 and Sample 2 boxes > Select Options tab > Choose correct alternative
  • Scatterplot : Graph > Scatterplot > Simple > Enter the response variable under Y variables and the explanatory variable under X variables
  • Fitted Line Plot : Stat > Regression > Fitted Line Plot > Enter the response variable under Response (y) and the explanatory variable under Predictor (x)
  • Correlation : Stat > Basic Statistics > Correlation > Select Graphs tab > Click Statistics to display on plot and select Correlations
  • Correlation : Stat > Basic Statistics > Correlation > Select Graphs tab > Click Statistics to display on plot and select Correlations and p-values
  • Regression Line : Stat > Regression > Regression > Fit Regression Model > Enter the response variable under Responses and the explanatory variable under Continuous predictors > Select Results tab > Click Display of results and select Basic tables ( Note : if you want the confidence interval for the population slope, change “display of results” to “expanded table.” With the expanded table, you will get a lot of information on the output that you will not understand.)
  • Side-by-side Bar Charts with raw data : Graph > Bar Chart > Counts of unique values > Multiple Variables
  • Side-by-side Bar Charts with a two-way table : Graph > Bar Chart > Summarized Data in a Table > Under Two-Way Table choose Clustered or Stacked > Enter the columns that contain the data under Y-variables and enter the column that contains your row labels under Row labels
  • Two-way Table : Stat > Tables > Cross Tabulation and Chi-square

Inference (difference in proportions)

  • Using a dataset : Stat > Basic Statistics > 2 Proportions > Select variables (enter response variable as Samples and explanatory variable as Sample IDs ) > Select Options tab > Choose correct alternative
  • Using a summary table : Stat > Basic Statistics > 2 Proportions > Select Summarized data from the dropdown menu > Enter data > Select Options tab > Choose correct alternative
  • Same as above, choose confidence level and make sure the alternative is set as not equal

Inference (Chi-squared test of association)

  • Stat > Tables > Chi-Square Test for Association > Choose correct data option (raw or summarized) > Select variables > Select Statistics tab to choose the statistics you want to display
  • Fit multiple regression model : Stat > Regression > Regression > Fit Regression Model > Enter the response variable under Responses , the quantitative explanatory variables under Continuous predictors , and any categorical explanatory variables under Categorical predictors > Select Results tab > Click Display of results and select Basic tables ( Note : if you want the confidence intervals for the coefficients, change display of results to expanded table . You will get a lot of information on the output that you will not understand.)
  • Make a prediction or prediction interval using a fitted model : Stat > Regression > Regression > Predict > Enter values for each explanatory variable

IMAGES

  1. How to Calculate the Coefficient of Correlation

    hypothesis of correlation coefficient

  2. Pearson's correlation coefficient: a beginner's guide

    hypothesis of correlation coefficient

  3. PPT

    hypothesis of correlation coefficient

  4. Hypothesis testing for correlation coefficient r

    hypothesis of correlation coefficient

  5. Correlation Coefficient

    hypothesis of correlation coefficient

  6. T-Test for a Correlation Coefficient and the Coefficient of Determination

    hypothesis of correlation coefficient

VIDEO

  1. Pearson Test for Correlation

  2. Testing of hypothesis about correlation coefficient

  3. Correlation coefficient hypothesis testing

  4. Correlation Analysis/ Testing hypothesis in r

  5. Excel Statistics, Correlation & Regression Analysis, Hypothesis Testing Correlation Coefficient ✏️📐

  6. Correlation (Parametric and Non-parametric) with hypothesis in SPSS in Bangla

COMMENTS

  1. 11.2: Correlation Hypothesis Test

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

  2. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  3. Pearson Correlation Coefficient (r)

    The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. ... Example: Deciding whether to reject the null hypothesis For the correlation between weight and height in a sample of 10 ...

  4. 12.4 Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the correlation ...

  5. Correlation Coefficient

    What does a correlation coefficient tell you? Correlation coefficients summarize data and help you compare results between studies.. Summarizing data. A correlation coefficient is a descriptive statistic.That means that it summarizes sample data without letting you infer anything about the population. A correlation coefficient is a bivariate statistic when it summarizes the relationship ...

  6. 9.4.1

    Next, we need to find the p-value. The p-value for the two-sided test is: \ (\text {p-value}=2P (T>5.1556)<0.0001\) Therefore, for any reasonable \ (\alpha\) level, we can reject the hypothesis that the population correlation coefficient is 0 and conclude that it is nonzero. There is evidence at the 5% level that Height and Weight are linearly ...

  7. Conducting a Hypothesis Test for the Population Correlation Coefficient

    We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient ρ. First, we specify the null and alternative hypotheses: Null hypothesis H0: ρ = 0. Alternative hypothesis HA: ρ ≠ 0 or HA: ρ < 0 or HA: ρ > 0. Second, we calculate the value of the test statistic using the following ...

  8. Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

  9. Correlation coefficient review (article)

    The correlation coefficient r measures the direction and strength of a linear relationship. Calculating r is pretty complex, so we usually rely on technology for the computations. We focus on understanding what r says about a scatterplot. Here are some facts about r : It always has a value between − 1. ‍.

  10. Hypothesis Test for Correlation

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero.". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we ...

  11. Interpreting Correlation Coefficients

    Pearson's correlation coefficient is represented by the Greek letter rho ( ρ) for the population parameter and r for a sample statistic. This correlation coefficient is a single number that measures both the strength and direction of the linear relationship between two continuous variables. Values can range from -1 to +1.

  12. 13.2 Testing the Significance of the Correlation Coefficient

    Alternate Hypothesis H a: The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population. Drawing a Conclusion There are two methods of making the decision concerning the hypothesis.

  13. Correlation Coefficient

    The correlation coefficient r is a unit-free value between -1 and 1. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r = and p = . The closer r is to zero, the weaker the linear relationship.; Positive r values indicate a positive correlation, where the values of both variables tend to increase together.

  14. Correlation: Meaning, Types, Examples & Coefficient

    The correlation coefficient (r) indicates the extent to which the pairs of numbers for these two variables lie on a straight line.Values over zero indicate a positive correlation, while values under zero indicate a negative correlation. A correlation of -1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down.

  15. Hypothesis Testing: Correlations

    We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The hypothesis test lets us decide whether the value of the population correlation coefficient. \rho ρ.

  16. Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the ...

  17. 2.5.2 Hypothesis Testing for Correlation

    You should be familiar with using a hypothesis test to determine bias within probability problems. It is also possible to use a hypothesis test to determine whether a given product moment correlation coefficient calculated from a sample could be representative of the same relationship existing within the whole population. For full information on hypothesis testing, see the revision notes from ...

  18. Correlation Coefficient: Simple Definition, Formula, Easy Steps

    Correlation Coefficient Hypothesis Test. If you can read a table — you can test for correlation coefficient. Note that correlations should only be calculated for an entire range of data. If you restrict the range, r will be weakened. Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical values for ...

  19. 1.6

    1.6 - (Pearson) Correlation Coefficient, r. The correlation coefficient, r, is directly related to the coefficient of determination R 2 in an obvious way. If R 2 is represented in decimal form, e.g. 0.39 or 0.87, then all we have to do to obtain r is to take the square root of R 2: r = ± R 2. The sign of r depends on the sign of the estimated ...

  20. Test Correlation Coefficient Reliability in BI

    1 Test Significance. To confirm the reliability of a correlation coefficient, start by testing its statistical significance. This involves determining whether the observed correlation could be due ...

  21. Minitab Quick Guide

    Hypothesis Test. With raw data: Stat > Basic Statistics > 1-Sample t > Select variable > Check Perform hypothesis test and enter null value > Select Options tab > Choose the correct alternative; With summarized data: Stat > Basic Statistics > 1-Sample t > Select Summarized data from the dropdown menu > Enter data (n, x-bar, s) > Check Perform hypothesis test and enter null value > Select ...