Teach yourself statistics

Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y .

The test focuses on the slope of the regression line

Y = Β 0 + Β 1 X

where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

  • The dependent variable Y has a linear relationship to the independent variable X .
  • For each value of X, the probability distribution of Y has the same standard deviation σ.
  • The Y values are independent.
  • The Y values are roughly normally distributed (i.e., symmetric and unimodal ). A little skewness is ok if the sample size is large.

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

If there is a significant linear relationship between the independent variable X and the dependent variable Y , the slope will not equal zero.

H o : Β 1 = 0

H a : Β 1 ≠ 0

The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero.

Analyze Sample Data

Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.

SE = s b 1 = sqrt [ Σ(y i - ŷ i ) 2 / (n - 2) ] / sqrt [ Σ(x i - x ) 2 ]

  • Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35.

t = b 1 / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance.

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

H o : The slope of the regression line is equal to zero.

H a : The slope of the regression line is not equal to zero.

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero.

We get the slope (b 1 ) and the standard error (SE) from the regression output.

b 1 = 0.55       SE = 0.24

We compute the degrees of freedom and the t statistic, using the following equations.

DF = n - 2 = 101 - 2 = 99

t = b 1 /SE = 0.55/0.24 = 2.29

where DF is the degrees of freedom, n is the number of observations in the sample, b 1 is the slope of the regression line, and SE is the standard error of the slope.

  • Interpret results . Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13.6 Testing the Regression Coefficients

Learning objectives.

  • Conduct and interpret a hypothesis test on individual regression coefficients.

Previously, we learned that the population model for the multiple regression equation is

[latex]\begin{eqnarray*} y & = & \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_kx_k +\epsilon \end{eqnarray*}[/latex]

where [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex] are the population parameters of the regression coefficients, and [latex]\epsilon[/latex] is the error variable.  In multiple regression, we estimate each population regression coefficient [latex]\beta_i[/latex] with the sample regression coefficient [latex]b_i[/latex].

In the previous section, we learned how to conduct an overall model test to determine if the regression model is valid.  If the outcome of the overall model test is that the model is valid, then at least one of the independent variables is related to the dependent variable—in other words, at least one of the regression coefficients [latex]\beta_i[/latex] is not zero.  However, the overall model test does not tell us which independent variables are related to the dependent variable.  To determine which independent variables are related to the dependent variable, we must test each of the regression coefficients.

Testing the Regression Coefficients

For an individual regression coefficient, we want to test if there is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].

  • No Relationship .  There is no relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficient [latex]\beta_i[/latex] is zero.  This is the claim for the null hypothesis in an individual regression coefficient test:  [latex]H_0: \beta_i=0[/latex].
  • Relationship.  There is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].  In this case, the regression coefficients [latex]\beta_i[/latex] is not zero.  This is the claim for the alternative hypothesis in an individual regression coefficient test:  [latex]H_a: \beta_i \neq 0[/latex].  We are not interested if the regression coefficient [latex]\beta_i[/latex] is positive or negative, only that it is not zero.  We only need to find out if the regression coefficient is not zero to demonstrate that there is a relationship between the dependent variable and the independent variable. This makes the test on a regression coefficient a two-tailed test.

In order to conduct a hypothesis test on an individual regression coefficient [latex]\beta_i[/latex], we need to use the distribution of the sample regression coefficient [latex]b_i[/latex]:

  • The mean of the distribution of the sample regression coefficient is the population regression coefficient [latex]\beta_i[/latex].
  • The standard deviation of the distribution of the sample regression coefficient is [latex]\sigma_{b_i}[/latex].  Because we do not know the population standard deviation we must estimate [latex]\sigma_{b_i}[/latex] with the sample standard deviation [latex]s_{b_i}[/latex].
  • The distribution of the sample regression coefficient follows a normal distribution.

Steps to Conduct a Hypothesis Test on a Regression Coefficient

[latex]\begin{eqnarray*} H_0: &  &  \beta_i=0 \\ \\ \end{eqnarray*}[/latex]

[latex]\begin{eqnarray*} H_a: &  & \beta_i \neq 0 \\ \\ \end{eqnarray*}[/latex]

  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}t & = & \frac{b_i-\beta_i}{s_{b_i}} \\ \\ df &  = & n-k-1 \\  \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

The required [latex]t[/latex]-score and p -value for the test can be found on the regression summary table, which we learned how to generate in Excel in a previous section.

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week”.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \beta_1=0 \\   H_a: & & \beta_1 \neq 0 \end{eqnarray*}[/latex]

The regression summary table generated by Excel is shown below:

The  p -value for the test on the hours of unpaid work per week regression coefficient is in the bottom part of the table under the P-value column of the Hours of Unpaid Work per Week row .  So the  p -value=[latex]0.0082[/latex].

Conclusion:  

Because p -value[latex]=0.0082 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”

  • The null hypothesis [latex]\beta_1=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 1 because the “hours of unpaid work per week” is defined as [latex]x_1[/latex] in the regression model.
  • The p -value for the tests on the regression coefficients are located in the bottom part of the table under the P-value column heading in the corresponding independent variable row. 
  • Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  This is the value calculated out by Excel in the regression summary table.
  • The p -value of 0.0082 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_1[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”  This means that the independent variable “hours of unpaid work per week” is useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “age”.

[latex]\begin{eqnarray*} H_0: & & \beta_2=0 \\   H_a: & & \beta_2 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the age regression coefficient is in the bottom part of the table under the P-value column of the Age row .  So the  p -value=[latex]0.8439[/latex].

Because p -value[latex]=0.8439 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “age.”

  • The null hypothesis [latex]\beta_2=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “age.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “age.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 2 because “age” is defined as [latex]x_2[/latex] in the regression model.
  • The p -value of 0.8439 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the regression coefficient [latex]\beta_2[/latex] is zero, and so there is no relationship between the dependent variable “job satisfaction” and the independent variable “age.”  This means that the independent variable “age” is not particularly useful in predicting the dependent variable.

At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “income”.

[latex]\begin{eqnarray*} H_0: & & \beta_3=0 \\   H_a: & & \beta_3 \neq 0 \end{eqnarray*}[/latex]

The  p -value for the test on the income regression coefficient is in the bottom part of the table under the P-value column of the Income row .  So the  p -value=[latex]0.0060[/latex].

Because p -value[latex]=0.0060 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”

  • The null hypothesis [latex]\beta_3=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is zero.  That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “income.”
  • The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is not zero.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “income.”
  • When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested.  Here the subscript on [latex]\beta[/latex] is 3 because “income” is defined as [latex]x_3[/latex] in the regression model.
  • The p -value of 0.0060 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the regression coefficient [latex]\beta_3[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”  This means that the independent variable “income” is useful in predicting the dependent variable.

Concept Review

The test on a regression coefficient determines if there is a relationship between the dependent variable and the corresponding independent variable.  The p -value for the test is the sum of the area in tails of the [latex]t[/latex]-distribution.  The p -value can be found on the regression summary table generated by Excel.

The hypothesis test for a regression coefficient is a well established process:

  • Write down the null and alternative hypotheses in terms of the regression coefficient being tested.  The null hypothesis is the claim that there is no relationship between the dependent variable and independent variable.  The alternative hypothesis is the claim that there is a relationship between the dependent variable and independent variable.
  • Collect the sample information for the test and identify the significance level.
  • The p -value is the sum of the area in the tails of the [latex]t[/latex]-distribution.  Use the regression summary table generated by Excel to find the p -value.
  • Compare the  p -value to the significance level and state the outcome of the test.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

  • 3.6 - Further SLR Evaluation Examples

Example 1: Are Sprinters Getting Faster?

The following data set ( mens200m.txt ) contains the winning times (in seconds) of the 22 men's 200 meter olympic sprints held between 1900 and 1996. (Notice that the Olympics were not held during the World War I and II years.) Is there a linear relationship between year and the winning times? The plot of the estimated regression line sure makes it look so!

To answer the research question, let's conduct the formal F -test of the null hypothesis H 0 : β 1 = 0 against the alternative hypothesis H A : β 1 ≠ 0.

The analysis of variance table above has been animated to allow you to interact with the table. As you roll your mouse over the blue numbers , you are reminded of how those numbers are determined.

From a scientific point of view, what we ultimately care about is the P -value, which is 0.000 (to three decimal places). That is, the P -value is less than 0.001. The P -value is very small. It is unlikely that we would have obtained such a large F* statistic if the null hypothesis were true. Therefore, we reject the null hypothesis H 0 : β 1 = 0 in favor of the alternative hypothesis H A : β 1 ≠ 0. There is sufficient evidence at the α = 0.05 level to conclude that there is a linear relationship between year and winning time.

Equivalence of the analysis of variance F -test and the t -test

As we noted in the first two examples, the P -value associated with the t -test is the same as the P -value associated with the analysis of variance F -test. This will always be true for the simple linear regression model. It is illustrated in the year and winning time example also. Both P -values are 0.000 (to three decimal places):

The P -values are the same because of a well-known relationship between a t random variable and an F random variable that has 1 numerator degree of freedom. Namely:

\[(t^{*}_{(n-2)})^2=F^{*}_{(1,n-2)}\]

This will always hold for the simple linear regression model. This relationship is demonstrated in this example as:

(-13.33) 2 = 177.7

  • For a given significance level α, the F -test of β 1 = 0 versus β 1 ≠ 0 is algebraically equivalent to the two-tailed t -test.
  • If one test rejects H 0 , then so will the other.
  • If one test does not reject H 0 , then so will the other.

The natural question then is ... when should we use the F -test and when should we use the t -test?

  • The F -test is only appropriate for testing that the slope differs from 0 ( β 1 ≠ 0).
  • Use the t -test to test that the slope is positive ( β 1 > 0) or negative ( β 1 < 0). Remember, though, that you will have to divide the reported two-tail P -value by 2 to get the appropriate one-tailed P -value.

The F -test is more useful for the multiple regression model when we want to test that more than one slope parameter is 0. We'll learn more about this later in the course!

Example 2: Highway Sign Reading Distance and Driver Age

The data are n = 30 observations on driver age and the maximum distance (feet) at which individuals can read a highway sign ( signdist.txt ). (Data source: Mind On Statistics , 3rd edition, Utts and Heckard).

The plot below gives a scatterplot of the highway sign data along with the least squares regression line. 

scatterplot

Here is the accompanying regression output:

Minitab output

Hypothesis Test for the Intercept ( β 0 )

This test is rarely a test of interest, but does show up when one is interested in performing a regression through the origin (which we touched on earlier in this lesson). In the software output above, the row labeled Constant gives the information used to make inferences about the intercept. The null and alternative hypotheses for a hypotheses test about the intercept are written as:

H 0 : β 0 = 0 H A : β 0 ≠ 0.

In other words, the null hypothesis is testing if the population intercept is equal to 0 versus the alternative hypothesis that the population intercept is not equal to 0. In most problems, we are not particularly interested in hypotheses about the intercept. For instance, in our example, the intercept is the mean distance when the age is 0, a meaningless age. Also, the intercept does not give information about how the value of y changes when the value of x changes. Nevertheless, to test whether the population intercept is 0, the information from the software output would be used as follows:

  • The sample intercept is b 0 = 576.68, the value under Coef .
  • The standard error (SE) of the sample intercept, written as se( b 0 ), is se( b 0 ) = 23.47, the value under SE Coef. The SE of any statistic is a measure of its accuracy. In this case, the SE of b 0 gives, very roughly, the average difference between the sample b 0 and the true population intercept β 0 , for random samples of this size (and with these x -values).
  • The test statistic is t = b 0 /se( b 0 ) = 576.68/23.47 = 24.57, the value under T.
  • The p -value for the test is p = 0.000 and is given under P. The p -value is actually very small and not exactly 0.
  • The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05. Thus, we conclude that there is statistically significant evidence that the population intercept is not equal to 0.

So how exactly is the p -value found? For simple regression, the p -value is determined using a t distribution with n − 2 degrees of freedom ( df ), which is written as t n −2 , and is calculated as 2 × area past | t | under a t n −2 curve. In this example, df = 30 − 2 = 28. The p -value region is the type of region shown in the figure below. The negative and positive versions of the calculated t provide the interior boundaries of the two shaded regions. As the value of t increases, the p -value (area in the shaded regions) decreases.

Hypothesis Test for the Slope ( β 1 )

This test can be used to test whether or not x and y are linearly related. The row pertaining to the variable Age in the software output from earlier gives information used to make inferences about the slope. The slope directly tells us about the link between the mean y and x . When the true population slope does not equal 0, the variables y and x are linearly related. When the slope is 0, there is not a linear relationship because the mean y does not change when the value of x is  changed. The null and alternative hypotheses for a hypotheses test about the slope are written as:

H 0 : β 1 = 0 H A : β 1 ≠ 0.

In other words, the null hypothesis is testing if the population slope is equal to 0 versus the alternative hypothesis that the population slope is not equal to 0. To test whether the population slope is 0, the information from the software output is used as follows:

  • The sample slope is b 1 = −3.0068, the value under Coef in the Age row of the output.
  • The SE of the sample slope, written as se( b 1 ), is se( b 1 ) = 0.4243, the value under SE Coef . Again, the SE of any statistic is a measure of its accuracy. In this case, the SE of b1 gives, very roughly, the average difference between the sample b 1 and the true population slope β 1 , for random samples of this size (and with these x -values).
  • The test statistic is t = b 1 /se( b 1 ) = −3.0068/0.4243 = −7.09, the value under T.
  • The p -value for the test is p = 0.000 and is given under P.
  • The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05. Thus, we conclude that there is statistically significant evidence that the variables of Distance and Age are linearly related.

As before, the p -value is the region illustrated in the figure above.

Confidence Interval for the Slope ( β 1 )

A confidence interval for the unknown value of the population slope β 1 can be computed as

sample statistic ± multiplier × standard error of statistic

→ b 1 ± t * × se( b 1 ).

In simple regression, the t * multiplier is determined using a t n −2 distribution. The value of t * is such that the confidence level is the area (probability) between − t * and + t * under the t -curve. To find the t * multiplier, you can do one of the following:

  • A table such as the one in the textbook can be used to look up the multiplier.
  • Alternatively, software like Minitab can be used.

95% Confidence Interval

In our example, n = 30 and df = n − 2 = 28. For 95% confidence, t * = 2.05. A 95% confidence interval for β 1 , the true population slope, is:

−3.0068 ± (2.05 × 0.4243) −3.0068 ± 0.870 or about − 3.88 to − 2.14.

Interpretation: With 95% confidence, we can say the mean sign reading distance decreases somewhere between 2.14 and 3.88 feet per each one-year increase in age. It is incorrect to say that with 95% probability the mean sign reading distance decreases somewhere between 2.14 and 3.88 feet per each one-year increase in age. Make sure you understand why!!!

99% Confidence Interval

For 99% confidence, t * = 2.76. A 99% confidence interval for β 1 , the true population slope is:

−3.0068 ± (2.76 × 0.4243) −3.0068 ± 1.1711 or about − 4.18 to − 1.84.

Interpretation: With 99% confidence, we can say the mean sign reading distance decreases somewhere between 1.84 and 4.18 feet per each one-year increase in age. Notice that as we increase our confidence, the interval becomes wider. So as we approach 100% confidence, our interval grows to become the whole real line.

As a final note, the above procedures can be used to calculate a confidence interval for the population intercept. Just use b 0 (and its standard error) rather than b 1 .

Example 3: Handspans Data

Stretched handspans and heights are measured in centimeters for n = 167 college students ( handheight.txt ). We’ll use y = height and x = stretched handspan. A scatterplot with a regression line superimposed is given below, together with results of a simple linear regression model fit to the data.

fitted line plot

Some things to note are:

  • The residual standard deviation S is 2.744 and this estimates the standard deviation of the errors.
  • r 2 = (SSTO-SSE) / SSTO = SSR / (SSR+SSE) = 1500.1 / (1500.1+1242.7) = 1500.1 / 2742.8 = 0.547 or 54.7%. The interpretation is that handspan differences explain 54.7% of the variation in heights.
  • The value of the F statistic is F = 199.2 with 1 and 165 degrees of freedom, and the p -value for this F statistic is 0.000. Thus we reject the null hypothesis H 0 : β 1 = 0 because the p -value is so small. In other words, the observed relationship is statistically significant.

Start Here!

  • Welcome to STAT 462!
  • Search Course Materials
  • Lesson 1: Statistical Inference Foundations
  • Lesson 2: Simple Linear Regression (SLR) Model
  • 3.1 - Inference for the Population Intercept and Slope
  • 3.2 - Another Example of Slope Inference
  • 3.3 - Sums of Squares
  • 3.4 - Analysis of Variance: The Basic Idea
  • 3.5 - The Analysis of Variance (ANOVA) table and the F-test
  • 3.7 - Decomposing The Error When There Are Replicates
  • 3.8 - The Lack of Fit F-test When There Are Replicates
  • Lesson 4: SLR Assumptions, Estimation & Prediction
  • Lesson 5: Multiple Linear Regression (MLR) Model & Evaluation
  • Lesson 6: MLR Assumptions, Estimation & Prediction
  • Lesson 7: Transformations & Interactions
  • Lesson 8: Categorical Predictors
  • Lesson 9: Influential Points
  • Lesson 10: Regression Pitfalls
  • Lesson 11: Model Building
  • Lesson 12: Logistic, Poisson & Nonlinear Regression
  • Website for Applied Regression Modeling, 2nd edition
  • Notation Used in this Course
  • R Software Help
  • Minitab Software Help

Penn State Science

Copyright © 2018 The Pennsylvania State University Privacy and Legal Statements Contact the Department of Statistics Online Programs

logo

Simple linear regression

Simple linear regression #.

Fig. 9 Simple linear regression #

Errors: \(\varepsilon_i \sim N(0,\sigma^2)\quad \text{i.i.d.}\)

Fit: the estimates \(\hat\beta_0\) and \(\hat\beta_1\) are chosen to minimize the (training) residual sum of squares (RSS):

Sample code: advertising data #

Estimates \(\hat\beta_0\) and \(\hat\beta_1\) #.

A little calculus shows that the minimizers of the RSS are:

Assessing the accuracy of \(\hat \beta_0\) and \(\hat\beta_1\) #

Fig. 10 How variable is the regression line? #

Based on our model #

The Standard Errors for the parameters are:

95% confidence intervals:

Hypothesis test #

Null hypothesis \(H_0\) : There is no relationship between \(X\) and \(Y\) .

Alternative hypothesis \(H_a\) : There is some relationship between \(X\) and \(Y\) .

Based on our model: this translates to

\(H_0\) : \(\beta_1=0\) .

\(H_a\) : \(\beta_1\neq 0\) .

Test statistic:

Under the null hypothesis, this has a \(t\) -distribution with \(n-2\) degrees of freedom.

Sample output: advertising data #

Interpreting the hypothesis test #.

If we reject the null hypothesis, can we assume there is an exact linear relationship?

No. A quadratic relationship may be a better fit, for example. This test assumes the simple linear regression model is correct which precludes a quadratic relationship.

If we don’t reject the null hypothesis, can we assume there is no relationship between \(X\) and \(Y\) ?

No. This test is based on the model we posited above and is only powerful against certain monotone alternatives. There could be more complex non-linear relationships.

IMAGES

  1. PPT

    regression line null and alternative hypothesis

  2. Null hypothesis for multiple linear regression

    regression line null and alternative hypothesis

  3. Null Hypothesis for Multiple Regression

    regression line null and alternative hypothesis

  4. Hypothesis Testing Null And Alternative at Peter Edwards blog

    regression line null and alternative hypothesis

  5. PPT

    regression line null and alternative hypothesis

  6. Determining the Null and Alternative Hypotheses

    regression line null and alternative hypothesis

VIDEO

  1. Simple Linear Regression, hypothesis tests

  2. Difference between Null Hypothesis and Alternative Hypothesis

  3. Linear regression

  4. Hypothesis Testing

  5. Null Hypothesis for Linear Regression in Excel

  6. Hypothesis Testing and The Null Hypothesis, Clearly Explained!!!

COMMENTS

  1. Understanding the Null Hypothesis for Linear Regression

    Simple linear regression uses the following null and alternative hypotheses: H 0: β 1 = 0; H A: β 1 ≠ 0; The null hypothesis states that the coefficient β 1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

  2. 12.2.1: Hypothesis Test for Linear Regression - Statistics ...

    The null hypothesis of a two-tailed test states that there is not a linear relationship between \(x\) and \(y\). The alternative hypothesis of a two-tailed test states that there is a significant linear relationship between \(x\) and \(y\).

  3. Understanding the t-Test in Linear Regression - Statology

    We use the following null and alternative hypothesis for this t-test: H 0 : β 1 = 0 (the slope for hours studied is equal to zero) H A : β 1 ≠ 0 (the slope for hours studied is not equal to zero)

  4. How to Test the Significance of a Regression Slope - Statology

    ŷ = b0 + b1x. where ŷ is the predicted value of the response variable, b0 is the y-intercept, b1 is the regression coefficient, and x is the value of the predictor variable. The value for b0 is given by the coefficient for the intercept, which is 47588.70.

  5. Hypothesis Test for Regression Slope - stattrek.com

    Hypothesis Test for Regression Slope. This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y. The test focuses on the slope of the regression line. Y = Β 0 + Β 1 X

  6. 13.6 Testing the Regression Coefficients – Introduction to ...

    Write down the null and alternative hypotheses in terms of the regression coefficient being tested. The null hypothesis is the claim that there is no relationship between the dependent variable and independent variable. The alternative hypothesis is the claim that there is a relationship between the dependent variable and independent variable.

  7. 13.11: Example of How to Test a Hypothesis Using Regression

    APA Formatted Summary Example. A simple regression was used to test the hypothesis that hours of sleep would predict quiz scores. Consistent with the hypothesis, hours of sleep was a significant predictor of quiz scores, F(1, 8) = 70.54 F (1, 8) = 70.54, p p < .05. Approximately 89.8% of the variance in quiz scores was accounted for by variance ...

  8. Null and Alternative Hypotheses | Definitions & Examples

    The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There’s no effect in the population. Alternative hypothesis (Ha or H1): There’s an effect in the population.

  9. 3.6 - Further SLR Evaluation Examples | STAT 462

    Therefore, we reject the null hypothesis H0: β1 = 0 in favor of the alternative hypothesis HA: β1 ≠ 0. There is sufficient evidence at the α = 0.05 level to conclude that there is a linear relationship between year and winning time. Equivalence of the analysis of variance F-test and the t-test.

  10. Simple linear regression — STATS 202 - Stanford University

    Hypothesis test. Null hypothesis H 0: There is no relationship between X and Y. Alternative hypothesis H a: There is some relationship between X and Y. Based on our model: this translates to. H 0: β 1 = 0. H a: β 1 ≠ 0. Test statistic: t = β ^ 1 − 0 SE (β ^ 1).