Linear regression - Hypothesis testing

by Marco Taboga , PhD

This lecture discusses how to perform tests of hypotheses about the coefficients of a linear regression model estimated by ordinary least squares (OLS).

Table of contents

Normal vs non-normal model

The linear regression model, matrix notation, tests of hypothesis in the normal linear regression model, test of a restriction on a single coefficient (t test), test of a set of linear restrictions (f test), tests based on maximum likelihood procedures (wald, lagrange multiplier, likelihood ratio), tests of hypothesis when the ols estimator is asymptotically normal, test of a restriction on a single coefficient (z test), test of a set of linear restrictions (chi-square test), learn more about regression analysis.

The lecture is divided in two parts:

in the first part, we discuss hypothesis testing in the normal linear regression model , in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors;

in the second part, we show how to carry out hypothesis tests in linear regression analyses where the hypothesis of normality holds only in large samples (i.e., the OLS estimator can be proved to be asymptotically normal).

How to choose which test to carry out after estimating a linear regression model.

We also denote:

We now explain how to derive tests about the coefficients of the normal linear regression model.

It can be proved (see the lecture about the normal linear regression model ) that the assumption of conditional normality implies that:

How the acceptance region is determined depends not only on the desired size of the test , but also on whether the test is:

one-tailed (only one of the two things, i.e., either smaller or larger, is possible).

For more details on how to determine the acceptance region, see the glossary entry on critical values .

[eq28]

The F test is one-tailed .

A critical value in the right tail of the F distribution is chosen so as to achieve the desired size of the test.

Then, the null hypothesis is rejected if the F statistics is larger than the critical value.

In this section we explain how to perform hypothesis tests about the coefficients of a linear regression model when the OLS estimator is asymptotically normal.

As we have shown in the lecture on the properties of the OLS estimator , in several cases (i.e., under different sets of assumptions) it can be proved that:

These two properties are used to derive the asymptotic distribution of the test statistics used in hypothesis testing.

The test can be either one-tailed or two-tailed . The same comments made for the t-test apply here.

[eq50]

Like the F test, also the Chi-square test is usually one-tailed .

The desired size of the test is achieved by appropriately choosing a critical value in the right tail of the Chi-square distribution.

The null is rejected if the Chi-square statistics is larger than the critical value.

Want to learn more about regression analysis? Here are some suggestions:

R squared of a linear regression ;

Gauss-Markov theorem ;

Generalized Least Squares ;

Multicollinearity ;

Dummy variables ;

Selection of linear regression models

Partitioned regression ;

Ridge regression .

How to cite

Please cite as:

Taboga, Marco (2021). "Linear regression - Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression-hypothesis-testing.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • F distribution
  • Beta distribution
  • Conditional probability
  • Central Limit Theorem
  • Binomial distribution
  • Mean square convergence
  • Delta method
  • Almost sure convergence
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Loss function
  • Almost sure
  • Type I error
  • Precision matrix
  • Integrable variable
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .
  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Linear regression hypothesis testing: Concepts, Examples

Simple linear regression model

In relation to machine learning , linear regression is defined as a predictive modeling technique that allows us to build a model which can help predict continuous response variables as a function of a linear combination of explanatory or predictor variables. While training linear regression models, we need to rely on hypothesis testing in relation to determining the relationship between the response and predictor variables. In the case of the linear regression model, two types of hypothesis testing are done. They are T-tests and F-tests . In other words, there are two types of statistics that are used to assess whether linear regression models exist representing response and predictor variables. They are t-statistics and f-statistics. As data scientists , it is of utmost importance to determine if linear regression is the correct choice of model for our particular problem and this can be done by performing hypothesis testing related to linear regression response and predictor variables. Many times, it is found that these concepts are not very clear with a lot many data scientists. In this blog post, we will discuss linear regression and hypothesis testing related to t-statistics and f-statistics . We will also provide an example to help illustrate how these concepts work.

Table of Contents

What are linear regression models?

A linear regression model can be defined as the function approximation that represents a continuous response variable as a function of one or more predictor variables. While building a linear regression model, the goal is to identify a linear equation that best predicts or models the relationship between the response or dependent variable and one or more predictor or independent variables.

There are two different kinds of linear regression models. They are as follows:

  • Simple or Univariate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and one predictor or independent variable. The form of the equation that represents a simple linear regression model is Y=mX+b, where m is the coefficients of the predictor variable and b is bias. When considering the linear regression line, m represents the slope and b represents the intercept.
  • Multiple or Multi-variate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and more than one predictor or independent variable. The form of the equation that represents a multiple linear regression model is Y=b0+b1X1+ b2X2 + … + bnXn, where bi represents the coefficients of the ith predictor variable. In this type of linear regression model, each predictor variable has its own coefficient that is used to calculate the predicted value of the response variable.

While training linear regression models, the requirement is to determine the coefficients which can result in the best-fitted linear regression line. The learning algorithm used to find the most appropriate coefficients is known as least squares regression . In the least-squares regression method, the coefficients are calculated using the least-squares error function. The main objective of this method is to minimize or reduce the sum of squared residuals between actual and predicted response values. The sum of squared residuals is also called the residual sum of squares (RSS). The outcome of executing the least-squares regression method is coefficients that minimize the linear regression cost function .

The residual e of the ith observation is represented as the following where [latex]Y_i[/latex] is the ith observation and [latex]\hat{Y_i}[/latex] is the prediction for ith observation or the value of response variable for ith observation.

[latex]e_i = Y_i – \hat{Y_i}[/latex]

The residual sum of squares can be represented as the following:

[latex]RSS = e_1^2 + e_2^2 + e_3^2 + … + e_n^2[/latex]

The least-squares method represents the algorithm that minimizes the above term, RSS.

Once the coefficients are determined, can it be claimed that these coefficients are the most appropriate ones for linear regression? The answer is no. After all, the coefficients are only the estimates and thus, there will be standard errors associated with each of the coefficients.  Recall that the standard error is used to calculate the confidence interval in which the mean value of the population parameter would exist. In other words, it represents the error of estimating a population parameter based on the sample data. The value of the standard error is calculated as the standard deviation of the sample divided by the square root of the sample size. The formula below represents the standard error of a mean.

[latex]SE(\mu) = \frac{\sigma}{\sqrt(N)}[/latex]

Thus, without analyzing aspects such as the standard error associated with the coefficients, it cannot be claimed that the linear regression coefficients are the most suitable ones without performing hypothesis testing. This is where hypothesis testing is needed . Before we get into why we need hypothesis testing with the linear regression model, let’s briefly learn about what is hypothesis testing?

Train a Multiple Linear Regression Model using R

Before getting into understanding the hypothesis testing concepts in relation to the linear regression model, let’s train a multi-variate or multiple linear regression model and print the summary output of the model which will be referred to, in the next section. 

The data used for creating a multi-linear regression model is BostonHousing which can be loaded in RStudioby installing mlbench package. The code is shown below:

install.packages(“mlbench”) library(mlbench) data(“BostonHousing”)

Once the data is loaded, the code shown below can be used to create the linear regression model.

attach(BostonHousing) BostonHousing.lm <- lm(log(medv) ~ crim + chas + rad + lstat) summary(BostonHousing.lm)

Executing the above command will result in the creation of a linear regression model with the response variable as medv and predictor variables as crim, chas, rad, and lstat. The following represents the details related to the response and predictor variables:

  • log(medv) : Log of the median value of owner-occupied homes in USD 1000’s
  • crim : Per capita crime rate by town
  • chas : Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • rad : Index of accessibility to radial highways
  • lstat : Percentage of the lower status of the population

The following will be the output of the summary command that prints the details relating to the model including hypothesis testing details for coefficients (t-statistics) and the model as a whole (f-statistics) 

linear regression model summary table r.png

Hypothesis tests & Linear Regression Models

Hypothesis tests are the statistical procedure that is used to test a claim or assumption about the underlying distribution of a population based on the sample data. Here are key steps of doing hypothesis tests with linear regression models:

  • Hypothesis formulation for T-tests: In the case of linear regression, the claim is made that there exists a relationship between response and predictor variables, and the claim is represented using the non-zero value of coefficients of predictor variables in the linear equation or regression model. This is formulated as an alternate hypothesis. Thus, the null hypothesis is set that there is no relationship between response and the predictor variables . Hence, the coefficients related to each of the predictor variables is equal to zero (0). So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis for each test states that a1 = 0, a2 = 0, a3 = 0 etc. For all the predictor variables, individual hypothesis testing is done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. Thus, if there are, say, 5 features, there will be five hypothesis tests and each will have an associated null and alternate hypothesis.
  • Hypothesis formulation for F-test : In addition, there is a hypothesis test done around the claim that there is a linear regression model representing the response variable and all the predictor variables. The null hypothesis is that the linear regression model does not exist . This essentially means that the value of all the coefficients is equal to zero. So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis states that a1 = a2 = a3 = 0.
  • F-statistics for testing hypothesis for linear regression model : F-test is used to test the null hypothesis that a linear regression model does not exist, representing the relationship between the response variable y and the predictor variables x1, x2, x3, x4 and x5. The null hypothesis can also be represented as x1 = x2 = x3 = x4 = x5 = 0. F-statistics is calculated as a function of sum of squares residuals for restricted regression (representing linear regression model with only intercept or bias and all the values of coefficients as zero) and sum of squares residuals for unrestricted regression (representing linear regression model). In the above diagram, note the value of f-statistics as 15.66 against the degrees of freedom as 5 and 194. 
  • Evaluate t-statistics against the critical value/region : After calculating the value of t-statistics for each coefficient, it is now time to make a decision about whether to accept or reject the null hypothesis. In order for this decision to be made, one needs to set a significance level, which is also known as the alpha level. The significance level of 0.05 is usually set for rejecting the null hypothesis or otherwise. If the value of t-statistics fall in the critical region, the null hypothesis is rejected. Or, if the p-value comes out to be less than 0.05, the null hypothesis is rejected.
  • Evaluate f-statistics against the critical value/region : The value of F-statistics and the p-value is evaluated for testing the null hypothesis that the linear regression model representing response and predictor variables does not exist. If the value of f-statistics is more than the critical value at the level of significance as 0.05, the null hypothesis is rejected. This means that the linear model exists with at least one valid coefficients. 
  • Draw conclusions : The final step of hypothesis testing is to draw a conclusion by interpreting the results in terms of the original claim or hypothesis. If the null hypothesis of one or more predictor variables is rejected, it represents the fact that the relationship between the response and the predictor variable is not statistically significant based on the evidence or the sample data we used for training the model. Similarly, if the f-statistics value lies in the critical region and the value of the p-value is less than the alpha value usually set as 0.05, one can say that there exists a linear regression model.

Why hypothesis tests for linear regression models?

The reasons why we need to do hypothesis tests in case of a linear regression model are following:

  • By creating the model, we are establishing a new truth (claims) about the relationship between response or dependent variable with one or more predictor or independent variables. In order to justify the truth, there are needed one or more tests. These tests can be termed as an act of testing the claim (or new truth) or in other words, hypothesis tests.
  • One kind of test is required to test the relationship between response and each of the predictor variables (hence, T-tests)
  • Another kind of test is required to test the linear regression model representation as a whole. This is called F-test.

While training linear regression models, hypothesis testing is done to determine whether the relationship between the response and each of the predictor variables is statistically significant or otherwise. The coefficients related to each of the predictor variables is determined. Then, individual hypothesis tests are done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. If at least one of the null hypotheses is rejected, it represents the fact that there exists no relationship between response and that particular predictor variable. T-statistics is used for performing the hypothesis testing because the standard deviation of the sampling distribution is unknown. The value of t-statistics is compared with the critical value from the t-distribution table in order to make a decision about whether to accept or reject the null hypothesis regarding the relationship between the response and predictor variables. If the value falls in the critical region, then the null hypothesis is rejected which means that there is no relationship between response and that predictor variable. In addition to T-tests, F-test is performed to test the null hypothesis that the linear regression model does not exist and that the value of all the coefficients is zero (0). Learn more about the linear regression and t-test in this blog – Linear regression t-test: formula, example .

Recent Posts

Ajitesh Kumar

  • Pricing Analytics in Banking: Strategies, Examples - May 15, 2024
  • How to Learn Effectively: A Holistic Approach - May 13, 2024
  • How to Choose Right Statistical Tests: Examples - May 13, 2024

Ajitesh Kumar

One response.

Very informative

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Search for:
  • Excellence Awaits: IITs, NITs & IIITs Journey

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • Pricing Analytics in Banking: Strategies, Examples
  • How to Learn Effectively: A Holistic Approach
  • How to Choose Right Statistical Tests: Examples
  • Data Lakehouses Fundamentals & Examples
  • Machine Learning Lifecycle: Data to Deployment Example

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Linear Hypothesis Tests

Most regression output will include the results of frequentist hypothesis tests comparing each coefficient to 0. However, in many cases, you may be interested in whether a linear sum of the coefficients is 0. For example, in the regression

You may be interested to see if \(GoodThing\) and \(BadThing\) (both binary variables) cancel each other out. So you would want to do a test of \(\beta_1 - \beta_2 = 0\).

Alternately, you may want to do a joint significance test of multiple linear hypotheses. For example, you may be interested in whether \(\beta_1\) or \(\beta_2\) are nonzero and so would want to jointly test the hypotheses \(\beta_1 = 0\) and \(\beta_2=0\) rather than doing them one at a time. Note the and here, since if either one or the other is rejected, we reject the null.

Keep in Mind

  • Be sure to carefully interpret the result. If you are doing a joint test, rejection means that at least one of your hypotheses can be rejected, not each of them. And you don’t necessarily know which ones can be rejected!
  • Generally, linear hypothesis tests are performed using F-statistics. However, there are alternate approaches such as likelihood tests or chi-squared tests. Be sure you know which on you’re getting.
  • Conceptually, what is going on with linear hypothesis tests is that they compare the model you’ve estimated against a more restrictive one that requires your restrictions (hypotheses) to be true. If the test you have in mind is too complex for the software to figure out on its own, you might be able to do it on your own by taking the sum of squared residuals in your original unrestricted model (\(SSR_{UR}\)), estimate the alternate model with the restriction in place (\(SSR_R\)) and then calculate the F-statistic for the joint test using \(F_{q,n-k-1} = ((SSR_R - SSR_{UR})/q)/(SSR_{UR}/(n-k-1))\).

Also Consider

  • The process for testing a nonlinear combination of your coefficients, for example testing if \(\beta_1\times\beta_2 = 1\) or \(\sqrt{\beta_1} = .5\), is generally different. See Nonlinear hypothesis tests .

Implementations

Linear hypothesis test in R can be performed for most regression models using the linearHypothesis() function in the car package. See this guide for more information.

Tests of coefficients in Stata can generally be performed using the built-in test command.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

3.6 - the general linear test.

This is just a general representation of an F -test based on a full and a reduced model. We will use this frequently when we look at more complex models.

Let's illustrate the general linear test here for the single factor experiment:

First we write the full model, \(Y_{ij} = \mu + \tau_i + \epsilon_{ij}\) and then the reduced model, \(Y_{ij} = \mu + \epsilon_{ij}\) where you don't have a \(\tau_i\) term, you just have an overall mean, \(\mu\). This is a pretty degenerate model that just says all the observations are just coming from one group. But the reduced model is equivalent to what we are hypothesizing when we say the \(\mu_i\) would all be equal, i.e.:

\(H_0 \colon \mu_1 = \mu_2 = \dots = \mu_a\)

This is equivalent to our null hypothesis where the \(\tau_i\)'s are all equal to 0.

The reduced model is just another way of stating our hypothesis. But in more complex situations this is not the only reduced model that we can write, there are others we could look at.

The general linear test is stated as an F ratio:

\(F=\dfrac{(SSE(R)-SSE(F))/(dfR-dfF)}{SSE(F)/dfF}\)

This is a very general test. You can apply any full and reduced model and test whether or not the difference between the full and the reduced model is significant just by looking at the difference in the SSE appropriately. This has an F distribution with ( df R - df F), df F degrees of freedom, which correspond to the numerator and the denominator degrees of freedom of this F ratio.

Let's take a look at this general linear test using Minitab...

Example 3.5: Cotton Weight Section  

Natural ball of cotton

Remember this experiment had treatment levels 15, 20, 25, 30, 35 % cotton weight and the observations were the tensile strength of the material.

The full model allows a different mean for each level of cotton weight %.

We can demonstrate the General Linear Test by viewing the ANOVA table from Minitab:

STAT > ANOVA > Balanced ANOVA

The \(SSE(R) = 636.96\) with a \(dfR = 24\), and \(SSE(F) = 161.20\) with \(dfF = 20\). Therefore:

\(F^\ast =\dfrac{(636.96-161.20)/(24-20)}{161.20/20}\)

This demonstrates the equivalence of this test to the F -test. We now use the General Linear Test (GLT) to test for Lack of Fit when fitting a series of polynomial regression models to determine the appropriate degree of polynomial.

We can demonstrate the General Linear Test by comparing the quadratic polynomial model (Reduced model), with the full ANOVA model (Full model). Let \(Y_{ij} = \mu + \beta_{1}x_{ij} + \beta_{2}x_{ij}^{2} + \epsilon_{ij}\) be the reduced model, where \(x_{ij}\) is the cotton weight percent. Let \(Y_{ij} = \mu + \tau_i + \epsilon_{ij}\) be the full model.

The General Linear Test - Cotton Weight Example (no sound)

The video above shows the SSE ( R ) = 260.126 with dfR = 22 for the quadratic regression model. The ANOVA shows the full model with SSE ( F ) = 161.20 with dfF = 20.

Therefore the GLT is:

\(\begin{eqnarray} F^\ast &=&\dfrac{(SSE(R)-SSE(F))/(dfR-dfF)}{SSE(F)/dfF} \nonumber\\ &=&\dfrac{(260.126-161.200)/(22-20)}{161.20/20}\nonumber\\ &=&\dfrac{98.926/2}{8.06}\nonumber\\ &=&\dfrac{49.46}{8.06}\nonumber\\&=&6.14 \nonumber \end{eqnarray}\)

We reject \(H_0\colon \) Quadratic Model and claim there is Lack of Fit if \(F^{*} > F_{1}-\alpha (2, 20) = 3.49\).

Therefore, since 6.14 is > 3.49 we reject the null hypothesis of no Lack of Fit from the quadratic equation and fit a cubic polynomial. From the viewlet above we noticed that the cubic term in the equation was indeed significant with p -value = 0.015.

We can apply the General Linear Test again, now testing whether the cubic equation is adequate. The reduced model is:

\(Y_{ij} = \mu + \beta_{1}x_{ij} + \beta_{2}x_{ij}^{2} + \beta_{3}x_{ij}^{3} + \epsilon_{ij}\)

and the full model is the same as before, the full ANOVA model:

\(Y_ij = \mu + \tau_i + \epsilon_{ij}\)

The General Linear Test is now a test for Lack of Fit from the cubic model:

\begin{aligned} F^{*} &=\frac{(\operatorname{SSE}(R)-\operatorname{SSE}(F)) /(d f R-d f F)}{\operatorname{SSE}(F) / d f F} \\ &=\frac{(195.146-161.200) /(21-20)}{161.20 / 20} \\ &=\frac{33.95 / 1}{8.06} \\ &=4.21 \end{aligned}

We reject if \(F^{*} > F_{0.95} (1, 20) = 4.35\).

Therefore we do not reject \(H_A \colon\) Lack of Fit and conclude the data are consistent with the cubic regression model, and higher order terms are not necessary.

Common statistical tests are linear models (or: how to teach stats)

By Jonas Kristoffer Lindeløv ( blog , profile ). Last updated: 28 June, 2019 (See changelog ). Check out the Python version and the Twitter summary .

Facebook

This document is summarised in the table below. It shows the linear models underlying common parametric and “non-parametric” tests. Formulating all the tests in the same language highlights the many similarities between them. Get it as an image or as a PDF .

hypothesis test linear models

1 The simplicity underlying common tests

Most of the common statistical models (t-test, correlation, ANOVA; chi-square, etc.) are special cases of linear models or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to \(y = a \cdot x + b\) which most students know from highschool. Unfortunately, stats intro courses are usually taught as if each test is an independent tool, needlessly making life more complicated for students and teachers alike.

This needless complexity multiplies when students try to rote learn the parametric assumptions underlying each test separately rather than deducing them from the linear model.

For this reason, I think that teaching linear models first and foremost and then name-dropping the special cases along the way makes for an excellent teaching strategy, emphasizing understanding over rote learning. Since linear models are the same across frequentist, Bayesian, and permutation-based inferences, I’d argue that it’s better to start with modeling than p-values, type-1 errors, Bayes factors, or other inferences.

Concerning the teaching of “non-parametric” tests in intro-courses, I think that we can justify lying-to-children and teach “non-parametric”" tests as if they are merely ranked versions of the corresponding parametric tests. It is much better for students to think “ranks!” than to believe that you can magically throw away assumptions. Indeed, the Bayesian equivalents of “non-parametric”" tests implemented in JASP literally just do (latent) ranking and that’s it. For the frequentist “non-parametric”" tests considered here, this approach is highly accurate for N > 15.

hypothesis test linear models

Use the menu to jump to your favourite section. There are links to lots of similar (though more scattered) stuff under sources and teaching materials . I hope that you will join in suggesting improvements or submitting improvements yourself in the Github repo to this page . Let’s make it awesome!

2 Settings and toy data

For a start, we’ll keep it simple and play with three standard normals in wide ( a , b , c ) and long format ( value , group ):

3 Pearson and Spearman correlation

3.0.1 theory: as linear models.

Model: the recipe for \(y\) is a slope ( \(\beta_1\) ) times \(x\) plus an intercept ( \(\beta_0\) , aka a straight line).

\(y = \beta_0 + \beta_1 x \qquad \mathcal{H}_0: \beta_1 = 0\)

… which is a math-y way of writing the good old \(y = ax + b\) (here ordered as \(y = b + ax\) ). In R we are lazy and write y ~ 1 + x which R reads like y = 1*number + x*othernumber and the task of t-tests, lm, etc., is simply to find the numbers that best predict \(y\) .

Either way you write it, it’s an intercept ( \(\beta_0\) ) and a slope ( \(\beta_1\) ) yielding a straight line:

hypothesis test linear models

This is often simply called a regression model which can be extended to multiple regression where there are several \(\beta\) s and on the right-hand side multiplied with the predictors. Everything below, from one-sample t-test to two-way ANOVA are just special cases of this system. Nothing more, nothing less.

As the name implies, the Spearman rank correlation is a Pearson correlation on rank-transformed \(x\) and \(y\) :

\(rank(y) = \beta_0 + \beta_1 \cdot rank(x) \qquad \mathcal{H}_0: \beta_1 = 0\)

I’ll introduce ranks in a minute. For now, notice that the correlation coefficient of the linear model is identical to a “real” Pearson correlation, but p-values are an approximation which is is appropriate for samples greater than N=10 and almost perfect when N > 20 .

Such a nice and non-mysterious equivalence that many students are left unaware of! Visualizing them side by side including data labels, we see this rank-transformation in action:

hypothesis test linear models

3.0.2 Theory: rank-transformation

rank simply takes a list of numbers and “replace” them with the integers of their rank (1st smallest, 2nd smallest, 3rd smallest, etc.). So the result of the rank-transformation rank(c(3.6, 3.4, -5.0, 8.2)) is 3, 2, 1, 4 . See that in the figure above?

A signed rank is the same, just where we rank according to absolute size first and then add in the sign second. So the signed rank here would be 2, 1, -3, 4 . Or in code:

I hope I don’t offend anyone when I say that ranks are easy; yet it’s all you need to do to convert most parametric tests into their “non-parametric” counterparts! One interesting implication is that many “non-parametric tests” are about as parametric as their parametric counterparts with means, standard deviations, homogeneity of variance, etc. - just on rank-transformed data . That’s why I put “non-parametric” in quotation marks.

3.0.3 R code: Pearson correlation

It couldn’t be much simpler to run these models in R. They yield identical p and t , but there’s a catch: lm gives you the slope and even though that is usually much more interpretable and informative than the correlation coefficient r , you may still want r . Luckily, the slope becomes r if x and y have identical standard deviations. For now, we will use scale(x) to make \(SD(x) = 1.0\) and \(SD(y) = 1.0\) :

The CIs are not exactly identical, but very close.

3.0.4 R code: Spearman correlation

Note that we can interpret the slope which is the number of ranks \(y\) change for each rank on \(x\) . I think that this is a pretty interesting number. However, the intercept is less interpretable since it lies at \(rank(x) = 0\) which is impossible since x starts at 1.

See the identical r (now “rho”) and p :

4.1 One sample t-test and Wilcoxon signed-rank

4.1.1 theory: as linear models.

t-test model: A single number predicts \(y\) .

\(y = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0\)

In other words, it’s our good old \(y = \beta_0 + \beta_1*x\) where the last term is gone since there is no \(x\) (essentially \(x=0\) , see left figure below).

The same is to a very close approximately true for Wilcoxon signed-rank test , just with the signed ranks of \(y\) instead of \(y\) itself (see right panel below).

\(signed\_rank(y) = \beta_0\)

This approximation is good enough when the sample size is larger than 14 and almost perfect if the sample size is larger than 50 .

hypothesis test linear models

4.1.2 R code: One-sample t-test

Try running the R code below and see that the linear model ( lm ) produces the same \(t\) , \(p\) , and \(r\) as the built-in t.test . The confidence interval is not presented in the output of lm but is also identical if you use confint(lm(...)) :

4.1.3 R code: Wilcoxon signed-rank test

In addition to matching p -values, lm also gives us the mean signed rank, which I find to be an informative number.

4.2 Paired samples t-test and Wilcoxon matched pairs

4.2.1 theory: as linear models.

t-test model: a single number (intercept) predicts the pairwise differences.

\(y_2-y_1 = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0\)

This means that there is just one \(y = y_2 - y_1\) to predict and it becomes a one-sample t-test on the pairwise differences. The visualization is therefore also the same as for the one-sample t-test. At the risk of overcomplicating a simple substraction, you can think of these pairwise differences as slopes (see left panel of the figure), which we can represent as y-offsets (see right panel of the figure):

hypothesis test linear models

Similarly, the Wilcoxon matched pairs only differ from Wilcoxon signed-rank in that it’s testing the signed ranks of the pairwise \(y-x\) differences.

\(signed\_rank(y_2-y_1) = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0\)

4.2.2 R code: Paired sample t-test

4.2.3 r code: wilcoxon matched pairs.

Again, we do the signed-ranks trick. This is still an approximation, but a close one:

For large sample sizes (N >> 100), this approaches the sign test to a reasonable degree, but this approximation is too inaccurate to flesh out here.

5 Two means

5.1 independent t-test and mann-whitney u, 5.1.1 theory: as linear models.

Independent t-test model: two means predict \(y\) .

\(y_i = \beta_0 + \beta_1 x_i \qquad \mathcal{H}_0: \beta_1 = 0\)

where \(x_i\) is an indicator (0 or 1) saying whether data point \(i\) was sampled from one or the other group. Indicator variables (also called “dummy coding”) underly a lot of linear models and we’ll take an aside to see how it works in a minute.

Mann-Whitney U (also known as Wilcoxon rank-sum test for two independent groups; no signed rank this time) is the same model to a very close approximation, just on the ranks of \(x\) and \(y\) instead of the actual values:

\(rank(y_i) = \beta_0 + \beta_1 x_i \qquad \mathcal{H}_0: \beta_1 = 0\)

To me, equivalences like this make “non-parametric” statistics much easier to understand. The approximation is appropriate when the sample size is larger than 11 in each group and virtually perfect when N > 30 in each group .

5.1.2 Theory: Dummy coding

Dummy coding can be understood visually. The indicator is on the x-axis so data points from the first group are located at \(x = 0\) and data points from the second group is located at \(x = 1\) . Then \(\beta_0\) is the intercept (blue line) and \(\beta_1\) is the slope between the two means (red line). Why? Because when \(\Delta x = 1\) the slope equals the difference because:

\(slope = \Delta y / \Delta x = \Delta y / 1 = \Delta y = difference\)

Magic! Even categorical differences can be modelled using linear models! It’s a true Swizz army knife.

hypothesis test linear models

5.1.3 Theory: Dummy coding (continued)

If you feel like you get dummy coding now, just skip ahead to the next section. Here is a more elaborate explanation of dummy coding:

If a data point was sampled from the first group, i.e., when \(x_i = 0\) , the model simply becomes \(y = \beta_0 + \beta_1 \cdot 0 = \beta_0\) . In other words, the model predicts that that data point is \(beta_0\) . It turns out that the \(\beta\) which best predicts a set of data points is the mean of those data points, so \(\beta_0\) is the mean of group 1.

On the other hand, data points sampled from the second group would have \(x_i = 1\) so the model becomes \(y_i = \beta_0 + \beta_1\cdot 1 = \beta_0 + \beta_1\) . In other words, we add \(\beta_1\) to “shift” from the mean of the first group to the mean of the second group. Thus \(\beta_1\) becomes the mean difference between the groups.

As an example, say group 1 is 25 years old ( \(\beta_0 = 25\) ) and group 2 is 28 years old ( \(\beta_1 = 3\) ), then the model for a person in group 1 is \(y = 25 + 3 \cdot 0 = 25\) and the model for a person in group 2 is \(y = 25 + 3 \cdot 1 = 28\) .

Hooray, it works! For first-timers it takes a few moments to understand dummy coding, but you only need to know addition and multiplication to get there!

5.1.4 R code: independent t-test

As a reminder, when we write y ~ 1 + x in R, it is shorthand for \(y = \beta_0 \cdot 1 + \beta_1 \cdot x\) and R goes on computing the \(\beta\) s for you. Thus y ~ 1 + x is the R-way of writing \(y = a \cdot x + b\) .

Notice the identical t , df , p , and estimates. We can get the confidence interval by running confint(lm(...)) .

5.1.5 R code: Mann-Whitney U

5.2 welch’s t-test.

This is identical to the (Student’s) independent t-test above except that Student’s assumes identical variances and Welch’s t-test does not. So the linear model is the same but we model one variance per group. We can do this using the nlme package ( see more details here ):

6 Three or more means

ANOVAs are linear models with (only) categorical predictors so they simply extend everything we did above, relying heavily on dummy coding. Do make sure to read the section on dummy coding if you haven’t already.

6.1 One-way ANOVA and Kruskal-Wallis

6.1.1 theory: as linear models.

Model: One mean for each group predicts \(y\) .

\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 +... \qquad \mathcal{H}_0: y = \beta_0\)

where \(x_i\) are indicators ( \(x=0\) or \(x=1\) ) where at most one \(x_i=1\) while all others are \(x_i=0\) .

Notice how this is just “more of the same” of what we already did in other models above. When there are only two groups, this model is \(y = \beta_0 + \beta_1*x\) , i.e. the independent t-test . If there is only one group, it is \(y = \beta_0\) , i.e. the one-sample t-test . This is easy to see in the visualization below - just cover up a few groups and see that it matches the other visualizations above.

hypothesis test linear models

A one-way ANOVA has a log-linear counterpart called goodness-of-fit test which we’ll return to. By the way, since we now regress on more than one \(x\) , the one-way ANOVA is a multiple regression model.

The Kruskal-Wallis test is simply a one-way ANOVA on the rank-transformed \(y\) ( value ):

\(rank(y) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 +...\)

This approximation is good enough for 12 or more data points . Again, if you do this for just one or two groups, we’re already acquainted with those equations, i.e. the Wilcoxon signed-rank test or the Mann-Whitney U test respectively.

6.1.2 Example data

We make a three-level factor with the levels a , b , and c so that the one-way ANOVA basically becomes a “three-sample t-test”. Then we manually do the dummy coding of the groups.

With group a’s intercept omni-present, see how exactly one other parameter is added to predict value for group b and c in a given row (scroll to he end). Thus data points in group b never affect the estimates in group c.

6.1.3 R code: one-way ANOVA

OK, let’s see the identity between a dedicated ANOVA function ( car::Anova ) and the dummy-coded in-your-face linear model in lm .

Actually, car::Anova and aov are wrappers around lm so the identity comes as no surprise. It only shows that the dummy-coded formula, which had a direct interpretation as a linear model, is the one that underlies the shorthand notation syntax y ~ factor . Indeed, the only real reason to use aov and car::Anova rather than lm is to get a nicely formatted ANOVA table.

The default output of lm returns parameter estimates as well (bonus!), which you can see if you unfold the R output above. However, because this IS the ANOVA model, you can also get parameter estimates out into the open by calling coefficients(aov(...)) .

Note that I do not use the aov function because it computes type-I sum of squares, which is widely discouraged. There is a BIG polarized debate about whether to use type-II (as car::Anova does by default) or type-III sum of squares (set car::Anova(..., type=3) ), but let’s skip that for now.

6.1.4 R code: Kruskal-Wallis

6.2 two-way anova (plot in progress), 6.2.1 theory: as linear models.

Model: one mean per group (main effects) plus these means multiplied across factors (interaction effects). The main effects are the one-way ANOVA s above, though in the context of a larger model. The interaction effect is harder to explain in the abstract even though it’s just a few numbers multiplied with each other. I will leave that to the teachers to keep focus on equivalences here :-)

Switching to matrix notation:

\(y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 \qquad \mathcal{H}_0: \beta_3 = 0\)

Here \(\beta_i\) are vectors of betas of which only one is selected by the indicator vector \(X_i\) . The \(\mathcal{H}_0\) shown here is the interaction effect. Note that the intercept \(\beta_0\) , to which all other \(\beta\) s are relative, is now the mean for the first level of all factors.

Continuing with the dataset from the one-way ANOVA above, let’s add a crossing factor mood so that we can test the group:mood interaction (a 3x2 ANOVA). We also do the dummy coding of this factor needed for the linear model.

\(\beta_0\) is now the happy guys from group a!

hypothesis test linear models

6.2.2 R code: Two-way ANOVA

Now let’s turn to the actual modeling in R. We compare a dedicated ANOVA function ( car::Anova ; see One-Way ANOVA why) to the linear model ( lm ). Notice that in ANOVA, we are testing a full factor interaction all at once which involves many parameters (two in this case), so we can’t look at the overall model fit nor any particular parameter for the result. Therefore, I use a likelihood-ratio test to compare a full two-way ANOVA model (“saturated”) to one without the interaction effect(s). The anova function does this test. Even though that looks like cheating, it’s just computing likelihoods, p-values, etc. on the models that were already fitted, so it’s legit!

Below, I present approximate main effect models, though exact calculation of ANOVA main effects is more involved if it is to be accurate and furthermore depend on whether type-II or type-III sum of squares are used for inference.

Look at the model summary statistics to find values comparable to the Anova -estimated main effects above.

This is simply ANOVA with a continuous regressor added so that it now contains continuous and (dummy-coded) categorical predictors. For example, if we continue with the one-way ANOVA example, we can add age and it is now called a one-way ANCOVA :

\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_3 age\)

… where \(x_i\) are our usual dummy-coded indicator variables. \(\beta_0\) is now the mean for the first group at \(age=0\) . You can turn all ANOVAs into ANCOVAs this way, e.g. by adding \(\beta_N \cdot age\) to our two-way ANOVA in the previous section. But let us go ahead with our one-way ANCOVA, starting by adding \(age\) to our dataset:

This is best visualized using colors for groups instead of x-position. The \(\beta\) s are still the average \(y\) -offset of the data points, only now we model each group using a slope instead of an intercept. In other words, the one-way ANOVA is sort of one-sample t-tests model for each group ( \(y = \beta_0\) ) while the one-way ANCOVA is sort of Pearson correlation model for each group ( \(y_i = \beta_0 + \beta_i + \beta_1 \cdot age\) ):

hypothesis test linear models

And now some R code to run the one-way ANCOVA as a linear model:

7 Proportions: Chi-square is a log-linear model

Recall that when you take the logarithm, you can easily make statements about proportions , i.e., that for every increase in \(x\) , \(y\) increases a certain percentage. This turns out to be one of the simplest (and therefore best!) ways to make count data and contingency tables intelligible. See this nice introduction to Chi-Square tests as linear models.

7.1 Goodness of fit

7.1.1 theory: as log-linear model.

Model: a single intercept predicts \(log(y)\) .

I’ll refer you to take a look at the section on contingency tables which is basically a “two-way goodness of fit”.

7.1.2 Example data

For this, we need some wide count data:

7.1.3 R code: Goodness of fit

Now let’s see that the Goodness of fit is just a log-linear equivalent to a one-way ANOVA. We set family = poisson() which defaults to setting a logarithmic link function ( family = poisson(link='log') ).

Let’s look at the results:

Note the strange anova(..., test='Rao') which merely states that p-values should be computed using the (Rao) score test . We could also have jotted in test='Chisq' or test='LRT' which would have yielded approximate p-values. You may think that we’re cheating here, sneaking in some sort of Chi-Square model post-hoc. However, anova only specifies how p-values are calculated whereas all the log-linear modeling happened in glm .

By the way, if there are only two counts and a large sample size (N > 100), this model begins to approximate the binomial test , binom.test , to a reasonable degree. But this sample size is larger than most use cases, so I won’t raise to a rule-of-thumb and won’t dig deeper into it here.

7.2 Contingency tables

7.2.1 theory: as log-linear model.

The theory here will be a bit more convoluted, and I mainly write it up so that you can get the feeling that it really is just a log-linear two-way ANOVA model . Let’s get started…

For a two-way contingency table, the model of the count variable \(y\) is a modeled using the marginal proportions of a contingency table. Why this makes sense, is too involved to go into here, but see the relevant slides by Christoph Scheepers here for an excellent exposition. The model is composed of a lot of counts and the regression coefficients \(A_i\) and \(B_i\) :

\(y_i = N \cdot x_i(A_i/N) \cdot z_j(B_j/N) \cdot x_{ij}/((A_i x_i)/(B_j z_j)/N)\)

What a mess!!! Here, \(i\) is the row index, \(j\) is the column index, \(x_{something}\) is the sum of that row and/or column, \(N = sum(y)\) . Remember that \(y\) is a count variable, so \(N\) is just the total count.

We can simplify the notation by defining the proportions : \(\alpha_i = x_i(A_i/N)\) , \(\beta_i = x_j(B_i/N)\) and \(\alpha_i\beta_j = x_{ij}/(A_i x_i)/(B_j z_j)/N\) . Let’s write the model again:

\(y_i = N \cdot \alpha_i \cdot \beta_j \cdot \alpha_i\beta_j\)

Ah, much prettier. However, there is still lot’s of multiplication which makes it hard to get an intuition about how the actual numbers interact. We can make it much more intelligible when we remember that \(log(A \cdot B) = log(A) + log(B)\) . Doing logarithms on both sides, we get:

\(log(y_i) = log(N) + log(\alpha_i) + log(\beta_j) + log(\alpha_i\beta_j)\)

Snuggly! Now we can get a better grasp on how the regression coefficients (which are proportions) independently contribute to \(y\) . This is why logarithms are so nice for proportions. Note that this is just the two-way ANOVA model with some logarithms added, so we are back to our good old linear models - only the interpretation of the regression coefficients have changed! And we cannot use lm anymore in R.

7.2.2 Example data

Here we need some long data and we need it in table format for chisq.test :

7.2.3 R code: Chi-square test

Now let’s show the equivalence between a chi-square model and a log-linear model. This is very similar to our two-way ANOVA above:

If you unfold the raw R output, I’ve included summary(full) so that you can see the raw regression coefficients. Being a log-linear model, these are the percentage increase in \(y\) over and above the intercept if that category obtains.

8 Sources and further equivalences

Here are links to other sources who have exposed bits and pieces of this puzzle, including many further equivalences not covered here:

  • My original exposition of the idea at Cross Validated
  • An earlier question by me about non-parametric tests and a helpful answer.
  • This question and replies on t-tests and ANOVA at StackOverflow
  • These slides by Christoph Scheepers on Chi-Square as log-linear models.
  • This notebook by Philip M. Alday on Chi-square, binomial, multinomial, and poisson tests as log-linear and logistic models. These “equivalences” are less exact than what I presented above, and were therefore not included here. They are still great for a conceptual understanding of these tests, though!
  • This article by Kristoffer Magnusson on RM-ANOVA and growth models using lme4::lmer mixed models.
  • This post by Thom Baguley on the Friedman test. That post was actually the one that inititated my exploration of linear equivalences to “non-parametric”" tests which ultimately pushed me over the edge to write up the present article.

9 Teaching materials and a course outline

Most advanced stats books (and some intro-books) take the “everything is GLMM” approach as well. However, the “linear model” part often stays at the conceptual level, rather than being made explicit. I wanted to make linear models the tool in a concise way. Luckily, more beginner-friendly materials have emerged lately:

  • Russ Poldrack’s open-source book “Statistical Thinking for the 21st century” (start at chapter 5 on modeling )
  • Jeff Rouder’s course notes , introducing model comparison using just \(R^2\) and BIC. It avoids all the jargon on p-values, F-values, etc. The full materials and slides are available here .

Here are my own thoughts on what I’d do. I’ve taught parts of this with great success already, but not the whole program since I’m not assigned to teach a full course yet.

I would spend 50% of the time on linear modeling of data since this contains 70% of what students need to know (bullet 1 below). The rest of the course is fleshing out what happens when you have one group, two groups, etc.

Note that whereas the understanding of sampling and hypothesis testing is usually the first focus of mainstream stats courses, it is saved for later here to build upon students’ prior knowledge, rather than throwing a lot of conceptually novel material at them.

Fundamentals of regression:

Recall from high-school: \(y = a \cdot x + b\) , and getting a really good intuition about slopes and intercepts. Understanding that this can be written using all variable names, e.g., money = profit * time + starting_money or \(y = \beta_1x + \beta_2*1\) or, suppressing the coefficients, as y ~ x + 1 . If the audience is receptive, convey the idea of these models as a solution to differential equations , specifying how \(y\) changes with \(x\) .

Extend to a few multiple regression as models. Make sure to include plenty of real-life examples and exercises at this point to make all of this really intuitive. Marvel at how briefly these models allow us to represent large datasets.

Introduce the idea of rank-transforming non-metric data and try it out.

Teach the three assumptions: independence of data points, normality of residuals, and homoscedasticity.

Confidence/credible intervals on the parameters. Stress that the Maximum-Likelihood estimate is extremely unlikely, so intervals are more important.

Briefly introduce \(R^2\) for the simple regression models above. Mention in passing that this is called the Pearson and Spearman correlation coefficients .

Special case #1: One or two means (t-tests, Wilcoxon, Mann-Whitney):

One mean: When there is only one x-value, the regression model simplifies to \(y = b\) . If \(y\) is non-metric, you can rank-transform it. Apply the assumptions (homoscedasticity doesn’t apply since there is only one \(x\) ). Mention in passing that these intercept-only models are called one-sample t-test and Wilcoxon Signed Rank test respectively .

Two means: If we put two variables 1 apart on the x-axis, the difference between the means is the slope. Great! It is accessible to our swizz army knife called linear modeling. Apply the assumption checks to see that homoscedasticity reduces to equal variance between groups. This is called an independent t-test . Do a few worked examples and exercises, maybe adding Welch’s test, and do the rank-transformed version, called Mann-Whitney U.

Paired samples: Violates the independence assumption. After computing pairwise differences, this is equivalent to 2.1 (one intercept), though it is called the paired t-test and Wilcoxon’s matched pairs .

Special case #2: Three or more means (ANOVAs)

Dummy coding of categories: How one regression coefficient for each level of a factor models an intercept for each level when multiplied by a binary indicator. This is just extending what we did in 2.1. to make this data accessible to linear modeling.

Means of one variable: One-way ANOVA .

Means of two variables: Two-way ANOVA .

Special case #3: Three or more proportions (Chi-Square)

Logarithmic transformation: Making multiplicative models linear using logarithms, thus modeling proportions. See this excellent introduction to the equivalence of log-linear models and Chi-Square tests as models of proportions. Also needs to introduce (log-)odds ratios. When the multiplicative model is made summative using logarithms, we just add the dummy-coding trick from 3.1, and see that the models are identical to the ANOVA models in 3.2 and 3.3, only the interpretation of the coefficients have changed.

Proportions of one variable: Goodness of fit .

Proportions of two variables: Contingency tables .

Hypothesis testing:

Hypothesis testing as model comparisons: Hypothesis testing is the act of choosing between a full model and one where a parameter is fixed to a particular value (often zero, i.e., effectively excluded from the model) instead of being estimated. For example, when fixing one of the two means to zero in the t-test , we study how well a single mean (a one-sample t-test ) explains all the data from both groups. If it does a good job, we prefer this model over the two-mean model because it is simpler. So hypothesis testing is just comparing linear models to make more qualitative statements than the truly quantitative statements which were covered in bullets 1-4 above. As tests of single parameters, hypothesis testing is therefore less informative However, when testing multiple parameters at the same time (e.g., a factor in ANOVA), model comparison becomes invaluable.

Likelihood ratios: Likelihood ratios are the swizz army knife which will do model comparison all the way from the one-sample t-test to GLMMs. BIC penalizes model complexity. Moreover, add priors and you’ve got Bayes Factors. One tool, and you’re done. I’ve used LRTs in the ANOVAs above.

10 Limitations

I have made a few simplifications for clarity:

I have not covered assumptions in the examples. This will be another post! But all assumptions of all tests come down to the usual three: a) independence of data points, b) normally distributed residuals, and c) homoscedasticity.

I assume that all null hypotheses are the absence of an effect, but everything works the same for non-zero null hypotheses.

I have not discussed inference. I am only including p-values in the comparisons as a crude way to show the equivalences between the underlying models since people care about p-values. Parameter estimates will show the same equivalence. How to do inference is another matter. Personally, I’m a Bayesian, but going Bayesian here would render it less accessible to the wider audience. Also, doing robust models would be preferable, but fail to show the equivalences.

Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on rank(y) , McNemar, and Binomial/Multinomial. See stuff on these in the section on links to further equivalences . If you think that they should be included here, feel free to submit “solutions” to the github repo of this doc!

Statology

Statistics Made Easy

Understanding the Null Hypothesis for Linear Regression

Linear regression is a technique we can use to understand the relationship between one or more predictor variables and a response variable .

If we only have one predictor variable and one response variable, we can use simple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x

  • ŷ: The estimated response value.
  • β 0 : The average value of y when x is zero.
  • β 1 : The average change in y associated with a one unit increase in x.
  • x: The value of the predictor variable.

Simple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = 0
  • H A : β 1 ≠ 0

The null hypothesis states that the coefficient β 1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

The alternative hypothesis states that β 1 is not equal to zero. In other words, there is a statistically significant relationship between x and y.

If we have multiple predictor variables and one response variable, we can use multiple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k

  • β 0 : The average value of y when all predictor variables are equal to zero.
  • β i : The average change in y associated with a one unit increase in x i .
  • x i : The value of the predictor variable x i .

Multiple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = β 2 = … = β k = 0
  • H A : β 1 = β 2 = … = β k ≠ 0

The null hypothesis states that all coefficients in the model are equal to zero. In other words, none of the predictor variables have a statistically significant relationship with the response variable, y.

The alternative hypothesis states that not every coefficient is simultaneously equal to zero.

The following examples show how to decide to reject or fail to reject the null hypothesis in both simple linear regression and multiple linear regression models.

Example 1: Simple Linear Regression

Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive in his class. He collects data for 20 students and fits a simple linear regression model.

The following screenshot shows the output of the regression model:

Output of simple linear regression in Excel

The fitted simple linear regression model is:

Exam Score = 67.1617 + 5.2503*(hours studied)

To determine if there is a statistically significant relationship between hours studied and exam score, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  47.9952
  • P-value:  0.000

Since this p-value is less than .05, we can reject the null hypothesis. In other words, there is a statistically significant relationship between hours studied and exam score received.

Example 2: Multiple Linear Regression

Suppose a professor would like to use the number of hours studied and the number of prep exams taken to predict the exam score that students will receive in his class. He collects data for 20 students and fits a multiple linear regression model.

Multiple linear regression output in Excel

The fitted multiple linear regression model is:

Exam Score = 67.67 + 5.56*(hours studied) – 0.60*(prep exams taken)

To determine if there is a jointly statistically significant relationship between the two predictor variables and the response variable, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  23.46
  • P-value:  0.00

Since this p-value is less than .05, we can reject the null hypothesis. In other words, hours studied and prep exams taken have a jointly statistically significant relationship with exam score.

Note: Although the p-value for prep exams taken (p = 0.52) is not significant, prep exams combined with hours studied has a significant relationship with exam score.

Additional Resources

Understanding the F-Test of Overall Significance in Regression How to Read and Interpret a Regression Table How to Report Regression Results How to Perform Simple Linear Regression in Excel How to Perform Multiple Linear Regression in Excel

Featured Posts

7 Best YouTube Channels to Learn Statistics for Free

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “Understanding the Null Hypothesis for Linear Regression”

Thank you Zach, this helped me on homework!

Great articles, Zach.

I would like to cite your work in a research paper.

Could you provide me with your last name and initials.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Linear hypothesis testing in ultra high dimensional generalized linear mixed models

  • Research Article
  • Published: 18 May 2024

Cite this article

hypothesis test linear models

  • Xiyun Zhang 1 &
  • Zaixing Li   ORCID: orcid.org/0000-0003-0129-4162 1  

This paper is concerned with linear hypothesis testing problems in ultra high dimensional generalized linear mixed models where the response and the random effects are distribution-free. The constrained-partial-regularization based penalized quasi-likelihood method is proposed and the corresponding statistical properties are studied. To test linear hypotheses, we propose a partial penalized quasi-likelihood ratio test, a partial penalized quasi-score test, and a partial penalized Wald test. The theoretical properties of these three tests are established under both the null and the alternatives. The finite sample performance of the proposed tests has been shown by the simulation studies, and the forest health data is illustrated by our procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

hypothesis test linear models

Similar content being viewed by others

hypothesis test linear models

Homogeneity Estimation in Multivariate Generalized Linear Models

Robust wald-type tests for non-homogeneous observations based on the minimum density power divergence estimator.

hypothesis test linear models

Inferences in linear mixed models with skew-normal random effects

Data availability.

The data can be available from the R2BayesX package in R.

Bates, D. M., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects models using S4 classes. Journal of Statistical Software, 67 (1), 1–48.

Article   Google Scholar  

Bondell, H. D., Krishna, A., & Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 66 (4), 1069–1077.

Article   MathSciNet   Google Scholar  

Booth, J. G., & Hobert, J. P. (1999). Maximum generalized linear mixed model likelihood with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society: Series B, 61 , 265–285.

Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3 , 1–122.

Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88 (421), 9–25.

Brezger, A., Kneib, T., & Lang, S. (2005). BayesX: Analyzing Bayesian structural additive regression models. Journal of Statistical Software, 14 , 1–22.

Dao, C., & Jiang, J. (2016). A modified Pearson’s \(\chi ^2\) test with application to generalized linear mixed model diagnostics. Annals of Mathematical Sciences and Applications, 1 (1), 195–215.

Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96 (456), 1348–1360.

Fan, Y., & Li, R. (2012). Variable selection in linear mixed effects models. Journal of the American Statistical Association, 40 (4), 2043–2068.

MathSciNet   Google Scholar  

Fan, J., & Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57 (8), 5467–5484.

Fan, Y., & Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B, 75 (3), 531–552.

Finos, L., & Basso, D. (2014). Permutation tests for between-unit fixed effects in multivariate generalized linear mixed models. Statistics and Computing, 24 , 941–952.

Groll, A., & Tutz, G. (2014). Variable selection for generalized linear mixed models by \(L_1\) -penalized estimation. Statistics and Computing, 24 (2), 137–154.

Hui, F. K., Müller, S., & Welsh, A. H. (2017). Joint selection in mixed models using regularized PQL. Journal of the American Statistical Association, 112 (519), 1323–1333.

Ibrahim, J. G., Zhu, H., Garcia, R. I., & Guo, R. (2011). Fixed and random effects selection in mixed effects models. Biometrics, 67 , 495–503.

Jaeger, B. C., Edwards, L. J., Das, K., & Sen, P. K. (2017). An \(R^2\) statistic for fixed effects in the generalized linear mixed model. Journal of Applied Statistics, 44 (6), 1086–1105.

Jiang, J. (1998). Consistent estimators in generalized linear mixed models. Journal of the American Statistical Association, 93 (442), 720–729.

Jiang, J. M. (2007). Linear and generalized linear mixed models and their applications . Springer.

Google Scholar  

Koh, H., Li, Y., Zhan, X., Chen, J., & Zhao, N. (2019). A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Frontiers in Genetics, 10 (458), 1–14.

Law, M., & Ritov, Y. (2023). Inference and estimation for random effects in high-dimensional linear mixed models. Journal of the American Statistical Association, 118 (543), 1682–1691.

Lee, K., & Yoo, J. (2014). Bayesian Cholesky factor models in random effects covariance matrix for generalized linear mixed models. Computational Statistics and Data Analysis, 80 , 111–116.

Lin, X. (1997). Variance component testing in generalised linear models with random effects. Biometrika, 84 (2), 309–326.

Lin, X., & Breslow, N. E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91 (435), 1007–1016.

Lv, J., & Fan, Y. A. (2009) Unified approach to model selection and sparse recovery using regularized least square. Annals of Statistics, 37 (6A), 34–98.

McCullagh, P., & Nelder, J. A. (1989). Generalized linear models . Chapman and Hall.

Book   Google Scholar  

McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92 (437), 162–170.

Pan, L., Li, Y., He, K., Li, Y., & Li, Y. (2020). Generalized linear mixed models with Gaussian mixture random effects: Inference and application. Journal of Multivariate Analysis, 175 , 104555.

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2002). Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal, 2 (1), 1–21.

Schelldorfer, J., Meier, L., & Bühlmann, P. (2014). GLMMLasso: An algorithm for high-dimensional generalized linear mixed models using \(l_1\) -penalization. Journal of Computational and Graphical Statistics, 23 (2), 460–477.

Shi, C., Song, R., Chen, Z., & Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models. The Annals of Statistics, 47 (5), 2671–2703.

Sinha, S. (2009). Bootstrap tests for variance components in generalized linear mixed models. The Canadian Journal of Statistics, 37 (2), 219–234.

Torabi, M. (2012). Likelihood inference in generalized linear mixed models with two components of dispersion using data cloning. Computational Statistics and Data Analysis, 56 , 4259–4265.

Waagepetersen, R. (2006). A simulation-based goodness-of-fit test for random effects in generalized linear mixed models. Scandinavian Journal of Statistics, 33 , 721–731.

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika, 61 (3), 439–447.

Zeger, S. L., & Karim, R. M. (1991). Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association, 86 , 79–86.

Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38 (2), 894–942.

Download references

Acknowledgements

The authors thank the Editor, the Associate Editor and the referees for their constructive comments and suggestions that substantially improved an earlier manuscript. Besides, this paper is supported by the National Natural Science Foundation of China (no. 11671398), State Key Lab of Coal Resources and Safe Mining (China University of Mining and Technology) (no. SKLCRSM16KFB03) and the Fundamental Research Funds for the Central Universities in China (no. 2009QS02).

Author information

Authors and affiliations.

School of Science, China University of Mining and Technology (Beijing), Beijing, China

Xiyun Zhang & Zaixing Li

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Zaixing Li .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 386 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zhang, X., Li, Z. Linear hypothesis testing in ultra high dimensional generalized linear mixed models. J. Korean Stat. Soc. (2024). https://doi.org/10.1007/s42952-024-00268-1

Download citation

Received : 25 February 2024

Accepted : 23 April 2024

Published : 18 May 2024

DOI : https://doi.org/10.1007/s42952-024-00268-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Linear hypothesis testing
  • Quasi-likelihood
  • Ultra high dimension
  • Proxy matrix
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    hypothesis test linear models

  2. Hypothesis Tests in Multiple Linear Regression, Part 1

    hypothesis test linear models

  3. PPT

    hypothesis test linear models

  4. Linear regression

    hypothesis test linear models

  5. PPT

    hypothesis test linear models

  6. Mod-01 Lec-39 Hypothesis Testing in Linear Regression

    hypothesis test linear models

VIDEO

  1. Hypothesis Testing in Simple Linear Regression

  2. Application of Hypothesis Testing and Linear Regression in Real-life

  3. Session 8- Hypothesis testing by Non Parametric Tests (7/12/23)

  4. Lecture 5. Hypothesis Testing In Simple Linear Regression Model

  5. The Linear Representation Hypothesis and the Geometry of Large Language Models with Kiho Park

  6. Chapter 09: Hypothesis testing: non-directional worked example

COMMENTS

  1. 12.2.1: Hypothesis Test for Linear Regression

    The two test statistic formulas are algebraically equal; however, the formulas are different and we use a different parameter in the hypotheses. The formula for the t-test statistic is t = b1 (MSE SSxx)√ t = b 1 ( M S E S S x x) Use the t-distribution with degrees of freedom equal to n − p − 1 n − p − 1.

  2. Linear regression

    Normal vs non-normal model. The lecture is divided in two parts: in the first part, we discuss hypothesis testing in the normal linear regression model, in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors; . in the second part, we show how to carry out hypothesis tests in linear regression analyses where the hypothesis of normality ...

  3. Linear regression hypothesis testing: Concepts, Examples

    This essentially means that the value of all the coefficients is equal to zero. So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis states that a1 = a2 = a3 = 0. Determine the test statistics: The next step is to determine the test statistics and calculate the value.

  4. Comparing Regression Lines with Hypothesis Tests

    Hypothesis Tests for Comparing Regression Constants. When the constant (y intercept) differs between regression equations, the regression lines are shifted up or down on the y-axis. The scatterplot below shows how the output for Condition B is consistently higher than Condition A for any given Input. These two models have different constants.

  5. PDF Lecture 15. Hypothesis testing in the linear model

    Hypothesis testing in the linear model 12 (1{1) 15. Hypothesis testing in the linear model 15.8. One way analysis of variance with equal numbers in each group The tted sum of squares is therefore RSS 0 RSS = X i X j (y i;j y::) 2 (y i;j y:) 2 = J X i (y i: y::) 2: Source of d.f. sum of squares mean square F statistic variation

  6. Linear Hypothesis Testing in Linear Models With High-Dimensional

    Linear Hypothesis Testing in Linear Models With High-Dimensional Responses. Changcheng Li Runze Li Department of Statistics, ... In this article, we propose a new projection test for linear hypotheses on regression coefficient matrices in linear models with high-dimensional responses. We systematically study the theoretical properties of the ...

  7. The Linear Model and Hypothesis

    Authors: George Seber. Provides a concise and unique overview of hypothesis testing in four important statistical subject areas: linear and nonlinear models, multivariate analysis, and large sample theory. Shows that all hypotheses are linear or asymptotically so, and that all the basic models are exact or asymptotically linear normal models.

  8. Linear Hypothesis Tests

    Linear Hypothesis Tests. Most regression output will include the results of frequentist hypothesis tests comparing each coefficient to 0. However, in many cases, you may be interested in whether a linear sum of the coefficients is 0. For example, in the regression. Outcome = β0 +β1 ×GoodT hing+β2 ×BadT hing O u t c o m e = β 0 + β 1 × G ...

  9. Hypothesis Testing On Linear Regression

    Steps to Perform Hypothesis testing: Step 1: We start by saying that β₁ is not significant, i.e., there is no relationship between x and y, therefore slope β₁ = 0. Step 2: Typically, we set ...

  10. PDF Lecture 25: The Standard Linear Model: Hypothesis Testing

    Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen-Marx [4]: Section 11.4. Chapter 12. Theil [6]: Chapters 3-5. 25.1 The LS estimator Let there be N observations on K regressors X plus a constant term, and a response y. The matrix X is N ×(K +1), and assume it has rank (K +1). A constant term is a ...

  11. PDF Lecture 5 Hypothesis Testing in Multiple Linear Regression

    As in simple linear regression, under the null hypothesis t 0 = βˆ j seˆ(βˆ j) ∼ t n−p−1. We reject H 0 if |t 0| > t n−p−1,1−α/2. This is a partial test because βˆ j depends on all of the other predictors x i, i 6= j that are in the model. Thus, this is a test of the contribution of x j given the other predictors in the model.

  12. 6.4

    For the simple linear regression model, there is only one slope parameter about which one can perform hypothesis tests. For the multiple linear regression model, there are three different hypothesis tests for slopes that one could conduct. They are: Hypothesis test for testing that all of the slope parameters are 0. Hypothesis test for testing ...

  13. 6.2

    The "general linear F-test" involves three basic steps, namely:Define a larger full model. (By "larger," we mean one with more parameters.) Define a smaller reduced model. (By "smaller," we mean one with fewer parameters.) Use an F-statistic to decide whether or not to reject the smaller reduced model in favor of the larger full model.; As you can see by the wording of the third step, the null ...

  14. 3.6

    The general linear test is stated as an F ratio: \ (F=\dfrac { (SSE (R)-SSE (F))/ (dfR-dfF)} {SSE (F)/dfF}\) This is a very general test. You can apply any full and reduced model and test whether or not the difference between the full and the reduced model is significant just by looking at the difference in the SSE appropriately.

  15. Parameter Estimation and Hypothesis Testing in Linear Models

    This second English edition is there­ fore a translation of the third German edition of Parameter Estimation and Hypothesis Testing in Linear Models, published in 1997. It differs from the first English edition by the addition of a new chapter on robust estimation of parameters and the deletion of the section on discriminant analysis, which ...

  16. Hypothesis Testing in Linear Models (Eisenhart Model I)

    Hypothesis Testing in Linear Models (Eisenhart Model I) table. Each sum of squares (SS) in the AOV table is uniquely associated with testing a particular hypothesis in the linear model. These hypotheses are well known and cause no confusion among statisticians as to what is being tested. Results from an analysis of unbalanced data, however ...

  17. Common statistical tests are linear models (or: how to teach stats)

    1 The simplicity underlying common tests. Most of the common statistical models (t-test, correlation, ANOVA; chi-square, etc.) are special cases of linear models or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to y = a ⋅ x + b y = a ⋅ x + b which most students know ...

  18. Mixed Models: Testing Significance of Effects

    Linear hypothesis test Hypothesis: 10 x1 + x2 = 0 Model 1: restricted model Model 2: bin ~ x1 + x2 + (1 | g1) Df Chisq Pr(>Chisq) 1 2 1 1.4086 0.2353 The p-value is well above .05. Linear hypothesis tests can also be done with the KRmodcomp() function, if your model is a linear mixed model.

  19. Understanding the Null Hypothesis for Linear Regression

    xi: The value of the predictor variable xi. Multiple linear regression uses the following null and alternative hypotheses: H0: β1 = β2 = … = βk = 0. HA: β1 = β2 = … = βk ≠ 0. The null hypothesis states that all coefficients in the model are equal to zero. In other words, none of the predictor variables have a statistically ...

  20. Linear hypothesis testing in ultra high dimensional generalized linear

    This paper is concerned with linear hypothesis testing problems in ultra high dimensional generalized linear mixed models where the response and the random effects are distribution-free. The constrained-partial-regularization based penalized quasi-likelihood method is proposed and the corresponding statistical properties are studied. To test linear hypotheses, we propose a partial penalized ...

  21. Mastering Hypothesis Testing for Linear Models: SIM Revision

    Hypothesis testing Recipe 1 Clearly indiate H 0 and H 1 Both are needed; expressed in terms of true values! Do not write: H 0: ˆ b 1 = 0, ... In a log-linear model, log (y i) = g 1 + g 2 log (x i 2) + g 3 log (x i 3) + v i , the parameters represent elasticities: g 2 ...

  22. Binary Hypothesis Testing for Softmax Models and Leverage Score Models

    Softmax distributions are widely used in machine learning, including Large Language Models (LLMs) where the attention unit uses softmax distributions. We abstract the attention unit as the softmax model, where given a vector input, the model produces an output drawn from the softmax distribution (which depends on the vector input). We consider the fundamental problem of binary hypothesis ...