Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.26: Hypothesis Test for a Population Mean (5 of 5)

  • Last updated
  • Save as PDF
  • Page ID 14164

Learning Objectives

  • Interpret the P-value as a conditional probability.

We finish our discussion of the hypothesis test for a population mean with a review of the meaning of the P-value, along with a review of type I and type II errors.

Review of the Meaning of the P-value

At this point, we assume you know how to use a P-value to make a decision in a hypothesis test. The logic is always the same. If we pick a level of significance (α), then we compare the P-value to α.

  • If the P-value ≤ α, reject the null hypothesis. The data supports the alternative hypothesis.
  • If the P-value > α, do not reject the null hypothesis. The data is not strong enough to support the alternative hypothesis.

In fact, we find that we treat these as “rules” and apply them without thinking about what the P-value means. So let’s pause here and review the meaning of the P-value, since it is the connection between probability and decision-making in inference.

Birth Weights in a Town

Let’s return to the familiar context of birth weights for babies in a town. Suppose that babies in the town had a mean birth weight of 3,500 grams in 2010. This year, a random sample of 50 babies has a mean weight of about 3,400 grams with a standard deviation of about 500 grams. Here is the distribution of birth weights in the sample.

Dot plot of birth weights, ranging from around 2,000 grams to 4,000 grams.

Obviously, this sample weighs less on average than the population of babies in the town in 2010. A decrease in the town’s mean birth weight could indicate a decline in overall health of the town. But does this sample give strong evidence that the town’s mean birth weight is less than 3,500 grams this year?

We now know how to answer this question with a hypothesis test. Let’s use a significance level of 5%.

Let μ = mean birth weight in the town this year. The null hypothesis says there is “no change from 2010.”

  • H 0 : μ < 3,500
  • H a : μ = 3,500

Since the sample is large, we can conduct the T-test (without worrying about the shape of the distribution of birth weights for individual babies.)

Statistical software tells us the P-value is 0.082 = 8.2%. Since the P-value is greater than 0.05, we fail to reject the null hypothesis.

Our conclusion: This sample does not suggest that the mean birth weight this year is less than 3,500 grams ( P -value = 0.082). The sample from this year has a mean of 3,400 grams, which is 100 grams lower than the mean in 2010. But this difference is not statistically significant. It can be explained by the chance fluctuation we expect to see in random sampling.

What Does the P-Value of 0.082 Tell Us?

A simulation can help us understand the P-value. In a simulation, we assume that the population mean is 3,500 grams. This is the null hypothesis. We assume the null hypothesis is true and select 1,000 random samples from a population with a mean of 3,500 grams. The mean of the sampling distribution is at 3,500 (as predicted by the null hypothesis.) We see this in the simulated sampling distribution.

If the mean = 3,500 then 86 out of the 1,000 random samples have a sample mean less than 3,400. This is 0.086 = 8.6%

In the simulation, we can see that about 8.6% of the samples have a mean less than 3,400. Since probability is the relative frequency of an event in the long run, we say there is an 8.6% chance that a random sample of 500 babies has a mean less than 3,400 if the population mean is 3,500. We can see that the corresponding area to the left of T = −1.41 in the T-model (with df = 49) also gives us a good estimate of the probability. This area is the P-value, about 8.2%.

If we generalize this statement, we say the P-value is the probability that random samples have results more extreme than the data if the null hypothesis is true. (By more extreme, we mean further from value of the parameter, in the direction of the alternative hypothesis.) We can also describe the P-value in terms of T-scores. The P-value is the probability that the test statistic from a random sample has a value more extreme than that associated with the data if the null hypothesis is true.

What Does a P-Value Mean?

Do women who smoke run the risk of shorter pregnancy and premature birth? The mean pregnancy length is 266 days. We test the following hypotheses.

  • H 0 : μ = 266
  • H a : μ < 266

Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The P-value is 0.04.

What probability does the P-value of 0.04 describe? Label each of the following interpretations as valid or invalid.

https://assessments.lumenlearning.co...sessments/3654

https://assessments.lumenlearning.co...sessments/3655

https://assessments.lumenlearning.co...sessments/3656

Review of Type I and Type II Errors

We know that statistical inference is based on probability, so there is always some chance of making a wrong decision. Recall that there are two types of wrong decisions that can be made in hypothesis testing. When we reject a null hypothesis that is true, we commit a type I error. When we fail to reject a null hypothesis that is false, we commit a type II error.

The following table summarizes the logic behind type I and type II errors.

A table that summarizes the logic behind type I and type II errors. If Ho is true and we reject Ho (accept Ha), this is a correct decision. If Ho is true and we fail to reject Ho (not enough evidence to accept Ha), this is a correct decision. If Ho is false (Ha is true) and we reject Ho (accept Ha), this is a correct decision. If Ho is false (Ha is true) and we fail to reject Ho (not enough evidence to accept Ha), this is a type II error.

It is possible to have some influence over the likelihoods of committing these errors. But decreasing the chance of a type I error increases the chance of a type II error. We have to decide which error is more serious for a given situation. Sometimes a type I error is more serious. Other times a type II error is more serious. Sometimes neither is serious.

Recall that if the null hypothesis is true, the probability of committing a type I error is α. Why is this? Well, when we choose a level of significance (α), we are choosing a benchmark for rejecting the null hypothesis. If the null hypothesis is true, then the probability that we will reject a true null hypothesis is α. So the smaller α is, the smaller the probability of a type I error.

It is more complicated to calculate the probability of a type II error. The best way to reduce the probability of a type II error is to increase the sample size. But once the sample size is set, larger values of α will decrease the probability of a type II error (while increasing the probability of a type I error).

General Guidelines for Choosing a Level of Significance

  • If the consequences of a type I error are more serious, choose a small level of significance (α).
  • If the consequences of a type II error are more serious, choose a larger level of significance (α). But remember that the level of significance is the probability of committing a type I error.
  • In general, we pick the largest level of significance that we can tolerate as the chance of a type I error.

Let’s return to the investigation of the impact of smoking on pregnancy length.

Recap of the hypothesis test: The mean human pregnancy length is 266 days. We test the following hypotheses.

https://assessments.lumenlearning.co...sessments/3778

https://assessments.lumenlearning.co...sessments/3779

https://assessments.lumenlearning.co...sessments/3780

Let’s Summarize

In this “Hypothesis Test for a Population Mean,” we looked at the four steps of a hypothesis test as they relate to a claim about a population mean.

Step 1: Determine the hypotheses.

  • The hypotheses are claims about the population mean, µ.
  • The null hypothesis is a hypothesis that the mean equals a specific value, µ 0 .

Step 2: Collect the data.

Since the hypothesis test is based on probability, random selection or assignment is essential in data production. Additionally, we need to check whether the t-model is a good fit for the sampling distribution of sample means. To use the t-model, the variable must be normally distributed in the population or the sample size must be more than 30. In practice, it is often impossible to verify that the variable is normally distributed in the population. If this is the case and the sample size is not more than 30, researchers often use the t-model if the sample is not strongly skewed and does not have outliers.

Step 3: Assess the evidence.

  • If a t-model is appropriate, determine the t-test statistic for the data’s sample mean.
  • Use the test statistic, together with the alternative hypothesis, to determine the P-value.
  • The P-value is the probability of finding a random sample with a mean at least as extreme as our sample mean, assuming that the null hypothesis is true.
  • As in all hypothesis tests, if the alternative hypothesis is greater than, the P-value is the area to the right of the test statistic. If the alternative hypothesis is less than, the P-value is the area to the left of the test statistic. If the alternative hypothesis is not equal to, the P-value is equal to double the tail area beyond the test statistic.

Step 4: Give the conclusion.

The logic of the hypothesis test is always the same. To state a conclusion about H 0 , we compare the P-value to the significance level, α.

  • If P ≤ α, we reject H 0 . We conclude there is significant evidence in favor of H a .
  • If P > α, we fail to reject H 0 . We conclude the sample does not provide significant evidence in favor of H a .
  • We write the conclusion in the context of the research question. Our conclusion is usually a statement about the alternative hypothesis (we accept H a or fail to acceptH a ) and should include the P-value.

Other Hypothesis Testing Notes

  • Remember that the P-value is the probability of seeing a sample mean at least as extreme as the one from the data if the null hypothesis is true. The probability is about the random sample; it is not a “chance” statement about the null or alternative hypothesis.
  • If our test results in rejecting a null hypothesis that is actually true, then it is called a type I error.
  • If our test results in failing to reject a null hypothesis that is actually false, then it is called a type II error.
  • If rejecting a null hypothesis would be very expensive, controversial, or dangerous, then we really want to avoid a type I error. In this case, we would set a strict significance level (a small value of α, such as 0.01).
  • Finally, remember the phrase “garbage in, garbage out.” If the data collection methods are poor, then the results of a hypothesis test are meaningless.

Contributors and Attributions

  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.7 Hypothesis Tests for a Population Mean with Unknown Population Standard Deviation

Learning objectives.

  • Conduct and interpret hypothesis tests for a population mean with unknown population standard deviation.

Some notes about conducting a hypothesis test:

  • The null hypothesis [latex]H_0[/latex] is always an “equal to.”  The null hypothesis is the original claim about the population parameter.
  • The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.”  The form of the alternative hypothesis depends on the context of the question.
  • If the alternative hypothesis is a “less than”,  then the test is left-tail.  The p -value is the area in the left-tail of the distribution.
  • If the alternative hypothesis is a “greater than”, then the test is right-tail.  The p -value is the area in the right-tail of the distribution.
  • If the alternative hypothesis is a “not equal to”, then the test is two-tail.  The p -value is the sum of the area in the two-tails of the distribution.  Each tail represents exactly half of the p -value.
  • Think about the meaning of the p -value.  A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of  0.05.  Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis.  This makes the data analyst use judgment rather than mindlessly applying rules.
  • The significance level must be identified before collecting the sample data and conducting the test.  Generally, the significance level will be included in the question.  If no significance level is given, a common standard is to use a significance level of 5%.
  • An alternative approach for hypothesis testing is to use what is called the critical value approach .  In this book, we will only use the p -value approach.  Some of the videos below may mention the critical value approach, but this approach will not be used in this book.

Steps to Conduct a Hypothesis Test for a Population Mean with Unknown Population Standard Deviation

  • Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex].  Include appropriate units with the values of the mean.
  • Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ \\ df & = & n-1 \\ \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION MEAN WITH UNKNOWN POPULATION STANDARD DEVIATION

The p -value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean.  When the population standard deviation is unknown, use the [latex]t[/latex]-distribution to find the p -value.

If the p -value is the area in the left-tail:

  • For t-score , enter the value of [latex]t[/latex] calculated from [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex].
  • For degrees of freedom , enter the degrees of freedom for the [latex]t[/latex]-distribution [latex]n-1[/latex].
  • For the logic operator , enter true .  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.
  • The output from the t.dist function is the area under the [latex]t[/latex]-distribution to the left of the entered [latex]t[/latex]-score.
  • Visit the Microsoft page for more information about the t.dist function.

If the p -value is the area in the right-tail:

  • The output from the t.dist.rt function is the area under the [latex]t[/latex]-distribution to the right of the entered [latex]t[/latex]-score.
  • Visit the Microsoft page for more information about the t.dist.rt function.

If the p -value is the sum of area in the tails:

  • For t-score , enter the absolute value of [latex]t[/latex] calculated from [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex].  Note:  In the t.dist.2t function, the value of the [latex]t[/latex]-score must be a positive number.  If the [latex]t[/latex]-score is negative, enter the absolute value of the [latex]t[/latex]-score into the t.dist.2t function.
  • The output from the t.dist.2t function is the sum of areas in the tails under the [latex]t[/latex]-distribution.
  • Visit the Microsoft page for more information about the t.dist.2t function.

Statistics students believe that the mean score on the first statistics test is 65.  A statistics instructor thinks the mean score is higher than 65.  He samples ten statistics students and obtains the following scores:

The instructor performs a hypothesis test using a 1% level of significance. The test scores are assumed to be from a normal distribution.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \mu=65  \\ H_a: & & \mu \gt 65  \end{eqnarray*}[/latex]

From the question, we have [latex]n=10[/latex], [latex]\overline{x}=67[/latex], [latex]s=3.1972...[/latex] and [latex]\alpha=0.01[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=3.1972...[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right-tail of the distribution.

This is a t-distribution curve. The peak of the curve is at 0 on the horizontal axis. The point t is also labeled. A vertical line extends from point t to the curve with the area to the right of this vertical line shaded. The p-value equals the area of this shaded region.

To use the t.dist.rt function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{67-65}{\frac{3.1972...}{\sqrt{10}}} \\ & = & 1.9781... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=10-1=9[/latex].

So the p -value[latex]=0.0396[/latex].

Conclusion:

Because p -value[latex]=0.0396 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that mean score on the test is greater than 65.

  • The null hypothesis [latex]\mu=65[/latex] is the claim that the mean test score is 65.
  • The alternative hypothesis [latex]\mu \gt 65[/latex] is the claim that the mean test score is greater than 65.
  • Keep all of the decimals throughout the calculation (i.e. in the sample standard deviation, the [latex]t[/latex]-score, etc.) to avoid any round-off error in the calculation of the p -value.  This ensures that we get the most accurate value for the p -value.
  • The p -value is the area in the right-tail of the [latex]t[/latex]-distribution, to the right of [latex]t=1.9781...[/latex].
  • The p -value of 0.0396 tells us that under the assumption that the mean test score is 65 (the null hypothesis), there is a 3.96% chance that the mean test score is 65 or more.  Compared to the 1% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.

A company claims that the average change in the value of their stock is $3.50 per week.  An investor believes this average is too high. The investor records the changes in the company’s stock price over 30 weeks and finds the average change in the stock price is $2.60 with a standard deviation of $1.80.  At the 5% significance level, is the average change in the company’s stock price lower than the company claims?

[latex]\begin{eqnarray*} H_0: & & \mu=$3.50  \\ H_a: & & \mu \lt $3.50  \end{eqnarray*}[/latex]

From the question, we have [latex]n=30[/latex], [latex]\overline{x}=2.6[/latex], [latex]s=1.8[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=1.8.[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left-tail of the distribution.

his is a t-distribution curve. The peak of the curve is at 0 on the horizontal axis. The point t is also labeled. A vertical line extends from point t to the curve with the area to the left of this vertical line shaded. The p-value equals the area of this shaded region.

To use the t.dist function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{2.6-3.5}{\frac{1.8}{\sqrt{30}}} \\ & = & -1.5699... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=30-1=29[/latex].

So the p -value[latex]=0.0636[/latex].

Because p -value[latex]=0.0636 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that average change in the stock price is lower than $3.50.

  • The null hypothesis [latex]\mu=$3.50[/latex] is the claim that the average change in the company’s stock is $3.50 per week.
  • The alternative hypothesis [latex]\mu \lt $3.50[/latex] is the claim that the average change in the company’s stock is less than $3.50 per week.
  • The p -value is the area in the left-tail of the [latex]t[/latex]-distribution, to the left of [latex]t=-1.5699...[/latex].
  • The p -value of 0.0636 tells us that under the assumption that the average change in the stock is $3.50 (the null hypothesis), there is a 6.36% chance that the average change is $3.50 or less.  Compared to the 5% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the company’s claim that the average change in their stock price is $3.50 per week is most likely correct.

A paint manufacturer has their production line set-up so that the average volume of paint in a can is 3.78 liters.  The quality control manager at the plant believes that something has happened with the production and the average volume of paint in the cans has changed.  The quality control department takes a sample of 100 cans and finds the average volume is 3.62 liters with a standard deviation of 0.7 liters.  At the 5% significance level, has the volume of paint in a can changed?

[latex]\begin{eqnarray*} H_0: & & \mu=3.78 \mbox{ liters}  \\ H_a: & & \mu \neq 3.78 \mbox{ liters}  \end{eqnarray*}[/latex]

From the question, we have [latex]n=100[/latex], [latex]\overline{x}=3.62[/latex], [latex]s=0.7[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=0.7[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

This is a t distribution curve. The peak of the curve is at 0 on the horizontal axis. The point -t and t are also labeled. A vertical line extends from point t to the curve with the area to the right of this vertical line shaded with the shaded area labeled half of the p-value. A vertical line extends from -t to the curve with the area to the left of this vertical line shaded with the shaded area labeled half of the p-value. The p-value equals the area of these two shaded regions.

To use the t.dist.2t function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{3.62-3.78}{\frac{0.07}{\sqrt{100}}} \\ & = & -2.2857... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=100-1=99[/latex].

So the p -value[latex]=0.0244[/latex].

Because p -value[latex]=0.0244 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that average volume of paint in the cans has changed.

  • The null hypothesis [latex]\mu=3.78[/latex] is the claim that the average volume of paint in the cans is 3.78.
  • The alternative hypothesis [latex]\mu \neq 3.78[/latex] is the claim that the average volume of paint in the cans is not 3.78.
  • Keep all of the decimals throughout the calculation (i.e. in the [latex]t[/latex]-score) to avoid any round-off error in the calculation of the p -value.  This ensures that we get the most accurate value for the p -value.
  • The p -value is the sum of the area in the two tails.  The output from the t.dist.2t function is exactly the sum of the area in the two tails, and so is the p -value required for the test.  No additional calculations are required.
  • The t.dist.2t function requires that the value entered for the [latex]t[/latex]-score is positive .  A negative [latex]t[/latex]-score entered into the t.dist.2t function generates an error in Excel.  In this case, the value of the [latex]t[/latex]-score is negative, so we must enter the absolute value of this [latex]t[/latex]-score into field 1.
  • The p -value of 0.0244 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the average volume of paint in the cans has most likely changed from 3.78 liters.

Watch this video: Hypothesis Testing: t -test, right tail by ExcelIsFun [11:02]

Watch this video: Hypothesis Testing: t -test, left tail by ExcelIsFun [7:48]

Watch this video: Hypothesis Testing: t -test, two tail by ExcelIsFun [8:54]

Concept Review

The hypothesis test for a population mean is a well established process:

  • Collect the sample information for the test and identify the significance level.
  • When the population standard deviation is unknown, find the p -value (the area in the corresponding tail) for the test using the [latex]t[/latex]-distribution with [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex] and [latex]df=n-1[/latex].
  • Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 9.6   Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Teach yourself statistics

Hypothesis Test for a Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:

  • The sampling method is simple random sampling .
  • The sampling distribution is normal or nearly normal.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

  • The population distribution is normal.
  • The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
  • The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = s * sqrt{ ( 1/n ) * [ ( N - n ) / ( N - 1 ) ] }

SE = s / sqrt( n )

  • Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n - 1.

t = ( x - μ) / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Sample Size Calculator

As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: μ = 300

Alternative hypothesis: μ ≠ 300

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a one-sample t-test .

SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83

DF = n - 1 = 50 - 1 = 49

t = ( x - μ) / SE = (295 - 300)/2.83 = -1.77

where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < -1.77) is about 0.04.

  • If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t >  1.77) is 1 minus 0.96 or 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
  • Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).

Problem 2: One-Tailed Test

Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)

Null hypothesis: μ >= 110

Alternative hypothesis: μ < 110

  • Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a one-sample t-test .

SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236

DF = n - 1 = 20 - 1 = 19

t = ( x - μ) / SE = (108 - 110)/2.236 = -0.894

Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.

The observed sample mean produced a t statistic test statistic of -0.894. We use the t Distribution Calculator to find P(t < -0.894) is about 0.19.

  • This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19.
  • Interpret results . Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Unit 12: Significance tests (hypothesis testing)

About this unit, the idea of significance tests.

  • Simple hypothesis testing (Opens a modal)
  • Idea behind hypothesis testing (Opens a modal)
  • Examples of null and alternative hypotheses (Opens a modal)
  • P-values and significance tests (Opens a modal)
  • Comparing P-values to different significance levels (Opens a modal)
  • Estimating a P-value from a simulation (Opens a modal)
  • Using P-values to make conclusions (Opens a modal)
  • Simple hypothesis testing Get 3 of 4 questions to level up!
  • Writing null and alternative hypotheses Get 3 of 4 questions to level up!
  • Estimating P-values from simulations Get 3 of 4 questions to level up!

Error probabilities and power

  • Introduction to Type I and Type II errors (Opens a modal)
  • Type 1 errors (Opens a modal)
  • Examples identifying Type I and Type II errors (Opens a modal)
  • Introduction to power in significance tests (Opens a modal)
  • Examples thinking about power in significance tests (Opens a modal)
  • Consequences of errors and significance (Opens a modal)
  • Type I vs Type II error Get 3 of 4 questions to level up!
  • Error probabilities and power Get 3 of 4 questions to level up!

Tests about a population proportion

  • Constructing hypotheses for a significance test about a proportion (Opens a modal)
  • Conditions for a z test about a proportion (Opens a modal)
  • Reference: Conditions for inference on a proportion (Opens a modal)
  • Calculating a z statistic in a test about a proportion (Opens a modal)
  • Calculating a P-value given a z statistic (Opens a modal)
  • Making conclusions in a test about a proportion (Opens a modal)
  • Writing hypotheses for a test about a proportion Get 3 of 4 questions to level up!
  • Conditions for a z test about a proportion Get 3 of 4 questions to level up!
  • Calculating the test statistic in a z test for a proportion Get 3 of 4 questions to level up!
  • Calculating the P-value in a z test for a proportion Get 3 of 4 questions to level up!
  • Making conclusions in a z test for a proportion Get 3 of 4 questions to level up!

Tests about a population mean

  • Writing hypotheses for a significance test about a mean (Opens a modal)
  • Conditions for a t test about a mean (Opens a modal)
  • Reference: Conditions for inference on a mean (Opens a modal)
  • When to use z or t statistics in significance tests (Opens a modal)
  • Example calculating t statistic for a test about a mean (Opens a modal)
  • Using TI calculator for P-value from t statistic (Opens a modal)
  • Using a table to estimate P-value from t statistic (Opens a modal)
  • Comparing P-value from t statistic to significance level (Opens a modal)
  • Free response example: Significance test for a mean (Opens a modal)
  • Writing hypotheses for a test about a mean Get 3 of 4 questions to level up!
  • Conditions for a t test about a mean Get 3 of 4 questions to level up!
  • Calculating the test statistic in a t test for a mean Get 3 of 4 questions to level up!
  • Calculating the P-value in a t test for a mean Get 3 of 4 questions to level up!
  • Making conclusions in a t test for a mean Get 3 of 4 questions to level up!

More significance testing videos

  • Hypothesis testing and p-values (Opens a modal)
  • One-tailed and two-tailed tests (Opens a modal)
  • Z-statistics vs. T-statistics (Opens a modal)
  • Small sample hypothesis test (Opens a modal)
  • Large sample proportion hypothesis testing (Opens a modal)

hypothesis test about population mean

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3 hypothesis testing.

In reviewing hypothesis tests, we start first with the general idea. Then, we keep returning to the basic procedures of hypothesis testing, each time adding a little more detail.

The general idea of hypothesis testing involves:

  • Making an initial assumption.
  • Collecting evidence (data).
  • Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

Example S.3.1

Is normal body temperature really 98.6 degrees f section  .

Consider the population of many, many adults. A researcher hypothesized that the average adult body temperature is lower than the often-advertised 98.6 degrees F. That is, the researcher wants an answer to the question: "Is the average adult body temperature 98.6 degrees? Or is it lower?" To answer his research question, the researcher starts by assuming that the average adult body temperature was 98.6 degrees F.

Then, the researcher went out and tried to find evidence that refutes his initial assumption. In doing so, he selects a random sample of 130 adults. The average body temperature of the 130 sampled adults is 98.25 degrees.

Then, the researcher uses the data he collected to make a decision about his initial assumption. It is either likely or unlikely that the researcher would collect the evidence he did given his initial assumption that the average adult body temperature is 98.6 degrees:

  • If it is likely , then the researcher does not reject his initial assumption that the average adult body temperature is 98.6 degrees. There is not enough evidence to do otherwise.
  • either the researcher's initial assumption is correct and he experienced a very unusual event;
  • or the researcher's initial assumption is incorrect.

In statistics, we generally don't make claims that require us to believe that a very unusual event happened. That is, in the practice of statistics, if the evidence (data) we collected is unlikely in light of the initial assumption, then we reject our initial assumption.

Example S.3.2

Criminal trial analogy section  .

One place where you can consistently see the general idea of hypothesis testing in action is in criminal trials held in the United States. Our criminal justice system assumes "the defendant is innocent until proven guilty." That is, our initial assumption is that the defendant is innocent.

In the practice of statistics, we make our initial assumption when we state our two competing hypotheses -- the null hypothesis ( H 0 ) and the alternative hypothesis ( H A ). Here, our hypotheses are:

  • H 0 : Defendant is not guilty (innocent)
  • H A : Defendant is guilty

In statistics, we always assume the null hypothesis is true . That is, the null hypothesis is always our initial assumption.

The prosecution team then collects evidence — such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, and handwriting samples — with the hopes of finding "sufficient evidence" to make the assumption of innocence refutable.

In statistics, the data are the evidence.

The jury then makes a decision based on the available evidence:

  • If the jury finds sufficient evidence — beyond a reasonable doubt — to make the assumption of innocence refutable, the jury rejects the null hypothesis and deems the defendant guilty. We behave as if the defendant is guilty.
  • If there is insufficient evidence, then the jury does not reject the null hypothesis . We behave as if the defendant is innocent.

In statistics, we always make one of two decisions. We either "reject the null hypothesis" or we "fail to reject the null hypothesis."

Errors in Hypothesis Testing Section  

Did you notice the use of the phrase "behave as if" in the previous discussion? We "behave as if" the defendant is guilty; we do not "prove" that the defendant is guilty. And, we "behave as if" the defendant is innocent; we do not "prove" that the defendant is innocent.

This is a very important distinction! We make our decision based on evidence not on 100% guaranteed proof. Again:

  • If we reject the null hypothesis, we do not prove that the alternative hypothesis is true.
  • If we do not reject the null hypothesis, we do not prove that the null hypothesis is true.

We merely state that there is enough evidence to behave one way or the other. This is always true in statistics! Because of this, whatever the decision, there is always a chance that we made an error .

Let's review the two types of errors that can be made in criminal trials:

Table S.3.2 shows how this corresponds to the two types of errors in hypothesis testing.

Note that, in statistics, we call the two types of errors by two different  names -- one is called a "Type I error," and the other is called  a "Type II error." Here are the formal definitions of the two types of errors:

There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!

Making the Decision Section  

Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. If it is likely , we do not reject the null hypothesis. If it is unlikely , then we reject the null hypothesis in favor of the alternative hypothesis. Effectively, then, making the decision reduces to determining "likely" or "unlikely."

In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption:

  • We could take the " critical value approach " (favored in many of the older textbooks).
  • Or, we could take the " P -value approach " (what is used most often in research, journal articles, and statistical software).

In the next two sections, we review the procedures behind each of these two approaches. To make our review concrete, let's imagine that μ is the average grade point average of all American students who major in mathematics. We first review the critical value approach for conducting each of the following three hypothesis tests about the population mean $\mu$:

In Practice

  • We would want to conduct the first hypothesis test if we were interested in concluding that the average grade point average of the group is more than 3.
  • We would want to conduct the second hypothesis test if we were interested in concluding that the average grade point average of the group is less than 3.
  • And, we would want to conduct the third hypothesis test if we were only interested in concluding that the average grade point average of the group differs from 3 (without caring whether it is more or less than 3).

Upon completing the review of the critical value approach, we review the P -value approach for conducting each of the above three hypothesis tests about the population mean \(\mu\). The procedures that we review here for both approaches easily extend to hypothesis tests about any other population parameter.

Module 10: Inference for Means

Hypothesis Test for a Population Mean (1 of 5)

Learning outcomes.

  • Recognize when to use a hypothesis test or a confidence interval to draw a conclusion about a population mean.
  • Under appropriate conditions, conduct a hypothesis test about a population mean. State a conclusion in context.

Introduction

In Inference for Means , our focus is on inference when the variable is quantitative, so the parameters and statistics are means. In “Estimating a Population Mean,” we learned how to use a sample mean to calculate a confidence interval. The confidence interval estimates a population mean. In “Hypothesis Test for a Population Mean,” we learn to use a sample mean to test a hypothesis about a population mean.

We did hypothesis tests in earlier modules. In Inference for One Proportion , each claim involved a single population proportion. In Inference for Two Proportions , the claim was a statement about a treatment effect or a difference in population proportions. In “Hypothesis Test for a Population Mean,” the claims are statements about a population mean. But we will see that the steps and the logic of the hypothesis test are the same. Before we get into the details, let’s practice identifying research questions and studies that involve a population mean.

Cell Phone Data

Cell phones and cell phone plans can be very expensive, so consumers must think carefully when choosing a cell phone and service. This decision is as much about choosing the right cellular company as it is about choosing the right phone. Many people use the data/Internet capabilities of a phone as much as, if not more than, they use voice capability. The data service of a cell company is therefore an important factor in this decision. In the following example, a student named Melanie from Los Angeles applies what she learned in her statistics class to help her make a decision about buying a data plan for her smartphone.

Melanie read an advertisement from the Cell Phone Giants (CPG, for short, and yes, we’re using a fictitious company name) that she thinks is too good to be true. The CPG ad states that customers in Los Angeles get average data download speeds of 4 Mbps. With this speed, the ad claims, it takes, on average, only 12 seconds to download a typical 3-minute song from iTunes.

Only 12 seconds on average to download a 3-minute song from iTunes! Melanie has her doubts about this claim, so she gathers data to test it. She asks a friend who uses the CPG plan to download a song, and it takes 13 seconds to download a 3-minute song using the CPG network. Melanie decides to gather more evidence. She uses her friend’s phone and times the download of the same 3-minute song from various locations in Los Angeles. She gets a mean download time of 13.5 seconds for her sample of downloads.

What can Melanie conclude? Her sample has a mean download time that is greater than 12 seconds. Isn’t this evidence that the CPG claim is wrong? Why is a hypothesis test necessary? Isn’t the conclusion clear?

Let’s review the reason Melanie needs to do a hypothesis test before she can reach a conclusion.

Why should Melanie do a hypothesis test?

Melanie’s data (with a mean of 13.5 seconds) suggest that the average download time overall is greater than the 12 seconds claimed by the manufacturer. But wait. We know that samples will vary. If the CPG claim is correct, we don’t expect all samples to have a mean download time exactly equal to 12 seconds. There will be variability in the sample means. But if the overall average download time is 12 seconds, how much variability in sample means do we expect to see? We need to determine if the difference Melanie observed can be explained by chance.

We have to judge Melanie’s data against random samples that come from a population with a mean of 12. For this reason, we must do a simulation or use a mathematical model to examine the sampling distribution of sample means. Based on the sampling distribution, we ask, Is it likely that the samples will have mean download times that are greater than 13.5 seconds if the overall mean is 12 seconds? This probability (the P-value) determines whether Melanie’s data provides convincing evidence against the CPG claim.

Now let’s do the hypothesis test.

Step 1: Determine the hypotheses.

As always, hypotheses come from the research question. The null hypothesis is a hypothesis that the population mean equals a specific value. The alternative hypothesis reflects our claim. The alternative hypothesis says the population mean is “greater than” or “less than” or “not equal to” the value we assume is true in the null hypothesis.

Melanie’s hypotheses:

  • H 0 : It takes 12 seconds on average to download Melanie’s song from iTunes with the CPG network in Los Angeles.
  • H a : It takes more than 12 seconds on average to download Melanie’s song from iTunes using the CPG network in Los Angeles.

We can write the hypotheses in terms of µ. When we do so, we should always define µ. Here μ = the average number of seconds it takes to download Melanie’s song on the CPG network in Los Angeles.

  • H 0 : μ = 12
  • H a : μ > 12

Step 2: Collect the data.

To conduct a hypothesis test, Melanie knows she has to use a t-model of the sampling distribution. She thinks ahead to the conditions required, which helps her collect a useful sample.

Recall the conditions for use of a t-model.

  • There is no reason to think the download times are normally distributed (they might be, but this isn’t something Melanie could know for sure). So the sample has to be large (more than 30).
  • The sample has to be random. Melanie decides to use one phone but randomly selects days, times, and locations in Los Angeles.

Melanie collects a random sample of 45 downloads by using her friend’s phone to download her song from iTunes according to the randomly selected days, times, and locations.

Melanie’s sample of size 45 downloads has an average download time of 13.5 seconds. The standard deviation for the sample is 3.2 seconds. Now Melanie needs to determine how unlikely this data is if CPG’s claim is actually true.

Step 3: Assess the evidence.

Assuming the average download time for Melanie’s song is really 12 seconds, what is the probability that 45 random downloads of this song will have a mean of 13.5 seconds or more?

This is a question about sampling variability. Melanie must determine the standard error. She knows the standard error of random sample means is [latex]\sigma/\sqrt{n}[/latex]. Since she has no way of knowing the population standard deviation, σ, Melanie uses the sample standard deviation, s = 3.2, as an approximation. Therefore, Melanie approximates the standard error of all sample means ( n = 45) to be

[latex]s/\sqrt{n} = 3.2/\sqrt{45}=0.48[/latex]

Now she can assess how far away her sample is from the claimed mean in terms of standard errors. That is, she can compute the t-score of her sample mean.

T = [latex]\frac{statistic-parameter}{standarderror} = \frac{\bar{x}-\mu}{s/\sqrt{n}}=\frac{13.5-12}{0.48}[/latex]=3.14

The sample mean for Melanie’s random sample is approximately 3.14 standard errors above the overall mean of 12. We know from previous experience that a sample mean this far above µ is very unlikely. With a t-score this large, the P-value is very small. We use a simulation of the t-model for 44 degrees of freedom to verify this.

The green area to the left of the T-value is 0.9985. The blue area to the right of the T-value is 0.0015.

We want the probability that the sample mean is greater than 13.5. This corresponds to the probability that T is greater than 3.14. The P-value is 0.0015.

Step 4: State a conclusion.

Here the logic is the same as for other hypothesis tests. We use the P-value to make a decision. The P-value helps us determine if the difference we see between the data and the hypothesized value of µ is statistically significant or due to chance. One of two outcomes can occur:

  • One possibility is that results similar to the actual sample are extremely unlikely. This means the data does not fit with results from random samples selected from the population described by the null hypothesis. In this case, it is unlikely that the data came from this population. The probability as measured by the P-value is small, so we view this as strong evidence against the null hypothesis. We reject the null hypothesis in favor of the alternative hypothesis.
  • The other possibility is that results similar to the actual sample are fairly likely (not unusual). This means the data fits with typical results from random samples selected from the population described by the null hypothesis. The probability as measured by the P-value is large. In this case, we do not have evidence against the null hypothesis, so we cannot reject it in favor of the alternative hypothesis.

Melanie’s data is very unlikely if µ = 12. The probability is essentially zero (P-value = 0.0015). This means we will rarely see sample means greater than 13.5 if µ = 12. So we reject the null and accept the alternative hypothesis. In other words, this sample provides strong evidence that CPG has overstated the speed of its data download capability.

The following activities give you an opportunity to practice parts of the hypothesis testing process for a population mean. Later you will have the opportunity to practice the hypothesis test from start to finish.

For the following scenarios, give the null and alternative hypotheses and state in words what µ represents in your hypotheses. A good definition of µ describes both the variable and the population.

In the previous example, Melanie did not state a significance level for her test. If she had, the logic is the same as we used for hypothesis tests in Modules 8 and 9. To come to a conclusion about H 0 , we compare the P-value to the significance level α.

  • If P ≤ α, we reject H 0 . We conclude there is significant evidence in favor of H a .
  • If P > α, we fail to reject H 0 . We conclude the sample does not provide significant evidence in favor of H a .

Use this simulation when needed to answer questions below.

  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a mean.

A population mean is an average of value a population.

Hypothesis tests are used to check a claim about the size of that population mean.

Hypothesis Testing a Mean

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Age when they received the prize.

And we want to check the claim:

"The average age of Nobel Prize winners when they received the prize is more than 55"

By taking a sample of 30 randomly selected Nobel Prize winners we could find that:

The mean age in the sample (\(\bar{x}\)) is 62.1

The standard deviation of age in the sample (\(s\)) is 13.46

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • The population data is normally distributed
  • Sample size is large enough

A moderately large sample size, like 30, is typically large enough.

In the example, the sample size was 30 and it was randomly selected, so the conditions are fulfilled.

Note: Checking if the data is normally distributed can be done with specialized statistical tests.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the mean age of Nobel Prize winners when they received the prize (\(\mu\)).

The null and alternative hypothesis are then:

Null hypothesis : The average age was 55.

Alternative hypothesis : The average age was more than 55.

Which can be expressed with symbols as:

\(H_{0}\): \(\mu = 55 \)

\(H_{1}\): \(\mu > 55 \)

This is a ' right tailed' test, because the alternative hypothesis claims that the proportion is more than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

Advertisement

3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population mean is:

\(\displaystyle \frac{\bar{x} - \mu}{s} \cdot \sqrt{n} \)

\(\bar{x}-\mu\) is the difference between the sample mean (\(\bar{x}\)) and the claimed population mean (\(\mu\)).

\(s\) is the sample standard deviation .

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population mean (\(\mu\)) was \( 55 \)

The sample mean (\(\bar{x}\)) was \(62.1\)

The sample standard deviation (\(s\)) was \(13.46\)

The sample size (\(n\)) was \(30\)

So the test statistic (TS) is then:

\(\displaystyle \frac{62.1-55}{13.46} \cdot \sqrt{30} = \frac{7.1}{13.46} \cdot \sqrt{30} \approx 0.528 \cdot 5.477 = \underline{2.889}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic.

With R use built-in math and statistics functions to calculate the test statistic.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population mean test, the critical value (CV) is a T-value from a student's t-distribution .

This critical T-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population mean is more than 55, the rejection region is in the right tail:

The student's t-distribution is adjusted for the uncertainty from smaller samples.

This adjustment is called degrees of freedom (df), which is the sample size \((n) - 1\)

In this case the degrees of freedom (df) is: \(30 - 1 = \underline{29} \)

Choosing a significance level (\(\alpha\)) of 0.01, or 1%, we can find the critical T-value from a T-table , or with a programming language function:

With Python use the Scipy Stats library t.ppf() function find the T-Value for an \(\alpha\) = 0.01 at 29 degrees of freedom (df).

With R use the built-in qt() function to find the t-value for an \(\alpha\) = 0.01 at 29 degrees of freedom (df).

Using either method we can find that the critical T-Value is \(\approx \underline{2.462}\)

For a right tailed test we need to check if the test statistic (TS) is bigger than the critical value (CV).

If the test statistic is bigger than the critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{2.889}\) and the critical value was \(\approx \underline{2.462}\)

Here is an illustration of this test in a graph:

Since the test statistic was bigger than the critical value we reject the null hypothesis.

This means that the sample data supports the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 1% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{2.889} \)

For a population proportion test, the test statistic is a T-Value from a student's t-distribution .

Because this is a right tailed test, we need to find the P-value of a t-value bigger than 2.889.

The student's t-distribution is adjusted according to degrees of freedom (df), which is the sample size \((30) - 1 = \underline{29}\)

We can find the P-value using a T-table , or with a programming language function:

With Python use the Scipy Stats library t.cdf() function find the P-value of a T-value bigger than 2.889 at 29 degrees of freedom (df):

With R use the built-in pt() function find the P-value of a T-Value bigger than 2.889 at 29 degrees of freedom (df):

Using either method we can find that the P-value is \(\approx \underline{0.0036}\)

This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.0036, or 0.36%, to reject the null hypothesis.

This P-value is smaller than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is rejected at all of these significance levels.

The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 10%, 5%, or 1% significance level .

Note: An outcome of an hypothesis test that rejects the null hypothesis with a p-value of 0.36% means:

For this p-value, we only expect to reject a true null hypothesis 36 out of 10000 times.

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a right tailed hypothesis test for a mean.

Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean bigger than 55.

With R use built-in math and statistics functions find the P-value for a right tailed hypothesis test for a mean.

Left-Tailed and Two-Tailed Tests

This was an example of a right tailed test, where the alternative hypothesis claimed that parameter is bigger than the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Left-Tailed Test
  • Two-Tailed Test

Get Certified

COLOR PICKER

colorpicker

Report Error

If you want to report an error, or if you want to make a suggestion, do not hesitate to send us an e-mail:

[email protected]

Top Tutorials

Top references, top examples, get certified.

Module 10: Inference for Means

Hypothesis test for a population mean (2 of 5), learning outcomes.

  • Under appropriate conditions, conduct a hypothesis test about a population mean. State a conclusion in context.

More on Checking Conditions for a T-Test

In practice, you will often see the use of a t-test with small samples. Technically, we can use the t-test with small samples only if we know the variable has a normal distribution in the population. But this is hard to verify. In addition, no variable has a perfect normal distribution. So what does the requirement that the “variable be normally distributed in the population” really mean?

We call a confidence interval or a hypothesis test robust if the confidence level or P-value does not change very much when the conditions for use of the procedure are not met.

T-procedures are robust when the variable is not normally distributed in the population, as long as the distribution is not heavily skewed. But how can we determine if the distribution of the variable in the population is heavily skewed? In this introductory course, we examine the distribution of the variable in the sample and make an educated guess about what is going on in the population.

Now we investigate this question: Can we tell from a sample whether the variable is normally distributed in the population?

Variable Skewed in the Population

Let’s start with a skewed distribution in the population. Can we tell that this distribution is not normal by looking at random samples?

The following figure shows the monthly payment on first home mortgages for 5,000 people, as reported in the 2000 U.S. Census. Think of this as data from the population of a small town. From this population, we randomly selected 20 people. We did this three times. Notice that for each random sample, the shape of the distribution of the monthly payments in the sample is skewed to the right, just like the distribution in the population. In 2 of the 3 samples, we also see outliers, just as we see in the population. So by looking at the sample, we can get a pretty good sense that the variable is not normally distributed in the population.

First mortgage monthly payment from three random samples of 20 people. In the first (main) graph, the gray bars get smaller as the monthly cost of the mortgage payment goes up in cost. In the first random sample graph, the bars are much higher on the left. In the second graph, the bars are fairly even towards the middle of the graph and then they drop off. In graph three, the bars are more randomly spaced along the graph.

In this example, the sample size is less than 30. We can use the t-test only if the variable is normally distributed in the population. The shape of the distribution in any one of these samples suggests that the variable has a skewed distribution in the population, so we would not conduct a t-test with any of these samples.

Variable Normal in the Population

Now we look at a variable that has an approximately normal distribution in the population. Can we tell that this distribution is approximately normal by looking at random samples?

The following graphs show the heights (in centimeters) of 5,000 women. Think of this as data from the population of a small town. From this population, we randomly selected 20 women. We did this three times. Notice that for each random sample, the shape of the distribution of the heights in the sample is not skewed, and there are no outliers. By looking at the sample, we can get a pretty good sense that the variable is not skewed in the population, which suggests that the variable may be somewhat normally distributed in the population.

The graphs of three random samples of the heights of 20 women taken from the population of a small town all look fairlysimilar to each other.

In this example, the sample size is less than 30. We can only use the t-test if the variable is normally distributed in the population. The shape of the distribution in any one of these samples indicates that the variable does not have a skewed distribution in the population, suggesting that the distribution in the population is somewhat normal. Since the t-procedures are robust, we would conduct a t-test with any of these samples.

What’s the Main Point?

We previously stated the conditions for use of the t-procedures as follows:

(1) If the variable is normally distributed in the population, you can always use the t-procedures.

(2) If the variable is not normally distributed in the population (or you can’t determine this factor), the sample size must be greater than 30 for safe use of the t-procedures.

We are now loosening these conditions somewhat because the t-procedures are robust.

(3) If the sample is small ( n ≤ 30), plot the data. If the distribution in the sample is not heavily skewed and does not have outliers, then we assume the variable is somewhat normally distributed in the population, so we use t-procedures.

If we use a t-procedure for a small sample ( n ≤ 30), it is good practice to include a disclaimer with the conclusion. We might say something like, “On the basis of the sample, we are assuming that the variable is distributed without strong skew or extreme outliers in the population. The conclusion from this test is valid only if this assumption is true.”

Each histogram in the following questions represents a random sample. We do not know if the variable has a normal distribution in the population, but we want to run a t-test to test a claim about the population mean. For each histogram, choose the option that best describes how to proceed with the hypothesis test.

Recall that the sample mean and standard deviation are not resistant to outliers. An outlier in the data can make the mean and standard deviation poor measures of center and spread. So why can we use data from large samples even if the data has an outlier? Well, if the sample is large enough, the distribution of sample means will still be approximately normal. And the t-model will be a good fit when we estimate the standard error of the sample means using the sample standard deviation. This is the important point. The P-value and confidence level come from a model of the sampling distribution, not from a model of the population’s distribution.

Summary in a Diagram

Summary of the material covered so far. Shows that if distribution is skewed in the population (or data suggests this), sample means are normally distributed only if sample size is large. Use a T-Model only of sample size is large. If distribution is normal in the population (or data suggests this), Sample means are always normally distributed. Use a T-Model.

Contribute!

Improve this page Learn More

  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Footer Logo Lumen Waymaker

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 10, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

IMAGES

  1. PPT

    hypothesis test about population mean

  2. Hypothesis Test for Population Mean Example

    hypothesis test about population mean

  3. 13.Hypothesis testing for single Population mean video 2

    hypothesis test about population mean

  4. Hypothesis Testing for the Population Mean

    hypothesis test about population mean

  5. PPT

    hypothesis test about population mean

  6. PPT

    hypothesis test about population mean

VIDEO

  1. Hypothesis Testing

  2. Hypothesis Test for a Population Mean, Sigma Known, Two-Tailed Test

  3. Hypothesis testing

  4. Proportion Hypothesis Testing, example 2

  5. Hypothesis Testing For Single Population (σ known)

  6. Hypothesis Tests About the Mean for TWO Populations

COMMENTS

  1. 10.26: Hypothesis Test for a Population Mean (5 of 5)

    The mean pregnancy length is 266 days. We test the following hypotheses. H 0: μ = 266. H a: μ < 266. Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The P-value is 0.04.

  2. Hypothesis Test for a Population Mean (1 of 5)

    In "Hypothesis Test for a Population Mean," we learn to use a sample mean to test a hypothesis about a population mean. We did hypothesis tests in earlier modules. In Inference for One Proportion, each claim involved a single population proportion. In Inference for Two Proportions, the claim was a statement about a treatment effect or a ...

  3. 8.7 Hypothesis Tests for a Population Mean with Unknown Population

    The hypothesis test for a population mean is a well established process: Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex]. Include appropriate units with the values of the mean. Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.

  4. Hypothesis Test for a Mean

    Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

  5. Introduction to Hypothesis Test for a Population Mean

    What you'll learn to do: Conduct and interpret results from a hypothesis test about a population mean. In this section we will learn to conduct a hypothesis test about a population mean and state a conclusion in context under appropriate conditions. Matched pairs design is when there is a "before and after" situation i.e. two quantitative ...

  6. Significance tests (hypothesis testing)

    Unit test. Significance tests give us a formal process for using sample data to evaluate the likelihood of some claim about a population value. Learn how to conduct significance tests and calculate p-values to see how likely a sample result is to occur by random chance. You'll also see how we use p-values to make conclusions about hypotheses.

  7. Hypothesis Test for a Population Mean (5 of 5)

    In this "Hypothesis Test for a Population Mean," we looked at the four steps of a hypothesis test as they relate to a claim about a population mean. Step 1: Determine the hypotheses. The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0.

  8. DOC Hypothesis Test for a Population Mean

    In hypothesis testing, there is a preconceived idea about the value of the population parameter. For example, in studying the antipsychotic properties of an experimental compound, we might ask whether the average shock-avoidance response of rats treated with a specific dose of the compound is greater than 60, >60, the value that has been ...

  9. S.3 Hypothesis Testing

    Every hypothesis test — regardless of the population parameter involved — requires the above three steps. Example S.3.1. ... We first review the critical value approach for conducting each of the following three hypothesis tests about the population mean $\mu$: Type. Null.

  10. PDF Hypothesis Testing for population mean

    Hypothesis Testing for Population Mean with Known and Unknown Population Standard Deviation Hypothesis tests are used to make decisions or judgments about the value of a parameter, such as the population mean. There are two approaches for conducting a hypothesis test; the critical value approach and the P-value approach.

  11. Hypothesis Test for a Population Mean (1 of 5)

    The confidence interval estimates a population mean. In "Hypothesis Test for a Population Mean," we learn to use a sample mean to test a hypothesis about a population mean. We did hypothesis tests in earlier modules. In Inference for One Proportion, each claim involved a single population proportion. In Inference for Two Proportions, the ...

  12. Statistics

    The formula for the test statistic (TS) of a population mean is: x ¯ − μ s ⋅ n. x ¯ − μ is the difference between the sample mean ( x ¯) and the claimed population mean ( μ ). s is the sample standard deviation. n is the sample size. In our example: The claimed ( H 0) population mean ( μ) was 55.

  13. Hypothesis Test for a Population Mean (5 of 5)

    In this "Hypothesis Test for a Population Mean," we looked at the four steps of a hypothesis test as they relate to a claim about a population mean. Step 1: Determine the hypotheses. The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0.

  14. Hypothesis Test for a Population Mean (2 of 5)

    Each histogram in the following questions represents a random sample. We do not know if the variable has a normal distribution in the population, but we want to run a t-test to test a claim about the population mean. For each histogram, choose the option that best describes how to proceed with the hypothesis test.

  15. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.