Teach yourself statistics

Hypothesis Test for a Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:

  • The sampling method is simple random sampling .
  • The sampling distribution is normal or nearly normal.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

  • The population distribution is normal.
  • The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
  • The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = s * sqrt{ ( 1/n ) * [ ( N - n ) / ( N - 1 ) ] }

SE = s / sqrt( n )

  • Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n - 1.

t = ( x - μ) / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Sample Size Calculator

As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: μ = 300

Alternative hypothesis: μ ≠ 300

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a one-sample t-test .

SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83

DF = n - 1 = 50 - 1 = 49

t = ( x - μ) / SE = (295 - 300)/2.83 = -1.77

where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < -1.77) is about 0.04.

  • If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t >  1.77) is 1 minus 0.96 or 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
  • Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).

Problem 2: One-Tailed Test

Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)

Null hypothesis: μ >= 110

Alternative hypothesis: μ < 110

  • Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a one-sample t-test .

SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236

DF = n - 1 = 20 - 1 = 19

t = ( x - μ) / SE = (108 - 110)/2.236 = -0.894

Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.

The observed sample mean produced a t statistic test statistic of -0.894. We use the t Distribution Calculator to find P(t < -0.894) is about 0.19.

  • This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19.
  • Interpret results . Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.
  • FOR INSTRUCTOR
  • FOR INSTRUCTORS

8.4.3 Hypothesis Testing for the Mean

$\quad$ $H_0$: $\mu=\mu_0$, $\quad$ $H_1$: $\mu \neq \mu_0$.

$\quad$ $H_0$: $\mu \leq \mu_0$, $\quad$ $H_1$: $\mu > \mu_0$.

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$.

Two-sided Tests for the Mean:

Therefore, we can suggest the following test. Choose a threshold, and call it $c$. If $|W| \leq c$, accept $H_0$, and if $|W|>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have

  • As discussed above, we let \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} Note that, assuming $H_0$, $W \sim N(0,1)$. We will choose a threshold, $c$. If $|W| \leq c$, we accept $H_0$, and if $|W|>c$, accept $H_1$. To choose $c$, we let \begin{align} P(|W| > c \; | \; H_0) =\alpha. \end{align} Since the standard normal PDF is symmetric around $0$, we have \begin{align} P(|W| > c \; | \; H_0) = 2 P(W>c | \; H_0). \end{align} Thus, we conclude $P(W>c | \; H_0)=\frac{\alpha}{2}$. Therefore, \begin{align} c=z_{\frac{\alpha}{2}}. \end{align} Therefore, we accept $H_0$ if \begin{align} \left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \leq z_{\frac{\alpha}{2}}, \end{align} and reject it otherwise.
  • We have \begin{align} \beta (\mu) &=P(\textrm{type II error}) = P(\textrm{accept }H_0 \; | \; \mu) \\ &= P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right). \end{align} If $X_i \sim N(\mu,\sigma^2)$, then $\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$. Thus, \begin{align} \beta (\mu)&=P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right)\\ &=P\left(\mu_0- z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \overline{X} \leq \mu_0+ z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\right)\\ &=\Phi\left(z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right)-\Phi\left(-z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right). \end{align}
  • Let $S^2$ be the sample variance for this random sample. Then, the random variable $W$ defined as \begin{equation} W(X_1,X_2, \cdots, X_n)=\frac{\overline{X}-\mu_0}{S / \sqrt{n}} \end{equation} has a $t$-distribution with $n-1$ degrees of freedom, i.e., $W \sim T(n-1)$. Thus, we can repeat the analysis of Example 8.24 here. The only difference is that we need to replace $\sigma$ by $S$ and $z_{\frac{\alpha}{2}}$ by $t_{\frac{\alpha}{2},n-1}$. Therefore, we accept $H_0$ if \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}, \end{align} and reject it otherwise. Let us look at a numerical example of this case.

$\quad$ $H_0$: $\mu=170$, $\quad$ $H_1$: $\mu \neq 170$.

  • Let's first compute the sample mean and the sample standard deviation. The sample mean is \begin{align}%\label{} \overline{X}&=\frac{X_1+X_2+X_3+X_4+X_5+X_6+X_7+X_8+X_9}{9}\\ &=165.8 \end{align} The sample variance is given by \begin{align}%\label{} {S}^2=\frac{1}{9-1} \sum_{k=1}^9 (X_k-\overline{X})^2&=68.01 \end{align} The sample standard deviation is given by \begin{align}%\label{} S&= \sqrt{S^2}=8.25 \end{align} The following MATLAB code can be used to obtain these values: x=[176.2,157.9,160.1,180.9,165.1,167.2,162.9,155.7,166.2]; m=mean(x); v=var(x); s=std(x); Now, our test statistic is \begin{align} W(X_1,X_2, \cdots, X_9)&=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}\\ &=\frac{165.8-170}{8.25 / 3}=-1.52 \end{align} Thus, $|W|=1.52$. Also, we have \begin{align} t_{\frac{\alpha}{2},n-1} = t_{0.025,8} \approx 2.31 \end{align} The above value can be obtained in MATLAB using the command $\mathtt{tinv(0.975,8)}$. Thus, we conclude \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}. \end{align} Therefore, we accept $H_0$. In other words, we do not have enough evidence to conclude that the average height in the city is different from the average height in the country.

Let us summarize what we have obtained for the two-sided test for the mean.

One-sided Tests for the Mean:

  • As before, we define the test statistic as \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} If $H_0$ is true (i.e., $\mu \leq \mu_0$), we expect $\overline{X}$ (and thus $W$) to be relatively small, while if $H_1$ is true, we expect $\overline{X}$ (and thus $W$) to be larger. This suggests the following test: Choose a threshold, and call it $c$. If $W \leq c$, accept $H_0$, and if $W>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have \begin{align} P(\textrm{type I error}) &= P(\textrm{Reject }H_0 \; | \; H_0) \\ &= P(W > c \; | \; \mu \leq \mu_0) \leq \alpha. \end{align} Here, the probability of type I error depends on $\mu$. More specifically, for any $\mu \leq \mu_0$, we can write \begin{align} P(\textrm{type I error} \; | \; \mu) &= P(\textrm{Reject }H_0 \; | \; \mu) \\ &= P(W > c \; | \; \mu)\\ &=P \left(\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}+\frac{\mu-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c+\frac{\mu_0-\mu}{\sigma / \sqrt{n}} \; | \; \mu\right)\\ &\leq P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c \; | \; \mu\right) \quad (\textrm{ since }\mu \leq \mu_0)\\ &=1-\Phi(c) \quad \big(\textrm{ since given }\mu, \frac{\overline{X}-\mu}{\sigma / \sqrt{n}} \sim N(0,1) \big). \end{align} Thus, we can choose $\alpha=1-\Phi(c)$, which results in \begin{align} c=z_{\alpha}. \end{align} Therefore, we accept $H_0$ if \begin{align} \frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \leq z_{\alpha}, \end{align} and reject it otherwise.

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$,

t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

  • A one-sample t-test;
  • A two-sample t-test; and
  • A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

  • The data points are independent; AND
  • The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

p-value from t-test

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 ​ .

The alternative hypothesis is that the population mean is:

  • different from μ 0 \mu_0 μ 0 ​ ;
  • smaller than μ 0 \mu_0 μ 0 ​ ; or
  • greater than μ 0 \mu_0 μ 0 ​ .

One-sample t-test formula :

  • μ 0 \mu_0 μ 0 ​ — Mean postulated in the null hypothesis;
  • n n n — Sample size;
  • x ˉ \bar{x} x ˉ — Sample mean; and
  • s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 ​ , and μ 2 \mu_2 μ 2 ​ , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 ​ − μ 2 ​ is:

  • Different from Δ \Delta Δ ;
  • Smaller than Δ \Delta Δ ; or
  • Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

  • μ 1 \mu_1 μ 1 ​ and μ 2 \mu_2 μ 2 ​ are different from one another;
  • μ 1 \mu_1 μ 1 ​ is smaller than μ 2 \mu_2 μ 2 ​ ; and
  • μ 1 \mu_1 μ 1 ​ is greater than μ 2 \mu_2 μ 2 ​ .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p ​ is the so-called pooled standard deviation , which we compute as:

  • Δ \Delta Δ — Mean difference postulated in the null hypothesis;
  • n 1 n_1 n 1 ​ — First sample size;
  • x ˉ 1 \bar{x}_1 x ˉ 1 ​ — Mean for the first sample;
  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • n 2 n_2 n 2 ​ — Second sample size;
  • x ˉ 2 \bar{x}_2 x ˉ 2 ​ — Mean for the second sample; and
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 ​ + n 2 ​ − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

  • The pre- and post-means are different from one another (treatment has some effect);
  • The pre-mean is smaller than the post-mean (treatment increases the result); or
  • The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 ​ , ... , x n ​ be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 ​ , ... , y n ​ the respective post observations. That is, x i , y i x_i, y_i x i ​ , y i ​ are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i ​ := x i ​ − y i ​ . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 ​ , ... , d n ​ . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s  — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

  • One-sample t-test;
  • Two-sample t-test; and
  • Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

  • Subtract the null hypothesis mean from the sample mean value.
  • Divide the difference by the standard deviation of the sample.
  • Multiply the resultant with the square root of the sample size.

.css-slt4t3.css-slt4t3{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-slt4t3.css-slt4t3:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-slt4t3 .js-external-link-button.link-like,.css-slt4t3 .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-slt4t3 .js-external-link-button.link-like:hover,.css-slt4t3 .js-external-link-anchor:hover,.css-slt4t3 .js-external-link-button.link-like:active,.css-slt4t3 .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-slt4t3 .js-external-link-button.link-like:focus-visible,.css-slt4t3 .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-slt4t3 p,.css-slt4t3 div{margin:0px;display:block;}.css-slt4t3 pre{margin:0px;display:block;}.css-slt4t3 pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-slt4t3 pre:not(:first-child){padding-top:8px;}.css-slt4t3 ul,.css-slt4t3 ol{display:block margin:0px;padding-left:20px;}.css-slt4t3 ul li,.css-slt4t3 ol li{padding-top:8px;}.css-slt4t3 ul ul,.css-slt4t3 ol ul,.css-slt4t3 ul ol,.css-slt4t3 ol ol{padding-top:0px;}.css-slt4t3 ul:not(:first-child),.css-slt4t3 ol:not(:first-child){padding-top:4px;} .css-4okk7a{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-4okk7a code,.css-4okk7a kbd,.css-4okk7a pre,.css-4okk7a samp{font-family:monospace;}.css-4okk7a code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-4okk7a figcaption,.css-4okk7a caption{text-align:center;}.css-4okk7a figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-4okk7a h3{font-size:1.75rem;}.css-4okk7a h4{font-size:1.5rem;}.css-4okk7a .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-4okk7a .mathBlock .katex{font-size:24px;text-align:left;}.css-4okk7a .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-4okk7a .videoBlock,.css-4okk7a .imageBlock{margin-bottom:16px;}.css-4okk7a .imageBlock__image-align--left,.css-4okk7a .videoBlock__video-align--left{float:left;}.css-4okk7a .imageBlock__image-align--right,.css-4okk7a .videoBlock__video-align--right{float:right;}.css-4okk7a .imageBlock__image-align--center,.css-4okk7a .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-4okk7a .imageBlock__image-align--none,.css-4okk7a .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-4okk7a .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-4okk7a .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-4okk7a .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-4okk7a .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-4okk7a .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-4okk7a .katex .katex-version::after{content:'0.13.13';}.css-4okk7a .katex .katex-mathml{position:absolute;clip:rect(1px, 1px, 1px, 1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-4okk7a .katex .katex-html>.newline{display:block;}.css-4okk7a .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-4okk7a .katex .strut{display:inline-block;}.css-4okk7a .katex .textbf{font-weight:bold;}.css-4okk7a .katex .textit{font-style:italic;}.css-4okk7a .katex .textrm{font-family:KaTeX_Main;}.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .texttt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-4okk7a .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-4okk7a .katex .mathrm{font-style:normal;}.css-4okk7a .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-4okk7a .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-4okk7a .katex .amsrm{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathbb,.css-4okk7a .katex .textbb{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-4okk7a .katex .mathfrak,.css-4okk7a .katex .textfrak{font-family:KaTeX_Fraktur;}.css-4okk7a .katex .mathtt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathscr,.css-4okk7a .katex .textscr{font-family:KaTeX_Script;}.css-4okk7a .katex .mathsf,.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .mathboldsf,.css-4okk7a .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-4okk7a .katex .mathitsf,.css-4okk7a .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-4okk7a .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-4okk7a .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-4okk7a .katex .vlist-r{display:table-row;}.css-4okk7a .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-4okk7a .katex .vlist>span{display:block;height:0;position:relative;}.css-4okk7a .katex .vlist>span>span{display:inline-block;}.css-4okk7a .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-4okk7a .katex .vlist-t2{margin-right:-2px;}.css-4okk7a .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-4okk7a .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-4okk7a .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-4okk7a .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-4okk7a .katex .msupsub{text-align:left;}.css-4okk7a .katex .mfrac>span>span{text-align:center;}.css-4okk7a .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .mfrac .frac-line,.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline,.css-4okk7a .katex .hdashline,.css-4okk7a .katex .rule{min-height:1px;}.css-4okk7a .katex .mspace{display:inline-block;}.css-4okk7a .katex .llap,.css-4okk7a .katex .rlap,.css-4okk7a .katex .clap{width:0;position:relative;}.css-4okk7a .katex .llap>.inner,.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{position:absolute;}.css-4okk7a .katex .llap>.fix,.css-4okk7a .katex .rlap>.fix,.css-4okk7a .katex .clap>.fix{display:inline-block;}.css-4okk7a .katex .llap>.inner{right:0;}.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{left:0;}.css-4okk7a .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-4okk7a .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-4okk7a .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-4okk7a .katex .sizing.reset-size1.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-4okk7a .katex .sizing.reset-size1.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size1.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-4okk7a .katex .sizing.reset-size1.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size1.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size1.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-4okk7a .katex .sizing.reset-size1.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size1.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size1.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-4okk7a .katex .sizing.reset-size1.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-4okk7a .katex .sizing.reset-size1.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-4okk7a .katex .sizing.reset-size2.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size2.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-4okk7a .katex .sizing.reset-size2.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-4okk7a .katex .sizing.reset-size2.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size2.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size2.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-4okk7a .katex .sizing.reset-size2.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-4okk7a .katex .sizing.reset-size2.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size2.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size2.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-4okk7a .katex .sizing.reset-size2.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-4okk7a .katex .sizing.reset-size3.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-4okk7a .katex .sizing.reset-size3.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-4okk7a .katex .sizing.reset-size3.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-4okk7a .katex .sizing.reset-size3.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-4okk7a .katex .sizing.reset-size3.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-4okk7a .katex .sizing.reset-size3.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-4okk7a .katex .sizing.reset-size3.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-4okk7a .katex .sizing.reset-size3.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-4okk7a .katex .sizing.reset-size3.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-4okk7a .katex .sizing.reset-size4.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size4.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size4.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-4okk7a .katex .sizing.reset-size4.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-4okk7a .katex .sizing.reset-size4.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-4okk7a .katex .sizing.reset-size4.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-4okk7a .katex .sizing.reset-size4.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size4.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size4.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-4okk7a .katex .sizing.reset-size4.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-4okk7a .katex .sizing.reset-size4.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-4okk7a .katex .sizing.reset-size5.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size5.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size5.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-4okk7a .katex .sizing.reset-size5.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-4okk7a .katex .sizing.reset-size5.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-4okk7a .katex .sizing.reset-size5.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-4okk7a .katex .sizing.reset-size5.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size5.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size5.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-4okk7a .katex .sizing.reset-size5.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-4okk7a .katex .sizing.reset-size5.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-4okk7a .katex .sizing.reset-size6.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size6.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-4okk7a .katex .sizing.reset-size6.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-4okk7a .katex .sizing.reset-size6.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-4okk7a .katex .sizing.reset-size6.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-4okk7a .katex .sizing.reset-size6.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-4okk7a .katex .sizing.reset-size6.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size6.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size6.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-4okk7a .katex .sizing.reset-size6.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-4okk7a .katex .sizing.reset-size6.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-4okk7a .katex .sizing.reset-size7.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size7.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size7.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-4okk7a .katex .sizing.reset-size7.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size7.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size7.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size7.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-4okk7a .katex .sizing.reset-size7.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size7.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size7.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-4okk7a .katex .sizing.reset-size7.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-4okk7a .katex .sizing.reset-size8.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size8.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size8.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-4okk7a .katex .sizing.reset-size8.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size8.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size8.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size8.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size8.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-4okk7a .katex .sizing.reset-size8.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size8.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-4okk7a .katex .sizing.reset-size8.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-4okk7a .katex .sizing.reset-size9.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-4okk7a .katex .sizing.reset-size9.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size9.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-4okk7a .katex .sizing.reset-size9.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-4okk7a .katex .sizing.reset-size9.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-4okk7a .katex .sizing.reset-size9.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-4okk7a .katex .sizing.reset-size9.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size9.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size9.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-4okk7a .katex .sizing.reset-size9.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-4okk7a .katex .sizing.reset-size9.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-4okk7a .katex .sizing.reset-size10.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-4okk7a .katex .sizing.reset-size10.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-4okk7a .katex .sizing.reset-size10.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-4okk7a .katex .sizing.reset-size10.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-4okk7a .katex .sizing.reset-size10.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-4okk7a .katex .sizing.reset-size10.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-4okk7a .katex .sizing.reset-size10.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-4okk7a .katex .sizing.reset-size10.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-4okk7a .katex .sizing.reset-size10.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-4okk7a .katex .sizing.reset-size10.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-4okk7a .katex .sizing.reset-size10.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-4okk7a .katex .sizing.reset-size11.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-4okk7a .katex .sizing.reset-size11.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-4okk7a .katex .sizing.reset-size11.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-4okk7a .katex .sizing.reset-size11.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-4okk7a .katex .sizing.reset-size11.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-4okk7a .katex .sizing.reset-size11.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-4okk7a .katex .sizing.reset-size11.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-4okk7a .katex .sizing.reset-size11.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-4okk7a .katex .sizing.reset-size11.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-4okk7a .katex .sizing.reset-size11.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-4okk7a .katex .sizing.reset-size11.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-4okk7a .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-4okk7a .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-4okk7a .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-4okk7a .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-4okk7a .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-4okk7a .katex .delimcenter{position:relative;}.css-4okk7a .katex .op-symbol{position:relative;}.css-4okk7a .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-4okk7a .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-4okk7a .katex .op-limits>.vlist-t{text-align:center;}.css-4okk7a .katex .accent>.vlist-t{text-align:center;}.css-4okk7a .katex .accent .accent-body{position:relative;}.css-4okk7a .katex .accent .accent-body:not(.accent-full){width:0;}.css-4okk7a .katex .overlay{display:block;}.css-4okk7a .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-4okk7a .katex .mtable .arraycolsep{display:inline-block;}.css-4okk7a .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-4okk7a .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-4okk7a .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-4okk7a .katex .svg-align{text-align:left;}.css-4okk7a .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-4okk7a .katex svg path{stroke:none;}.css-4okk7a .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-4okk7a .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-4okk7a .katex .stretchy::before,.css-4okk7a .katex .stretchy::after{content:'';}.css-4okk7a .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-4okk7a .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-4okk7a .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .x-arrow-pad{padding:0 0.5em;}.css-4okk7a .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-4okk7a .katex .x-arrow,.css-4okk7a .katex .mover,.css-4okk7a .katex .munder{text-align:center;}.css-4okk7a .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-4okk7a .katex .fbox,.css-4okk7a .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-4okk7a .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-4okk7a .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-4okk7a .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-4okk7a .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-4okk7a .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-4okk7a .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-4okk7a .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-4okk7a .katex .mtr-glue{width:50%;}.css-4okk7a .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-4okk7a .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-4okk7a .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-4okk7a .katex-display{display:block;margin:1em 0;text-align:center;}.css-4okk7a .katex-display>.katex{display:block;white-space:nowrap;}.css-4okk7a .katex-display>.katex>.katex-html{display:block;position:relative;}.css-4okk7a .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-4okk7a .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-4okk7a .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-4okk7a body{counter-reset:katexEqnNo mmlEqnNo;}.css-4okk7a table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-4okk7a .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-4okk7a .tableBlock thead,.css-4okk7a .tableBlock thead th{border-bottom:1px solid #333!important;}.css-4okk7a .tableBlock th,.css-4okk7a .tableBlock td{padding:10px;text-align:left;}.css-4okk7a .tableBlock th{font-weight:bold!important;}.css-4okk7a .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-4okk7a .tableBlock caption>p{margin:0;}.css-4okk7a .tableBlock th>p,.css-4okk7a .tableBlock td>p{margin:0;}.css-4okk7a .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-4okk7a .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-4okk7a .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-4okk7a .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-4okk7a .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-4okk7a .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-4okk7a .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-4okk7a .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-4okk7a .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-4okk7a .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-4okk7a .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-4okk7a .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-4okk7a .tableBlock [data-text-align='center']{text-align:center;}.css-4okk7a .tableBlock [data-text-align='left']{text-align:left;}.css-4okk7a .tableBlock [data-text-align='right']{text-align:right;}.css-4okk7a .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-4okk7a .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-4okk7a .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-4okk7a .tableBlock__font-size--xxsmall{font-size:10px;}.css-4okk7a .tableBlock__font-size--xsmall{font-size:12px;}.css-4okk7a .tableBlock__font-size--small{font-size:14px;}.css-4okk7a .tableBlock__font-size--large{font-size:18px;}.css-4okk7a .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--bordered td,.css-4okk7a .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--borderless tbody+tbody,.css-4okk7a .tableBlock__border--borderless td,.css-4okk7a .tableBlock__border--borderless th,.css-4okk7a .tableBlock__border--borderless tr,.css-4okk7a .tableBlock__border--borderless thead,.css-4okk7a .tableBlock__border--borderless thead th{border:0!important;}.css-4okk7a .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-4okk7a .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-4okk7a .tableBlock__table-compactl th,.css-4okk7a .tableBlock__table-compact td{padding:3px!important;}.css-4okk7a .tableBlock__full-size{width:100%;}.css-4okk7a .textBlock{margin-bottom:16px;}.css-4okk7a .textBlock__text-formatting--finePrint{font-size:12px;}.css-4okk7a .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-4okk7a .textBlock__text-infoBox p{margin:0;}.css-4okk7a .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-4okk7a .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-4okk7a .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-4okk7a .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-4okk7a .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-4okk7a .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-4okk7a .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-4okk7a .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-4okk7a.css-4okk7a{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-4okk7a.css-4okk7a:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-4okk7a .js-external-link-button.link-like,.css-4okk7a .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-4okk7a .js-external-link-button.link-like:hover,.css-4okk7a .js-external-link-anchor:hover,.css-4okk7a .js-external-link-button.link-like:active,.css-4okk7a .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-4okk7a .js-external-link-button.link-like:focus-visible,.css-4okk7a .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-4okk7a p,.css-4okk7a div{margin:0px;display:block;}.css-4okk7a pre{margin:0px;display:block;}.css-4okk7a pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-4okk7a pre:not(:first-child){padding-top:8px;}.css-4okk7a ul,.css-4okk7a ol{display:block margin:0px;padding-left:20px;}.css-4okk7a ul li,.css-4okk7a ol li{padding-top:8px;}.css-4okk7a ul ul,.css-4okk7a ol ul,.css-4okk7a ul ol,.css-4okk7a ol ol{padding-top:0px;}.css-4okk7a ul:not(:first-child),.css-4okk7a ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0  

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

Hypothesis Testing for Means & Proportions

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

hypothesis test given mean and standard deviation

Introduction

This is the first of three modules that will addresses the second area of statistical inference, which is hypothesis testing, in which a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The process of hypothesis testing involves setting up two competing hypotheses, the null hypothesis and the alternate hypothesis. One selects a random sample (or multiple samples when there are more comparison groups), computes summary statistics and then assesses the likelihood that the sample data support the research or alternative hypothesis. Similar to estimation, the process of hypothesis testing is based on probability theory and the Central Limit Theorem.  

This module will focus on hypothesis testing for means and proportions. The next two modules in this series will address analysis of variance and chi-squared tests. 

Learning Objectives

After completing this module, the student will be able to:

  • Define null and research hypothesis, test statistic, level of significance and decision rule
  • Distinguish between Type I and Type II errors and discuss the implications of each
  • Explain the difference between one and two sided tests of hypothesis
  • Estimate and interpret p-values
  • Explain the relationship between confidence interval estimates and p-values in drawing inferences
  • Differentiate hypothesis testing procedures based on type of outcome variable and number of sample

Introduction to Hypothesis Testing

Techniques for hypothesis testing  .

The techniques for hypothesis testing depend on

  • the type of outcome variable being analyzed (continuous, dichotomous, discrete)
  • the number of comparison groups in the investigation
  • whether the comparison groups are independent (i.e., physically separate such as men versus women) or dependent (i.e., matched or paired such as pre- and post-assessments on the same participants).

In estimation we focused explicitly on techniques for one and two samples and discussed estimation for a specific parameter (e.g., the mean or proportion of a population), for differences (e.g., difference in means, the risk difference) and ratios (e.g., the relative risk and odds ratio). Here we will focus on procedures for one and two samples when the outcome is either continuous (and we focus on means) or dichotomous (and we focus on proportions).

General Approach: A Simple Example

The Centers for Disease Control (CDC) reported on trends in weight, height and body mass index from the 1960's through 2002. 1 The general trend was that Americans were much heavier and slightly taller in 2002 as compared to 1960; both men and women gained approximately 24 pounds, on average, between 1960 and 2002.   In 2002, the mean weight for men was reported at 191 pounds. Suppose that an investigator hypothesizes that weights are even higher in 2006 (i.e., that the trend continued over the subsequent 4 years). The research hypothesis is that the mean weight in men in 2006 is more than 191 pounds. The null hypothesis is that there is no change in weight, and therefore the mean weight is still 191 pounds in 2006.  

In order to test the hypotheses, we select a random sample of American males in 2006 and measure their weights. Suppose we have resources available to recruit n=100 men into our sample. We weigh each participant and compute summary statistics on the sample data. Suppose in the sample we determine the following:

Do the sample data support the null or research hypothesis? The sample mean of 197.1 is numerically higher than 191. However, is this difference more than would be expected by chance? In hypothesis testing, we assume that the null hypothesis holds until proven otherwise. We therefore need to determine the likelihood of observing a sample mean of 197.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true or under the null hypothesis). We can compute this probability using the Central Limit Theorem. Specifically,

(Notice that we use the sample standard deviation in computing the Z score. This is generally an appropriate substitution as long as the sample size is large, n > 30. Thus, there is less than a 1% probability of observing a sample mean as large as 197.1 when the true population mean is 191. Do you think that the null hypothesis is likely true? Based on how unlikely it is to observe a sample mean of 197.1 under the null hypothesis (i.e., <1% probability), we might infer, from our data, that the null hypothesis is probably not true.

Suppose that the sample data had turned out differently. Suppose that we instead observed the following in 2006:

How likely it is to observe a sample mean of 192.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true)? We can again compute this probability using the Central Limit Theorem. Specifically,

There is a 33.4% probability of observing a sample mean as large as 192.1 when the true population mean is 191. Do you think that the null hypothesis is likely true?  

Neither of the sample means that we obtained allows us to know with certainty whether the null hypothesis is true or not. However, our computations suggest that, if the null hypothesis were true, the probability of observing a sample mean >197.1 is less than 1%. In contrast, if the null hypothesis were true, the probability of observing a sample mean >192.1 is about 33%. We can't know whether the null hypothesis is true, but the sample that provided a mean value of 197.1 provides much stronger evidence in favor of rejecting the null hypothesis, than the sample that provided a mean value of 192.1. Note that this does not mean that a sample mean of 192.1 indicates that the null hypothesis is true; it just doesn't provide compelling evidence to reject it.

In essence, hypothesis testing is a procedure to compute a probability that reflects the strength of the evidence (based on a given sample) for rejecting the null hypothesis. In hypothesis testing, we determine a threshold or cut-off point (called the critical value) to decide when to believe the null hypothesis and when to believe the research hypothesis. It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. Based on the two samples above it would seem reasonable to believe the research hypothesis when x̄ = 197.1, but to believe the null hypothesis when x̄ =192.1. What we need is a threshold value such that if x̄ is above that threshold then we believe that H 1 is true and if x̄ is below that threshold then we believe that H 0 is true. The difficulty in determining a threshold for x̄ is that it depends on the scale of measurement. In this example, the threshold, sometimes called the critical value, might be 195 (i.e., if the sample mean is 195 or more then we believe that H 1 is true and if the sample mean is less than 195 then we believe that H 0 is true). Suppose we are interested in assessing an increase in blood pressure over time, the critical value will be different because blood pressures are measured in millimeters of mercury (mmHg) as opposed to in pounds. In the following we will explain how the critical value is determined and how we handle the issue of scale.

First, to address the issue of scale in determining the critical value, we convert our sample data (in particular the sample mean) into a Z score. We know from the module on probability that the center of the Z distribution is zero and extreme values are those that exceed 2 or fall below -2. Z scores above 2 and below -2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H 0 (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H 0 , then Z will be large.  

In hypothesis testing, we select a critical value from the Z distribution. This is done by first determining what is called the level of significance, denoted α ("alpha"). What we are doing here is drawing a line at extreme values. The level of significance is the probability that we reject the null hypothesis (in favor of the alternative) when it is actually true and is also called the Type I error rate.

α = Level of significance = P(Type I error) = P(Reject H 0 | H 0 is true).

Because α is a probability, it ranges between 0 and 1. The most commonly used value in the medical literature for α is 0.05, or 5%. Thus, if an investigator selects α=0.05, then they are allowing a 5% probability of incorrectly rejecting the null hypothesis in favor of the alternative when the null is in fact true. Depending on the circumstances, one might choose to use a level of significance of 1% or 10%. For example, if an investigator wanted to reject the null only if there were even stronger evidence than that ensured with α=0.05, they could choose a =0.01as their level of significance. The typical values for α are 0.01, 0.05 and 0.10, with α=0.05 the most commonly used value.  

Suppose in our weight study we select α=0.05. We need to determine the value of Z that holds 5% of the values above it (see below).

Standard normal distribution curve showing an upper tail at z=1.645 where alpha=0.05

The critical value of Z for α =0.05 is Z = 1.645 (i.e., 5% of the distribution is above Z=1.645). With this value we can set up what is called our decision rule for the test. The rule is to reject H 0 if the Z score is 1.645 or more.  

With the first sample we have

Because 2.38 > 1.645, we reject the null hypothesis. (The same conclusion can be drawn by comparing the 0.0087 probability of observing a sample mean as extreme as 197.1 to the level of significance of 0.05. If the observed probability is smaller than the level of significance we reject H 0 ). Because the Z score exceeds the critical value, we conclude that the mean weight for men in 2006 is more than 191 pounds, the value reported in 2002. If we observed the second sample (i.e., sample mean =192.1), we would not be able to reject the null hypothesis because the Z score is 0.43 which is not in the rejection region (i.e., the region in the tail end of the curve above 1.645). With the second sample we do not have sufficient evidence (because we set our level of significance at 5%) to conclude that weights have increased. Again, the same conclusion can be reached by comparing probabilities. The probability of observing a sample mean as extreme as 192.1 is 33.4% which is not below our 5% level of significance.

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

Type I and Type II Errors

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

Tests with One Sample, Continuous Outcome

Hypothesis testing applications with a continuous outcome variable in a single population are performed according to the five-step procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0 ). The known value is generally derived from another study or report, for example a study in a similar, but not identical, population or a study performed some years ago. The latter is called a historical control. It is important in setting up the hypotheses in a one sample test that the mean specified in the null hypothesis is a fair and reasonable comparator. This will be discussed in the examples that follow.

Test Statistics for Testing H 0 : μ= μ 0

  • if n > 30
  • if n < 30

Note that statistical computing packages will use the t statistic exclusively and make the necessary adjustments for comparing the test statistic to appropriate values from probability tables to produce a p-value. 

The National Center for Health Statistics (NCHS) published a report in 2005 entitled Health, United States, containing extensive information on major trends in the health of Americans. Data are provided for the US population as a whole and for specific ages, sexes and races.  The NCHS report indicated that in 2002 Americans paid an average of $3,302 per year on health care and prescription drugs. An investigator hypothesizes that in 2005 expenditures have decreased primarily due to the availability of generic drugs. To test the hypothesis, a sample of 100 Americans are selected and their expenditures on health care and prescription drugs in 2005 are measured.   The sample data are summarized as follows: n=100, x̄

=$3,190 and s=$890. Is there statistical evidence of a reduction in expenditures on health care and prescription drugs in 2005? Is the sample mean of $3,190 evidence of a true reduction in the mean or is it within chance fluctuation? We will run the test using the five-step approach. 

  • Step 1.  Set up hypotheses and determine level of significance

H 0 : μ = 3,302 H 1 : μ < 3,302           α =0.05

The research hypothesis is that expenditures have decreased, and therefore a lower-tailed test is used.

This is a lower tailed test, using a Z statistic and a 5% level of significance.   Reject H 0 if Z < -1.645.

  •   Step 4. Compute the test statistic.  

We do not reject H 0 because -1.26 > -1.645. We do not have statistically significant evidence at α=0.05 to show that the mean expenditures on health care and prescription drugs are lower in 2005 than the mean of $3,302 reported in 2002.  

Recall that when we fail to reject H 0 in a test of hypothesis that either the null hypothesis is true (here the mean expenditures in 2005 are the same as those in 2002 and equal to $3,302) or we committed a Type II error (i.e., we failed to reject H 0 when in fact it is false). In summarizing this test, we conclude that we do not have sufficient evidence to reject H 0 . We do not conclude that H 0 is true, because there may be a moderate to high probability that we committed a Type II error. It is possible that the sample size is not large enough to detect a difference in mean expenditures.      

The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,310, x̄ =200.3, and s=36.8. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring?

Here we want to assess whether the sample mean of 200.3 in the Framingham sample is statistically significantly different from 203 (i.e., beyond what we would expect by chance). We will run the test using the five-step approach.

H 0 : μ= 203 H 1 : μ≠ 203                       α=0.05

The research hypothesis is that cholesterol levels are different in the Framingham Offspring, and therefore a two-tailed test is used.

  •   Step 3. Set up decision rule.  

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or is Z > 1.960.

We reject H 0 because -4.22 ≤ -1. .960. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level in the Framingham Offspring is different from the national average of 203 reported in 2002.   Because we reject H 0 , we also approximate a p-value. Using the two-sided significance levels, p < 0.0001.  

Statistical Significance versus Clinical (Practical) Significance

This example raises an important concept of statistical versus clinical or practical significance. From a statistical standpoint, the total cholesterol levels in the Framingham sample are highly statistically significantly different from the national average with p < 0.0001 (i.e., there is less than a 0.01% chance that we are incorrectly rejecting the null hypothesis). However, the sample mean in the Framingham Offspring study is 200.3, less than 3 units different from the national mean of 203. The reason that the data are so highly statistically significant is due to the very large sample size. It is always important to assess both statistical and clinical significance of data. This is particularly relevant when the sample size is large. Is a 3 unit difference in total cholesterol a meaningful difference?  

Consider again the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. Suppose a new drug is proposed to lower total cholesterol. A study is designed to evaluate the efficacy of the drug in lowering cholesterol.   Fifteen patients are enrolled in the study and asked to take the new drug for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows:   n=15, x̄ =195.9 and s=28.7. Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new drug for 6 weeks? We will run the test using the five-step approach. 

H 0 : μ= 203 H 1 : μ< 203                   α=0.05

  •  Step 2. Select the appropriate test statistic.  

Because the sample size is small (n<30) the appropriate test statistic is

This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n-1. In this example df=15-1=14. The critical value for a lower tailed test with df=14 and a =0.05 is -2.145 and the decision rule is as follows:   Reject H 0 if t < -2.145.

We do not reject H 0 because -0.96 > -2.145. We do not have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower than the national mean in patients taking the new drug for 6 weeks. Again, because we failed to reject the null hypothesis we make a weaker concluding statement allowing for the possibility that we may have committed a Type II error (i.e., failed to reject H 0 when in fact the drug is efficacious).

Lightbulb icon signifyig an important idea

This example raises an important issue in terms of study design. In this example we assume in the null hypothesis that the mean cholesterol level is 203. This is taken to be the mean cholesterol level in patients without treatment. Is this an appropriate comparator? Alternative and potentially more efficient study designs to evaluate the effect of the new drug could involve two treatment groups, where one group receives the new drug and the other does not, or we could measure each patient's baseline or pre-treatment cholesterol level and then assess changes from baseline to 6 weeks post-treatment. These designs are also discussed here.

Video - Comparing a Sample Mean to Known Population Mean (8:20)

Link to transcript of the video

Tests with One Sample, Dichotomous Outcome

Hypothesis testing applications with a dichotomous outcome variable in a single population are also performed according to the five-step procedure. Similar to tests for means, a key component is setting up the null and research hypotheses. The objective is to compare the proportion of successes in a single population to a known proportion (p 0 ). That known proportion is generally derived from another study or report and is sometimes called a historical control. It is important in setting up the hypotheses in a one sample test that the proportion specified in the null hypothesis is a fair and reasonable comparator.    

In one sample tests for a dichotomous outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the sample proportion which is computed by taking the ratio of the number of successes to the sample size,

We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below.

Test Statistic for Testing H 0 : p = p 0

if min(np 0 , n(1-p 0 )) > 5

The formula above is appropriate for large samples, defined when the smaller of np 0 and n(1-p 0 ) is at least 5. This is similar, but not identical, to the condition required for appropriate use of the confidence interval formula for a population proportion, i.e.,

Here we use the proportion specified in the null hypothesis as the true proportion of successes rather than the sample proportion. If we fail to satisfy the condition, then alternative procedures, called exact methods must be used to test the hypothesis about the population proportion.

Example:  

The NCHS report indicated that in 2002 the prevalence of cigarette smoking among American adults was 21.1%.  Data on prevalent smoking in n=3,536 participants who attended the seventh examination of the Offspring in the Framingham Heart Study indicated that 482/3,536 = 13.6% of the respondents were currently smoking at the time of the exam. Suppose we want to assess whether the prevalence of smoking is lower in the Framingham Offspring sample given the focus on cardiovascular health in that community. Is there evidence of a statistically lower prevalence of smoking in the Framingham Offspring study as compared to the prevalence among all Americans?

H 0 : p = 0.211 H 1 : p < 0.211                     α=0.05

We must first check that the sample size is adequate.   Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 3,536(0.211), 3,536(1-0.211))=min(746, 2790)=746. The sample size is more than adequate so the following formula can be used:

This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.645.

We reject H 0 because -10.93 < -1.645. We have statistically significant evidence at α=0.05 to show that the prevalence of smoking in the Framingham Offspring is lower than the prevalence nationally (21.1%). Here, p < 0.0001.  

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

Calculate this on your own before checking the answer.

Video - Hypothesis Test for One Sample and a Dichotomous Outcome (3:55)

Tests with Two Independent Samples, Continuous Outcome

There are many applications where it is of interest to compare two independent groups with respect to their mean scores on a continuous outcome. Here we compare means between groups, but rather than generating an estimate of the difference, we will test whether the observed difference (increase, decrease or difference) is statistically significant or not. Remember, that hypothesis testing gives an assessment of statistical significance, whereas estimation gives an estimate of effect and both are important.

Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows:

for sample 1:

for sample 2:

The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2.  

In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ 1 -μ 2 . The null hypothesis is always that there is no difference between groups with respect to means, i.e.,

The null hypothesis can also be written as follows: H 0 : μ 1 = μ 2 . In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H 1 : μ 1 > μ 2 ), that the first mean is smaller than the second (H 1 : μ 1 < μ 2 ), or that the means are different (H 1 : μ 1 ≠ μ 2 ). The three different alternatives represent upper-, lower-, and two-tailed tests, respectively. The following test statistics are used to test these hypotheses.

Test Statistics for Testing H 0 : μ 1 = μ 2

  • if n 1 > 30 and n 2 > 30
  • if n 1 < 30 or n 2 < 30

NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s 1 2 = s 2 2 ). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s 1 2 /s 2 2 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances.    

The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows:

Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s 1 and s 2 .)

Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.  

Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.  

H 0 : μ 1 = μ 2

H 1 : μ 1 ≠ μ 2                       α=0.05

Because both samples are large ( > 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s 1 2 /s 2 2 . Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5 2 /20.1 2 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample.   Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes.  

Now the test statistic:

We reject H 0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The p-value is p < 0.010.  

Here again we find that there is a statistically significant difference in mean systolic blood pressures between men and women at p < 0.010. Notice that there is a very small difference in the sample means (128.2-126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and p-value provide an assessment of the statistical significance of the difference.  

Above we performed a study to evaluate a new drug designed to lower total cholesterol. The study involved one sample of patients, each patient took the new drug for 6 weeks and had their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean total cholesterol following 6 weeks of treatment was compared to the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed the appropriateness of the fixed comparator as well as an alternative study design to evaluate the effect of the new drug involving two treatment groups, where one group receives the new drug and the other does not. Here, we revisit the example with a concurrent or parallel control group, which is very typical in randomized controlled trials or clinical trials (refer to the EP713 module on Clinical Trials).  

A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.

Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the five-step approach.

H 0 : μ 1 = μ 2 H 1 : μ 1 < μ 2                         α=0.05

Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s 1 2 /s 2 2 =28.7 2 /30.3 2 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is:

This is a lower-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n 1 +n 2 -2 = 15+15-2=28. The critical value for a lower tailed test with df=28 and α=0.05 is -1.701 and the decision rule is: Reject H 0 if t < -1.701.

Now the test statistic,

We reject H 0 because -2.92 < -1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005.

The clinical trial in this example finds a statistically significant reduction in total cholesterol, whereas in the previous example where we had a historical control (as opposed to a parallel control group) we did not demonstrate efficacy of the new drug. Notice that the mean total cholesterol level in patients taking placebo is 217.4 which is very different from the mean cholesterol reported among all Americans in 2002 of 203 and used as the comparator in the prior example. The historical control value may not have been the most appropriate comparator as cholesterol levels have been increasing over time. In the next section, we present another design that can be used to assess the efficacy of the new drug.

Video - Comparison of Two Independent Samples With a Continuous Outcome (8:02)

Tests with Matched Samples, Continuous Outcome

In the previous section we compared two groups with respect to their mean scores on a continuous outcome. An alternative study design is to compare matched or paired samples. The two comparison groups are said to be dependent, and the data can arise from a single sample of participants where each participant is measured twice (possibly before and after an intervention) or from two samples that are matched on specific characteristics (e.g., siblings). When the samples are dependent, we focus on difference scores in each participant or between members of a pair and the test of hypothesis is based on the mean difference, μ d . The null hypothesis again reflects "no difference" and is stated as H 0 : μ d =0 . Note that there are some instances where it is of interest to test whether there is a difference of a particular magnitude (e.g., μ d =5) but in most instances the null hypothesis reflects no difference (i.e., μ d =0).  

The appropriate formula for the test of hypothesis depends on the sample size. The formulas are shown below and are identical to those we presented for estimating the mean of a single sample presented (e.g., when comparing against an external or historical control), except here we focus on difference scores.

Test Statistics for Testing H 0 : μ d =0

A new drug is proposed to lower total cholesterol and a study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients agree to participate in the study and each is asked to take the new drug for 6 weeks. However, before starting the treatment, each patient's total cholesterol level is measured. The initial measurement is a pre-treatment or baseline value. After taking the drug for 6 weeks, each patient's total cholesterol level is measured again and the data are shown below. The rightmost column contains difference scores for each patient, computed by subtracting the 6 week cholesterol level from the baseline level. The differences represent the reduction in total cholesterol over 4 weeks. (The differences could have been computed by subtracting the baseline total cholesterol level from the level measured at 6 weeks. The way in which the differences are computed does not affect the outcome of the analysis only the interpretation.)

Because the differences are computed by subtracting the cholesterols measured at 6 weeks from the baseline values, positive differences indicate reductions and negative differences indicate increases (e.g., participant 12 increases by 2 units over 6 weeks). The goal here is to test whether there is a statistically significant reduction in cholesterol. Because of the way in which we computed the differences, we want to look for an increase in the mean difference (i.e., a positive reduction). In order to conduct the test, we need to summarize the differences. In this sample, we have

The calculations are shown below.  

Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new medication for 6 weeks? We will run the test using the five-step approach.

H 0 : μ d = 0 H 1 : μ d > 0                 α=0.05

NOTE: If we had computed differences by subtracting the baseline level from the level measured at 6 weeks then negative differences would have reflected reductions and the research hypothesis would have been H 1 : μ d < 0. 

  • Step 2 . Select the appropriate test statistic.

This is an upper-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table at the right, with df=15-1=14. The critical value for an upper-tailed test with df=14 and α=0.05 is 2.145 and the decision rule is Reject H 0 if t > 2.145.

We now substitute the sample data into the formula for the test statistic identified in Step 2.

We reject H 0 because 4.61 > 2.145. We have statistically significant evidence at α=0.05 to show that there is a reduction in cholesterol levels over 6 weeks.  

Here we illustrate the use of a matched design to test the efficacy of a new drug to lower total cholesterol. We also considered a parallel design (randomized clinical trial) and a study using a historical comparator. It is extremely important to design studies that are best suited to detect a meaningful difference when one exists. There are often several alternatives and investigators work with biostatisticians to determine the best design for each application. It is worth noting that the matched design used here can be problematic in that observed differences may only reflect a "placebo" effect. All participants took the assigned medication, but is the observed reduction attributable to the medication or a result of these participation in a study.

Video - Hypothesis Testing With a Matched Sample and a Continuous Outcome (3:11)

Tests with Two Independent Samples, Dichotomous Outcome

There are several approaches that can be used to test hypotheses concerning two independent proportions. Here we present one approach - the chi-square test of independence is an alternative, equivalent, and perhaps more popular approach to the same analysis. Hypothesis testing with the chi-square test is addressed in the third module in this series: BS704_HypothesisTesting-ChiSquare.

In tests of hypothesis comparing proportions between two independent groups, one test is performed and results can be interpreted to apply to a risk difference, relative risk or odds ratio. As a reminder, the risk difference is computed by taking the difference in proportions between comparison groups, the risk ratio is computed by taking the ratio of proportions, and the odds ratio is computed by taking the ratio of the odds of success in the comparison groups. Because the null values for the risk difference, the risk ratio and the odds ratio are different, the hypotheses in tests of hypothesis look slightly different depending on which measure is used. When performing tests of hypothesis for the risk difference, relative risk or odds ratio, the convention is to label the exposed or treated group 1 and the unexposed or control group 2.      

For example, suppose a study is designed to assess whether there is a significant difference in proportions in two independent comparison groups. The test of interest is as follows:

H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2 .  

The following are the hypothesis for testing for a difference in proportions using the risk difference, the risk ratio and the odds ratio. First, the hypotheses above are equivalent to the following:

  • For the risk difference, H 0 : p 1 - p 2 = 0 versus H 1 : p 1 - p 2 ≠ 0 which are, by definition, equal to H 0 : RD = 0 versus H 1 : RD ≠ 0.
  • If an investigator wants to focus on the risk ratio, the equivalent hypotheses are H 0 : RR = 1 versus H 1 : RR ≠ 1.
  • If the investigator wants to focus on the odds ratio, the equivalent hypotheses are H 0 : OR = 1 versus H 1 : OR ≠ 1.  

Suppose a test is performed to test H 0 : RD = 0 versus H 1 : RD ≠ 0 and the test rejects H 0 at α=0.05. Based on this test we can conclude that there is significant evidence, α=0.05, of a difference in proportions, significant evidence that the risk difference is not zero, significant evidence that the risk ratio and odds ratio are not one. The risk difference is analogous to the difference in means when the outcome is continuous. Here the parameter of interest is the difference in proportions in the population, RD = p 1 -p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0 : RD = 0. This is equivalent to H 0 : RR = 1 and H 0 : OR = 1. In the research hypothesis, an investigator can hypothesize that the first proportion is larger than the second (H 1 : p 1 > p 2 , which is equivalent to H 1 : RD > 0, H 1 : RR > 1 and H 1 : OR > 1), that the first proportion is smaller than the second (H 1 : p 1 < p 2 , which is equivalent to H 1 : RD < 0, H 1 : RR < 1 and H 1 : OR < 1), or that the proportions are different (H 1 : p 1 ≠ p 2 , which is equivalent to H 1 : RD ≠ 0, H 1 : RR ≠ 1 and H 1 : OR ≠

1). The three different alternatives represent upper-, lower- and two-tailed tests, respectively.  

The formula for the test of hypothesis for the difference in proportions is given below.

Test Statistics for Testing H 0 : p 1 = p

                                     

The formula above is appropriate for large samples, defined as at least 5 successes (np > 5) and at least 5 failures (n(1-p > 5)) in each of the two samples. If there are fewer than 5 successes or failures in either comparison group, then alternative procedures, called exact methods must be used to estimate the difference in population proportions.

The following table summarizes data from n=3,799 participants who attended the fifth examination of the Offspring in the Framingham Heart Study. The outcome of interest is prevalent CVD and we want to test whether the prevalence of CVD is significantly higher in smokers as compared to non-smokers.

The prevalence of CVD (or proportion of participants with prevalent CVD) among non-smokers is 298/3,055 = 0.0975 and the prevalence of CVD among current smokers is 81/744 = 0.1089. Here smoking status defines the comparison groups and we will call the current smokers group 1 (exposed) and the non-smokers (unexposed) group 2. The test of hypothesis is conducted below using the five step approach.

H 0 : p 1 = p 2     H 1 : p 1 ≠ p 2                 α=0.05

  • Step 2.  Select the appropriate test statistic.  

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group. In this example, we have more than enough successes (cases of prevalent CVD) and failures (persons free of CVD) in each comparison group. The sample size is more than adequate so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5. Conclusion.

We do not reject H 0 because -1.960 < 0.927 < 1.960. We do not have statistically significant evidence at α=0.05 to show that there is a difference in prevalent CVD between smokers and non-smokers.  

A 95% confidence interval for the difference in prevalent CVD (or risk difference) between smokers and non-smokers as 0.0114 + 0.0247, or between -0.0133 and 0.0361. Because the 95% confidence interval for the risk difference includes zero we again conclude that there is no statistically significant difference in prevalent CVD between smokers and non-smokers.    

Smoking has been shown over and over to be a risk factor for cardiovascular disease. What might explain the fact that we did not observe a statistically significant difference using data from the Framingham Heart Study? HINT: Here we consider prevalent CVD, would the results have been different if we considered incident CVD?

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

We now test whether there is a statistically significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using the five step approach.  

H 0 : p 1 = p 2     H 1 : p 1 ≠ p 2              α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group, i.e.,

In this example, we have min(50(0.46), 50(1-0.46), 50(0.22), 50(1-0.22)) = min(23, 27, 11, 39) = 11. The sample size is adequate so the following formula can be used

We reject H 0 because 2.526 > 1960. We have statistically significant evidence at a =0.05 to show that there is a difference in the proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever.

A 95% confidence interval for the difference in proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever is 0.24 + 0.18 or between 0.06 and 0.42. Because the 95% confidence interval does not include zero we concluded that there was a statistically significant difference in proportions which is consistent with the test of hypothesis result. 

Again, the procedures discussed here apply to applications where there are two independent comparison groups and a dichotomous outcome. There are other applications in which it is of interest to compare a dichotomous outcome in matched or paired samples. For example, in a clinical trial we might wish to test the effectiveness of a new antibiotic eye drop for the treatment of bacterial conjunctivitis. Participants use the new antibiotic eye drop in one eye and a comparator (placebo or active control treatment) in the other. The success of the treatment (yes/no) is recorded for each participant for each eye. Because the two assessments (success or failure) are paired, we cannot use the procedures discussed here. The appropriate test is called McNemar's test (sometimes called McNemar's test for dependent proportions).  

Vide0 - Hypothesis Testing With Two Independent Samples and a Dichotomous Outcome (2:55)

Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.

  • Continuous Outcome, One Sample: H0: μ = μ0
  • Continuous Outcome, Two Independent Samples: H0: μ1 = μ2
  • Continuous Outcome, Two Matched Samples: H0: μd = 0
  • Dichotomous Outcome, One Sample: H0: p = p 0
  • Dichotomous Outcome, Two Independent Samples: H0: p1 = p2, RD=0, RR=1, OR=1

Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate p-value is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact p-values are computed. Because the statistical tables in this textbook are limited, we can only approximate p-values. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason.

In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis.    

We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a two-sided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a two-sided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the p-value, can only be determined using the hypothesis testing approach and the p-value provides an assessment of the strength of the evidence and not an estimate of the effect.

Answers to Selected Problems

Dental services problem - bottom of page 5.

  • Step 1: Set up hypotheses and determine the level of significance.

α=0.05

  • Step 2: Select the appropriate test statistic.

First, determine whether the sample size is adequate.

Therefore the sample size is adequate, and we can use the following formula:

  • Step 3: Set up the decision rule.

Reject H0 if Z is less than or equal to -1.96 or if Z is greater than or equal to 1.96.

  • Step 4: Compute the test statistic
  • Step 5: Conclusion.

We reject the null hypothesis because -6.15<-1.96. Therefore there is a statistically significant difference in the proportion of children in Boston using dental services compated to the national proportion.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

8.6: Hypothesis Test of a Single Population Mean with Examples

  • Last updated
  • Save as PDF
  • Page ID 130297

Steps for performing Hypothesis Test of a Single Population Mean

Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure

  • Find or identify the sample size, n, the sample mean, \(\bar{x}\) and the sample standard deviation, s .

The sampling distribution for the one-mean test statistic is, approximately, T- distribution if the following conditions are met

  • Sample is random with independent observations .
  • Sample is large. The population must be Normal or the sample size must be at least 30.

Step 3: Perform the procedure based on the assumption that \(H_{0}\) is true

  • Find the Estimated Standard Error: \(SE=\frac{s}{\sqrt{n}}\).
  • Compute the observed value of the test statistic: \(T_{obs}=\frac{\bar{x}-\mu_{0}}{SE}\).
  • Check the type of the test (right-, left-, or two-tailed)
  • Find the p-value in order to measure your level of surprise.

Step 4: Make a decision about \(H_{0}\) and \(H_{a}\)

  • Do you reject or not reject your null hypothesis?

Step 5: Make a conclusion

  • What does this mean in the context of the data?

The following examples illustrate a left-, right-, and two-tailed test.

Example \(\pageindex{1}\).

\(H_{0}: \mu = 5, H_{a}: \mu < 5\)

Test of a single population mean. \(H_{a}\) tells you the test is left-tailed. The picture of the \(p\)-value is as follows:

Normal distribution curve of a single population mean with a value of 5 on the x-axis and the p-value points to the area on the left tail of the curve.

Exercise \(\PageIndex{1}\)

\(H_{0}: \mu = 10, H_{a}: \mu < 10\)

Assume the \(p\)-value is 0.0935. What type of test is this? Draw the picture of the \(p\)-value.

left-tailed test

alt

Example \(\PageIndex{2}\)

\(H_{0}: \mu \leq 0.2, H_{a}: \mu > 0.2\)

This is a test of a single population proportion. \(H_{a}\) tells you the test is right-tailed . The picture of the p -value is as follows:

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

Exercise \(\PageIndex{2}\)

\(H_{0}: \mu \leq 1, H_{a}: \mu > 1\)

Assume the \(p\)-value is 0.1243. What type of test is this? Draw the picture of the \(p\)-value.

right-tailed test

alt

Example \(\PageIndex{3}\)

\(H_{0}: \mu = 50, H_{a}: \mu \neq 50\)

This is a test of a single population mean. \(H_{a}\) tells you the test is two-tailed . The picture of the \(p\)-value is as follows.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

Exercise \(\PageIndex{3}\)

\(H_{0}: \mu = 0.5, H_{a}: \mu \neq 0.5\)

Assume the p -value is 0.2564. What type of test is this? Draw the picture of the \(p\)-value.

two-tailed test

alt

Full Hypothesis Test Examples

Example \(\pageindex{4}\).

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71. He performs a hypothesis test using a 5% level of significance. The data are assumed to be from a normal distribution.

Set up the hypothesis test:

A 5% level of significance means that \(\alpha = 0.05\). This is a test of a single population mean .

\(H_{0}: \mu = 65  H_{a}: \mu > 65\)

Since the instructor thinks the average score is higher, use a "\(>\)". The "\(>\)" means the test is right-tailed.

Determine the distribution needed:

Random variable: \(\bar{X} =\) average score on the first statistics test.

Distribution for the test: If you read the problem carefully, you will notice that there is no population standard deviation given . You are only given \(n = 10\) sample data values. Notice also that the data come from a normal distribution. This means that the distribution for the test is a student's \(t\).

Use \(t_{df}\). Therefore, the distribution for the test is \(t_{9}\) where \(n = 10\) and \(df = 10 - 1 = 9\).

The sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data.

Calculate the \(p\)-value using the Student's \(t\)-distribution:

\[t_{obs} = \dfrac{\bar{x}-\mu_{\bar{x}}}{\left(\dfrac{s}{\sqrt{n}}\right)}=\dfrac{67-65}{\left(\dfrac{3.1972}{\sqrt{10}}\right)}\]

Use the T-table or Excel's t_dist() function to find p-value:

\(p\text{-value} = P(\bar{x} > 67) =P(T >1.9782 )= 1-0.9604=0.0396\)

Interpretation of the p -value: If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more.

Normal distribution curve of average scores on the first statistic tests with 65 and 67 values on the x-axis. A vertical upward line extends from 67 to the curve. The p-value points to the area to the right of 67.

Compare \(\alpha\) and the \(p-\text{value}\):

Since \(α = 0.05\) and \(p\text{-value} = 0.0396\). \(\alpha > p\text{-value}\).

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\).

This means you reject \(\mu = 65\). In other words, you believe the average test score is more than 65.

Conclusion: At a 5% level of significance, the sample data show sufficient evidence that the mean (average) test score is more than 65, just as the math instructor thinks.

The \(p\text{-value}\) can easily be calculated.

Put the data into a list. Press STAT and arrow over to TESTS . Press 2:T-Test . Arrow over to Data and press ENTER . Arrow down and enter 65 for \(\mu_{0}\), the name of the list where you put the data, and 1 for Freq: . Arrow down to \(\mu\): and arrow over to \(> \mu_{0}\). Press ENTER . Arrow down to Calculate and press ENTER . The calculator not only calculates the \(p\text{-value}\) (p = 0.0396) but it also calculates the test statistic ( t -score) for the sample mean, the sample mean, and the sample standard deviation. \(\mu > 65\) is the alternative hypothesis. Do this set of instructions again except arrow to Draw (instead of Calculate ). Press ENTER . A shaded graph appears with \(t = 1.9781\) (test statistic) and \(p = 0.0396\) (\(p\text{-value}\)). Make sure when you use Draw that no other equations are highlighted in \(Y =\) and the plots are turned off.

Exercise \(\PageIndex{4}\)

It is believed that a stock price for a particular company will grow at a rate of $5 per week with a standard deviation of $1. An investor believes the stock won’t grow as quickly. The changes in stock price is recorded for ten weeks and are as follows: $4, $3, $2, $3, $1, $7, $2, $1, $1, $2. Perform a hypothesis test using a 5% level of significance. State the null and alternative hypotheses, find the p -value, state your conclusion, and identify the Type I and Type II errors.

  • \(H_{0}: \mu = 5\)
  • \(H_{a}: \mu < 5\)
  • \(p = 0.0082\)

Because \(p < \alpha\), we reject the null hypothesis. There is sufficient evidence to suggest that the stock price of the company grows at a rate less than $5 a week.

  • Type I Error: To conclude that the stock price is growing slower than $5 a week when, in fact, the stock price is growing at $5 a week (reject the null hypothesis when the null hypothesis is true).
  • Type II Error: To conclude that the stock price is growing at a rate of $5 a week when, in fact, the stock price is growing slower than $5 a week (do not reject the null hypothesis when the null hypothesis is false).

Example \(\PageIndex{5}\)

The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass.

1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98 1.02; .95; .95

Is there convincing evidence that the average conductivity of this type of glass is greater than one? Use a significance level of 0.05. Assume the population is normal.

Let’s follow a four-step process to answer this statistical question.

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Plan : We are testing a sample mean without a known population standard deviation. Therefore, we need to use a Student's-t distribution. Assume the underlying population is normal.
  • Do the calculations : \(p\text{-value} ( = 0.036)\)

4. State the Conclusions : Since the \(p\text{-value} (= 0.036)\) is less than our alpha value, we will reject the null hypothesis. It is reasonable to state that the data supports the claim that the average conductivity level is greater than one.

The hypothesis test itself has an established process. This can be summarized as follows:

  • Determine \(H_{0}\) and \(H_{a}\). Remember, they are contradictory.
  • Determine the random variable.
  • Determine the distribution for the test.
  • Draw a graph, calculate the test statistic, and use the test statistic to calculate the \(p\text{-value}\). (A t -score is an example of test statistics.)
  • Compare the preconceived α with the p -value, make a decision (reject or do not reject H 0 ), and write a clear conclusion using English sentences.

Notice that in performing the hypothesis test, you use \(\alpha\) and not \(\beta\). \(\beta\) is needed to help determine the sample size of the data that is used in calculating the \(p\text{-value}\). Remember that the quantity \(1 – \beta\) is called the Power of the Test . A high power is desirable. If the power is too low, statisticians typically increase the sample size while keeping α the same.If the power is low, the null hypothesis might not be rejected when it should be.

  • Data from Amit Schitai. Director of Instructional Technology and Distance Learning. LBCC.
  • Data from Bloomberg Businessweek . Available online at www.businessweek.com/news/2011- 09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.html.
  • Data from energy.gov. Available online at http://energy.gov (accessed June 27. 2013).
  • Data from Gallup®. Available online at www.gallup.com (accessed June 27, 2013).
  • Data from Growing by Degrees by Allen and Seaman.
  • Data from La Leche League International. Available online at www.lalecheleague.org/Law/BAFeb01.html.
  • Data from the American Automobile Association. Available online at www.aaa.com (accessed June 27, 2013).
  • Data from the American Library Association. Available online at www.ala.org (accessed June 27, 2013).
  • Data from the Bureau of Labor Statistics. Available online at http://www.bls.gov/oes/current/oes291111.htm .
  • Data from the Centers for Disease Control and Prevention. Available online at www.cdc.gov (accessed June 27, 2013)
  • Data from the U.S. Census Bureau, available online at quickfacts.census.gov/qfd/states/00000.html (accessed June 27, 2013).
  • Data from the United States Census Bureau. Available online at www.census.gov/hhes/socdemo/language/.
  • Data from Toastmasters International. Available online at http://toastmasters.org/artisan/deta...eID=429&Page=1 .
  • Data from Weather Underground. Available online at www.wunderground.com (accessed June 27, 2013).
  • Federal Bureau of Investigations. “Uniform Crime Reports and Index of Crime in Daviess in the State of Kentucky enforced by Daviess County from 1985 to 2005.” Available online at http://www.disastercenter.com/kentucky/crime/3868.htm (accessed June 27, 2013).
  • “Foothill-De Anza Community College District.” De Anza College, Winter 2006. Available online at research.fhda.edu/factbook/DA...t_da_2006w.pdf.
  • Johansen, C., J. Boice, Jr., J. McLaughlin, J. Olsen. “Cellular Telephones and Cancer—a Nationwide Cohort Study in Denmark.” Institute of Cancer Epidemiology and the Danish Cancer Society, 93(3):203-7. Available online at http://www.ncbi.nlm.nih.gov/pubmed/11158188 (accessed June 27, 2013).
  • Rape, Abuse & Incest National Network. “How often does sexual assault occur?” RAINN, 2009. Available online at www.rainn.org/get-information...sexual-assault (accessed June 27, 2013).

Section 10.4: Hypothesis Tests for a Population Standard Deviation

  • 10.1 The Language of Hypothesis Testing
  • 10.2 Hypothesis Tests for a Population Proportion
  • 10.3 Hypothesis Tests for a Population Mean
  • 10.4 Hypothesis Tests for a Population Standard Deviation
  • 10.5 Putting It Together: Which Method Do I Use?

By the end of this lesson, you will be able to...

  • test hypotheses about a population standard deviation

For a quick overview of this section, watch this short video summary:

Before we begin this section, we need a quick refresher of the Χ 2 distribution.

The Chi-Square ( Χ 2 ) distribution

Reminder: "chi-square" is pronounced "kai" as in sky, not "chai" like the tea .

If a random sample size n is obtained from a normally distributed population with mean μ and standard deviation σ , then

has a chi-square distribution with n-1 degrees of freedom.

Properties of the Χ 2 distribution

  • It is not symmetric.
  • The shape depends on the degrees of freedom.
  • As the number of degrees of freedom increases, the distribution becomes more symmetric.
  • Χ 2 ≥0

Finding Probabilities Using StatCrunch

We again have some conditions that need to be true in order to perform the test 

  • the sample was randomly selected, and
  • the population from which the sample is drawn is normally distributed

Note that in the second requirement, the population must be normally distributed. The steps in performing the hypothesis test should be familiar by now.

Performing a Hypothesis Test Regarding σ

Step 1 : State the null and alternative hypotheses.

Step 2 : Decide on a level of significance, α .

Step 4 : Determine the P -value.

Step 5 : Reject the null hypothesis if the P -value is less than the level of significance, α.

Step 6 : State the conclusion.

In Example 2 , in Section 10.2, we assumed that the standard deviation for the resting heart rates of ECC students was 12 bpm. Later, in Example 2 in Section 10.3, we considered the actual sample data below.

( Click here to view the data in a format more easily copied.)

Based on this sample, is there enough evidence to say that the standard deviation of the resting heart rates for students in this class is different from 12 bpm?

Note: Be sure to check that the conditions for performing the hypothesis test are met.

[ reveal answer ]

From the earlier examples, we know that the resting heart rates could come from a normally distributed population and there are no outliers.

Step 1 : H 0 : σ = 12 H 1 : σ ≠ 12

Step 2 : α = 0.05

Step 4 : P -value = 2P( Χ 2 > 15.89) ≈ 0.2159

Step 5 : Since P -value > α , we do not reject H 0 .

Step 6 : There is not enough evidence at the 5% level of significance to support the claim that the standard deviation of the resting heart rates for students in this class is different from 12 bpm.

Hypothesis Testing Regarding σ Using StatCrunch

Let's look at Example 1 again, and try the hypothesis test with technology.

Using DDXL:

Using StatCrunch:

<< previous section | next section >>

Creative Commons License

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.7 Hypothesis Tests for a Population Mean with Unknown Population Standard Deviation

Learning objectives.

  • Conduct and interpret hypothesis tests for a population mean with unknown population standard deviation.

Some notes about conducting a hypothesis test:

  • The null hypothesis [latex]H_0[/latex] is always an “equal to.”  The null hypothesis is the original claim about the population parameter.
  • The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.”  The form of the alternative hypothesis depends on the context of the question.
  • If the alternative hypothesis is a “less than”,  then the test is left-tail.  The p -value is the area in the left-tail of the distribution.
  • If the alternative hypothesis is a “greater than”, then the test is right-tail.  The p -value is the area in the right-tail of the distribution.
  • If the alternative hypothesis is a “not equal to”, then the test is two-tail.  The p -value is the sum of the area in the two-tails of the distribution.  Each tail represents exactly half of the p -value.
  • Think about the meaning of the p -value.  A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of  0.05.  Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis.  This makes the data analyst use judgment rather than mindlessly applying rules.
  • The significance level must be identified before collecting the sample data and conducting the test.  Generally, the significance level will be included in the question.  If no significance level is given, a common standard is to use a significance level of 5%.
  • An alternative approach for hypothesis testing is to use what is called the critical value approach .  In this book, we will only use the p -value approach.  Some of the videos below may mention the critical value approach, but this approach will not be used in this book.

Steps to Conduct a Hypothesis Test for a Population Mean with Unknown Population Standard Deviation

  • Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex].  Include appropriate units with the values of the mean.
  • Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ \\ df & = & n-1 \\ \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION MEAN WITH UNKNOWN POPULATION STANDARD DEVIATION

The p -value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean.  When the population standard deviation is unknown, use the [latex]t[/latex]-distribution to find the p -value.

If the p -value is the area in the left-tail:

  • For t-score , enter the value of [latex]t[/latex] calculated from [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex].
  • For degrees of freedom , enter the degrees of freedom for the [latex]t[/latex]-distribution [latex]n-1[/latex].
  • For the logic operator , enter true .  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.
  • The output from the t.dist function is the area under the [latex]t[/latex]-distribution to the left of the entered [latex]t[/latex]-score.
  • Visit the Microsoft page for more information about the t.dist function.

If the p -value is the area in the right-tail:

  • The output from the t.dist.rt function is the area under the [latex]t[/latex]-distribution to the right of the entered [latex]t[/latex]-score.
  • Visit the Microsoft page for more information about the t.dist.rt function.

If the p -value is the sum of area in the tails:

  • For t-score , enter the absolute value of [latex]t[/latex] calculated from [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex].  Note:  In the t.dist.2t function, the value of the [latex]t[/latex]-score must be a positive number.  If the [latex]t[/latex]-score is negative, enter the absolute value of the [latex]t[/latex]-score into the t.dist.2t function.
  • The output from the t.dist.2t function is the sum of areas in the tails under the [latex]t[/latex]-distribution.
  • Visit the Microsoft page for more information about the t.dist.2t function.

Statistics students believe that the mean score on the first statistics test is 65.  A statistics instructor thinks the mean score is higher than 65.  He samples ten statistics students and obtains the following scores:

The instructor performs a hypothesis test using a 1% level of significance. The test scores are assumed to be from a normal distribution.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \mu=65  \\ H_a: & & \mu \gt 65  \end{eqnarray*}[/latex]

From the question, we have [latex]n=10[/latex], [latex]\overline{x}=67[/latex], [latex]s=3.1972...[/latex] and [latex]\alpha=0.01[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=3.1972...[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right-tail of the distribution.

This is a t-distribution curve. The peak of the curve is at 0 on the horizontal axis. The point t is also labeled. A vertical line extends from point t to the curve with the area to the right of this vertical line shaded. The p-value equals the area of this shaded region.

To use the t.dist.rt function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{67-65}{\frac{3.1972...}{\sqrt{10}}} \\ & = & 1.9781... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=10-1=9[/latex].

So the p -value[latex]=0.0396[/latex].

Conclusion:

Because p -value[latex]=0.0396 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that mean score on the test is greater than 65.

  • The null hypothesis [latex]\mu=65[/latex] is the claim that the mean test score is 65.
  • The alternative hypothesis [latex]\mu \gt 65[/latex] is the claim that the mean test score is greater than 65.
  • Keep all of the decimals throughout the calculation (i.e. in the sample standard deviation, the [latex]t[/latex]-score, etc.) to avoid any round-off error in the calculation of the p -value.  This ensures that we get the most accurate value for the p -value.
  • The p -value is the area in the right-tail of the [latex]t[/latex]-distribution, to the right of [latex]t=1.9781...[/latex].
  • The p -value of 0.0396 tells us that under the assumption that the mean test score is 65 (the null hypothesis), there is a 3.96% chance that the mean test score is 65 or more.  Compared to the 1% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.

A company claims that the average change in the value of their stock is $3.50 per week.  An investor believes this average is too high. The investor records the changes in the company’s stock price over 30 weeks and finds the average change in the stock price is $2.60 with a standard deviation of $1.80.  At the 5% significance level, is the average change in the company’s stock price lower than the company claims?

[latex]\begin{eqnarray*} H_0: & & \mu=$3.50  \\ H_a: & & \mu \lt $3.50  \end{eqnarray*}[/latex]

From the question, we have [latex]n=30[/latex], [latex]\overline{x}=2.6[/latex], [latex]s=1.8[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=1.8.[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left-tail of the distribution.

his is a t-distribution curve. The peak of the curve is at 0 on the horizontal axis. The point t is also labeled. A vertical line extends from point t to the curve with the area to the left of this vertical line shaded. The p-value equals the area of this shaded region.

To use the t.dist function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{2.6-3.5}{\frac{1.8}{\sqrt{30}}} \\ & = & -1.5699... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=30-1=29[/latex].

So the p -value[latex]=0.0636[/latex].

Because p -value[latex]=0.0636 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that average change in the stock price is lower than $3.50.

  • The null hypothesis [latex]\mu=$3.50[/latex] is the claim that the average change in the company’s stock is $3.50 per week.
  • The alternative hypothesis [latex]\mu \lt $3.50[/latex] is the claim that the average change in the company’s stock is less than $3.50 per week.
  • The p -value is the area in the left-tail of the [latex]t[/latex]-distribution, to the left of [latex]t=-1.5699...[/latex].
  • The p -value of 0.0636 tells us that under the assumption that the average change in the stock is $3.50 (the null hypothesis), there is a 6.36% chance that the average change is $3.50 or less.  Compared to the 5% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the company’s claim that the average change in their stock price is $3.50 per week is most likely correct.

A paint manufacturer has their production line set-up so that the average volume of paint in a can is 3.78 liters.  The quality control manager at the plant believes that something has happened with the production and the average volume of paint in the cans has changed.  The quality control department takes a sample of 100 cans and finds the average volume is 3.62 liters with a standard deviation of 0.7 liters.  At the 5% significance level, has the volume of paint in a can changed?

[latex]\begin{eqnarray*} H_0: & & \mu=3.78 \mbox{ liters}  \\ H_a: & & \mu \neq 3.78 \mbox{ liters}  \end{eqnarray*}[/latex]

From the question, we have [latex]n=100[/latex], [latex]\overline{x}=3.62[/latex], [latex]s=0.7[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is unknown (we only know the sample standard deviation [latex]s=0.7[/latex]).  So we use a [latex]t[/latex]-distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

This is a t distribution curve. The peak of the curve is at 0 on the horizontal axis. The point -t and t are also labeled. A vertical line extends from point t to the curve with the area to the right of this vertical line shaded with the shaded area labeled half of the p-value. A vertical line extends from -t to the curve with the area to the left of this vertical line shaded with the shaded area labeled half of the p-value. The p-value equals the area of these two shaded regions.

To use the t.dist.2t function, we need to calculate out the [latex]t[/latex]-score:

[latex]\begin{eqnarray*} t & = & \frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}} \\ & = & \frac{3.62-3.78}{\frac{0.07}{\sqrt{100}}} \\ & = & -2.2857... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]t[/latex]-distribution is [latex]n-1=100-1=99[/latex].

So the p -value[latex]=0.0244[/latex].

Because p -value[latex]=0.0244 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that average volume of paint in the cans has changed.

  • The null hypothesis [latex]\mu=3.78[/latex] is the claim that the average volume of paint in the cans is 3.78.
  • The alternative hypothesis [latex]\mu \neq 3.78[/latex] is the claim that the average volume of paint in the cans is not 3.78.
  • Keep all of the decimals throughout the calculation (i.e. in the [latex]t[/latex]-score) to avoid any round-off error in the calculation of the p -value.  This ensures that we get the most accurate value for the p -value.
  • The p -value is the sum of the area in the two tails.  The output from the t.dist.2t function is exactly the sum of the area in the two tails, and so is the p -value required for the test.  No additional calculations are required.
  • The t.dist.2t function requires that the value entered for the [latex]t[/latex]-score is positive .  A negative [latex]t[/latex]-score entered into the t.dist.2t function generates an error in Excel.  In this case, the value of the [latex]t[/latex]-score is negative, so we must enter the absolute value of this [latex]t[/latex]-score into field 1.
  • The p -value of 0.0244 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the average volume of paint in the cans has most likely changed from 3.78 liters.

Watch this video: Hypothesis Testing: t -test, right tail by ExcelIsFun [11:02]

Watch this video: Hypothesis Testing: t -test, left tail by ExcelIsFun [7:48]

Watch this video: Hypothesis Testing: t -test, two tail by ExcelIsFun [8:54]

Concept Review

The hypothesis test for a population mean is a well established process:

  • Collect the sample information for the test and identify the significance level.
  • When the population standard deviation is unknown, find the p -value (the area in the corresponding tail) for the test using the [latex]t[/latex]-distribution with [latex]\displaystyle{t=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}}[/latex] and [latex]df=n-1[/latex].
  • Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 9.6   Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

An open portfolio of interoperable, industry leading products

The Dotmatics digital science platform provides the first true end-to-end solution for scientific R&D, combining an enterprise data platform with the most widely used applications for data analysis, biologics, flow cytometry, chemicals innovation, and more.

hypothesis test given mean and standard deviation

Statistical analysis and graphing software for scientists

Bioinformatics, cloning, and antibody discovery software

Plan, visualize, & document core molecular biology procedures

Electronic Lab Notebook to organize, search and share data

Proteomics software for analysis of mass spec data

Modern cytometry analysis platform

Analysis, statistics, graphing and reporting of flow cytometry data

Software to optimize designs of clinical trials

T test calculator

A t test compares the means of two groups. There are several types of two sample t tests and this calculator focuses on the three most common: unpaired, welch's, and paired t tests. Directions for using the calculator are listed below, along with more information about two sample t tests and help on which is appropriate for your analysis. NOTE: This is not the same as a one sample t test; for that, you need this One sample t test calculator .

1. Choose data entry format

Caution: Changing format will erase your data.

2. Choose a test

Help me choose

3. Enter data

Help me arrange the data

4. View the results

What is a t test.

A t test is used to measure the difference between exactly two means. Its focus is on the same numeric data variable rather than counts or correlations between multiple variables. If you are taking the average of a sample of measurements, t tests are the most commonly used method to evaluate that data. It is particularly useful for small samples of less than 30 observations. For example, you might compare whether systolic blood pressure differs between a control and treated group, between men and women, or any other two groups.

This calculator uses a two-sample t test, which compares two datasets to see if their means are statistically different. That is different from a one sample t test , which compares the mean of your sample to some proposed theoretical value.

The most general formula for a t test is composed of two means (M1 and M2) and the overall standard error (SE) of the two samples:

t test formula

See our video on How to Perform a Two-sample t test for an intuitive explanation of t tests and an example.

How to use the t test calculator

  • Choose your data entry format . This will change how section 3 on the page looks. The first two options are for entering your data points themselves, either manually or by copy & paste. The last two are for entering the means for each group, along with the number of observations (N) and either the standard error of that mean (SEM) or standard deviation of the dataset (SD) standard error. If you have already calculated these summary statistics, the latter options will save you time.
  • Choose a test from the three options: Unpaired t test, Welch's unpaired t test, or Paired t test. Use our Ultimate Guide to t tests if you are unsure which is appropriate, as it includes a section on "How do I know which t test to use?". Notice not all options are available if you enter means only.
  • Enter data for the test, based on the format you chose in Step 1.
  • Click Calculate Now and View the results. All options will perform a two-tailed test .

Performing t tests? We can help.

Sign up for more information on how to perform t tests and other common statistical analyses.

Common t test confusion

In addition to the number of t test options, t tests are often confused with completely different techniques as well. Here's how to keep them all straight.

Correlation and regression are used to measure how much two factors move together. While t tests are part of regression analysis, they are focused on only one factor by comparing means in different samples.

ANOVA is used for comparing means across three or more total groups. In contrast, t tests compare means between exactly two groups.

Finally, contingency tables compare counts of observations within groups rather than a calculated average. Since t tests compare means of continuous variable between groups, contingency tables use methods such as chi square instead of t tests.

Assumptions of t tests

Because there are several versions of t tests, it's important to check the assumptions to figure out which is best suited for your project. Here are our analysis checklists for unpaired t tests and paired t tests , which are the two most common. These (and the ultimate guide to t tests ) go into detail on the basic assumptions underlying any t test:

  • Exactly two groups
  • Sample is normally distributed
  • Independent observations
  • Unequal or equal variance?
  • Paired or unpaired data?

Interpreting results

The three different options for t tests have slightly different interpretations, but they all hinge on hypothesis testing and P values. You need to select a significance threshold for your P value (often 0.05) before doing the test.

While P values can be easy to misinterpret , they are the most commonly used method to evaluate whether there is evidence of a difference between the sample of data collected and the null hypothesis. Once you have run the correct t test, look at the resulting P value. If the test result is less than your threshold, you have enough evidence to conclude that the data are significantly different.

If the test result is larger or equal to your threshold, you cannot conclude that there is a difference. However, you cannot conclude that there was definitively no difference either. It's possible that a dataset with more observations would have resulted in a different conclusion.

Depending on the test you run, you may see other statistics that were used to calculate the P value, including the mean difference, t statistic, degrees of freedom, and standard error. The confidence interval and a review of your dataset is given as well on the results page.

Graphing t tests

This calculator does not provide a chart or graph of t tests, however, graphing is an important part of analysis because it can help explain the results of the t test and highlight any potential outliers. See our Prism guide for some graphing tips for both unpaired and paired t tests.

Prism is built for customized, publication quality graphics and charts. For t tests we recommend simply plotting the datapoints themselves and the mean, or an estimation plot . Another popular approach is to use a violin plot, like those available in Prism.

For more information

Our ultimate guide to t tests includes examples, links, and intuitive explanations on the subject. It is quite simply the best place to start if you're looking for more about t tests!

If you enjoyed this calculator, you will love using Prism for analysis. Take a free 30-day trial to do more with your data, such as:

  • Clear guidance to pick the right t test and detailed results summaries
  • Custom, publication quality t test graphics, violin plots, and more
  • More t test options, including normality testing as well as nested and multiple t tests
  • Non-parametric test alternatives such as Wilcoxon, Mann-Whitney, and Kolmogorov-Smirnov

Check out our video on how to perform a t test in Prism , for an example from start to finish!

Remember, this page is just for two sample t tests. If you only have one sample, you need to use this calculator instead.

We Recommend:

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 3.

  • Sample variance
  • Sample standard deviation and bias
  • Sample and population standard deviation

Population and sample standard deviation review

Population and sample standard deviation.

  • If the data is being considered a population on its own, we divide by the number of data points, N ‍   .
  • If the data is a sample from a larger population, we divide by one fewer than the number of data points in the sample, n − 1 ‍   .

Population standard deviation

Example: population standard deviation, sample standard deviation, example: sample standard deviation, want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Incredible Answer

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.E: Hypothesis Testing with One Sample (Exercises)

  • Last updated
  • Save as PDF
  • Page ID 1146

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

9.1: Introduction

9.2: null and alternative hypotheses.

Some of the following statements refer to the null hypothesis, some to the alternate hypothesis.

State the null hypothesis, \(H_{0}\), and the alternative hypothesis. \(H_{a}\), in terms of the appropriate parameter \((\mu \text{or} p)\).

  • The mean number of years Americans work before retiring is 34.
  • At most 60% of Americans vote in presidential elections.
  • The mean starting salary for San Jose State University graduates is at least $100,000 per year.
  • Twenty-nine percent of high school seniors get drunk each month.
  • Fewer than 5% of adults ride the bus to work in Los Angeles.
  • The mean number of cars a person owns in her lifetime is not more than ten.
  • About half of Americans prefer to live away from cities, given the choice.
  • Europeans have a mean paid vacation each year of six weeks.
  • The chance of developing breast cancer is under 11% for women.
  • Private universities' mean tuition cost is more than $20,000 per year.
  • \(H_{0}: \mu = 34; H_{a}: \mu \neq 34\)
  • \(H_{0}: p \leq 0.60; H_{a}: p > 0.60\)
  • \(H_{0}: \mu \geq 100,000; H_{a}: \mu < 100,000\)
  • \(H_{0}: p = 0.29; H_{a}: p \neq 0.29\)
  • \(H_{0}: p = 0.05; H_{a}: p < 0.05\)
  • \(H_{0}: \mu \leq 10; H_{a}: \mu > 10\)
  • \(H_{0}: p = 0.50; H_{a}: p \neq 0.50\)
  • \(H_{0}: \mu = 6; H_{a}: \mu \neq 6\)
  • \(H_{0}: p ≥ 0.11; H_{a}: p < 0.11\)
  • \(H_{0}: \mu \leq 20,000; H_{a}: \mu > 20,000\)

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin? The alternative hypothesis is:

  • \(p < 0.30\)
  • \(p \leq 0.30\)
  • \(p \geq 0.30\)
  • \(p > 0.30\)

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 attended the midnight showing. An appropriate alternative hypothesis is:

  • \(p = 0.20\)
  • \(p > 0.20\)
  • \(p < 0.20\)
  • \(p \leq 0.20\)

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are:

  • \(H_{0}: \bar{x} = 4.5, H_{a}: \bar{x} > 4.5\)
  • \(H_{0}: \mu \geq 4.5, H_{a}: \mu < 4.5\)
  • \(H_{0}: \mu = 4.75, H_{a}: \mu > 4.75\)
  • \(H_{0}: \mu = 4.5, H_{a}: \mu > 4.5\)

9.3: Outcomes and the Type I and Type II Errors

State the Type I and Type II errors in complete sentences given the following statements.

  • The mean number of cars a person owns in his or her lifetime is not more than ten.
  • Private universities mean tuition cost is more than $20,000 per year.
  • Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We conclude that the mean is 34 years, when in fact it really is not 34 years.
  • Type I error: We conclude that more than 60% of Americans vote in presidential elections, when the actual percentage is at most 60%.Type II error: We conclude that at most 60% of Americans vote in presidential elections when, in fact, more than 60% do.
  • Type I error: We conclude that the mean starting salary is less than $100,000, when it really is at least $100,000. Type II error: We conclude that the mean starting salary is at least $100,000 when, in fact, it is less than $100,000.
  • Type I error: We conclude that the proportion of high school seniors who get drunk each month is not 29%, when it really is 29%. Type II error: We conclude that the proportion of high school seniors who get drunk each month is 29% when, in fact, it is not 29%.
  • Type I error: We conclude that fewer than 5% of adults ride the bus to work in Los Angeles, when the percentage that do is really 5% or more. Type II error: We conclude that 5% or more adults ride the bus to work in Los Angeles when, in fact, fewer that 5% do.
  • Type I error: We conclude that the mean number of cars a person owns in his or her lifetime is more than 10, when in reality it is not more than 10. Type II error: We conclude that the mean number of cars a person owns in his or her lifetime is not more than 10 when, in fact, it is more than 10.
  • Type I error: We conclude that the proportion of Americans who prefer to live away from cities is not about half, though the actual proportion is about half. Type II error: We conclude that the proportion of Americans who prefer to live away from cities is half when, in fact, it is not half.
  • Type I error: We conclude that the duration of paid vacations each year for Europeans is not six weeks, when in fact it is six weeks. Type II error: We conclude that the duration of paid vacations each year for Europeans is six weeks when, in fact, it is not.
  • Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: We conclude that the proportion of women who develop breast cancer is at least 11%, when in fact it is less than 11%.
  • Type I error: We conclude that the average tuition cost at private universities is more than $20,000, though in reality it is at most $20,000. Type II error: We conclude that the average tuition cost at private universities is at most $20,000 when, in fact, it is more than $20,000.

For statements a-j in Exercise 9.109 , answer the following in complete sentences.

  • State a consequence of committing a Type I error.
  • State a consequence of committing a Type II error.

When a new drug is created, the pharmaceutical company must subject it to testing before receiving the necessary permission from the Food and Drug Administration (FDA) to market the drug. Suppose the null hypothesis is “the drug is unsafe.” What is the Type II Error?

  • To conclude the drug is safe when in, fact, it is unsafe.
  • Not to conclude the drug is safe when, in fact, it is safe.
  • To conclude the drug is safe when, in fact, it is safe.
  • Not to conclude the drug is unsafe when, in fact, it is unsafe.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing. The Type I error is to conclude that the percent of EVC students who attended is ________.

  • at least 20%, when in fact, it is less than 20%.
  • 20%, when in fact, it is 20%.
  • less than 20%, when in fact, it is at least 20%.
  • less than 20%, when in fact, it is less than 20%.

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average?

The Type II error is not to reject that the mean number of hours of sleep LTCC students get per night is at least seven when, in fact, the mean number of hours

  • is more than seven hours.
  • is at most seven hours.
  • is at least seven hours.
  • is less than seven hours.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test, the Type I error is:

  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher
  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is the same
  • to conclude that the mean hours per week currently is 4.5, when in fact, it is higher
  • to conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is not higher

9.4: Distribution Needed for Hypothesis Testing

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average? The distribution to be used for this test is \(\bar{X} \sim\) ________________

  • \(N\left(7.24, \frac{1.93}{\sqrt{22}}\right)\)
  • \(N\left(7.24, 1.93\right)\)

9.5: Rare Events, the Sample, Decision and Conclusion

The National Institute of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. Conduct a hypothesis test to determine if the true proportion of people in that town suffering from depression or a depressive illness is lower than the percent in the general adult American population.

  • Is this a test of one mean or proportion?
  • State the null and alternative hypotheses. \(H_{0}\) : ____________________ \(H_{a}\) : ____________________
  • Is this a right-tailed, left-tailed, or two-tailed test?
  • What symbol represents the random variable for this test?
  • In words, define the random variable for this test.
  • \(x =\) ________________
  • \(n =\) ________________
  • \(p′ =\) _____________
  • Calculate \(\sigma_{x} =\) __________. Show the formula set-up.
  • State the distribution to use for the hypothesis test.
  • Find the \(p\text{-value}\).
  • Reason for the decision:
  • Conclusion (write out in a complete sentence):

9.6: Additional Information and Full Hypothesis Test Examples

For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's \(t\) - distribution for one of the following homework problems, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs to be replaced. From past studies of this tire, the standard deviation is known to be 8,000. A survey of owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 46,500 miles with a standard deviation of 9,800 miles. Using \(\alpha = 0.05\), is the data highly inconsistent with the claim?

  • \(H_{0}: \mu \geq 50,000\)
  • \(H_{a}: \mu < 50,000\)
  • Let \(\bar{X} =\) the average lifespan of a brand of tires.
  • normal distribution
  • \(z = -2.315\)
  • \(p\text{-value} = 0.0103\)
  • Check student’s solution.
  • alpha: 0.05
  • Decision: Reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the mean lifespan of the tires is less than 50,000 miles.
  • \((43,537, 49,463)\)

From generation to generation, the mean age when smokers first start to smoke varies. However, the standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers of this generation was done to see if the mean starting age is at least 19. The sample mean was 18.1 with a sample standard deviation of 1.3. Do the data support the claim at the 5% level?

The cost of a daily newspaper varies from city to city. However, the variation among prices remains steady with a standard deviation of 20¢. A study was done to test the claim that the mean cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard deviation of 18¢. Do the data support the claim at the 1% level?

  • \(H_{0}: \mu = $1.00\)
  • \(H_{a}: \mu \neq $1.00\)
  • Let \(\bar{X} =\) the average cost of a daily newspaper.
  • \(z = –0.866\)
  • \(p\text{-value} = 0.3865\)
  • \(\alpha: 0.01\)
  • Decision: Do not reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is greater than 0.01.
  • Conclusion: There is sufficient evidence to support the claim that the mean cost of daily papers is $1. The mean cost could be $1.
  • \(($0.84, $1.06)\)

An article in the San Jose Mercury News stated that students in the California state university system take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that the mean time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with a sample standard deviation of 1.2. Do the data support your claim at the 1% level?

The mean number of sick days an employee takes per year is believed to be about ten. Members of a personnel department do not believe this figure. They randomly survey eight employees. The number of sick days they took for the past year are as follows: 12; 4; 15; 3; 11; 8; 6; 8. Let \(x =\) the number of sick days they took for the past year. Should the personnel team believe that the mean number is ten?

  • \(H_{0}: \mu = 10\)
  • \(H_{a}: \mu \neq 10\)
  • Let \(\bar{X}\) the mean number of sick days an employee takes per year.
  • Student’s t -distribution
  • \(t = –1.12\)
  • \(p\text{-value} = 0.300\)
  • \(\alpha: 0.05\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean number of sick days is not ten.
  • \((4.9443, 11.806)\)

In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 hour week. Recently, many groups have been studying whether or not the women's movement has, in fact, resulted in an increase in the average work week for women (combining employment and at-home work). Suppose a study was done to determine if the mean work week has increased. 81 women were surveyed with the following results. The sample mean was 83; the sample standard deviation was ten. Does it appear that the mean work week has increased for women at the 5% level?

Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics class go through life feeling more enriched. For some reason that she can't quite figure out, most people don't believe her. You decide to check this out on your own. You randomly survey 64 of her past Elementary Statistics students and find that 34 feel more enriched as a result of her class. Now, what do you think?

  • \(H_{0}: p \geq 0.6\)
  • \(H_{a}: p < 0.6\)
  • Let \(P′ =\) the proportion of students who feel more enriched as a result of taking Elementary Statistics.
  • normal for a single proportion
  • \(p\text{-value} = 0.1308\)
  • Conclusion: There is insufficient evidence to conclude that less than 60 percent of her students feel more enriched.

The “plus-4s” confidence interval is \((0.411, 0.648)\)

A Nissan Motor Corporation advertisement read, “The average man’s I.Q. is 107. The average brown trout’s I.Q. is 4. So why can’t man catch brown trout?” Suppose you believe that the brown trout’s mean I.Q. is greater than four. You catch 12 brown trout. A fish psychologist determines the I.Q.s as follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief.

Refer to Exercise 9.119 . Conduct a hypothesis test to see if your decision and conclusion would change if your belief were that the brown trout’s mean I.Q. is not four.

  • \(H_{0}: \mu = 4\)
  • \(H_{a}: \mu \neq 4\)
  • Let \(\bar{X}\) the average I.Q. of a set of brown trout.
  • two-tailed Student's t-test
  • \(t = 1.95\)
  • \(p\text{-value} = 0.076\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05
  • Conclusion: There is insufficient evidence to conclude that the average IQ of brown trout is not four.
  • \((3.8865,5.9468)\)

According to an article in Newsweek , the natural ratio of girls to boys is 100:105. In China, the birth ratio is 100: 114 (46.7% girls). Suppose you don’t believe the reported figures of the percent of girls born in China. You conduct a study. In this study, you count the number of girls and boys born in 150 randomly chosen recent births. There are 60 girls and 90 boys born of the 150. Based on your study, do you believe that the percent of girls born in China is 46.7?

A poll done for Newsweek found that 13% of Americans have seen or sensed the presence of an angel. A contingent doubts that the percent is really that high. It conducts its own survey. Out of 76 Americans surveyed, only two had seen or sensed the presence of an angel. As a result of the contingent’s survey, would you agree with the Newsweek poll? In complete sentences, also give three reasons why the two polls might give different results.

  • \(H_{a}: p < 0.13\)
  • Let \(P′ =\) the proportion of Americans who have seen or sensed angels
  • –2.688
  • \(p\text{-value} = 0.0036\)
  • Reason for decision: The \(p\text{-value}\)e is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the percentage of Americans who have seen or sensed an angel is less than 13%.

The“plus-4s” confidence interval is (0.0022, 0.0978)

The mean work week for engineers in a start-up company is believed to be about 60 hours. A newly hired engineer hopes that it’s shorter. She asks ten engineering friends in start-ups for the lengths of their mean work weeks. Based on the results that follow, should she count on the mean work week to be shorter than 60 hours?

Data (length of mean work week): 70; 45; 55; 60; 65; 55; 55; 60; 50; 55.

Use the “Lap time” data for Lap 4 (see [link] ) to test the claim that Terri finishes Lap 4, on average, in less than 129 seconds. Use all twenty races given.

  • \(H_{0}: \mu \geq 129\)
  • \(H_{a}: \mu < 129\)
  • Let \(\bar{X} =\) the average time in seconds that Terri finishes Lap 4.
  • Student's t -distribution
  • \(t = 1.209\)
  • Conclusion: There is insufficient evidence to conclude that Terri’s mean lap time is less than 129 seconds.
  • \((128.63, 130.37)\)

Use the “Initial Public Offering” data (see [link] ) to test the claim that the mean offer price was $18 per share. Do not use all the data. Use your random number generator to randomly survey 15 prices.

The following questions were written by past students. They are excellent problems!

"Asian Family Reunion," by Chau Nguyen

Every two years it comes around.

We all get together from different towns.

In my honest opinion,

It's not a typical family reunion.

Not forty, or fifty, or sixty,

But how about seventy companions!

The kids would play, scream, and shout

One minute they're happy, another they'll pout.

The teenagers would look, stare, and compare

From how they look to what they wear.

The men would chat about their business

That they make more, but never less.

Money is always their subject

And there's always talk of more new projects.

The women get tired from all of the chats

They head to the kitchen to set out the mats.

Some would sit and some would stand

Eating and talking with plates in their hands.

Then come the games and the songs

And suddenly, everyone gets along!

With all that laughter, it's sad to say

That it always ends in the same old way.

They hug and kiss and say "good-bye"

And then they all begin to cry!

I say that 60 percent shed their tears

But my mom counted 35 people this year.

She said that boys and men will always have their pride,

So we won't ever see them cry.

I myself don't think she's correct,

So could you please try this problem to see if you object?

  • \(H_{0}: p = 0.60\)
  • \(H_{a}: p < 0.60\)
  • Let \(P′ =\) the proportion of family members who shed tears at a reunion.
  • –1.71
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of family members who shed tears at a reunion is less than 0.60. However, the test is weak because the \(p\text{-value}\) and alpha are quite close, so other tests should be done.
  • We are 95% confident that between 38.29% and 61.71% of family members will shed tears at a family reunion. \((0.3829, 0.6171)\). The“plus-4s” confidence interval (see chapter 8) is \((0.3861, 0.6139)\)

Note that here the “large-sample” \(1 - \text{PropZTest}\) provides the approximate \(p\text{-value}\) of 0.0438. Whenever a \(p\text{-value}\) based on a normal approximation is close to the level of significance, the exact \(p\text{-value}\) based on binomial probabilities should be calculated whenever possible. This is beyond the scope of this course.

"The Problem with Angels," by Cyndy Dowling

Although this problem is wholly mine,

The catalyst came from the magazine, Time.

On the magazine cover I did find

The realm of angels tickling my mind.

Inside, 69% I found to be

In angels, Americans do believe.

Then, it was time to rise to the task,

Ninety-five high school and college students I did ask.

Viewing all as one group,

Random sampling to get the scoop.

So, I asked each to be true,

"Do you believe in angels?" Tell me, do!

Hypothesizing at the start,

Totally believing in my heart

That the proportion who said yes

Would be equal on this test.

Lo and behold, seventy-three did arrive,

Out of the sample of ninety-five.

Now your job has just begun,

Solve this problem and have some fun.

"Blowing Bubbles," by Sondra Prull

Studying stats just made me tense,

I had to find some sane defense.

Some light and lifting simple play

To float my math anxiety away.

Blowing bubbles lifts me high

Takes my troubles to the sky.

POIK! They're gone, with all my stress

Bubble therapy is the best.

The label said each time I blew

The average number of bubbles would be at least 22.

I blew and blew and this I found

From 64 blows, they all are round!

But the number of bubbles in 64 blows

Varied widely, this I know.

20 per blow became the mean

They deviated by 6, and not 16.

From counting bubbles, I sure did relax

But now I give to you your task.

Was 22 a reasonable guess?

Find the answer and pass this test!

  • \(H_{0}: \mu \geq 22\)
  • \(H_{a}: \mu < 22\)
  • Let \(\bar{X} =\) the mean number of bubbles per blow.
  • –2.667
  • \(p\text{-value} = 0.00486\)
  • Conclusion: There is sufficient evidence to conclude that the mean number of bubbles per blow is less than 22.
  • \((18.501, 21.499)\)

"Dalmatian Darnation," by Kathy Sparling

A greedy dog breeder named Spreckles

Bred puppies with numerous freckles

The Dalmatians he sought

Possessed spot upon spot

The more spots, he thought, the more shekels.

His competitors did not agree

That freckles would increase the fee.

They said, “Spots are quite nice

But they don't affect price;

One should breed for improved pedigree.”

The breeders decided to prove

This strategy was a wrong move.

Breeding only for spots

Would wreak havoc, they thought.

His theory they want to disprove.

They proposed a contest to Spreckles

Comparing dog prices to freckles.

In records they looked up

One hundred one pups:

Dalmatians that fetched the most shekels.

They asked Mr. Spreckles to name

An average spot count he'd claim

To bring in big bucks.

Said Spreckles, “Well, shucks,

It's for one hundred one that I aim.”

Said an amateur statistician

Who wanted to help with this mission.

“Twenty-one for the sample

Standard deviation's ample:

They examined one hundred and one

Dalmatians that fetched a good sum.

They counted each spot,

Mark, freckle and dot

And tallied up every one.

Instead of one hundred one spots

They averaged ninety six dots

Can they muzzle Spreckles’

Obsession with freckles

Based on all the dog data they've got?

"Macaroni and Cheese, please!!" by Nedda Misherghi and Rachelle Hall

As a poor starving student I don't have much money to spend for even the bare necessities. So my favorite and main staple food is macaroni and cheese. It's high in taste and low in cost and nutritional value.

One day, as I sat down to determine the meaning of life, I got a serious craving for this, oh, so important, food of my life. So I went down the street to Greatway to get a box of macaroni and cheese, but it was SO expensive! $2.02 !!! Can you believe it? It made me stop and think. The world is changing fast. I had thought that the mean cost of a box (the normal size, not some super-gigantic-family-value-pack) was at most $1, but now I wasn't so sure. However, I was determined to find out. I went to 53 of the closest grocery stores and surveyed the prices of macaroni and cheese. Here are the data I wrote in my notebook:

Price per box of Mac and Cheese:

  • 5 stores @ $2.02
  • 15 stores @ $0.25
  • 3 stores @ $1.29
  • 6 stores @ $0.35
  • 4 stores @ $2.27
  • 7 stores @ $1.50
  • 5 stores @ $1.89
  • 8 stores @ 0.75.

I could see that the cost varied but I had to sit down to figure out whether or not I was right. If it does turn out that this mouth-watering dish is at most $1, then I'll throw a big cheesy party in our next statistics lab, with enough macaroni and cheese for just me. (After all, as a poor starving student I can't be expected to feed our class of animals!)

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Let \(\bar{X} =\) the mean cost in dollars of macaroni and cheese in a certain town.
  • Student's \(t\)-distribution
  • \(t = 0.340\)
  • \(p\text{-value} = 0.36756\)
  • Conclusion: The mean cost could be $1, or less. At the 5% significance level, there is insufficient evidence to conclude that the mean price of a box of macaroni and cheese is more than $1.
  • \((0.8291, 1.241)\)

"William Shakespeare: The Tragedy of Hamlet, Prince of Denmark," by Jacqueline Ghodsi

THE CHARACTERS (in order of appearance):

  • HAMLET, Prince of Denmark and student of Statistics
  • POLONIUS, Hamlet’s tutor
  • HOROTIO, friend to Hamlet and fellow student

Scene: The great library of the castle, in which Hamlet does his lessons

(The day is fair, but the face of Hamlet is clouded. He paces the large room. His tutor, Polonius, is reprimanding Hamlet regarding the latter’s recent experience. Horatio is seated at the large table at right stage.)

POLONIUS: My Lord, how cans’t thou admit that thou hast seen a ghost! It is but a figment of your imagination!

HAMLET: I beg to differ; I know of a certainty that five-and-seventy in one hundred of us, condemned to the whips and scorns of time as we are, have gazed upon a spirit of health, or goblin damn’d, be their intents wicked or charitable.

POLONIUS If thou doest insist upon thy wretched vision then let me invest your time; be true to thy work and speak to me through the reason of the null and alternate hypotheses. (He turns to Horatio.) Did not Hamlet himself say, “What piece of work is man, how noble in reason, how infinite in faculties? Then let not this foolishness persist. Go, Horatio, make a survey of three-and-sixty and discover what the true proportion be. For my part, I will never succumb to this fantasy, but deem man to be devoid of all reason should thy proposal of at least five-and-seventy in one hundred hold true.

HORATIO (to Hamlet): What should we do, my Lord?

HAMLET: Go to thy purpose, Horatio.

HORATIO: To what end, my Lord?

HAMLET: That you must teach me. But let me conjure you by the rights of our fellowship, by the consonance of our youth, but the obligation of our ever-preserved love, be even and direct with me, whether I am right or no.

(Horatio exits, followed by Polonius, leaving Hamlet to ponder alone.)

(The next day, Hamlet awaits anxiously the presence of his friend, Horatio. Polonius enters and places some books upon the table just a moment before Horatio enters.)

POLONIUS: So, Horatio, what is it thou didst reveal through thy deliberations?

HORATIO: In a random survey, for which purpose thou thyself sent me forth, I did discover that one-and-forty believe fervently that the spirits of the dead walk with us. Before my God, I might not this believe, without the sensible and true avouch of mine own eyes.

POLONIUS: Give thine own thoughts no tongue, Horatio. (Polonius turns to Hamlet.) But look to’t I charge you, my Lord. Come Horatio, let us go together, for this is not our test. (Horatio and Polonius leave together.)

HAMLET: To reject, or not reject, that is the question: whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous statistics, or to take arms against a sea of data, and, by opposing, end them. (Hamlet resignedly attends to his task.)

(Curtain falls)

"Untitled," by Stephen Chen

I've often wondered how software is released and sold to the public. Ironically, I work for a company that sells products with known problems. Unfortunately, most of the problems are difficult to create, which makes them difficult to fix. I usually use the test program X, which tests the product, to try to create a specific problem. When the test program is run to make an error occur, the likelihood of generating an error is 1%.

So, armed with this knowledge, I wrote a new test program Y that will generate the same error that test program X creates, but more often. To find out if my test program is better than the original, so that I can convince the management that I'm right, I ran my test program to find out how often I can generate the same error. When I ran my test program 50 times, I generated the error twice. While this may not seem much better, I think that I can convince the management to use my test program instead of the original test program. Am I right?

  • \(H_{0}: p = 0.01\)
  • \(H_{a}: p > 0.01\)
  • Let \(P′ =\) the proportion of errors generated
  • Normal for a single proportion
  • Decision: Reject the null hypothesis
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of errors generated is more than 0.01.

The“plus-4s” confidence interval is \((0.004, 0.144)\).

"Japanese Girls’ Names"

by Kumi Furuichi

It used to be very typical for Japanese girls’ names to end with “ko.” (The trend might have started around my grandmothers’ generation and its peak might have been around my mother’s generation.) “Ko” means “child” in Chinese characters. Parents would name their daughters with “ko” attaching to other Chinese characters which have meanings that they want their daughters to become, such as Sachiko—happy child, Yoshiko—a good child, Yasuko—a healthy child, and so on.

However, I noticed recently that only two out of nine of my Japanese girlfriends at this school have names which end with “ko.” More and more, parents seem to have become creative, modernized, and, sometimes, westernized in naming their children.

I have a feeling that, while 70 percent or more of my mother’s generation would have names with “ko” at the end, the proportion has dropped among my peers. I wrote down all my Japanese friends’, ex-classmates’, co-workers, and acquaintances’ names that I could remember. Following are the names. (Some are repeats.) Test to see if the proportion has dropped for this generation.

Ai, Akemi, Akiko, Ayumi, Chiaki, Chie, Eiko, Eri, Eriko, Fumiko, Harumi, Hitomi, Hiroko, Hiroko, Hidemi, Hisako, Hinako, Izumi, Izumi, Junko, Junko, Kana, Kanako, Kanayo, Kayo, Kayoko, Kazumi, Keiko, Keiko, Kei, Kumi, Kumiko, Kyoko, Kyoko, Madoka, Maho, Mai, Maiko, Maki, Miki, Miki, Mikiko, Mina, Minako, Miyako, Momoko, Nana, Naoko, Naoko, Naoko, Noriko, Rieko, Rika, Rika, Rumiko, Rei, Reiko, Reiko, Sachiko, Sachiko, Sachiyo, Saki, Sayaka, Sayoko, Sayuri, Seiko, Shiho, Shizuka, Sumiko, Takako, Takako, Tomoe, Tomoe, Tomoko, Touko, Yasuko, Yasuko, Yasuyo, Yoko, Yoko, Yoko, Yoshiko, Yoshiko, Yoshiko, Yuka, Yuki, Yuki, Yukiko, Yuko, Yuko.

"Phillip’s Wish," by Suzanne Osorio

My nephew likes to play

Chasing the girls makes his day.

He asked his mother

If it is okay

To get his ear pierced.

She said, “No way!”

To poke a hole through your ear,

Is not what I want for you, dear.

He argued his point quite well,

Says even my macho pal, Mel,

Has gotten this done.

It’s all just for fun.

C’mon please, mom, please, what the hell.

Again Phillip complained to his mother,

Saying half his friends (including their brothers)

Are piercing their ears

And they have no fears

He wants to be like the others.

She said, “I think it’s much less.

We must do a hypothesis test.

And if you are right,

I won’t put up a fight.

But, if not, then my case will rest.”

We proceeded to call fifty guys

To see whose prediction would fly.

Nineteen of the fifty

Said piercing was nifty

And earrings they’d occasionally buy.

Then there’s the other thirty-one,

Who said they’d never have this done.

So now this poem’s finished.

Will his hopes be diminished,

Or will my nephew have his fun?

  • \(H_{0}: p = 0.50\)
  • \(H_{a}: p < 0.50\)
  • Let \(P′ =\) the proportion of friends that has a pierced ear.
  • –1.70
  • \(p\text{-value} = 0.0448\)
  • Reason for decision: The \(p\text{-value}\) is less than 0.05. (However, they are very close.)
  • Conclusion: There is sufficient evidence to support the claim that less than 50% of his friends have pierced ears.
  • Confidence Interval: \((0.245, 0.515)\): The “plus-4s” confidence interval is \((0.259, 0.519)\).

"The Craven," by Mark Salangsang

Once upon a morning dreary

In stats class I was weak and weary.

Pondering over last night’s homework

Whose answers were now on the board

This I did and nothing more.

While I nodded nearly napping

Suddenly, there came a tapping.

As someone gently rapping,

Rapping my head as I snore.

Quoth the teacher, “Sleep no more.”

“In every class you fall asleep,”

The teacher said, his voice was deep.

“So a tally I’ve begun to keep

Of every class you nap and snore.

The percentage being forty-four.”

“My dear teacher I must confess,

While sleeping is what I do best.

The percentage, I think, must be less,

A percentage less than forty-four.”

This I said and nothing more.

“We’ll see,” he said and walked away,

And fifty classes from that day

He counted till the month of May

The classes in which I napped and snored.

The number he found was twenty-four.

At a significance level of 0.05,

Please tell me am I still alive?

Or did my grade just take a dive

Plunging down beneath the floor?

Upon thee I hereby implore.

Toastmasters International cites a report by Gallop Poll that 40% of Americans fear public speaking. A student believes that less than 40% of students at her school fear public speaking. She randomly surveys 361 schoolmates and finds that 135 report they fear public speaking. Conduct a hypothesis test to determine if the percent at her school is less than 40%.

  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p < 0.40\)
  • Let \(P′ =\) the proportion of schoolmates who fear public speaking.
  • –1.01
  • \(p\text{-value} = 0.1563\)
  • Conclusion: There is insufficient evidence to support the claim that less than 40% of students at the school fear public speaking.
  • Confidence Interval: \((0.3241, 0.4240)\): The “plus-4s” confidence interval is \((0.3257, 0.4250)\).

Sixty-eight percent of online courses taught at community colleges nationwide were taught by full-time faculty. To test if 68% also represents California’s percent for full-time faculty teaching the online classes, Long Beach City College (LBCC) in California, was randomly selected for comparison. In the same year, 34 of the 44 online courses LBCC offered were taught by full-time faculty. Conduct a hypothesis test to determine if 68% represents California. NOTE: For more accurate results, use more California community colleges and this past year's data.

According to an article in Bloomberg Businessweek , New York City's most recent adult smoking rate is 14%. Suppose that a survey is conducted to determine this year’s rate. Nine out of 70 randomly chosen N.Y. City residents reply that they smoke. Conduct a hypothesis test to determine if the rate is still 14% or if it has decreased.

  • \(H_{0}: p = 0.14\)
  • \(H_{a}: p < 0.14\)
  • Let \(P′ =\) the proportion of NYC residents that smoke.
  • –0.2756
  • \(p\text{-value} = 0.3914\)
  • At the 5% significance level, there is insufficient evidence to conclude that the proportion of NYC residents who smoke is less than 0.14.
  • Confidence Interval: \((0.0502, 0.2070)\): The “plus-4s” confidence interval (see chapter 8) is \((0.0676, 0.2297)\).

The mean age of De Anza College students in a previous term was 26.6 years old. An instructor thinks the mean age for online students is older than 26.6. She randomly surveys 56 online students and finds that the sample mean is 29.4 with a standard deviation of 2.1. Conduct a hypothesis test.

Registered nurses earned an average annual salary of $69,110. For that same year, a survey was conducted of 41 California registered nurses to determine if the annual salary is higher than $69,110 for California nurses. The sample average was $71,121 with a sample standard deviation of $7,489. Conduct a hypothesis test.

  • \(H_{0}: \mu = 69,110\)
  • \(H_{0}: \mu > 69,110\)
  • Let \(\bar{X} =\) the mean salary in dollars for California registered nurses.
  • \(t = 1.719\)
  • \(p\text{-value}: 0.0466\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean salary of California registered nurses exceeds $69,110.
  • \(($68,757, $73,485)\)

La Leche League International reports that the mean age of weaning a child from breastfeeding is age four to five worldwide. In America, most nursing mothers wean their children much earlier. Suppose a random survey is conducted of 21 U.S. mothers who recently weaned their children. The mean weaning age was nine months (3/4 year) with a standard deviation of 4 months. Conduct a hypothesis test to determine if the mean weaning age in the U.S. is less than four years old.

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin?

After conducting the test, your decision and conclusion are

  • Reject \(H_{0}\): There is sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Reject \(H_{0}\): There is sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing.

At a 1% level of significance, an appropriate conclusion is:

  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is more than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is at least 20%.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test.

At a significance level of \(a = 0.05\), what is the correct conclusion?

  • There is enough evidence to conclude that the mean number of hours is more than 4.75
  • There is enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.75

Instructions: For the following ten exercises,

Hypothesis testing: For the following ten exercises, answer each question.

State the null and alternate hypothesis.

State the \(p\text{-value}\).

State \(\alpha\).

What is your decision?

Write a conclusion.

Answer any other questions asked in the problem.

According to the Center for Disease Control website, in 2011 at least 18% of high school students have smoked a cigarette. An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized–approximately 1,200 students–small city demographic) to determine if the local high school’s percentage was lower. One hundred fifty students were chosen at random and surveyed. Of the 150 students surveyed, 82 have smoked. Use a significance level of 0.05 and using appropriate statistical evidence, conduct a hypothesis test and state the conclusions.

A recent survey in the N.Y. Times Almanac indicated that 48.8% of families own stock. A broker wanted to determine if this survey could be valid. He surveyed a random sample of 250 families and found that 142 owned some type of stock. At the 0.05 significance level, can the survey be considered to be accurate?

  • \(H_{0}: p = 0.488\) \(H_{a}: p \neq 0.488\)
  • \(p\text{-value} = 0.0114\)
  • \(\alpha = 0.05\)
  • Reject the null hypothesis.
  • At the 5% level of significance, there is enough evidence to conclude that 48.8% of families own stocks.
  • The survey does not appear to be accurate.

Driver error can be listed as the cause of approximately 54% of all fatal auto accidents, according to the American Automobile Association. Thirty randomly selected fatal accidents are examined, and it is determined that 14 were caused by driver error. Using \(\alpha = 0.05\), is the AAA proportion accurate?

The US Department of Energy reported that 51.7% of homes were heated by natural gas. A random sample of 221 homes in Kentucky found that 115 were heated by natural gas. Does the evidence support the claim for Kentucky at the \(\alpha = 0.05\) level in Kentucky? Are the results applicable across the country? Why?

  • \(H_{0}: p = 0.517\) \(H_{0}: p \neq 0.517\)
  • \(p\text{-value} = 0.9203\).
  • \(\alpha = 0.05\).
  • Do not reject the null hypothesis.
  • At the 5% significance level, there is not enough evidence to conclude that the proportion of homes in Kentucky that are heated by natural gas is 0.517.
  • However, we cannot generalize this result to the entire nation. First, the sample’s population is only the state of Kentucky. Second, it is reasonable to assume that homes in the extreme north and south will have extreme high usage and low usage, respectively. We would need to expand our sample base to include these possibilities if we wanted to generalize this claim to the entire nation.

For Americans using library services, the American Library Association claims that at most 67% of patrons borrow books. The library director in Owensboro, Kentucky feels this is not true, so she asked a local college statistic class to conduct a survey. The class randomly selected 100 patrons and found that 82 borrowed books. Did the class demonstrate that the percentage was higher in Owensboro, KY? Use \(\alpha = 0.01\) level of significance. What is the possible proportion of patrons that do borrow books from the Owensboro Library?

The Weather Underground reported that the mean amount of summer rainfall for the northeastern US is at least 11.52 inches. Ten cities in the northeast are randomly selected and the mean rainfall amount is calculated to be 7.42 inches with a standard deviation of 1.3 inches. At the \(\alpha = 0.05 level\), can it be concluded that the mean rainfall was below the reported average? What if \(\alpha = 0.01\)? Assume the amount of summer rainfall follows a normal distribution.

  • \(H_{0}: \mu \geq 11.52\) \(H_{a}: \mu < 11.52\)
  • \(p\text{-value} = 0.000002\) which is almost 0.
  • At the 5% significance level, there is enough evidence to conclude that the mean amount of summer rain in the northeaster US is less than 11.52 inches, on average.
  • We would make the same conclusion if alpha was 1% because the \(p\text{-value}\) is almost 0.

A survey in the N.Y. Times Almanac finds the mean commute time (one way) is 25.4 minutes for the 15 largest US cities. The Austin, TX chamber of commerce feels that Austin’s commute time is less and wants to publicize this fact. The mean for 25 randomly selected commuters is 22.1 minutes with a standard deviation of 5.3 minutes. At the \(\alpha = 0.10\) level, is the Austin, TX commute significantly less than the mean commute time for the 15 largest US cities?

A report by the Gallup Poll found that a woman visits her doctor, on average, at most 5.8 times each year. A random sample of 20 women results in these yearly visit totals

3; 2; 1; 3; 7; 2; 9; 4; 6; 6; 8; 0; 5; 6; 4; 2; 1; 3; 4; 1

At the \(\alpha = 0.05\) level can it be concluded that the sample mean is higher than 5.8 visits per year?

  • \(H_{0}: \mu \leq 5.8\) \(H_{a}: \mu > 5.8\)
  • \(p\text{-value} = 0.9987\)
  • At the 5% level of significance, there is not enough evidence to conclude that a woman visits her doctor, on average, more than 5.8 times a year.

According to the N.Y. Times Almanac the mean family size in the U.S. is 3.18. A sample of a college math class resulted in the following family sizes:

5; 4; 5; 4; 4; 3; 6; 4; 3; 3; 5; 5; 6; 3; 3; 2; 7; 4; 5; 2; 2; 2; 3; 2

At \(\alpha = 0.05\) level, is the class’ mean family size greater than the national average? Does the Almanac result remain valid? Why?

The student academic group on a college campus claims that freshman students study at least 2.5 hours per day, on average. One Introduction to Statistics class was skeptical. The class took a random sample of 30 freshman students and found a mean study time of 137 minutes with a standard deviation of 45 minutes. At α = 0.01 level, is the student academic group’s claim correct?

  • \(H_{0}: \mu \geq 150\) \(H_{0}: \mu < 150\)
  • \(p\text{-value} = 0.0622\)
  • \(\alpha = 0.01\)
  • At the 1% significance level, there is not enough evidence to conclude that freshmen students study less than 2.5 hours per day, on average.
  • The student academic group’s claim appears to be correct.

9.7: Hypothesis Testing of a Single Mean and Single Proportion

T-test for two Means – Unknown Population Standard Deviations

Instructions : Use this T-Test Calculator for two Independent Means calculator to conduct a t-test for two population means (\(\mu_1\) and \(\mu_2\)), with unknown population standard deviations. This test apply when you have two-independent samples, and the population standard deviations \(\sigma_1\) and \(\sigma_2\) and not known. Please select the null and alternative hypotheses, type the significance level, the sample means, the sample standard deviations, the sample sizes, and the results of the t-test for two independent samples will be displayed for you:

hypothesis test given mean and standard deviation

The T-test for Two Independent Samples

More about the t-test for two means so you can better interpret the output presented above: A t-test for two means with unknown population variances and two independent samples is a hypothesis test that attempts to make a claim about the population means (\(\mu_1\) and \(\mu_2\)).

More specifically, a t-test uses sample information to assess how plausible it is for the population means \(\mu_1\) and \(\mu_2\) to be equal. The test has two non-overlapping hypotheses, the null and the alternative hypothesis.

The null hypothesis is a statement about the population means, specifically the assumption of no effect, and the alternative hypothesis is the complementary hypothesis to the null hypothesis.

Properties of the two sample t-test

The main properties of a two sample t-test for two population means are:

  • Depending on our knowledge about the "no effect" situation, the t-test can be two-tailed, left-tailed or right-tailed
  • The main principle of hypothesis testing is that the null hypothesis is rejected if the test statistic obtained is sufficiently unlikely under the assumption that the null hypothesis is true
  • The p-value is the probability of obtaining sample results as extreme or more extreme than the sample results obtained, under the assumption that the null hypothesis is true
  • In a hypothesis tests there are two types of errors. Type I error occurs when we reject a true null hypothesis, and the Type II error occurs when we fail to reject a false null hypothesis

How do you compute the t-statistic for the t test for two independent samples?

The formula for a t-statistic for two population means (with two independent samples), with unknown population variances shows us how to calculate t-test with mean and standard deviation and it depends on whether the population variances are assumed to be equal or not. If the population variances are assumed to be unequal, then the formula is:

On the other hand, if the population variances are assumed to be equal, then the formula is:

Normally, the way of knowing whether the population variances must be assumed to be equal or unequal is by using an F-test for equality of variances.

With the above t-statistic, we can compute the corresponding p-value, which allows us to assess whether or not there is a statistically significant difference between two means.

Why is it called t-test for independent samples?

This is because the samples are not related with each other, in a way that the outcomes from one sample are unrelated from the other sample. If the samples are related (for example, you are comparing the answers of husbands and wives, or identical twins), you should use a t-test for paired samples instead .

What if the population standard deviations are known?

The main purpose of this calculator is for comparing two population mean when sigma is unknown for both populations. In case that the population standard deviations are known, then you should use instead this z-test for two means .

Related Calculators

Chi-Square Test for Goodness of Fit

log in to your account

Reset password.

IMAGES

  1. How To Calculate Standard Deviation Below The Mean

    hypothesis test given mean and standard deviation

  2. hypothesis test formula statistics

    hypothesis test given mean and standard deviation

  3. Standard Deviation: Variation from the Mean

    hypothesis test given mean and standard deviation

  4. Hypothesis testing tutorial using p value method

    hypothesis test given mean and standard deviation

  5. t-test formula

    hypothesis test given mean and standard deviation

  6. How to Calculate a Sample Standard Deviation

    hypothesis test given mean and standard deviation

VIDEO

  1. Hypothesis Testing for the Standard Deviation with One Sample Using Statcrunch

  2. T test Part 1 Hypothesis Set Up and Formula Discussion MBS First Semester Statistics Solution

  3. Hypothesis Tests Standard Deviation

  4. Testing of hypothesis -single Mean Problems| Statistical Inference| MAT202 |MAT208 |Module 3| Part 8

  5. Testing of hypothesis -single Mean Problems| Statistical Inference| MAT202 |MAT208 |Module 3| Part 7

  6. F Test part 2 Variance Test Hypothesis Set Up formula Discussion MBS first Semester Statistics

COMMENTS

  1. 8.3: Hypothesis Test Examples for Means with Unknown Standard Deviation

    p-value = P(x¯ > 67) = 0.0396 p -value = P ( x ¯ > 67) = 0.0396 where the sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data. Interpretation of the p-value: If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more. Figure 8.3.11 8.3. 11.

  2. 8.4: Hypothesis Test on a Single Standard Deviation

    A test of a single standard deviation assumes that the underlying distribution is normal. The null and alternative hypotheses are stated in terms of the population standard deviation (or population variance). The test statistic is: χ2 = (n − 1)s2 σ2 (8.4.1) (8.4.1) χ 2 = ( n − 1) s 2 σ 2. where:

  3. Hypothesis Test for a Mean

    where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size. Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77.

  4. Hypothesis Testing for the Mean

    Table 8.3: One-sided hypothesis testing for the mean: H0: μ ≤ μ0, H1: μ > μ0. Note that the tests mentioned in Table 8.3 remain valid if we replace the null hypothesis by μ = μ0. The reason for this is that in choosing the threshold c, we assumed the worst case scenario, i.e, μ = μ0 .

  5. T Test Overview: How to Use & Examples

    We'll use a two-sample t test to evaluate if the difference between the two group means is statistically significant. The t test output is below. In the output, you can see that the treatment group (Sample 1) has a mean of 109 while the control group's (Sample 2) average is 100. The p-value for the difference between the groups is 0.112.

  6. 8.3: Hypothesis Test for One Mean

    Then type in the hypothesized mean (μ 0), sample standard deviation, sample mean, sample size (or list name (list1), and Freq: 1), arrow over to the \(\neq\), <, > and select the sign that is the same as the problem's alternative hypothesis statement then press the [ENTER] key to calculate. The calculator returns the t-test statistic and p ...

  7. How to Find the P value: Process and Calculations

    For this example, assume we're tasked with determining whether a sample mean is different from a hypothesized value. We're given the sample statistics below and need to find the p value. Mean: 330.6; Standard deviation: 154.2; Sample size: 25; Null hypothesis value: 260; Let's work through the step-by-step process of how to calculate a p ...

  8. t-test Calculator

    This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing! Enter your T-score and the number of degrees of freedom. If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and ...

  9. 8.6 Hypothesis Tests for a Population Mean with Known Population

    If no significance level is given, a common standard is to use a significance level of 5%. ... This is a test on a population mean where the population standard deviation is known ([latex]\sigma=0.8[/latex]). ... The hypothesis test for a population mean is a well established process:

  10. Hypothesis testing and p-values (video)

    This is the mean. If I did 1 standard deviation, 2 standard deviations, 3 standard deviations-- that's in the positive direction. Actually let me draw it a little bit different than that. This wasn't a nicely drawn bell curve, but I'll do 1 standard deviation, 2 standard deviation, and then 3 standard deviations in the positive direction.

  11. Hypothesis Testing for Means & Proportions

    We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H0: p = p 0. if min (np 0 , n (1-p 0 )) > 5. The formula above is appropriate for large samples, defined when the smaller of np 0 and n (1-p 0) is at least 5.

  12. 8.6: Hypothesis Test of a Single Population Mean with Examples

    Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure. Find or identify the sample size, n, the sample mean, \ (\bar {x}\) and the sample standard deviation, s.

  13. Section 10.4: Hypothesis Tests for a Population Standard Deviation

    Performing a Hypothesis Test Regarding σ. Step 1: State the null and alternative hypotheses. Step 2: Decide on a level of significance, α. Step 3: Compute the test statistic, . Step 4: Determine the P -value. Step 5: Reject the null hypothesis if the P -value is less than the level of significance, α.

  14. 8.7 Hypothesis Tests for a Population Mean with Unknown Population

    The p-value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean. When the population standard deviation is unknown, use the [latex]t[/latex]-distribution to find the p-value.. If the p-value is the area in the left-tail: Use the t.dist function to find the p-value. In the t.dist(t-score, degrees of freedom, logic operator) function:

  15. 8.3: Hypothesis Testing of Single Mean

    Thus the test statistic is. T = x¯ −μ0 s/ n−−√ T = x ¯ − μ 0 s / n. and has the Student t t -distribution with n − 1 = 5 − 1 = 4 n − 1 = 5 − 1 = 4 degrees of freedom. Step 3. From the data we compute x¯ = 169 x ¯ = 169 and s = 10.39 s = 10.39. Inserting these values into the formula for the test statistic gives.

  16. Hypothesis Testing

    Example problem: A sample of 200 people has a mean age of 21 with a population standard deviation (σ) of 5. Test the hypothesis that the population mean is 18.9 at α = 0.05. Step 1: State the null hypothesis. In this case, the null hypothesis is that the population mean is 18.9, so we write: H 0: μ = 18.9. Step 2: State the alternative ...

  17. T test calculator

    A t test compares the means of two groups. There are several types of two sample t tests and this calculator focuses on the three most common: unpaired, welch's, and paired t tests. Directions for using the calculator are listed below, along with more information about two sample t tests and help on which is appropriate for your analysis. NOTE: This is not the same as a one sample t test; for ...

  18. Z Test: Uses, Formula & Examples

    Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ. A Z test is a form of inferential statistics. It uses samples to draw conclusions about populations.

  19. 8.2: Large Sample Tests for a Population Mean

    Regardless of the mean amount dispensed, the standard deviation of the amount dispensed always has value \(0.22\) ounce. A quality control engineer routinely selects \(30\) jars from the assembly line to check the amounts filled. On one occasion, the sample mean is \(\bar{x}=8.2\) ounces and the sample standard deviation is \(s=0.25\) ounce.

  20. How to perform two-sample t-tests in R by inputting sample statistics

    Let's say we have the statistics given below. gender mean sd n f 1.666667 0.5773503 3 m 4.500000 0.5773503 4 How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable) using statistics like this rather than actual data?

  21. how to test a hypothesis when given population mean and standard

    The null hypothesis is that the mean weight is 49.3 lbs, with 14 lbs. standard deviation. The question is, what is the probability that you observed a mean of 51.5 lbs. in a sample of 196 children, if the true mean and standard deviation followed the null hypothesis?

  22. Population and sample standard deviation review

    Here's how to calculate sample standard deviation: Step 1: Calculate the mean of the data—this is x ‍ in the formula. Step 2: Subtract the mean from each data point. These differences are called deviations. Data points below the mean will have negative deviations, and data points above the mean will have positive deviations.

  23. 9.E: Hypothesis Testing with One Sample (Exercises)

    The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are: \(H_{0}: \bar{x} = 4.5, H_{a}: \bar{x} > 4.5\) ... State the Type I and Type II errors in complete sentences given the following statements. The mean number of years Americans work before retiring is 34 ...

  24. T-test for two Means

    Instructions : Use this T-Test Calculator for two Independent Means calculator to conduct a t-test for two population means ( \mu_1 μ1 and \mu_2 μ2 ), with unknown population standard deviations. This test apply when you have two-independent samples, and the population standard deviations \sigma_1 σ1 and \sigma_2 σ2 and not known.