Statology

Statistics Made Easy

Two-Tailed Hypothesis Tests: 3 Example Problems

In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.

Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:

H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value

H A (Alternative Hypothesis): Population parameter <, >, ≠ some value

There are two types of hypothesis tests:

  • One-tailed test : Alternative hypothesis contains either < or > sign
  • Two-tailed test : Alternative hypothesis contains the ≠ sign

In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.

This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.

Check out the following example problems to gain a better understanding of two-tailed tests.

Example 1: Factory Widgets

Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 20 grams
  • H A (Alternative Hypothesis): μ ≠ 20 grams

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.

To test this, he uses the new method to produce 20 widgets and obtains the following information:

  • n = 20 widgets
  • x = 19.8 grams
  • s = 3.1 grams

Plugging these values into the One Sample t-test Calculator , we obtain the following results:

  • t-test statistic: -0.288525
  • two-tailed p-value: 0.776

Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.

Example 2: Plant Growth

Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.

To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 10 inches
  • H A (Alternative Hypothesis): μ ≠ 10 inches

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.

To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:

  • n = 15 plants
  • x = 11.4 inches
  • s = 2.5 inches
  • t-test statistic: 2.1689
  • two-tailed p-value: 0.0478

Since the p-value is less than .05, the botanist rejects the null hypothesis.

She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.

Example 3: Studying Method

A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.

To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.

She then performs a hypothesis test using the following hypotheses:

  • H 0 : μ = 82
  • H A : μ ≠ 82

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.

To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:

  • t-test statistic: 3.6586
  • two-tailed p-value: 0.0012

Since the p-value is less than .05, the professor rejects the null hypothesis.

She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is a Two-Tailed Test?

Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.

  • Two-Tailed Test FAQs
  • Corporate Finance
  • Financial Analysis

What Is a Two-Tailed Test? Definition and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

two tailed hypothesis test example

Investopedia / Joules Garcia

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

Key Takeaways

  • In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
  • It is used in null-hypothesis testing and testing for statistical significance.
  • If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
  • By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.

A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.

A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.

There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.

A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.

To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.

For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.

If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.

Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.

When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a  one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be  at least  x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being  either  higher or lower than it.

If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.

A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.

Example of a Two-Tailed Test

As a hypothetical example, imagine that a new  stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.

A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?

  • H 0 : Null Hypothesis: mean = 18
  • H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
  • Rejection region: Z <= - Z 2.5  and Z>=Z 2.5  (assuming 5% significance level, split 2.5 each on either side).
  • Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25

This calculated Z value falls between the two limits defined by: - Z 2.5  = -1.96 and Z 2.5  = 1.96.

This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.

How Is a Two-Tailed Test Designed?

A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.

What Is the Difference Between a Two-Tailed and One-Tailed Test?

A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than  or  significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.

What Is a Z-score?

A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.

San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."

two tailed hypothesis test example

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Two-Tailed Hypothesis Tests: 3 Example Problems

In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.

Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:

H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value

H A (Alternative Hypothesis): Population parameter , ≠ some value

There are two types of hypothesis tests:

  • One-tailed test : Alternative hypothesis contains either or > sign
  • Two-tailed test : Alternative hypothesis contains the ≠ sign

In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.

This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.

Check out the following example problems to gain a better understanding of two-tailed tests.

Example 1: Factory Widgets

Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 20 grams
  • H A (Alternative Hypothesis): μ ≠ 20 grams

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.

To test this, he uses the new method to produce 20 widgets and obtains the following information:

  • n = 20 widgets
  • x = 19.8 grams
  • s = 3.1 grams

Plugging these values into the One Sample t-test Calculator , we obtain the following results:

  • t-test statistic: -0.288525
  • two-tailed p-value: 0.776

Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.

Example 2: Plant Growth

Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.

To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 10 inches
  • H A (Alternative Hypothesis): μ ≠ 10 inches

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.

To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:

  • n = 15 plants
  • x = 11.4 inches
  • s = 2.5 inches
  • t-test statistic: 2.1689
  • two-tailed p-value: 0.0478

Since the p-value is less than .05, the botanist rejects the null hypothesis.

She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.

Example 3: Studying Method

A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.

To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.

She then performs a hypothesis test using the following hypotheses:

  • H 0 : μ = 82
  • H A : μ ≠ 82

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.

To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:

  • t-test statistic: 3.6586
  • two-tailed p-value: 0.0012

Since the p-value is less than .05, the professor rejects the null hypothesis.

She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?

Statistics vs. Probability: What’s the Difference?

One sample z-test calculator, related posts, how to normalize data between -1 and 1, vba: how to check if string contains another..., how to interpret f-values in a two-way anova, how to create a vector of ones in..., how to determine if a probability distribution is..., what is a symmetric histogram (definition & examples), how to find the mode of a histogram..., how to find quartiles in even and odd..., how to calculate sxy in statistics (with example), how to calculate sxx in statistics (with example).

two tailed hypothesis test example

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

11.4: One- and Two-Tailed Tests

  • Last updated
  • Save as PDF
  • Page ID 2148

  • Rice University

Learning Objectives

  • Define Type I and Type II errors
  • Interpret significant and non-significant differences
  • Explain why the null hypothesis should not be accepted when the effect is not significant

In the James Bond case study, Mr. Bond was given \(16\) trials on which he judged whether a martini had been shaken or stirred. He was correct on \(13\) of the trials. From the binomial distribution, we know that the probability of being correct \(13\) or more times out of \(16\) if one is only guessing is \(0.0106\). Figure \(\PageIndex{1}\) shows a graph of the binomial distribution. The red bars show the values greater than or equal to \(13\). As you can see in the figure, the probabilities are calculated for the upper tail of the distribution. A probability calculated in only one tail of the distribution is called a "one-tailed probability."

Binomial Calculator

A slightly different question can be asked of the data: "What is the probability of getting a result as extreme or more extreme than the one observed?" Since the chance expectation is \(8/16\), a result of \(3/16\) is equally as extreme as \(13/16\). Thus, to calculate this probability, we would consider both tails of the distribution. Since the binomial distribution is symmetric when \(\pi =0.5\), this probability is exactly double the probability of \(0.0106\) computed previously. Therefore, \(p = 0.0212\). A probability calculated in both tails of a distribution is called a "two-tailed probability" (see Figure \(\PageIndex{2}\)).

Should the one-tailed or the two-tailed probability be used to assess Mr. Bond's performance? That depends on the way the question is posed. If we are asking whether Mr. Bond can tell the difference between shaken or stirred martinis, then we would conclude he could if he performed either much better than chance or much worse than chance. If he performed much worse than chance, we would conclude that he can tell the difference, but he does not know which is which. Therefore, since we are going to reject the null hypothesis if Mr. Bond does either very well or very poorly, we will use a two-tailed probability.

On the other hand, if our question is whether Mr. Bond is better than chance at determining whether a martini is shaken or stirred, we would use a one-tailed probability. What would the one-tailed probability be if Mr. Bond were correct on only \(3\) of the \(16\) trials? Since the one-tailed probability is the probability of the right-hand tail, it would be the probability of getting \(3\) or more correct out of \(16\). This is a very high probability and the null hypothesis would not be rejected.

The null hypothesis for the two-tailed test is \(\pi =0.5\). By contrast, the null hypothesis for the one-tailed test is \(\pi \leq 0.5\). Accordingly, we reject the two-tailed hypothesis if the sample proportion deviates greatly from \(0.5\) in either direction. The one-tailed hypothesis is rejected only if the sample proportion is much greater than \(0.5\). The alternative hypothesis in the two-tailed test is \(\pi \neq 0.5\). In the one-tailed test it is \(\pi > 0.5\).

You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data. Statistical tests that compute one-tailed probabilities are called one-tailed tests; those that compute two-tailed probabilities are called two-tailed tests. Two-tailed tests are much more common than one-tailed tests in scientific research because an outcome signifying that something other than chance is operating is usually worth noting. One-tailed tests are appropriate when it is not important to distinguish between no effect and an effect in the unexpected direction. For example, consider an experiment designed to test the efficacy of a treatment for the common cold. The researcher would only be interested in whether the treatment was better than a placebo control. It would not be worth distinguishing between the case in which the treatment was worse than a placebo and the case in which it was the same because in both cases the drug would be worthless.

Some have argued that a one-tailed test is justified whenever the researcher predicts the direction of an effect. The problem with this argument is that if the effect comes out strongly in the non-predicted direction, the researcher is not justified in concluding that the effect is not zero. Since this is unrealistic, one-tailed tests are usually viewed skeptically if justified on this basis alone.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 12.

  • Hypothesis testing and p-values

One-tailed and two-tailed tests

  • Z-statistics vs. T-statistics
  • Small sample hypothesis test
  • Large sample proportion hypothesis testing

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Video transcript

two tailed hypothesis test example

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.2 Two-tailed tests

Hypotheses that have an equal (=) or not equal (≠) supposition (sign) in the statement are called non-directional hypotheses . In non-directional hypotheses, the researcher is interested in whether there is a statistically significant difference or relationship between two or more variables, but does not have any specific expectation about which group or variable will be higher or lower. For example, a non-directional hypothesis might be: ‘There is a difference in the preference for brand X between male and female consumers.’ In this hypothesis, the researcher is interested in whether there is a statistically significant difference in the preference for brand X between male and female consumers, but does not have a specific prediction about which gender will have a higher preference. The researcher may conduct a survey or experiment to collect data on the brand preference of male and female consumers and then use statistical analysis to determine whether there is a significant difference between the two groups.

Non-directional hypotheses are also known as two-tailed hypotheses. The term ‘two-tailed’ comes from the fact that the statistical test used to evaluate the hypothesis is based on the assumption that the difference or relationship could occur in either direction, resulting in two ‘tails’ in the probability distribution. Using the coffee foam example (from Activity 1), you have the following set of hypotheses:

H 0 : µ = 1cm foam

H a : µ ≠ 1cm foam

In this case, the researcher can reject the null hypothesis for the mean value that is either ‘much higher’ or ‘much lower’ than 1 cm foam. This is called a two-tailed test because the rejection region includes outcomes from both the upper and lower tails of the sample distribution when determining a decision rule. To give an illustration, if you set alpha level (α) equal to 0.05, that would give you a 95% confidence level. Then, you would reject the null hypothesis for obtained values of z 1.96 (you will look at how to calculate z-scores later in the course).

This can be plotted on a graph as shown in Figure 7.

A two-tailed test shown in a symmetrical graph reminiscent of a bell

A symmetrical graph reminiscent of a bell. The x-axis is labelled ‘z-score’ and the y-axis is labelled ‘probability density’. The x-axis increases in increments of 1 from -2 to 2.

The top of the bell-shaped curve is labelled ‘Foam height = 1cm’. The graph circles the rejection regions of the null hypothesis on both sides of the bell curve. Within these circles are two areas shaded orange: beneath the curve from -2 downwards which is labelled z 1.96 and α = 0.025.

In a two-tailed hypothesis test, the null hypothesis assumes that there is no significant difference or relationship between the two groups or variables, and the alternative hypothesis suggests that there is a significant difference or relationship, but does not specify the direction of the difference or relationship.

When performing a two-tailed test, you need to determine the level of significance, which is denoted by alpha (α). The value of alpha, in this case, is 0.05. To perform a two-tailed test at a significance level of 0.05, you need to divide alpha by 2, giving a significance level of 0.025 for each distribution tail (0.05/2 = 0.025). This is done because the two-tailed test is looking for significance in either tail of the distribution. If the calculated test statistic falls in the rejection region of either tail of the distribution, then the null hypothesis is rejected and the alternative hypothesis is accepted. In this case, the researcher can conclude that there is a significant difference or relationship between the two groups or variables.

Assuming that the population follows a normal distribution, the tail located below the critical value of z = –1.96 (in a later section, you will discuss how this value was determined) and the tail above the critical value of z = +1.96 each represent a proportion of 0.025. These tails are referred to as the lower and upper tails, respectively, and they correspond to the extreme values of the distribution that are far from the central part of the bell curve. These critical values are used in a two-tailed hypothesis test to determine whether to reject or fail to reject the null hypothesis. The null hypothesis represents the default assumption that there is no significant difference between the observed data and what would be expected under a specific condition.

If the calculated test statistic falls within the critical values, then the null hypothesis cannot be rejected at the 0.05 level of significance. However, if the calculated test statistic falls outside the critical values (orange-coloured areas in Figure 7), then the null hypothesis can be rejected in favour of the alternative hypothesis, suggesting that there is evidence of a significant difference between the observed data and what would be expected under the specified condition.

Previous

two tailed hypothesis test example

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.3 hypothesis testing examples.

  • Example: Right-Tailed Test
  • Example: Left-Tailed Test
  • Example: Two-Tailed Test

Brinell Hardness Scores

An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:

The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:

H 0 : μ = 170 H A : μ > 170

The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

Descriptive Statistics

$\mu$: mean of Brinelli

Null hypothesis    H₀: $\mu$ = 170 Alternative hypothesis    H₁: $\mu$ > 170

The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.

If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):

t distribution graph for df = 24 and a right tailed test of .05 significance level

Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:

t distribution graph of right tailed test showing the p-value of 0117 for a t-value of 1.22

In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.

Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Height of Sunflowers

A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:

The biologist's hypotheses are:

H 0 : μ = 15.7 H A : μ < 15.7

The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:

$\mu$: mean of Height

Null hypothesis    H₀: $\mu$ = 15.7 Alternative hypothesis    H₁: $\mu$ < 15.7

The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.

Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."

If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3

Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:

t-distribution for left tailed test with significance level of 0.05 shown in left tail

In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.

t-distribution graph for left tailed test with a t-value of -4.60 and left tail area of 0.000

Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

Gum Thickness

A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:

The quality control specialist's hypotheses are:

H 0 : μ = 7.5 H A : μ ≠ 7.5

The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:

$\mu$: mean of Thickness

Null hypothesis    H₀: $\mu$ = 7.5 Alternative hypothesis    H₁: $\mu \ne$ 7.5

The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.

If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):

t-distribution graph of two tails with a significance level of .05 and t values of -2.2616 and 2.2616

Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.

If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:

t-distribution graph for a two tailed test with t values of -1.54 and 1.54, the corresponding p-values are 0.0789732 on both tails

In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.

Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.

In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.

  • Study Guides
  • One- and Two-Tailed Tests
  • Method of Statistical Inference
  • Types of Statistics
  • Steps in the Process
  • Making Predictions
  • Comparing Results
  • Probability
  • Quiz: Introduction to Statistics
  • What Are Statistics?
  • Quiz: Bar Chart
  • Quiz: Pie Chart
  • Introduction to Graphic Displays
  • Quiz: Dot Plot
  • Quiz: Introduction to Graphic Displays
  • Frequency Histogram
  • Relative Frequency Histogram
  • Quiz: Relative Frequency Histogram
  • Frequency Polygon
  • Quiz: Frequency Polygon
  • Frequency Distribution
  • Stem-and-Leaf
  • Box Plot (Box-and-Whiskers)
  • Quiz: Box Plot (Box-and-Whiskers)
  • Scatter Plot
  • Measures of Central Tendency
  • Quiz: Measures of Central Tendency
  • Measures of Variability
  • Quiz: Measures of Variability
  • Measurement Scales
  • Quiz: Introduction to Numerical Measures
  • Classic Theory
  • Relative Frequency Theory
  • Probability of Simple Events
  • Quiz: Probability of Simple Events
  • Independent Events
  • Dependent Events
  • Introduction to Probability
  • Quiz: Introduction to Probability
  • Probability of Joint Occurrences
  • Quiz: Probability of Joint Occurrences
  • Non-Mutually-Exclusive Outcomes
  • Quiz: Non-Mutually-Exclusive Outcomes
  • Double-Counting
  • Conditional Probability
  • Quiz: Conditional Probability
  • Probability Distributions
  • Quiz: Probability Distributions
  • The Binomial
  • Quiz: The Binomial
  • Quiz: Sampling Distributions
  • Random and Systematic Error
  • Central Limit Theorem
  • Quiz: Central Limit Theorem
  • Populations, Samples, Parameters, and Statistics
  • Properties of the Normal Curve
  • Quiz: Populations, Samples, Parameters, and Statistics
  • Sampling Distributions
  • Quiz: Properties of the Normal Curve
  • Normal Approximation to the Binomial
  • Quiz: Normal Approximation to the Binomial
  • Quiz: Stating Hypotheses
  • The Test Statistic
  • Quiz: The Test Statistic
  • Quiz: One- and Two-Tailed Tests
  • Type I and II Errors
  • Quiz: Type I and II Errors
  • Stating Hypotheses
  • Significance
  • Quiz: Significance
  • Point Estimates and Confidence Intervals
  • Quiz: Point Estimates and Confidence Intervals
  • Estimating a Difference Score
  • Quiz: Estimating a Difference Score
  • Univariate Tests: An Overview
  • Quiz: Univariate Tests: An Overview
  • One-Sample z-test
  • Quiz: One-Sample z-test
  • One-Sample t-test
  • Quiz: One-Sample t-test
  • Two-Sample z-test for Comparing Two Means
  • Quiz: Introduction to Univariate Inferential Tests
  • Quiz: Two-Sample z-test for Comparing Two Means
  • Two Sample t test for Comparing Two Means
  • Quiz: Two-Sample t-test for Comparing Two Means
  • Paired Difference t-test
  • Quiz: Paired Difference t-test
  • Test for a Single Population Proportion
  • Quiz: Test for a Single Population Proportion
  • Test for Comparing Two Proportions
  • Quiz: Test for Comparing Two Proportions
  • Quiz: Simple Linear Regression
  • Chi-Square (X2)
  • Quiz: Chi-Square (X2)
  • Correlation
  • Quiz: Correlation
  • Simple Linear Regression
  • Common Mistakes
  • Statistics Tables
  • Quiz: Cumulative Review A
  • Quiz: Cumulative Review B
  • Statistics Quizzes

In the previous example, you tested a research hypothesis that predicted not only that the sample mean would be different from the population mean but that it would be different in a specific direction—it would be lower. This test is called a directional or one‐tailed test because the region of rejection is entirely within one tail of the distribution.

Some hypotheses predict only that one value will be different from another, without additionally predicting which will be higher. The test of such a hypothesis is nondirectional or two‐tailed because an extreme test statistic in either tail of the distribution (positive or negative) will lead to the rejection of the null hypothesis of no difference.

Suppose that you suspect that a particular class's performance on a proficiency test is not representative of those people who have taken the test. The national mean score on the test is 74.

The research hypothesis is:

The mean score of the class on the test is not 74.

Or in notation: H a : μ ≠ 74

The null hypothesis is:

The mean score of the class on the test is 74.

In notation: H 0 : μ = 74

As in the last example, you decide to use a 5 percent probability level for the test. Both tests have a region of rejection, then, of 5 percent, or 0.05. In this example, however, the rejection region must be split between both tails of the distribution—0.025 in the upper tail and 0.025 in the lower tail—because your hypothesis specifies only a difference, not a direction, as shown in Figure 1(a). You will reject the null hypotheses of no difference if the class sample mean is either much higher or much lower than the population mean of 74. In the previous example, only a sample mean much lower than the population mean would have led to the rejection of the null hypothesis.

Figure 1.Comparison of (a) a two‐tailed test and (b) a one‐tailed test, at the same probability level (95 percent).

two tailed hypothesis test example

The decision of whether to use a one‐ or a two‐tailed test is important because a test statistic that falls in the region of rejection in a one‐tailed test may not do so in a two‐tailed test, even though both tests use the same probability level. Suppose the class sample mean in your example was 77, and its corresponding z ‐score was computed to be 1.80. Table 2 in "Statistics Tables" shows the critical z ‐scores for a probability of 0.025 in either tail to be –1.96 and 1.96. In order to reject the null hypothesis, the test statistic must be either smaller than –1.96 or greater than 1.96. It is not, so you cannot reject the null hypothesis. Refer to Figure 1(a).

Suppose, however, you had a reason to expect that the class would perform better on the proficiency test than the population, and you did a one‐tailed test instead. For this test, the rejection region of 0.05 would be entirely within the upper tail. The critical z ‐value for a probability of 0.05 in the upper tail is 1.65. (Remember that Table 2 in "Statistics Tables" gives areas of the curve below z ; so you look up the z ‐value for a probability of 0.95.) Your computed test statistic of z = 1.80 exceeds the critical value and falls in the region of rejection, so you reject the null hypothesis and say that your suspicion that the class was better than the population was supported. See Figure 1(b).

In practice, you should use a one‐tailed test only when you have good reason to expect that the difference will be in a particular direction. A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis.

Previous Quiz: The Test Statistic

Next Quiz: One- and Two-Tailed Tests

  • Online Quizzes for CliffsNotes Statistics QuickReview, 2nd Edition

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

An Introduction to t Tests | Definitions, Formula and Examples

Published on January 31, 2020 by Rebecca Bevans . Revised on June 22, 2023.

A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.

  • The null hypothesis ( H 0 ) is that the true difference between these group means is zero.
  • The alternate hypothesis ( H a ) is that the true difference is different from zero.

Table of contents

When to use a t test, what type of t test should i use, performing a t test, interpreting test results, presenting the results of a t test, other interesting articles, frequently asked questions about t tests.

A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an   ANOVA test  or a post-hoc test.

The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:

  • are independent
  • are (approximately) normally distributed
  • have a similar amount of variance within each group being compared (a.k.a. homogeneity of variance)

If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances .

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

When choosing a t test, you will need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction.

What type of t-test should I use

One-sample, two-sample, or paired t test?

  • If the groups come from a single population (e.g., measuring before and after an experimental treatment), perform a paired t test . This is a within-subjects design .
  • If the groups come from two different populations (e.g., two different species, or people from two separate cities), perform a two-sample t test (a.k.a. independent t test ). This is a between-subjects design .
  • If there is one group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral pH of 7), perform a one-sample t test .

One-tailed or two-tailed t test?

  • If you only care whether the two populations are different from one another, perform a two-tailed t test .
  • If you want to know whether one population mean is greater than or less than the other, perform a one-tailed t test.
  • Your observations come from two separate populations (separate species), so you perform a two-sample t test.
  • You don’t care about the direction of the difference, only whether there is a difference, so you choose to use a two-tailed t test.

The t test estimates the true difference between two group means using the ratio of the difference in group means over the pooled standard error of both groups. You can calculate it manually using a formula, or use statistical analysis software.

T test formula

The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.

\begin{equation*}t=\dfrac{\bar{x}_{1}-\bar{x}_{2}}{\sqrt{(s^2(\frac{1}{n_{1}}+\frac{1}{n_{2}}))}}}\end{equation*}

In this formula, t is the t value, x 1 and x 2 are the means of the two groups being compared, s 2 is the pooled standard error of the two groups, and n 1 and n 2 are the number of observations in each of the groups.

A larger t value shows that the difference between group means is greater than the pooled standard error, indicating a more significant difference between the groups.

You can compare your calculated t value against the values in a critical value chart (e.g., Student’s t table) to determine whether your t value is greater than what would be expected by chance. If so, you can reject the null hypothesis and conclude that the two groups are in fact different.

T test function in statistical software

Most statistical software (R, SPSS, etc.) includes a t test function. This built-in function will take your raw data and calculate the t value. It will then compare it to the critical value, and calculate a p -value . This way you can quickly see whether your groups are statistically different.

In your comparison of flower petal lengths, you decide to perform your t test using R. The code looks like this:

Download the data set to practice by yourself.

Sample data set

If you perform the t test for your flower hypothesis in R, you will receive the following output:

T-test output in R

The output provides:

  • An explanation of what is being compared, called data in the output table.
  • The t value : -33.719. Note that it’s negative; this is fine! In most cases, we only care about the absolute value of the difference, or the distance from 0. It doesn’t matter which direction.
  • The degrees of freedom : 30.196. Degrees of freedom is related to your sample size, and shows how many ‘free’ data points are available in your test for making comparisons. The greater the degrees of freedom, the better your statistical test will work.
  • The p value : 2.2e-16 (i.e. 2.2 with 15 zeros in front). This describes the probability that you would see a t value as large as this one by chance.
  • A statement of the alternative hypothesis ( H a ). In this test, the H a is that the difference is not 0.
  • The 95% confidence interval . This is the range of numbers within which the true difference in means will be 95% of the time. This can be changed from 95% if you want a larger or smaller interval, but 95% is very commonly used.
  • The mean petal length for each group.

When reporting your t test results, the most important values to include are the t value , the p value , and the degrees of freedom for the test. These will communicate to your audience whether the difference between the two groups is statistically significant (a.k.a. that it is unlikely to have happened by chance).

You can also include the summary statistics for the groups being compared, namely the mean and standard deviation . In R, the code for calculating the mean and the standard deviation from the data looks like this:

flower.data %>% group_by(Species) %>% summarize(mean_length = mean(Petal.Length), sd_length = sd(Petal.Length))

In our example, you would report the results like this:

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.

A t-test measures the difference in group means divided by the pooled standard error of the two group means.

In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).

Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.

If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .

If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .

A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).

A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).

A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.

If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). An Introduction to t Tests | Definitions, Formula and Examples. Scribbr. Retrieved April 2, 2024, from https://www.scribbr.com/statistics/t-test/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, what is your plagiarism score.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion (two tailed).

A population proportion is the share of a population that belongs to a particular category .

Hypothesis tests are used to check a claim about the size of that population proportion.

Hypothesis Testing a Proportion

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Women

And we want to check the claim:

"The share of Nobel Prize winners that are women is not 50%"

By taking a sample of 100 randomly selected Nobel Prize winners we could find that:

10 out of 100 Nobel Prize winners in the sample were women

The sample proportion is then: \(\displaystyle \frac{10}{100} = 0.1\), or 10%.

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • Being in the category
  • Not being in the category
  • 5 members in the category
  • 5 members not in the category

In our example, we randomly selected 10 people that were women.

The rest were not women, so there are 90 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the proportion of Nobel Prize winners that are women (\(p\)).

The null and alternative hypothesis are then:

Null hypothesis : 50% of Nobel Prize winners were women.

Alternative hypothesis : The share of Nobel Prize winners that are women is not 50%

Which can be expressed with symbols as:

\(H_{0}\): \(p = 0.50 \)

\(H_{1}\): \(p \neq 0.50 \)

This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different (larger or smaller) than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

Advertisement

3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population proportion is:

\(\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} \)

\(\hat{p}-p\) is the difference between the sample proportion (\(\hat{p}\)) and the claimed population proportion (\(p\)).

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population proportion (\(p\)) was \( 0.50 \)

The sample size (\(n\)) was \(100\)

So the test statistic (TS) is then:

\(\displaystyle \frac{0.1-0.5}{\sqrt{0.5(1-0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.5(0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.25}} \cdot \sqrt{100} = \frac{-0.4}{0.5} \cdot 10 = \underline{-8}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

With R use the built-in math functions to calculate the test statistic for a proportion.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .

This critical Z-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is different from 50%, the rejection region is split into both the left and right tail:

Choosing a significance level (\(\alpha\)) of 0.01, or 1%, we can find the critical Z-value from a Z-table , or with a programming language function:

Note: Because this is a two-tailed test the tail area (\(\alpha\)) needs to be split in half (divided by 2).

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\)/2 = 0.005 in the left tail.

With R use the built-in qnorm() function to find the Z-value for an \(\alpha\) = 0.005 in the left tail.

Using either method we can find that the critical Z-value in the left tail is \(\approx \underline{-2.5758}\)

Since a normal distribution i symmetric, we know that the critical Z-value in the right tail will be the same number, only positive: \(\underline{2.5758}\)

For a two-tailed test we need to check if the test statistic (TS) is smaller than the negative critical value (-CV), or bigger than the positive critical value (CV).

If the test statistic is smaller than the negative critical value, the test statistic is in the rejection region .

If the test statistic is bigger than the positive critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{-8}\) and the critical value was \(\approx \underline{-2.5758}\)

Here is an illustration of this test in a graph:

Since the test statistic was smaller than the negative critical value we reject the null hypothesis.

This means that the sample data supports the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 1% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{-8} \)

For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .

Because this is a two-tailed test, we need to find the P-value of a Z-value smaller than -8 and multiply it by 2 .

We can find the P-value using a Z-table , or with a programming language function:

With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value smaller than -8 for a two tailed test:

With R use the built-in pnorm() function find the P-value of a Z-value smaller than -8 for a two tailed test:

Using either method we can find that the P-value is \(\approx \underline{1.25 \cdot 10^{-15}}\) or \(0.00000000000000125\)

This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.000000000000125%, to reject the null hypothesis.

This P-value is smaller than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is rejected at all of these significance levels.

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 10%, 5%, and 1% significance level .

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50.

With R use the built-in prop.test() function find the P-value for a left tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from 0.50.

Note: The conf.level in the R code is the reverse of the significance level.

Here, the significance level is 0.01, or 1%, so the conf.level is 1-0.01 = 0.99, or 99%.

Left-Tailed and Two-Tailed Tests

This was an example of a two tailed test, where the alternative hypothesis claimed that parameter is different from the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Right-Tailed Test
  • Left-Tailed Test

Get Certified

COLOR PICKER

colorpicker

Report Error

If you want to report an error, or if you want to make a suggestion, do not hesitate to send us an e-mail:

[email protected]

Top Tutorials

Top references, top examples, get certified.

IMAGES

  1. Hypothesis Testing: Upper, Lower, and Two Tailed Tests

    two tailed hypothesis test example

  2. Two Tailed Test Tutorial

    two tailed hypothesis test example

  3. What Is a Two-Tailed Test? Definition and Example / STATISTICAL TABLES

    two tailed hypothesis test example

  4. PPT

    two tailed hypothesis test example

  5. One Sample T Test

    two tailed hypothesis test example

  6. Hypothesis Testing Example Two Sample t-Test

    two tailed hypothesis test example

VIDEO

  1. 1 tailed and 2 tailed Hypothesis

  2. Statistics Two tailed Hypothesis example

  3. Two-Tailed Hypothesis Test for the Population Mean

  4. Two Tailed Test of Hypothesis about Population Mean Book store Example

  5. single tailed hypothesis test explained

  6. Relation of a Two-Tailed Hypothesis Test to a Confidence Interval, InvT in the TI-84

COMMENTS

  1. Two-Tailed Hypothesis Tests: 3 Example Problems

    To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 10 inches; H A (Alternative Hypothesis): μ ≠ 10 inches; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The botanist believes ...

  2. One-Tailed and Two-Tailed Hypothesis Tests Explained

    Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2. ...

  3. What Is a Two-Tailed Test? Definition and Example

    Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...

  4. Two Tailed Test: Definition, Examples

    This video explains the difference between one and two tailed tests: For example, let's say you were running a z test with an alpha level of 5% (0.05). In a one tailed test, the entire 5% would be in a single tail. But with a two tailed test, that 5% is split between the two tails, giving you 2.5% (0.025) in each tail.

  5. Two-Tailed Hypothesis Tests: 3 Example Problems

    To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 10 inches; H A (Alternative Hypothesis): μ ≠ 10 inches; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The botanist believes ...

  6. Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

    The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645. The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05.

  7. Hypothesis Testing

    So let's perform the step -1 of hypothesis testing which is: Specify the Null (H0) and Alternate (H1) hypothesis. Null hypothesis (H0): The null hypothesis here is what currently stated to be true about the population. In our case it will be the average height of students in the batch is 100. H0 : μ = 100.

  8. Hypothesis Testing

    Hypothesis testing example Based on the type of data you collected, you perform a one-tailed t-test to test whether men are in fact taller than women. This test gives you: an estimate of the difference in average height between the two groups.

  9. Two-Tailed Test in Statistics

    A two-tailed hypothesis test example: A machine is used to fill bags with coffee, and each bag is 1 kg. A randomly selected sample of 30 bags has a mean weight of 1.01 kg with a standard deviation ...

  10. 11.4: One- and Two-Tailed Tests

    The one-tailed hypothesis is rejected only if the sample proportion is much greater than \(0.5\). The alternative hypothesis in the two-tailed test is \(\pi \neq 0.5\). In the one-tailed test it is \(\pi > 0.5\). You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data.

  11. One-tailed and two-tailed tests (video)

    To decide if a one-tailed test can be used, one has to have some extra information about the experiment to know the direction from the mean (H1: drug lowers the response time). If the direction of the effect is unknown, a two tailed test has to be used, and the H1 must be stated in a way where the direction of the effect is left uncertain (H1 ...

  12. Hypothesis testing: One-tailed and two-tailed tests

    At this point, you might use a statistical test, like unpaired or 2-sample t-test, to see if there's a significant difference between the two groups' means. Typically, an unpaired t-test starts with two hypotheses. The first hypothesis is called the null hypothesis, and it basically says there's no difference in the means of the two groups.

  13. One- and two-tailed tests

    In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) ¯. If testing for whether the coin is biased towards heads, a one-tailed test would be used - only large numbers of heads would be significant.

  14. Statistics

    With R use built-in math and statistics functions find the P-value for a two tailed hypothesis test for a mean. Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean different from 60. ... This was an example of a left tailed test, where the alternative hypothesis claimed that ...

  15. Data analysis: hypothesis testing: 4.2 Two-tailed tests

    The term 'two-tailed' comes from the fact that the statistical test used to evaluate the hypothesis is based on the assumption that the difference or relationship could occur in either direction, resulting in two 'tails' in the probability distribution. Using the coffee foam example (from Activity 1), you have the following set of ...

  16. S.3.2 Hypothesis Testing (P-Value Approach)

    Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...

  17. S.3.3 Hypothesis Testing Examples

    If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t* were less than -1.6939 (determined using statistical software or a t-table):s-3-3. Since the biologist's test statistic, t* = -4.60, is less than -1.6939, the biologist rejects the null hypothesis.

  18. One- and Two-Tailed Tests

    A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis. Quiz: One- and Two-Tailed Tests. Access quality crowd-sourced study materials tagged to courses at universities all over the world and get homework help from our tutors when you need it.

  19. PDF Two-tailed hypothesis test example

    Two-tailed hypothesis test example Problem: A premium golf ball production line must produce all of its balls to 1.615 ounces in order to get the top rating (and therefore the top dollar). Samples are drawn hourly and checked. If the production line gets out of sync with a statistical significance of more than 1%, it must be shut down and repaired.

  20. An Introduction to t Tests

    Revised on June 22, 2023. A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. t test example.

  21. Statistics

    With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion. Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50. ... Left-Tailed and Two-Tailed Tests. This was an example of a two tailed test, ...