Statistics Made Easy
Two-Tailed Hypothesis Tests: 3 Example Problems
In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.
Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:
H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value
H A (Alternative Hypothesis): Population parameter <, >, ≠ some value
There are two types of hypothesis tests:
- One-tailed test : Alternative hypothesis contains either < or > sign
- Two-tailed test : Alternative hypothesis contains the ≠ sign
In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.
This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.
Check out the following example problems to gain a better understanding of two-tailed tests.
Example 1: Factory Widgets
Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.
To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:
- H 0 (Null Hypothesis): μ = 20 grams
- H A (Alternative Hypothesis): μ ≠ 20 grams
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.
To test this, he uses the new method to produce 20 widgets and obtains the following information:
- n = 20 widgets
- x = 19.8 grams
- s = 3.1 grams
Plugging these values into the One Sample t-test Calculator , we obtain the following results:
- t-test statistic: -0.288525
- two-tailed p-value: 0.776
Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.
He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.
Example 2: Plant Growth
Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.
To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:
- H 0 (Null Hypothesis): μ = 10 inches
- H A (Alternative Hypothesis): μ ≠ 10 inches
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.
To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:
- n = 15 plants
- x = 11.4 inches
- s = 2.5 inches
- t-test statistic: 2.1689
- two-tailed p-value: 0.0478
Since the p-value is less than .05, the botanist rejects the null hypothesis.
She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.
Example 3: Studying Method
A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.
To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.
She then performs a hypothesis test using the following hypotheses:
- H 0 : μ = 82
- H A : μ ≠ 82
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.
To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:
- t-test statistic: 3.6586
- two-tailed p-value: 0.0012
Since the p-value is less than .05, the professor rejects the null hypothesis.
She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.
Additional Resources
The following tutorials provide additional information about hypothesis testing:
Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?
Published by Zach
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
- Search Search Please fill out this field.
What Is a Two-Tailed Test?
Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.
- Two-Tailed Test FAQs
- Corporate Finance
- Financial Analysis
What Is a Two-Tailed Test? Definition and Example
Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.
Investopedia / Joules Garcia
A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
Key Takeaways
- In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
- It is used in null-hypothesis testing and testing for statistical significance.
- If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
- By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.
A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.
A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.
There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.
A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.
To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.
For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.
If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.
Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.
When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be at least x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being either higher or lower than it.
If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.
A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.
Example of a Two-Tailed Test
As a hypothetical example, imagine that a new stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.
A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?
- H 0 : Null Hypothesis: mean = 18
- H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
- Rejection region: Z <= - Z 2.5 and Z>=Z 2.5 (assuming 5% significance level, split 2.5 each on either side).
- Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25
This calculated Z value falls between the two limits defined by: - Z 2.5 = -1.96 and Z 2.5 = 1.96.
This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.
How Is a Two-Tailed Test Designed?
A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.
What Is the Difference Between a Two-Tailed and One-Tailed Test?
A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than or significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.
What Is a Z-score?
A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.
San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."
- Terms of Service
- Editorial Policy
- Privacy Policy
- Your Privacy Choices
Two-Tailed Hypothesis Tests: 3 Example Problems
In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.
Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:
H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value
H A (Alternative Hypothesis): Population parameter , ≠ some value
There are two types of hypothesis tests:
- One-tailed test : Alternative hypothesis contains either or > sign
- Two-tailed test : Alternative hypothesis contains the ≠ sign
In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.
This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.
Check out the following example problems to gain a better understanding of two-tailed tests.
Example 1: Factory Widgets
Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.
To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:
- H 0 (Null Hypothesis): μ = 20 grams
- H A (Alternative Hypothesis): μ ≠ 20 grams
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.
To test this, he uses the new method to produce 20 widgets and obtains the following information:
- n = 20 widgets
- x = 19.8 grams
- s = 3.1 grams
Plugging these values into the One Sample t-test Calculator , we obtain the following results:
- t-test statistic: -0.288525
- two-tailed p-value: 0.776
Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.
He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.
Example 2: Plant Growth
Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.
To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:
- H 0 (Null Hypothesis): μ = 10 inches
- H A (Alternative Hypothesis): μ ≠ 10 inches
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.
To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:
- n = 15 plants
- x = 11.4 inches
- s = 2.5 inches
- t-test statistic: 2.1689
- two-tailed p-value: 0.0478
Since the p-value is less than .05, the botanist rejects the null hypothesis.
She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.
Example 3: Studying Method
A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.
To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.
She then performs a hypothesis test using the following hypotheses:
- H 0 : μ = 82
- H A : μ ≠ 82
This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.
To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:
- t-test statistic: 3.6586
- two-tailed p-value: 0.0012
Since the p-value is less than .05, the professor rejects the null hypothesis.
She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.
Additional Resources
The following tutorials provide additional information about hypothesis testing:
Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?
Statistics vs. Probability: What’s the Difference?
One sample z-test calculator, related posts, how to normalize data between -1 and 1, vba: how to check if string contains another..., how to interpret f-values in a two-way anova, how to create a vector of ones in..., how to determine if a probability distribution is..., what is a symmetric histogram (definition & examples), how to find the mode of a histogram..., how to find quartiles in even and odd..., how to calculate sxy in statistics (with example), how to calculate sxx in statistics (with example).
Hypothesis Testing for Means & Proportions
- 1
- | 2
- | 3
- | 4
- | 5
- | 6
- | 7
- | 8
- | 9
- | 10
Hypothesis Testing: Upper-, Lower, and Two Tailed Tests
Type i and type ii errors.
All Modules
Z score Table
t score Table
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.
- Step 1. Set up hypotheses and select the level of significance α.
H 0 : Null hypothesis (no change, no difference);
H 1 : Research hypothesis (investigator's belief); α =0.05
- Step 2. Select the appropriate test statistic.
The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:
When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.
- Step 3. Set up decision rule.
The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.
- The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value. In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
- The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
- The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.
The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.
Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05
The decision rule is: Reject H 0 if Z < 1.645.
Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05
The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.
The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."
Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."
- Step 4. Compute the test statistic.
Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.
- Step 5. Conclusion.
The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).
If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .
Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .
- Step 1. Set up hypotheses and determine level of significance
H 0 : μ = 191 H 1 : μ > 191 α =0.05
The research hypothesis is that weights have increased, and therefore an upper tailed test is used.
- Step 2. Select the appropriate test statistic.
Because the sample size is large (n > 30) the appropriate test statistic is
- Step 3. Set up decision rule.
In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05. Reject H 0 if Z > 1.645.
We now substitute the sample data into the formula for the test statistic identified in Step 2.
We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.
In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).
Table - Conclusions in Test of Hypothesis
In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).
When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.
The most common reason for a Type II error is a small sample size.
return to top | previous page | next page
Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH
- school Campus Bookshelves
- menu_book Bookshelves
- perm_media Learning Objects
- login Login
- how_to_reg Request Instructor Account
- hub Instructor Commons
- Download Page (PDF)
- Download Full Book (PDF)
- Periodic Table
- Physics Constants
- Scientific Calculator
- Reference & Cite
- Tools expand_more
- Readability
selected template will load here
This action is not available.
11.4: One- and Two-Tailed Tests
- Last updated
- Save as PDF
- Page ID 2148
- Rice University
Learning Objectives
- Define Type I and Type II errors
- Interpret significant and non-significant differences
- Explain why the null hypothesis should not be accepted when the effect is not significant
In the James Bond case study, Mr. Bond was given \(16\) trials on which he judged whether a martini had been shaken or stirred. He was correct on \(13\) of the trials. From the binomial distribution, we know that the probability of being correct \(13\) or more times out of \(16\) if one is only guessing is \(0.0106\). Figure \(\PageIndex{1}\) shows a graph of the binomial distribution. The red bars show the values greater than or equal to \(13\). As you can see in the figure, the probabilities are calculated for the upper tail of the distribution. A probability calculated in only one tail of the distribution is called a "one-tailed probability."
Binomial Calculator
A slightly different question can be asked of the data: "What is the probability of getting a result as extreme or more extreme than the one observed?" Since the chance expectation is \(8/16\), a result of \(3/16\) is equally as extreme as \(13/16\). Thus, to calculate this probability, we would consider both tails of the distribution. Since the binomial distribution is symmetric when \(\pi =0.5\), this probability is exactly double the probability of \(0.0106\) computed previously. Therefore, \(p = 0.0212\). A probability calculated in both tails of a distribution is called a "two-tailed probability" (see Figure \(\PageIndex{2}\)).
Should the one-tailed or the two-tailed probability be used to assess Mr. Bond's performance? That depends on the way the question is posed. If we are asking whether Mr. Bond can tell the difference between shaken or stirred martinis, then we would conclude he could if he performed either much better than chance or much worse than chance. If he performed much worse than chance, we would conclude that he can tell the difference, but he does not know which is which. Therefore, since we are going to reject the null hypothesis if Mr. Bond does either very well or very poorly, we will use a two-tailed probability.
On the other hand, if our question is whether Mr. Bond is better than chance at determining whether a martini is shaken or stirred, we would use a one-tailed probability. What would the one-tailed probability be if Mr. Bond were correct on only \(3\) of the \(16\) trials? Since the one-tailed probability is the probability of the right-hand tail, it would be the probability of getting \(3\) or more correct out of \(16\). This is a very high probability and the null hypothesis would not be rejected.
The null hypothesis for the two-tailed test is \(\pi =0.5\). By contrast, the null hypothesis for the one-tailed test is \(\pi \leq 0.5\). Accordingly, we reject the two-tailed hypothesis if the sample proportion deviates greatly from \(0.5\) in either direction. The one-tailed hypothesis is rejected only if the sample proportion is much greater than \(0.5\). The alternative hypothesis in the two-tailed test is \(\pi \neq 0.5\). In the one-tailed test it is \(\pi > 0.5\).
You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data. Statistical tests that compute one-tailed probabilities are called one-tailed tests; those that compute two-tailed probabilities are called two-tailed tests. Two-tailed tests are much more common than one-tailed tests in scientific research because an outcome signifying that something other than chance is operating is usually worth noting. One-tailed tests are appropriate when it is not important to distinguish between no effect and an effect in the unexpected direction. For example, consider an experiment designed to test the efficacy of a treatment for the common cold. The researcher would only be interested in whether the treatment was better than a placebo control. It would not be worth distinguishing between the case in which the treatment was worse than a placebo and the case in which it was the same because in both cases the drug would be worthless.
Some have argued that a one-tailed test is justified whenever the researcher predicts the direction of an effect. The problem with this argument is that if the effect comes out strongly in the non-predicted direction, the researcher is not justified in concluding that the effect is not zero. Since this is unrealistic, one-tailed tests are usually viewed skeptically if justified on this basis alone.
If you're seeing this message, it means we're having trouble loading external resources on our website.
If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.
To log in and use all the features of Khan Academy, please enable JavaScript in your browser.
Statistics and probability
Course: statistics and probability > unit 12.
- Hypothesis testing and p-values
One-tailed and two-tailed tests
- Z-statistics vs. T-statistics
- Small sample hypothesis test
- Large sample proportion hypothesis testing
Want to join the conversation?
- Upvote Button navigates to signup page
- Downvote Button navigates to signup page
- Flag Button navigates to signup page
Video transcript
- The Open University
- Guest user / Sign out
- Study with The Open University
My OpenLearn Profile
Personalise your OpenLearn profile, save your favourite content and get recognition for your learning
About this free course
Become an ou student, download this course, share this free course.
Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.
4.2 Two-tailed tests
Hypotheses that have an equal (=) or not equal (≠) supposition (sign) in the statement are called non-directional hypotheses . In non-directional hypotheses, the researcher is interested in whether there is a statistically significant difference or relationship between two or more variables, but does not have any specific expectation about which group or variable will be higher or lower. For example, a non-directional hypothesis might be: ‘There is a difference in the preference for brand X between male and female consumers.’ In this hypothesis, the researcher is interested in whether there is a statistically significant difference in the preference for brand X between male and female consumers, but does not have a specific prediction about which gender will have a higher preference. The researcher may conduct a survey or experiment to collect data on the brand preference of male and female consumers and then use statistical analysis to determine whether there is a significant difference between the two groups.
Non-directional hypotheses are also known as two-tailed hypotheses. The term ‘two-tailed’ comes from the fact that the statistical test used to evaluate the hypothesis is based on the assumption that the difference or relationship could occur in either direction, resulting in two ‘tails’ in the probability distribution. Using the coffee foam example (from Activity 1), you have the following set of hypotheses:
H 0 : µ = 1cm foam
H a : µ ≠ 1cm foam
In this case, the researcher can reject the null hypothesis for the mean value that is either ‘much higher’ or ‘much lower’ than 1 cm foam. This is called a two-tailed test because the rejection region includes outcomes from both the upper and lower tails of the sample distribution when determining a decision rule. To give an illustration, if you set alpha level (α) equal to 0.05, that would give you a 95% confidence level. Then, you would reject the null hypothesis for obtained values of z 1.96 (you will look at how to calculate z-scores later in the course).
This can be plotted on a graph as shown in Figure 7.
A symmetrical graph reminiscent of a bell. The x-axis is labelled ‘z-score’ and the y-axis is labelled ‘probability density’. The x-axis increases in increments of 1 from -2 to 2.
The top of the bell-shaped curve is labelled ‘Foam height = 1cm’. The graph circles the rejection regions of the null hypothesis on both sides of the bell curve. Within these circles are two areas shaded orange: beneath the curve from -2 downwards which is labelled z 1.96 and α = 0.025.
In a two-tailed hypothesis test, the null hypothesis assumes that there is no significant difference or relationship between the two groups or variables, and the alternative hypothesis suggests that there is a significant difference or relationship, but does not specify the direction of the difference or relationship.
When performing a two-tailed test, you need to determine the level of significance, which is denoted by alpha (α). The value of alpha, in this case, is 0.05. To perform a two-tailed test at a significance level of 0.05, you need to divide alpha by 2, giving a significance level of 0.025 for each distribution tail (0.05/2 = 0.025). This is done because the two-tailed test is looking for significance in either tail of the distribution. If the calculated test statistic falls in the rejection region of either tail of the distribution, then the null hypothesis is rejected and the alternative hypothesis is accepted. In this case, the researcher can conclude that there is a significant difference or relationship between the two groups or variables.
Assuming that the population follows a normal distribution, the tail located below the critical value of z = –1.96 (in a later section, you will discuss how this value was determined) and the tail above the critical value of z = +1.96 each represent a proportion of 0.025. These tails are referred to as the lower and upper tails, respectively, and they correspond to the extreme values of the distribution that are far from the central part of the bell curve. These critical values are used in a two-tailed hypothesis test to determine whether to reject or fail to reject the null hypothesis. The null hypothesis represents the default assumption that there is no significant difference between the observed data and what would be expected under a specific condition.
If the calculated test statistic falls within the critical values, then the null hypothesis cannot be rejected at the 0.05 level of significance. However, if the calculated test statistic falls outside the critical values (orange-coloured areas in Figure 7), then the null hypothesis can be rejected in favour of the alternative hypothesis, suggesting that there is evidence of a significant difference between the observed data and what would be expected under the specified condition.
User Preferences
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident
Keyboard Shortcuts
S.3.3 hypothesis testing examples.
- Example: Right-Tailed Test
- Example: Left-Tailed Test
- Example: Two-Tailed Test
Brinell Hardness Scores
An engineer measured the Brinell hardness of 25 pieces of ductile iron that were subcritically annealed. The resulting data were:
The engineer hypothesized that the mean Brinell hardness of all such ductile iron pieces is greater than 170. Therefore, he was interested in testing the hypotheses:
H 0 : μ = 170 H A : μ > 170
The engineer entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:
Descriptive Statistics
$\mu$: mean of Brinelli
Null hypothesis H₀: $\mu$ = 170 Alternative hypothesis H₁: $\mu$ > 170
The output tells us that the average Brinell hardness of the n = 25 pieces of ductile iron was 172.52 with a standard deviation of 10.31. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 10.31 by the square root of n = 25, is 2.06). The test statistic t * is 1.22, and the P -value is 0.117.
If the engineer set his significance level α at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were greater than 1.7109 (determined using statistical software or a t -table):
Since the engineer's test statistic, t * = 1.22, is not greater than 1.7109, the engineer fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.
If the engineer used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 24 curve and to the right of the test statistic t * = 1.22:
In the output above, Minitab reports that the P -value is 0.117. Since the P -value, 0.117, is greater than \(\alpha\) = 0.05, the engineer fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean Brinell hardness of all such ductile iron pieces is greater than 170.
Note that the engineer obtains the same scientific conclusion regardless of the approach used. This will always be the case.
Height of Sunflowers
A biologist was interested in determining whether sunflower seedlings treated with an extract from Vinca minor roots resulted in a lower average height of sunflower seedlings than the standard height of 15.7 cm. The biologist treated a random sample of n = 33 seedlings with the extract and subsequently obtained the following heights:
The biologist's hypotheses are:
H 0 : μ = 15.7 H A : μ < 15.7
The biologist entered her data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. She obtained the following output:
$\mu$: mean of Height
Null hypothesis H₀: $\mu$ = 15.7 Alternative hypothesis H₁: $\mu$ < 15.7
The output tells us that the average height of the n = 33 sunflower seedlings was 13.664 with a standard deviation of 2.544. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 13.664 by the square root of n = 33, is 0.443). The test statistic t * is -4.60, and the P -value, 0.000, is to three decimal places.
Minitab Note. Minitab will always report P -values to only 3 decimal places. If Minitab reports the P -value as 0.000, it really means that the P -value is 0.000....something. Throughout this course (and your future research!), when you see that Minitab reports the P -value as 0.000, you should report the P -value as being "< 0.001."
If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t * were less than -1.6939 (determined using statistical software or a t -table):s-3-3
Since the biologist's test statistic, t * = -4.60, is less than -1.6939, the biologist rejects the null hypothesis. That is, the test statistic falls in the "critical region." There is sufficient evidence, at the α = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.
If the biologist used the P -value approach to conduct her hypothesis test, she would determine the area under a t n - 1 = t 32 curve and to the left of the test statistic t * = -4.60:
In the output above, Minitab reports that the P -value is 0.000, which we take to mean < 0.001. Since the P -value is less than 0.001, it is clearly less than \(\alpha\) = 0.05, and the biologist rejects the null hypothesis. There is sufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean height of all such sunflower seedlings is less than 15.7 cm.
Note again that the biologist obtains the same scientific conclusion regardless of the approach used. This will always be the case.
Gum Thickness
A manufacturer claims that the thickness of the spearmint gum it produces is 7.5 one-hundredths of an inch. A quality control specialist regularly checks this claim. On one production run, he took a random sample of n = 10 pieces of gum and measured their thickness. He obtained:
The quality control specialist's hypotheses are:
H 0 : μ = 7.5 H A : μ ≠ 7.5
The quality control specialist entered his data into Minitab and requested that the "one-sample t -test" be conducted for the above hypotheses. He obtained the following output:
$\mu$: mean of Thickness
Null hypothesis H₀: $\mu$ = 7.5 Alternative hypothesis H₁: $\mu \ne$ 7.5
The output tells us that the average thickness of the n = 10 pieces of gums was 7.55 one-hundredths of an inch with a standard deviation of 0.1027. (The standard error of the mean "SE Mean", calculated by dividing the standard deviation 0.1027 by the square root of n = 10, is 0.0325). The test statistic t * is 1.54, and the P -value is 0.158.
If the quality control specialist sets his significance level \(\alpha\) at 0.05 and used the critical value approach to conduct his hypothesis test, he would reject the null hypothesis if his test statistic t * were less than -2.2616 or greater than 2.2616 (determined using statistical software or a t -table):
Since the quality control specialist's test statistic, t * = 1.54, is not less than -2.2616 nor greater than 2.2616, the quality control specialist fails to reject the null hypothesis. That is, the test statistic does not fall in the "critical region." There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all of the manufacturer's spearmint gum differs from 7.5 one-hundredths of an inch.
If the quality control specialist used the P -value approach to conduct his hypothesis test, he would determine the area under a t n - 1 = t 9 curve, to the right of 1.54 and to the left of -1.54:
In the output above, Minitab reports that the P -value is 0.158. Since the P -value, 0.158, is greater than \(\alpha\) = 0.05, the quality control specialist fails to reject the null hypothesis. There is insufficient evidence, at the \(\alpha\) = 0.05 level, to conclude that the mean thickness of all pieces of spearmint gum differs from 7.5 one-hundredths of an inch.
Note that the quality control specialist obtains the same scientific conclusion regardless of the approach used. This will always be the case.
In our review of hypothesis tests, we have focused on just one particular hypothesis test, namely that concerning the population mean \(\mu\). The important thing to recognize is that the topics discussed here — the general idea of hypothesis tests, errors in hypothesis testing, the critical value approach, and the P -value approach — generally extend to all of the hypothesis tests you will encounter.
- Study Guides
- One- and Two-Tailed Tests
- Method of Statistical Inference
- Types of Statistics
- Steps in the Process
- Making Predictions
- Comparing Results
- Probability
- Quiz: Introduction to Statistics
- What Are Statistics?
- Quiz: Bar Chart
- Quiz: Pie Chart
- Introduction to Graphic Displays
- Quiz: Dot Plot
- Quiz: Introduction to Graphic Displays
- Frequency Histogram
- Relative Frequency Histogram
- Quiz: Relative Frequency Histogram
- Frequency Polygon
- Quiz: Frequency Polygon
- Frequency Distribution
- Stem-and-Leaf
- Box Plot (Box-and-Whiskers)
- Quiz: Box Plot (Box-and-Whiskers)
- Scatter Plot
- Measures of Central Tendency
- Quiz: Measures of Central Tendency
- Measures of Variability
- Quiz: Measures of Variability
- Measurement Scales
- Quiz: Introduction to Numerical Measures
- Classic Theory
- Relative Frequency Theory
- Probability of Simple Events
- Quiz: Probability of Simple Events
- Independent Events
- Dependent Events
- Introduction to Probability
- Quiz: Introduction to Probability
- Probability of Joint Occurrences
- Quiz: Probability of Joint Occurrences
- Non-Mutually-Exclusive Outcomes
- Quiz: Non-Mutually-Exclusive Outcomes
- Double-Counting
- Conditional Probability
- Quiz: Conditional Probability
- Probability Distributions
- Quiz: Probability Distributions
- The Binomial
- Quiz: The Binomial
- Quiz: Sampling Distributions
- Random and Systematic Error
- Central Limit Theorem
- Quiz: Central Limit Theorem
- Populations, Samples, Parameters, and Statistics
- Properties of the Normal Curve
- Quiz: Populations, Samples, Parameters, and Statistics
- Sampling Distributions
- Quiz: Properties of the Normal Curve
- Normal Approximation to the Binomial
- Quiz: Normal Approximation to the Binomial
- Quiz: Stating Hypotheses
- The Test Statistic
- Quiz: The Test Statistic
- Quiz: One- and Two-Tailed Tests
- Type I and II Errors
- Quiz: Type I and II Errors
- Stating Hypotheses
- Significance
- Quiz: Significance
- Point Estimates and Confidence Intervals
- Quiz: Point Estimates and Confidence Intervals
- Estimating a Difference Score
- Quiz: Estimating a Difference Score
- Univariate Tests: An Overview
- Quiz: Univariate Tests: An Overview
- One-Sample z-test
- Quiz: One-Sample z-test
- One-Sample t-test
- Quiz: One-Sample t-test
- Two-Sample z-test for Comparing Two Means
- Quiz: Introduction to Univariate Inferential Tests
- Quiz: Two-Sample z-test for Comparing Two Means
- Two Sample t test for Comparing Two Means
- Quiz: Two-Sample t-test for Comparing Two Means
- Paired Difference t-test
- Quiz: Paired Difference t-test
- Test for a Single Population Proportion
- Quiz: Test for a Single Population Proportion
- Test for Comparing Two Proportions
- Quiz: Test for Comparing Two Proportions
- Quiz: Simple Linear Regression
- Chi-Square (X2)
- Quiz: Chi-Square (X2)
- Correlation
- Quiz: Correlation
- Simple Linear Regression
- Common Mistakes
- Statistics Tables
- Quiz: Cumulative Review A
- Quiz: Cumulative Review B
- Statistics Quizzes
In the previous example, you tested a research hypothesis that predicted not only that the sample mean would be different from the population mean but that it would be different in a specific direction—it would be lower. This test is called a directional or one‐tailed test because the region of rejection is entirely within one tail of the distribution.
Some hypotheses predict only that one value will be different from another, without additionally predicting which will be higher. The test of such a hypothesis is nondirectional or two‐tailed because an extreme test statistic in either tail of the distribution (positive or negative) will lead to the rejection of the null hypothesis of no difference.
Suppose that you suspect that a particular class's performance on a proficiency test is not representative of those people who have taken the test. The national mean score on the test is 74.
The research hypothesis is:
The mean score of the class on the test is not 74.
Or in notation: H a : μ ≠ 74
The null hypothesis is:
The mean score of the class on the test is 74.
In notation: H 0 : μ = 74
As in the last example, you decide to use a 5 percent probability level for the test. Both tests have a region of rejection, then, of 5 percent, or 0.05. In this example, however, the rejection region must be split between both tails of the distribution—0.025 in the upper tail and 0.025 in the lower tail—because your hypothesis specifies only a difference, not a direction, as shown in Figure 1(a). You will reject the null hypotheses of no difference if the class sample mean is either much higher or much lower than the population mean of 74. In the previous example, only a sample mean much lower than the population mean would have led to the rejection of the null hypothesis.
Figure 1.Comparison of (a) a two‐tailed test and (b) a one‐tailed test, at the same probability level (95 percent).
The decision of whether to use a one‐ or a two‐tailed test is important because a test statistic that falls in the region of rejection in a one‐tailed test may not do so in a two‐tailed test, even though both tests use the same probability level. Suppose the class sample mean in your example was 77, and its corresponding z ‐score was computed to be 1.80. Table 2 in "Statistics Tables" shows the critical z ‐scores for a probability of 0.025 in either tail to be –1.96 and 1.96. In order to reject the null hypothesis, the test statistic must be either smaller than –1.96 or greater than 1.96. It is not, so you cannot reject the null hypothesis. Refer to Figure 1(a).
Suppose, however, you had a reason to expect that the class would perform better on the proficiency test than the population, and you did a one‐tailed test instead. For this test, the rejection region of 0.05 would be entirely within the upper tail. The critical z ‐value for a probability of 0.05 in the upper tail is 1.65. (Remember that Table 2 in "Statistics Tables" gives areas of the curve below z ; so you look up the z ‐value for a probability of 0.95.) Your computed test statistic of z = 1.80 exceeds the critical value and falls in the region of rejection, so you reject the null hypothesis and say that your suspicion that the class was better than the population was supported. See Figure 1(b).
In practice, you should use a one‐tailed test only when you have good reason to expect that the difference will be in a particular direction. A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis.
Previous Quiz: The Test Statistic
Next Quiz: One- and Two-Tailed Tests
- Online Quizzes for CliffsNotes Statistics QuickReview, 2nd Edition
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base
An Introduction to t Tests | Definitions, Formula and Examples
Published on January 31, 2020 by Rebecca Bevans . Revised on June 22, 2023.
A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.
- The null hypothesis ( H 0 ) is that the true difference between these group means is zero.
- The alternate hypothesis ( H a ) is that the true difference is different from zero.
Table of contents
When to use a t test, what type of t test should i use, performing a t test, interpreting test results, presenting the results of a t test, other interesting articles, frequently asked questions about t tests.
A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.
The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:
- are independent
- are (approximately) normally distributed
- have a similar amount of variance within each group being compared (a.k.a. homogeneity of variance)
If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances .
Here's why students love Scribbr's proofreading services
Discover proofreading & editing
When choosing a t test, you will need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction.
One-sample, two-sample, or paired t test?
- If the groups come from a single population (e.g., measuring before and after an experimental treatment), perform a paired t test . This is a within-subjects design .
- If the groups come from two different populations (e.g., two different species, or people from two separate cities), perform a two-sample t test (a.k.a. independent t test ). This is a between-subjects design .
- If there is one group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral pH of 7), perform a one-sample t test .
One-tailed or two-tailed t test?
- If you only care whether the two populations are different from one another, perform a two-tailed t test .
- If you want to know whether one population mean is greater than or less than the other, perform a one-tailed t test.
- Your observations come from two separate populations (separate species), so you perform a two-sample t test.
- You don’t care about the direction of the difference, only whether there is a difference, so you choose to use a two-tailed t test.
The t test estimates the true difference between two group means using the ratio of the difference in group means over the pooled standard error of both groups. You can calculate it manually using a formula, or use statistical analysis software.
T test formula
The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.
In this formula, t is the t value, x 1 and x 2 are the means of the two groups being compared, s 2 is the pooled standard error of the two groups, and n 1 and n 2 are the number of observations in each of the groups.
A larger t value shows that the difference between group means is greater than the pooled standard error, indicating a more significant difference between the groups.
You can compare your calculated t value against the values in a critical value chart (e.g., Student’s t table) to determine whether your t value is greater than what would be expected by chance. If so, you can reject the null hypothesis and conclude that the two groups are in fact different.
T test function in statistical software
Most statistical software (R, SPSS, etc.) includes a t test function. This built-in function will take your raw data and calculate the t value. It will then compare it to the critical value, and calculate a p -value . This way you can quickly see whether your groups are statistically different.
In your comparison of flower petal lengths, you decide to perform your t test using R. The code looks like this:
Download the data set to practice by yourself.
Sample data set
If you perform the t test for your flower hypothesis in R, you will receive the following output:
The output provides:
- An explanation of what is being compared, called data in the output table.
- The t value : -33.719. Note that it’s negative; this is fine! In most cases, we only care about the absolute value of the difference, or the distance from 0. It doesn’t matter which direction.
- The degrees of freedom : 30.196. Degrees of freedom is related to your sample size, and shows how many ‘free’ data points are available in your test for making comparisons. The greater the degrees of freedom, the better your statistical test will work.
- The p value : 2.2e-16 (i.e. 2.2 with 15 zeros in front). This describes the probability that you would see a t value as large as this one by chance.
- A statement of the alternative hypothesis ( H a ). In this test, the H a is that the difference is not 0.
- The 95% confidence interval . This is the range of numbers within which the true difference in means will be 95% of the time. This can be changed from 95% if you want a larger or smaller interval, but 95% is very commonly used.
- The mean petal length for each group.
When reporting your t test results, the most important values to include are the t value , the p value , and the degrees of freedom for the test. These will communicate to your audience whether the difference between the two groups is statistically significant (a.k.a. that it is unlikely to have happened by chance).
You can also include the summary statistics for the groups being compared, namely the mean and standard deviation . In R, the code for calculating the mean and the standard deviation from the data looks like this:
flower.data %>% group_by(Species) %>% summarize(mean_length = mean(Petal.Length), sd_length = sd(Petal.Length))
In our example, you would report the results like this:
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
- Chi square test of independence
- Statistical power
- Descriptive statistics
- Degrees of freedom
- Pearson correlation
- Null hypothesis
Methodology
- Double-blind study
- Case-control study
- Research ethics
- Data collection
- Hypothesis testing
- Structured interviews
Research bias
- Hawthorne effect
- Unconscious bias
- Recall bias
- Halo effect
- Self-serving bias
- Information bias
A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.
A t-test measures the difference in group means divided by the pooled standard error of the two group means.
In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).
Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.
If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .
If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .
A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).
A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).
A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.
If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bevans, R. (2023, June 22). An Introduction to t Tests | Definitions, Formula and Examples. Scribbr. Retrieved April 2, 2024, from https://www.scribbr.com/statistics/t-test/
Is this article helpful?
Rebecca Bevans
Other students also liked, choosing the right statistical test | types & examples, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, what is your plagiarism score.
Statistics Tutorial
Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion (two tailed).
A population proportion is the share of a population that belongs to a particular category .
Hypothesis tests are used to check a claim about the size of that population proportion.
Hypothesis Testing a Proportion
The following steps are used for a hypothesis test:
- Check the conditions
- Define the claims
- Decide the significance level
- Calculate the test statistic
For example:
- Population : Nobel Prize winners
- Category : Women
And we want to check the claim:
"The share of Nobel Prize winners that are women is not 50%"
By taking a sample of 100 randomly selected Nobel Prize winners we could find that:
10 out of 100 Nobel Prize winners in the sample were women
The sample proportion is then: \(\displaystyle \frac{10}{100} = 0.1\), or 10%.
From this sample data we check the claim with the steps below.
1. Checking the Conditions
The conditions for calculating a confidence interval for a proportion are:
- The sample is randomly selected
- Being in the category
- Not being in the category
- 5 members in the category
- 5 members not in the category
In our example, we randomly selected 10 people that were women.
The rest were not women, so there are 90 in the other category.
The conditions are fulfilled in this case.
Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.
2. Defining the Claims
We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.
The claim was:
In this case, the parameter is the proportion of Nobel Prize winners that are women (\(p\)).
The null and alternative hypothesis are then:
Null hypothesis : 50% of Nobel Prize winners were women.
Alternative hypothesis : The share of Nobel Prize winners that are women is not 50%
Which can be expressed with symbols as:
\(H_{0}\): \(p = 0.50 \)
\(H_{1}\): \(p \neq 0.50 \)
This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different (larger or smaller) than in the null hypothesis.
If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.
Advertisement
3. Deciding the Significance Level
The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.
The significance level is a percentage probability of accidentally making the wrong conclusion.
Typical significance levels are:
- \(\alpha = 0.1\) (10%)
- \(\alpha = 0.05\) (5%)
- \(\alpha = 0.01\) (1%)
A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.
There is no "correct" significance level - it only states the uncertainty of the conclusion.
Note: A 5% significance level means that when we reject a null hypothesis:
We expect to reject a true null hypothesis 5 out of 100 times.
4. Calculating the Test Statistic
The test statistic is used to decide the outcome of the hypothesis test.
The test statistic is a standardized value calculated from the sample.
The formula for the test statistic (TS) of a population proportion is:
\(\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} \)
\(\hat{p}-p\) is the difference between the sample proportion (\(\hat{p}\)) and the claimed population proportion (\(p\)).
\(n\) is the sample size.
In our example:
The claimed (\(H_{0}\)) population proportion (\(p\)) was \( 0.50 \)
The sample size (\(n\)) was \(100\)
So the test statistic (TS) is then:
\(\displaystyle \frac{0.1-0.5}{\sqrt{0.5(1-0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.5(0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.25}} \cdot \sqrt{100} = \frac{-0.4}{0.5} \cdot 10 = \underline{-8}\)
You can also calculate the test statistic using programming language functions:
With Python use the scipy and math libraries to calculate the test statistic for a proportion.
With R use the built-in math functions to calculate the test statistic for a proportion.
5. Concluding
There are two main approaches for making the conclusion of a hypothesis test:
- The critical value approach compares the test statistic with the critical value of the significance level.
- The P-value approach compares the P-value of the test statistic and with the significance level.
Note: The two approaches are only different in how they present the conclusion.
The Critical Value Approach
For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).
For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .
This critical Z-value (CV) defines the rejection region for the test.
The rejection region is an area of probability in the tails of the standard normal distribution.
Because the claim is that the population proportion is different from 50%, the rejection region is split into both the left and right tail:
Choosing a significance level (\(\alpha\)) of 0.01, or 1%, we can find the critical Z-value from a Z-table , or with a programming language function:
Note: Because this is a two-tailed test the tail area (\(\alpha\)) needs to be split in half (divided by 2).
With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\)/2 = 0.005 in the left tail.
With R use the built-in qnorm() function to find the Z-value for an \(\alpha\) = 0.005 in the left tail.
Using either method we can find that the critical Z-value in the left tail is \(\approx \underline{-2.5758}\)
Since a normal distribution i symmetric, we know that the critical Z-value in the right tail will be the same number, only positive: \(\underline{2.5758}\)
For a two-tailed test we need to check if the test statistic (TS) is smaller than the negative critical value (-CV), or bigger than the positive critical value (CV).
If the test statistic is smaller than the negative critical value, the test statistic is in the rejection region .
If the test statistic is bigger than the positive critical value, the test statistic is in the rejection region .
When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).
Here, the test statistic (TS) was \(\approx \underline{-8}\) and the critical value was \(\approx \underline{-2.5758}\)
Here is an illustration of this test in a graph:
Since the test statistic was smaller than the negative critical value we reject the null hypothesis.
This means that the sample data supports the alternative hypothesis.
And we can summarize the conclusion stating:
The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 1% significance level .
The P-Value Approach
For the P-value approach we need to find the P-value of the test statistic (TS).
If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).
The test statistic was found to be \( \approx \underline{-8} \)
For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .
Because this is a two-tailed test, we need to find the P-value of a Z-value smaller than -8 and multiply it by 2 .
We can find the P-value using a Z-table , or with a programming language function:
With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value smaller than -8 for a two tailed test:
With R use the built-in pnorm() function find the P-value of a Z-value smaller than -8 for a two tailed test:
Using either method we can find that the P-value is \(\approx \underline{1.25 \cdot 10^{-15}}\) or \(0.00000000000000125\)
This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.000000000000125%, to reject the null hypothesis.
This P-value is smaller than any of the common significance levels (10%, 5%, 1%).
So the null hypothesis is rejected at all of these significance levels.
The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 10%, 5%, and 1% significance level .
Calculating a P-Value for a Hypothesis Test with Programming
Many programming languages can calculate the P-value to decide outcome of a hypothesis test.
Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.
The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.
With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion.
Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50.
With R use the built-in prop.test() function find the P-value for a left tailed hypothesis test for a proportion.
Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from 0.50.
Note: The conf.level in the R code is the reverse of the significance level.
Here, the significance level is 0.01, or 1%, so the conf.level is 1-0.01 = 0.99, or 99%.
Left-Tailed and Two-Tailed Tests
This was an example of a two tailed test, where the alternative hypothesis claimed that parameter is different from the null hypothesis claim.
You can check out an equivalent step-by-step guide for other types here:
- Right-Tailed Test
- Left-Tailed Test
COLOR PICKER
Report Error
If you want to report an error, or if you want to make a suggestion, do not hesitate to send us an e-mail:
Top Tutorials
Top references, top examples, get certified.
IMAGES
VIDEO
COMMENTS
To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 10 inches; H A (Alternative Hypothesis): μ ≠ 10 inches; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The botanist believes ...
Two-tailed hypothesis tests are also known as nondirectional and two-sided tests because you can test for effects in both directions. When you perform a two-tailed test, you split the significance level percentage between both tails of the distribution. In the example below, I use an alpha of 5% and the distribution has two shaded regions of 2. ...
Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...
This video explains the difference between one and two tailed tests: For example, let's say you were running a z test with an alpha level of 5% (0.05). In a one tailed test, the entire 5% would be in a single tail. But with a two tailed test, that 5% is split between the two tails, giving you 2.5% (0.025) in each tail.
To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 10 inches; H A (Alternative Hypothesis): μ ≠ 10 inches; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The botanist believes ...
The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645. The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05.
So let's perform the step -1 of hypothesis testing which is: Specify the Null (H0) and Alternate (H1) hypothesis. Null hypothesis (H0): The null hypothesis here is what currently stated to be true about the population. In our case it will be the average height of students in the batch is 100. H0 : μ = 100.
Hypothesis testing example Based on the type of data you collected, you perform a one-tailed t-test to test whether men are in fact taller than women. This test gives you: an estimate of the difference in average height between the two groups.
A two-tailed hypothesis test example: A machine is used to fill bags with coffee, and each bag is 1 kg. A randomly selected sample of 30 bags has a mean weight of 1.01 kg with a standard deviation ...
The one-tailed hypothesis is rejected only if the sample proportion is much greater than \(0.5\). The alternative hypothesis in the two-tailed test is \(\pi \neq 0.5\). In the one-tailed test it is \(\pi > 0.5\). You should always decide whether you are going to use a one-tailed or a two-tailed probability before looking at the data.
To decide if a one-tailed test can be used, one has to have some extra information about the experiment to know the direction from the mean (H1: drug lowers the response time). If the direction of the effect is unknown, a two tailed test has to be used, and the H1 must be stated in a way where the direction of the effect is left uncertain (H1 ...
At this point, you might use a statistical test, like unpaired or 2-sample t-test, to see if there's a significant difference between the two groups' means. Typically, an unpaired t-test starts with two hypotheses. The first hypothesis is called the null hypothesis, and it basically says there's no difference in the means of the two groups.
In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) ¯. If testing for whether the coin is biased towards heads, a one-tailed test would be used - only large numbers of heads would be significant.
With R use built-in math and statistics functions find the P-value for a two tailed hypothesis test for a mean. Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean different from 60. ... This was an example of a left tailed test, where the alternative hypothesis claimed that ...
The term 'two-tailed' comes from the fact that the statistical test used to evaluate the hypothesis is based on the assumption that the difference or relationship could occur in either direction, resulting in two 'tails' in the probability distribution. Using the coffee foam example (from Activity 1), you have the following set of ...
Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...
If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t* were less than -1.6939 (determined using statistical software or a t-table):s-3-3. Since the biologist's test statistic, t* = -4.60, is less than -1.6939, the biologist rejects the null hypothesis.
A two‐tailed test is more conservative than a one‐tailed test because a two‐tailed test takes a more extreme test statistic to reject the null hypothesis. Quiz: One- and Two-Tailed Tests. Access quality crowd-sourced study materials tagged to courses at universities all over the world and get homework help from our tutors when you need it.
Two-tailed hypothesis test example Problem: A premium golf ball production line must produce all of its balls to 1.615 ounces in order to get the top rating (and therefore the top dollar). Samples are drawn hourly and checked. If the production line gets out of sync with a statistical significance of more than 1%, it must be shut down and repaired.
Revised on June 22, 2023. A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. t test example.
With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion. Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50. ... Left-Tailed and Two-Tailed Tests. This was an example of a two tailed test, ...