Statology

Statistics Made Easy

One-Tailed Hypothesis Tests: 3 Example Problems

In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.

Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:

H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value

H A (Alternative Hypothesis): Population parameter <, >, ≠ some value

There are two types of hypothesis tests:

  • Two-tailed test : Alternative hypothesis contains the ≠ sign
  • One-tailed test : Alternative hypothesis contains either < or > sign

In a one-tailed test , the alternative hypothesis contains the less than (“<“) or greater than (“>”) sign. This indicates that we’re testing whether or not there is a positive or negative effect.

Check out the following example problems to gain a better understanding of one-tailed tests.

Example 1: Factory Widgets

Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ ≥ 20 grams
  • H A (Alternative Hypothesis): μ < 20 grams

Note : We can tell this is a one-tailed test because the alternative hypothesis contains the less than ( < ) sign. Specifically, we would call this a left-tailed test because we’re testing if some population parameter is less than a specific value.

To test this, he uses the new method to produce 20 widgets and obtains the following information:

  • n = 20 widgets
  • x = 19.8 grams
  • s = 3.1 grams

Plugging these values into the One Sample t-test Calculator , we obtain the following results:

  • t-test statistic: -0.288525
  • one-tailed p-value: 0.388

Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is less than 20 grams.

Example 2: Plant Growth

Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer can cause this species of plants to grow by an average of greater than 10 inches.

To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ ≤ 10 inches
  • H A (Alternative Hypothesis): μ > 10 inches

Note : We can tell this is a one-tailed test because the alternative hypothesis contains the greater than ( > ) sign. Specifically, we would call this a right-tailed test because we’re testing if some population parameter is greater than a specific value.

To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:

  • n = 15 plants
  • x = 11.4 inches
  • s = 2.5 inches
  • t-test statistic: 2.1689
  • one-tailed p-value: 0.0239

Since the p-value is less than .05, the botanist rejects the null hypothesis.

She has sufficient evidence to conclude that the new fertilizer causes an average increase of greater than 10 inches.

Example 3: Studying Method

A professor currently teaches students to use a studying method that results in an average exam score of 82. However, he believes a new studying method can produce exam scores with an average value greater than 82.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ ≤ 82
  • H A (Alternative Hypothesis): μ > 82

To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:

  • t-test statistic: 3.6586
  • one-tailed p-value: 0.0006

Since the p-value is less than .05, the professor rejects the null hypothesis.

He has sufficient evidence to conclude that the new studying method produces exam scores with an average score greater than 82.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is a One-Tailed Test?

  • Determining Significance
  • One-Tailed Test FAQs
  • Corporate Finance
  • Financial Analysis

One-Tailed Test Explained: Definition and Example

1 sided hypothesis test

Investopedia / Xiaojie Liu

A one-tailed test is a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both. If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis.

Financial analysts use the one-tailed test to test an investment or portfolio hypothesis.

Key Takeaways

  • A one-tailed test is a statistical hypothesis test set up to show that the sample mean would be higher or lower than the population mean, but not both.
  • When using a one-tailed test, the analyst is testing for the possibility of the relationship in one direction of interest and completely disregarding the possibility of a relationship in another direction.
  • Before running a one-tailed test, the analyst must set up a null and alternative hypothesis and establish a probability value (p-value).

A basic concept in inferential statistics is hypothesis testing . Hypothesis testing is run to determine whether a claim is true or not, given a population parameter. A test that is conducted to show whether the mean of the sample is significantly greater than and significantly less than the mean of a population is considered a two-tailed test . When the testing is set up to show that the sample mean would be higher or lower than the population mean, it is referred to as a one-tailed test. The one-tailed test gets its name from testing the area under one of the tails (sides) of a normal distribution , although the test can be used in other non-normal distributions.

Before the one-tailed test can be performed, null and alternative hypotheses must be established. A null hypothesis is a claim that the researcher hopes to reject. An alternative hypothesis is the claim supported by rejecting the null hypothesis.

A one-tailed test is also known as a directional hypothesis or directional test.

Example of the One-Tailed Test

Let's say an analyst wants to prove that a portfolio manager outperformed the S&P 500 index in a given year by 16.91%. They may set up the null (H 0 ) and alternative (H a ) hypotheses as:

H 0 : μ ≤ 16.91

H a : μ > 16.91

The null hypothesis is the measurement that the analyst hopes to reject. The alternative hypothesis is the claim made by the analyst that the portfolio manager performed better than the S&P 500. If the outcome of the one-tailed test results in rejecting the null, the alternative hypothesis will be supported. On the other hand, if the outcome of the test fails to reject the null, the analyst may carry out further analysis and investigation into the portfolio manager’s performance.

The region of rejection is on only one side of the sampling distribution in a one-tailed test. To determine how the portfolio’s return on investment compares to the market index, the analyst must run an upper-tailed significance test in which extreme values fall in the upper tail (right side) of the normal distribution curve. The one-tailed test conducted in the upper or right tail area of the curve will show the analyst how much higher the portfolio return is than the index return and whether the difference is significant.

1%, 5% or 10%

The most common significance levels (p-values) used in a one-tailed test.

Determining Significance in a One-Tailed Test

To determine how significant the difference in returns is, a significance level must be specified. The significance level is almost always represented by the letter p, which stands for probability. The level of significance is the probability of incorrectly concluding that the null hypothesis is false. The significance value used in a one-tailed test is either 1%, 5%, or 10%, although any other probability measurement can be used at the discretion of the analyst or statistician. The probability value is calculated with the assumption that the null hypothesis is true. The lower the p-value , the stronger the evidence that the null hypothesis is false.

If the resulting p-value is less than 5%, the difference between both observations is statistically significant, and the null hypothesis is rejected. Following our example above, if the p-value = 0.03, or 3%, then the analyst can be 97% confident that the portfolio returns did not equal or fall below the return of the market for the year. They will, therefore, reject H 0  and support the claim that the portfolio manager outperformed the index. The probability calculated in only one tail of a distribution is half the probability of a two-tailed distribution if similar measurements were tested using both hypothesis testing tools.

When using a one-tailed test, the analyst is testing for the possibility of the relationship in one direction of interest and completely disregarding the possibility of a relationship in another direction. Using our example above, the analyst is interested in whether a portfolio’s return is greater than the market’s. In this case, they do not need to statistically account for a situation in which the portfolio manager underperformed the S&P 500 index. For this reason, a one-tailed test is only appropriate when it is not important to test the outcome at the other end of a distribution.

How Do You Determine If It Is a One-Tailed or Two-Tailed Test?

A one-tailed test looks for an increase or decrease in a parameter. A two-tailed test looks for change, which could be a decrease or an increase.

What Is a One-Tailed T Test Used for?

A one-tailed T-test checks for the possibility of a one-direction relationship but does not consider a directional relationship in another direction.

When Should a Two-Tailed Test Be Used?

You would use a two-tailed test when you want to test your hypothesis in both directions.

University of Southern California. " FAQ: What Are the Differences Between One-Tailed and Two-Tailed Tests? "

1 sided hypothesis test

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 12.

  • Hypothesis testing and p-values

One-tailed and two-tailed tests

  • Z-statistics vs. T-statistics
  • Small sample hypothesis test
  • Large sample proportion hypothesis testing

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Video transcript

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

FAQ: What are the differences between one-tailed and two-tailed tests?

When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output.  If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test.  However, the p-value presented is (almost always) for a two-tailed test.  But how do you choose which test?  Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output?  

What is a two-tailed test?

First let’s start with the meaning of a two-tailed test.  If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction.  This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions.  For example, we may wish to compare the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x . The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.     

What is a one-tailed test?

Next, let’s discuss the meaning of a one-tailed test.  If you are using a significance level of .05, a one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest.  This means that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction.  Let’s return to our example comparing the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x , but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05.  The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A discussion of when this is an appropriate option follows.   

When is a one-tailed test appropriate?

Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction.  Imagine you have developed a new drug that you believe is an improvement over an existing drug.  You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug.  The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.

So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical, then you can proceed with a one-tailed test. For example, imagine again that you have developed a new drug. It is cheaper than the existing drug and, you believe, no less effective.  In testing this drug, you are only interested in testing if it less effective than the existing drug.  You do not care if it is significantly more effective.  You only wish to show that it is not less effective. In this scenario, a one-tailed test would be appropriate. 

When is a one-tailed test NOT appropriate?

Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate.  Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was.  Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable–a steep price to pay for a significance star in your results table!   

Deriving a one-tailed test from two-tailed output

The default among statistical packages performing tests is to report two-tailed p-values.  Because the most commonly used test statistic distributions (standard normal, Student’s t) are symmetric about zero, most one-tailed p-values can be derived from the two-tailed p-values.   

Below, we have the output from a two-sample t-test in Stata.  The test is comparing the mean male score to the mean female score.  The null hypothesis is that the difference in means is zero.  The two-sided alternative is that the difference in means is not zero.  There are two one-sided alternatives that one could opt to test instead: that the male score is higher than the female score (diff  > 0) or that the female score is higher than the male score (diff < 0).  In this instance, Stata presents results for all three alternatives.  Under the headings Ha: diff < 0 and Ha: diff > 0 are the results for the one-tailed tests. In the middle, under the heading Ha: diff != 0 (which means that the difference is not equal to 0), are the results for the two-tailed test. 

Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999

Note that the test statistic, -3.7341, is the same for all of these tests.  The two-tailed p-value is P > |t|. This can be rewritten as P(>3.7341) + P(< -3.7341).  Because the t-distribution is symmetric about zero, these two probabilities are equal: P > |t| = 2 *  P(< -3.7341).  Thus, we can see that the two-tailed p-value is twice the one-tailed p-value for the alternative hypothesis that (diff < 0).  The other one-tailed alternative hypothesis has a p-value of P(>-3.7341) = 1-(P<-3.7341) = 1-0.0001 = 0.9999.   So, depending on the direction of the one-tailed hypothesis, its p-value is either 0.5*(two-tailed p-value) or 1-0.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero. 

In this example, the two-tailed p-value suggests rejecting the null hypothesis of no difference. Had we opted for the one-tailed test of (diff > 0), we would fail to reject the null because of our choice of tails. 

The output below is from a regression analysis in Stata.  Unlike the example above, only the two-sided p-values are presented in this output.

Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 46.58 Model | 7363.62077 2 3681.81039 Prob > F = 0.0000 Residual | 15572.5742 197 79.0486001 R-squared = 0.3210 -------------+------------------------------ Adj R-squared = 0.3142 Total | 22936.195 199 115.257261 Root MSE = 8.8909 ------------------------------------------------------------------------------ socst | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2191144 .0820323 2.67 0.008 .0573403 .3808885 math | .4778911 .0866945 5.51 0.000 .3069228 .6488594 _cons | 15.88534 3.850786 4.13 0.000 8.291287 23.47939 ------------------------------------------------------------------------------

For each regression coefficient, the tested null hypothesis is that the coefficient is equal to zero.  Thus, the one-tailed alternatives are that the coefficient is greater than zero and that the coefficient is less than zero. To get the p-value for the one-tailed test of the variable science having a coefficient greater than zero, you would divide the .008 by 2, yielding .004 because the effect is going in the predicted direction. This is P(>2.67). If you had made your prediction in the other direction (the opposite direction of the model effect), the p-value would have been 1 – .004 = .996.  This is P(<2.67). For all three p-values, the test statistic is 2.67. 

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

1 sided hypothesis test

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

4.3 One-sided tests

As well as non-directional hypotheses, you will also encounter hypotheses that have a less than or equal to (≤) and greater than (>) supposition (sign) in the statement (as you saw in Activity 3). This is called a directional hypothesis. A directional hypothesis is a type of research hypothesis that aims to predict the direction of the relationship or difference between two variables. Essentially, it specifies the anticipated outcome of a study prior to the collection of data.

For example, a directional hypothesis might propose that a marketing campaign will increase product sales, predicting the direction of the relationship (i.e. the marketing campaign will lead to an increase in product sales). In contrast, a non-directional hypothesis simply states that there is a relationship between two variables without specifying the direction of that relationship, such as: ‘There is a relationship between the marketing campaign and product sales.’

Directional hypotheses are often preferred in scientific research because they provide a more precise and focused prediction than non-directional hypotheses. In business management, a directional hypothesis can also be a useful tool. For example, a company may use a directional hypothesis to design a study that examines the effectiveness of a marketing campaign in enhancing sales. This approach provides a clearer understanding of the impact of the campaign and enables the company to make more informed decisions about future marketing strategies.

A one-tailed test is a statistical test employed to evaluate a directional hypothesis, which predicts the direction of the difference or association between two variables. Its objective is to ascertain if the data supports the anticipated direction.

To illustrate, consider the hypotheses from Activity 3:

H 0 : µ ≤ 15 hours of studies

H a : µ > 15 hours of studies

The null hypothesis (H 0 ) posits that the population mean (µ) is less than or equal to 15 hours of studies, while the alternative hypothesis (H a ) predicts that the population mean is greater than 15 hours of studies.

To conduct a one-tailed test, a critical value must be established to determine whether the null hypothesis should be rejected or retained. Typically, a significance level (α) is set for this purpose. For instance, assuming α = 0.05, the z-score for a one-tailed test with α = 0.05 in a normal distribution is 1.645. Consequently, the null hypothesis would be rejected if the z-score exceeds 1.645. In other words, only the upper tail region of the distribution is rejected for a one-tailed test. Additionally, you employ distinct z-scores since, in contrast to a two-tailed test, the alpha level does not need to be divided by two. In a normal distribution, the area in the tail above z = +1.645 represents 0.5 of the distribution. This portion of the distribution is significantly remote from the centre of the bell curve at 0. Consequently, the null hypothesis would be rejected if the z-score exceeds 1.645 (as depicted in Figure 8).

A one tailed test shown in a symmetrical graph reminiscent of a bell

A symmetrical graph reminiscent of a bell. The x-axis is labelled ‘z-score’ and the y-axis is labelled ‘probability density’. The x-axis increases in increments of 1 from -2 to 2.

The top of the bell-shaped curve is labelled ‘Hours of study = 15 hours’. The graph circles the rejection regions of the null hypothesis on the right hand side of the bell curve. Within this circle is an area shaded orange which is labelled z > 1.645 and α = 0.05.

In summary, a one-tailed test is used to assess a directional hypothesis in which the direction of the difference or association between two variables is predicted. The critical value for a one-tailed test is determined by the selected significance level (α), and the test is conducted to ascertain whether the data supports the predicted direction.

In addition, the one-tailed test is not limited to a single direction (greater than) but can also be employed in the opposite direction (less than). An example can be used to illustrate this type of hypothesis testing. Consider a situation where the management team believes that the average amount spent by customers during their visits to a department store is £65. However, the service manager observes that customers spend less than that amount during their visits. In this case, you can formulate the following set of hypotheses:

H 0 : µ ≥ £65

To test this directional hypothesis, a one-tailed test must be conducted. The alternative hypothesis states that the specific value of µ will be lower than the value specified in the hypothesis. Therefore, you must reject the region in the lower tail of the normal distribution. More specifically, the rejection region of the one-tailed test at alpha levels equals 0.05. The lower tail of the normal distribution has a z-score lower than -1.645. Any hypothesis in this region will be rejected. The graph in Figure 9 illustrates this.

A one tailed test shown in a symmetrical graph reminiscent of a bell

A symmetrical graph reminiscent of a bell. The x-axis is labelled ‘z-score axis’ and the y-axis is labelled ‘customer spending axis’. The x-axis increases in increments of 1 from -2 to 2.

The top of the bell shaped curve is labelled ‘Customer spending = £65’. The graph circles the rejection regions of the null hypothesis on left hand side of the bell curve. Within this circle is an area shaded orange which is labelled z

In conclusion, the one-tailed test is not restricted to a specific direction and can be used in either direction, depending on the research question and the hypothesis being tested. The test is used to determine if the data supports a directional hypothesis, and a critical value is established based on the significance level chosen for the test.

Previous

1 sided hypothesis test

Module 7 - Comparing Continuous Outcomes

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  

On This Page sidebar

One Sample t-test

One-tailed and two-tailed tests of significance, a one-tailed test of hypothesis, a two-tailed test of hypothesis, one sample t-test using r.

Learn More sidebar

Investigators wanted to determine whether children who were born very prematurely have poorer cognitive ability than children born at full term. To investigate they enrolled 100 children who had been born prematurely and measured their IQ in order to compare the sample mean to a normative IQ of 100. In essence, the normative IQ is a historical or external (μ 0 ). In the sample of 100 children who had been born prematurely the data for IQ were as follows: X̄ = 95.8, SD = 17.5" . The investigators wish to use a one-tailed test of significance.

A research hypothesis that states that two groups differ without specifying direction, i.e., which is greater, is a two-tailed hypothes is . In contrast, a research hypothesis that specifies direction, e.g., that the mean in a sample group will be less than the mean in a historic comparison group is a one-tailed hypothesis . Similarly, a hypothesis that a mean in a sample group will be greater than the mean in a historic comparison group is also a one-tailed hypothesis .

Suppose we conduct a test of significance to compare two groups (call them A and B) using large samples. The test statistic could be either a t score or a Z score, depending on the test we choose, but if the sample is very large the t or Z scores will be similar. Suppose also that we have specified an "alpha" level of 0.05, i.e., an error rate of 5% for concluding that the groups differ when they really don't. In other words, we will use p≤0.05 as the criterion for statistical significance.

Two-tailed test: The first figure below shows that with a two-tailed test in which we acknowledge that one group's mean could be either above or below the other, the alpha error rate has to be split into the upper and lower tails, i.e., with half of our alpha (0.025) in each tail. Therefore, we need to achieve a test statistic that is either less than -1.96 or greater than +1.96.

1 sided hypothesis test

One-tailed test (lower tail): In the middle figure the hypothesis is that group A has a mean less than group B, perhaps because it is unreasonable to think the mean IQ in group A would be greater than that in group B. If so, all of the 5% alpha is in the lower tail, and we only need to achieve a test statistic less than 1.645 to achieve "statistical significance."

One-tailed test (upper tail): The third image shows a one-tailed test in which the hypothesis is that group A has a mean value greater than that of group B, so all of the alpha is in the upper tail, meaning that we need a test statistic greater than +1.645 to achieve statistical significance.

Clearly, the two-tailed test is more conservative because, regardless of direction, the test statistic has to be more than 1.96 units away from the null. The vast majority of tests that are reported are two-tailed tests. However, there are occasionally situations in which a one-tailed test hypothesis can be justified.

A one-tailed test could be justified in the study examining whether children who had been born very prematurely have lower IQ scores than children who had a normal, full term gestation, since there is no reason to believe that those born prematurely would have higher IQs.

First, we set up the hypotheses: Null hypothesis : H 0 : μ prem = 100 (i.e., that there is no difference or association) Children born prematurely have mean IQs that are not different from those of the general population (μ=100)

Alternative hypothesis (i.e., the research hypothesis): H 0 :μ prem < 100) Children born prematurely have lower mean IQ than the general population (μ<100).

The t statistic= -2.4. We can look up the corresponding p-value with df=99, or we can use R to compute the probability.

> pt(-2.4,99) [1] 0.009132283

1 sided hypothesis test

Since p=0.009, we reject the null hypothesis and accept the alternative hypothesis. We conclude that children born prematurely have lower mean IQ scores than children in the general population who had a full term gestation.

What if we had performed a two-tailed test of hypothesis on the same data?

The null hypothesis would be the same, but the alternative hypothesis would be:

Alternative hypothesis (research hypothesis: H A : μ prem ≠100 Children born prematurely have IQ scores that are different (either lower or higher) from children from the general population who had full term gestation.

The calculation of the t statistic would also be the same, but the p-value will be different.

> 2*pt(-2.4,99) [1] 0.01826457

1 sided hypothesis test

The probability is the area under the standard normal distribution that is either lower than -2.40 or greater than +2.40. So the probability is 0.009+0.009 = 0.018, i.e., a probability of 0.009 in both the lower and upper tail. We still conclude that children born prematurely have a significantly lower mean IQ than the general population (mean=95.8, s=17.5, p=0.018).

When performing two-tailed tests, direction is not specified in the hypothesis, but one should report the direction in any report, publication, or presentation, e.g., "Children who had been born prematurely had lower mean IQ scores than in the general population."

In the example for IQ tests above we were given the mean IQ and standard deviation for the children who had been born prematurely. However, suppose we were given an Excel spreadsheet with the raw data listed in a column for "iq"?

In Excel we could save this data as a .CSV file using the "Save as" function. We could then import the data in the .CSV file into R and analyze the data as follows:

[Note that R defaults to performing the more conservative two-tailed test unless a one-tailed test is specified as we will describe below.]

> t.test(iq,mu=100)

  One Sample t-test data: iq t = -2.3801, df=99, p-value = 0.01922 alternative hypothesis: true mean is not equal to 100 95 percent confidence interval: 92.35365   99.30635 Sample estimates mean of x 95.83

In order to perform a one-tailed test, you need to specify the alternative hypothesis. For example:

> t.test(iq,mu=100, alternative="less")

> t.test(iq,mu=100, alternative="greater")

return to top | previous page | next page

Content ©2021. Some Rights Reserved. Date last modified: October 13, 2021. Wayne W. LaMorte, MD, PhD, MPH

Creative Commons license Attribution Non-commercial

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Statistics Notes: One...

Statistics Notes: One and two sided tests of significance

  • Related content
  • Peer review
  • J M Bland ,
  • Department of Public Health Sciences, St George's Hospital Medical School, London SW 17 ORE Medical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX.

In some comparisons - for example, between two means or two proportions - there is a choice between two sided or one sided tests of significance (all comparisons of three or more groups are two sided).

* This is the eighth in a series of occasional notes on medical statistics.

When we use a test of significance to compare two groups we usually start with the null hypothesis that there is no difference between the populations from which the data come. If this hypothesis is not true the alternative hypothesis must be true - that there is a difference. Since the null hypothesis specifies no direction for the difference nor does the alternative hypothesis, and so we have a two sided test. In a one sided test the alternative hypothesis does specify a direction - for example, that an active treatment is better than a placebo. This is sometimes justified by saying that we are not interested in the possibility that the active treatment is worse than no treatment. This possibility is still part of the test; it is part of the null hypothesis, which now states that the difference in the population is zero or in favour of the placebo.

A one sided test is sometimes appropriate. Luthra et al investigated the effects of laparoscopy and hydrotubation on the fertility of women presenting at an infertility clinic. 1 After some months laparoscopy was carried out on those who had still not conceived. These women were then observed for several further months and some of these women also conceived. The conception rate in the period before laparoscopy was compared with that afterwards. The less fertile a woman is the longer it is likely to take her to conceive. Hence, the women who had the laparoscopy should have a lower conception rate (by an unknown amount) than the larger group who entered the study, because the more fertile women had conceived before their turn for laparoscopy came. To see whether laparoscopy increased fertility, Luthra et al tested the null hypothesis that the conception rate after laparoscopy was less than or equal to that before. The alternative hypothesis was that the conception rate after laparoscopy was higher than that before. A two sided test was inappropriate because if the laparoscopy had no effect on fertility the conception rate after laparoscopy was expected to be lower.

One sided tests are not often used, and sometimes they are not justified. Consider the following example. Twenty five patients with breast cancer were given radiotherapy treatment of 50 Gy in fractions of 2 Gy over 5 weeks. 2 Lung function was measured initially, at one week, at three months, and at one year. The aim of the study was to see whether lung function was lowered following radiotherapy. Some of the results are shown in the table, the forced vital capacity being compared between the initial and each subsequent visit using one sided tests. The direction of the one sided tests was not specified, but it may appear reasonable to test the alternative hypothesis that forced vital capacity decreases after radiotherapy, as there is no reason to suppose that damage to the lungs would increase it. The null hypothesis is that forced vital capacity does not change or increases. If the forced vital capacity increases, this is consistent with the null hypothesis, and the more it increases the more consistent the data are with the null hypothesis. Because the differences are not all in the same direction, at least one P value should be greater than 0.5. What has been done here is to test the null hypothesis that forced vital capacity does not change or decreases from visit 1 to visit 2 (nine week), and to test the null hypothesis that it does not change or increases from visit 1 to visit 3 (three months) or visit 4 (one year). These authors seem to have carried out one sided tests in both directions for each visit and then taken the smaller probability. If there is no difference in the population the probability of getting a significant difference by this approach is 10%, not 5% as it should be. The chance of a spurious significant difference is doubled. Two sided tests should be used, which would give probabilities of 0.26, 0.064, and 0.38, and no significant differences.

In general a one sided test is appropriate when a large difference in one direction would lead to the same action as no difference at all. Expectation of a difference in a particular direction is not adequate justification. In medicine, things do not always work out as expected, and researchers may be surprised by their results. For example, Galloe et al found that oral magnesium significantly increased the risk of cardiac events, rather than decreasing it as they had hoped. 3 If a new treatment kills a lot of patients we should not simply abandon it; we should ask why this happened.

Two sided tests should be used unless there is a very good reason for doing otherwise. If one sided tests are to be used the direction of the test must be specified in advance. One sided tests should never be used simply as a device to make a conventionally non-significant difference significant.

  • Galloe AM ,
  • Rasmussen HS ,
  • Jorgensen LN ,
  • Balslov S ,
  • Graudal N ,

1 sided hypothesis test

STM1001 Topic 5: Hypothesis Testing

3.2 one-sided vs two-sided tests.

In the previous section, we mentioned that the test we had carried out was a two-sided test . One-sided tests are also possible. Consider the following cases:

  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu \neq 5\)
  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu > 5\)
  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu < 5\)

Examples 2 and 3 above are referred to as 'one-sided tests' because they are only testing for extreme values in one direction. Consider the below figure, which shows the critical values (CV) required for each test:

  • The two-sided test can also be referred to as a two-tailed test , because as we can see above, there are two tails of the distribution curve for which we are interested in extreme values. Because the combined area of the shaded area must equal \(\alpha = 0.05\) , in this case, we have an area of \(\alpha / 2 = 0.5 / 2 = 0.025\) at each tail. This resulted in critical values of -1.99 and 1.99 respectively. For a two-sided test , we have: \[p\text{-value} = 2 \times P(T\geq |t|) \text{ for } T\sim t_{\text{df}}\]
  • The one-sided test (right-tailed) can also be referred to as a right-tailed test , because as we can see above, it is the right-tail of the distribution curve for which we are interested in extreme values. Because the total area of the shaded area must equal \(\alpha = 0.05\) , in this case, we simply have an area of \(\alpha = 0.5\) in the right tail. This resulted in a critical value of 1.67. For a one-sided test (right-tailed) , we have: \[p\text{-value} = P(T\geq t) \text{ for } T\sim t_{\text{df}}\]
  • The one-sided test (left-tailed) can also be referred to as a left-tailed test , because as we can see above, it is the left-tail of the distribution curve for which we are interested in extreme values. Because the total area of the shaded area must equal \(\alpha = 0.05\) , in this case, we simply have an area of \(\alpha = 0.5\) in the left tail. This resulted in a critical value of -1.67. For a one-sided test (left-tailed) , we have: \[p\text{-value} = P(T\leq t) \text{ for } T\sim t_{\text{df}}\]

Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will get practice using both two-sided and one-sided tests.

  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students greater than 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students different from 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students less than 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

The One-Sample t -Test

What is the one-sample t -test.

The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is different from a specific value.

When can I use the test?

You can use the test for continuous data. Your data should be a random sample from a normal population.

What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data. When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

Using the one-sample t -test

See how to perform a one-sample t -test using statistical software.

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library .

The sections below discuss what we need for the test, checking our data, performing the test, understanding test results and statistical details.

What do we need?

For the one-sample t -test, we need one variable.

We also have an idea, or hypothesis, that the mean of the population has some value. Here are two examples:

  • A hospital has a random sample of cholesterol measurements for men. These patients were seen for issues other than cholesterol. They were not taking any medications for high cholesterol. The hospital wants to know if the unknown mean cholesterol for patients is different from a goal level of 200 mg.
  • We measure the grams of protein for a sample of energy bars. The label claims that the bars have 20 grams of protein. We want to know if the labels are correct or not.

One-sample t -test assumptions

For a valid test, we need data values that are:

  • Independent (values are not related to one another).
  • Continuous.
  • Obtained via a simple random sample from the population.

Also, the population is assumed to be normally distributed .

One-sample t -test example

Imagine we have collected a random sample of 31 energy bars from a number of different stores to represent the population of energy bars available to the general consumer. The labels on the bars claim that each bar contains 20 grams of protein.

Table 1: Grams of protein in random sample of energy bars

If you look at the table above, you see that some bars have less than 20 grams of protein. Other bars have more. You might think that the data support the idea that the labels are correct. Others might disagree. The statistical test provides a sound method to make a decision, so that everyone makes the same decision on the same set of data values. 

Checking the data

Let’s start by answering: Is the t -test an appropriate method to test that the energy bars have 20 grams of protein ? The list below checks the requirements for the test.

  • The data values are independent. The grams of protein in one energy bar do not depend on the grams in any other energy bar. An example of dependent values would be if you collected energy bars from a single production lot. A sample from a single lot is representative of that lot, not energy bars in general.
  • The data values are grams of protein. The measurements are continuous.
  • We assume the energy bars are a simple random sample from the population of energy bars available to the general consumer (i.e., a mix of lots of bars).
  • We assume the population from which we are collecting our sample is normally distributed, and for large samples, we can check this assumption.

We decide that the t -test is an appropriate method.

Before jumping into analysis, we should take a quick look at the data. The figure below shows a histogram and summary statistics for the energy bars.

Histogram and summary statistics for the grams of protein in energy bars

From a quick look at the histogram, we see that there are no unusual points, or outliers . The data look roughly bell-shaped, so our assumption of a normal distribution seems reasonable.

From a quick look at the statistics, we see that the average is 21.40, above 20. Does this  average from our sample of 31 bars invalidate the label's claim of 20 grams of protein for the unknown entire population mean? Or not?

How to perform the one-sample t -test

For the t -test calculations we need the mean, standard deviation and sample size. These are shown in the summary statistics section of Figure 1 above.

We round the statistics to two decimal places. Software will show more decimal places, and use them in calculations. (Note that Table 1 shows only two decimal places; the actual data used to calculate the summary statistics has more.)

We start by finding the difference between the sample mean and 20:

$ 21.40-20\ =\ 1.40$

Next, we calculate the standard error for the mean. The calculation is:

Standard Error for the mean = $ \frac{s}{\sqrt{n}}= \frac{2.54}{\sqrt{31}}=0.456 $

This matches the value in Figure 1 above.

We now have the pieces for our test statistic. We calculate our test statistic as:

$ t =  \frac{\text{Difference}}{\text{Standard Error}}= \frac{1.40}{0.456}=3.07 $

To make our decision, we compare the test statistic to a value from the t- distribution. This activity involves four steps.

  • We calculate a test statistic. Our test statistic is 3.07.
  • We decide on the risk we are willing to take for declaring a difference when there is not a difference. For the energy bar data, we decide that we are willing to take a 5% risk of saying that the unknown population mean is different from 20 when in fact it is not. In statistics-speak, we set α = 0.05. In practice, setting your risk level (α) should be made before collecting the data.

We find the value from the t- distribution based on our decision. For a t -test, we need the degrees of freedom to find this value. The degrees of freedom are based on the sample size. For the energy bar data:

degrees of freedom = $ n - 1 = 31 - 1 = 30 $

The critical value of t with α = 0.05 and 30 degrees of freedom is +/- 2.043. Most statistics books have look-up tables for the distribution. You can also find tables online. The most likely situation is that you will use software and will not use printed tables.

We compare the value of our statistic (3.07) to the t value. Since 3.07 > 2.043, we reject the null hypothesis that the mean grams of protein is equal to 20. We make a practical conclusion that the labels are incorrect, and the population mean grams of protein is greater than 20.

Statistical details

Let’s look at the energy bar data and the 1-sample t -test using statistical terms.

Our null hypothesis is that the underlying population mean is equal to 20. The null hypothesis is written as:

$ H_o:  \mathrm{\mu} = 20 $

The alternative hypothesis is that the underlying population mean is not equal to 20. The labels claiming 20 grams of protein would be incorrect. This is written as:

$ H_a:  \mathrm{\mu} ≠ 20 $

This is a two-sided test. We are testing if the population mean is different from 20 grams in either direction. If we can reject the null hypothesis that the mean is equal to 20 grams, then we make a practical conclusion that the labels for the bars are incorrect. If we cannot reject the null hypothesis, then we make a practical conclusion that the labels for the bars may be correct.

We calculate the average for the sample and then calculate the difference with the population mean, mu:

$  \overline{x} - \mathrm{\mu} $

We calculate the standard error as:

$ \frac{s}{ \sqrt{n}} $

The formula shows the sample standard deviation as s and the sample size as n .  

The test statistic uses the formula shown below:

$  \dfrac{\overline{x} - \mathrm{\mu}} {s / \sqrt{n}} $

We compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the energy bar data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the sample size and are calculated as:

$ df = n - 1 = 31 - 1 = 30 $

Statisticians write the t value with α = 0.05 and 30 degrees of freedom as:

$ t_{0.05,30} $

The t value for a two-sided test with α = 0.05 and 30 degrees of freedom is +/- 2.042. There are two possible results from our comparison:

  • The test statistic is less extreme than the critical  t  values; in other words, the test statistic is not less than -2.042, or is not greater than +2.042. You fail to reject the null hypothesis that the mean is equal to the specified value. In our example, you would be unable to conclude that the label for the protein bars should be changed.
  • The test statistic is more extreme than the critical  t  values; in other words, the test statistic is less than -2.042, or is greater than +2.042. You reject the null hypothesis that the mean is equal to the specified value. In our example, you conclude that either the label should be updated or the production process should be improved to produce, on average, bars with 20 grams of protein.

Testing for normality

The normality assumption is more important for small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the energy bar data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for the data, and supports our decision.

Normal quantile plot for energy bar data

You can also perform a formal test for normality using software. The figure below shows results of testing for normality with JMP software. We cannot reject the hypothesis of a normal distribution. 

Testing for normality using JMP software

We can go ahead with the assumption that the energy bar data is normally distributed.

What if my data are not from a Normal distribution?

If your sample size is very small, it is hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the energy bar data, the company knows that the underlying distribution of grams of protein is normally distributed. Even for a very small sample, the company would likely go ahead with the t -test and assume normality.

What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use a nonparametric test. Nonparametric  analyses do not depend on an assumption that the data values are from a specific distribution. For the one-sample t ­-test, the one possible nonparametric test is the Wilcoxon Signed Rank test. 

Understanding p-values

Using a visual, you can check to see if your test statistic is more extreme than a specified value in the distribution. The figure below shows a t- distribution with 30 degrees of freedom.

t-distribution with 30 degrees of freedom and α = 0.05

Since our test is two-sided and we set α = 0.05, the figure shows that the value of 2.042 “cuts off” 5% of the data in the tails combined.

The next figure shows our results. You can see the test statistic falls above the specified critical value. It is far enough “out in the tail” to reject the hypothesis that the mean is equal to 20.

Our results displayed in a t-distribution with 30 degrees of freedom

Putting it all together with Software

You are likely to use software to perform a t -test. The figure below shows results for the 1-sample t -test for the energy bar data from JMP software.  

One-sample t-test results for energy bar data using JMP software

The software shows the null hypothesis value of 20 and the average and standard deviation from the data. The test statistic is 3.07. This matches the calculations above.

The software shows results for a two-sided test and for one-sided tests. We want the two-sided test. Our null hypothesis is that the mean grams of protein is equal to 20. Our alternative hypothesis is that the mean grams of protein is not equal to 20.  The software shows a p- value of 0.0046 for the two-sided test. This p- value describes the likelihood of seeing a sample average as extreme as 21.4, or more extreme, when the underlying population mean is actually 20; in other words, the probability of observing a sample mean as different, or even more different from 20, than the mean we observed in our sample. A p -value of 0.0046 means there is about 46 chances out of 10,000. We feel confident in rejecting the null hypothesis that the population mean is equal to 20.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.6: One-Sided Tests

  • Last updated
  • Save as PDF
  • Page ID 29512

  • Danielle Navarro
  • University of New South Wales

When we introduced the theory of null hypothesis tests, we mentioned that there are some situations when it’s appropriate to specify a one-sided test (see Section ????.4.3). So far, all of the t -tests have been two-sided tests (as is default for SPSS and many other statistics packages). For instance, when we specified a one-sample t -test for the grades in Dr. Zeppo’s class, the null hypothesis was that the true mean was 67.5%. The alternative hypothesis was that the true mean was greater than or less than 67.5%. Suppose we were only interested in finding out if the true mean is greater than 67.5%, and have no interest whatsoever in testing to find out if the true mean is lower than 67.5%. If so, our null hypothesis would be that the true mean is 67.5% or less, and the alternative hypothesis would be that the true mean is greater than 67.5%. Newer versions of SPSS solve this issue by simply reporting both the one-sided and two-sided p -values:

clipboard_ee65823a5e686034b13ba9b71d2e643dc.png

Notice that although the t -statistic and degrees of freedom are not different, the p-value is. This is because the one-sided test has a different rejection region from the two-sided test. If you’ve forgotten why this is and what it means, you may find it helpful to read back over Chapter ??, and Section ??.4.3 in particular.

So that’s how to do a one-sided one sample t-test. However, all versions of the t -test can be one-sided. For an independent samples t -test, you could have a one-sided test if you’re only interested in testing to see if group A has higher scores than group B, but have no interest in finding out if group B has higher scores than group A. Let’s suppose that, for Dr. Harpo’s class, you wanted to see if Anastasia’s students had higher grades than Bernadette’s. The independentSamplesTTest() function lets you do this, again by specifying the one.sided argument. However, this time around you need to specify the name of the group that you’re expecting to have the higher score. In our case, we’d write one.sided = "Anastasia" . So the command would be:

Again, the output changes in a predictable way. The definition of the null and alternative hypotheses has changed, the p-value has changed, and it now reports a one-sided confidence interval rather than a two-sided one.

What about the paired samples t-test? Suppose we wanted to test the hypothesis that grades go up from test 1 to test 2 in Dr Zeppo’s class, and are not prepared to consider the idea that the grades go down. Again, we can use the one.sided argument to specify the one-sided test, and it works the same way it does for the independent samples t-test. You need to specify the name of the group whose scores are expected to be larger under the alternative hypothesis. If your data are in wide form, as they are in the chico data frame, you’d use this command:

Yet again, the output changes in a predictable way. The hypotheses have changed, the p-value has changed, and the confidence interval is now one-sided. If your data are in long form, as they are in the chico2 data frame, it still works the same way. Either of the following commands would work,

and would produce the same answer as the output shown above.

Bayesian and frequentist evidence in one-sided hypothesis testing

  • Original Paper
  • Published: 08 June 2021
  • Volume 31 , pages 278–297, ( 2022 )

Cite this article

  • Elías Moreno   ORCID: orcid.org/0000-0001-5246-6935 1 &
  • Carmen Martínez 1  

405 Accesses

2 Citations

Explore all metrics

In one-sided testing, Bayesians and frequentists differ on whether or not there is discrepancy between the inference based on the posterior model probability and that based on the p value. We add some arguments to this debate analyzing the discrepancy for moderate and large sample sizes. For small and moderate samples sizes, the discrepancy is measured by the probability of disagreement. Examples of the discrepancy on some basic sampling models indicate the somewhat unexpected result that the probability of disagreement is larger when sampling from models in the alternative hypothesis that are not located at the boundary of the hypotheses. For large sample sizes, we prove that the Bayesian one-sided testing is, under mild conditions, consistent, a property that is not shared by the frequentist procedure. Further, the rate of convergence is \(O(e^{nA})\) , where A is a constant that depends on the model from which we are sampling. Consistency is also proved for an extension to multiple hypotheses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

1 sided hypothesis test

Similar content being viewed by others

1 sided hypothesis test

Confidence distributions and hypothesis testing

Eugenio Melilli & Piero Veronese

1 sided hypothesis test

History and nature of the Jeffreys–Lindley paradox

Eric-Jan Wagenmakers & Alexander Ly

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

David R. Bickel

Berger JO (2003) Could Fisher, Jeffreys and Neyman have agreed on testing? Stat Sci 18:1–32

Article   MathSciNet   Google Scholar  

Berger JO, Mortera J (1999) Default Bayes factors for one-sided hypothesis testing. J Am Stat Assoc 94:542–554

Article   Google Scholar  

Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for linear models (with discussion). In: Bernardo JM et al (eds) Bayesian statistics, vol 5. Oxford University Press, New York, pp 25–44

Google Scholar  

Berger JO, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of p values and evidence. J Am Stat Assoc 82:112–122

MathSciNet   MATH   Google Scholar  

Casella G, Berger R (1987) Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82:106–111

Casella G, Girón FJ, Martínez ML, Moreno E (2009) Consistency of Bayesian procedures for variable selection. Ann Stat 37:1207–1228

Casella G, Moreno E, Girón FJ (2014) Cluster analysis, model selection, and prior distributions on models, Bayesian. Analysis 9:613–658

Dickey JM (1977) Is the tail area useful as an approximate Bayes factor? J Am Stat Assoc 72:138–142

Dudley RM, Haughton D (1997) Information criteria for multiple data sets and restricted parameters. Stat Sin 7:265–284

Edwards W, Lindman H, Savage L (1963) Bayesian statistical inference for psychological research. Psychol Rev 70:193–242

Efron B, Gous A (2001) Scales of evidence for model selection: Fisher versus Jeffreys. In: Lahiri P (ed) Lecture notes-monograph series, vol 38. Institute of Mathematical Statistics, Hayward, pp 208–246

Girón FJ, Martínez ML, Moreno E, Torres F (2006) Objective testing procedures in linear models: calibration of the p-values. Scand J Stat 33:765–784

Kass RE, Vaidyanathan SK (1992) Approximate Bayes factors and orthogonal parameters. With application to testing equality of two binomial proportions. J R Stat Soc Ser B 54:129–144

Micheas AC, Dey DK (2003) Prior and posterior predictive p-values in the one-side location parameter testing problem. Sankhya Ser A 65:158–178

MATH   Google Scholar  

Moreno E (1997) Bayes factors for intrinsic and fractional priors in nested models, Bayesian robustness. In: Dodge Y (ed) L \(_{1}\) Statistical procedures and related topics, vol 31 of lecture notes-monograph series. Institute of Mathematical Statistics, Hayward, CA, pp 257–270

Moreno E (2005) Objective Bayesian analysis for one-sided testing. Test 14:181–198

Moreno E, Girón FJ (2005) Consistency of Bayes factors for linear models. C R Acad Sci Paris Ser I 340:911–914

Moreno E, Bertolino F, Racugno W (1998) An intrinsic limiting procedure for model selection and hypothesis testing. J Am Stat Assoc 93:1451–1460

Moreno E, Girón FJ, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952

Moreno E, Girón FJ, Casella G (2015) Posterior model consistency in variable selection as the model dimension grows. Stat Sci 30:228–241

Morris C (1987) Comment to Casella and Berger (1987). J Am Stat Assoc 82:112–139

Mulder J (2014) Prior adjusted default Bayes factors for testing (in) equality constrained hypotheses. Comput Stat Data Anal 71:448–463

Mulder J, Raftery AE (2019) BIC extensions for order-constrained model selection. Sociol Methods Res. https://doi.org/10.1177/0049124119882459

Wang M, Maruyama Y (2016) Consistency of Bayes factor for nonnested model selection when the model dimension grows. Bernoulli 22:2080–2100

Wilks SS (1963) Mathematical statistics. Wiley, New York

Download references

Acknowledgements

The first author of this research was supported by the Junta de Andalucía Grant A-FQM-456-UGR18. We are grateful to two anonymous Referees for their comments that lead to improve the presentation of the article.

Author information

Authors and affiliations.

Department of Statistics and O.R., University of Granada, Granada, Spain

Elías Moreno & Carmen Martínez

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Elías Moreno .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Moreno, E., Martínez, C. Bayesian and frequentist evidence in one-sided hypothesis testing. TEST 31 , 278–297 (2022). https://doi.org/10.1007/s11749-021-00778-8

Download citation

Received : 26 October 2020

Accepted : 12 May 2021

Published : 08 June 2021

Issue Date : March 2022

DOI : https://doi.org/10.1007/s11749-021-00778-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bayesian one-sided test
  • Consistency
  • Decision rule
  • Intrinsic priors
  • Reference priors

Mathematics Subject Classification

  • Find a journal
  • Publish with us
  • Track your research

Do you get more food when you order in-person at Chipotle?

April 7, 2024

Inspired by this Reddit post , we will conduct a hypothesis test to determine if there is a difference in the weight of Chipotle orders between in-person and online orders. The data was originally collected by Zackary Smigel , and a cleaned copy be found in data/chipotle.csv .

Throughout the application exercise we will use the infer package which is part of tidymodels to conduct our permutation tests.

Variable type: character

Variable type: Date

Variable type: numeric

The variable we will use in this analysis is weight which records the total weight of the meal in grams.

We wish to test the claim that the difference in weight between in-person and online orders must be due to something other than chance.

1 sided hypothesis test

  • Your turn: Write out the correct null and alternative hypothesis in terms of the difference in means between in-person and online orders. Do this in both words and in proper notation.

Null hypothesis: TODO

\[H_0: \mu_{\text{online}} - \mu_{\text{in-person}} = TODO\]

Alternative hypothesis: The difference in means between in-person and online Chipotle orders is not \(0\) .

\[H_A: \mu_{\text{online}} - \mu_{\text{in-person}} TODO\]

Observed data

Our goal is to use the collected data and calculate the probability of a sample statistic at least as extreme as the one observed in our data if in fact the null hypothesis is true.

  • Demo: Calculate and report the sample statistic below using proper notation.

The null distribution

Let’s use permutation-based methods to conduct the hypothesis test specified above.

We’ll start by generating the null distribution.

  • Demo: Generate the null distribution.
  • Your turn: Take a look at null_dist . What does each element in this distribution represent?

Add response here.

Question: Before you visualize the distribution of null_dist – at what value would you expect this distribution to be centered? Why?

Demo: Create an appropriate visualization for your null distribution. Does the center of the distribution match what you guessed in the previous question?

1 sided hypothesis test

  • Demo: Now, add a vertical red line on your null distribution that represents your sample statistic.

1 sided hypothesis test

Question: Based on the position of this line, does your observed sample difference in means appear to be an unusual observation under the assumption of the null hypothesis?

Above, we eyeballed how likely/unlikely our observed mean is. Now, let’s actually quantify it using a p-value.

Question: What is a p-value?

Guesstimate the p-value

  • Demo: Visualize the p-value.

1 sided hypothesis test

Your turn: What is you guesstimate of the p-value?

Calculate the p-value

1 sided hypothesis test

Your turn: What is the conclusion of the hypothesis test based on the p-value you calculated? Make sure to frame it in context of the data and the research question. Use a significance level of 5% to make your conclusion.

Demo: Interpret the p-value in context of the data and the research question.

Reframe as a linear regression model

While we originally evaluated the null/alternative hypotheses as a difference in means, we could also frame this as a regression problem where the outcome of interest (weight of the order) is a continuous variable. Framing it this way allows us to include additional explanatory variables in our model which may account for some of the variation in weight.

Single explanatory variable

Demo: Let’s reevaluate the original hypotheses using a linear regression model. Notice the similarities and differences in the code compared to a difference in means, and that the obtained p-value should be nearly identical to the results from the difference in means test.

1 sided hypothesis test

Multiple explanatory variables

Demo: Now let’s also account for additional variables that likely influence the weight of the order.

  • Protein type ( meat )
  • Type of meal ( meal_type ) - burrito or bowl
  • Store ( store ) - at which Chipotle location the order was placed

1 sided hypothesis test

Your turn: Interpret the p-value for the order in context of the data and the research question.

Compare to CLT-based method

Demo: Let’s compare the p-value obtained from the permutation test to the p-value obtained from that derived using the Central Limit Theorem (CLT).

Your turn: What is the p-value obtained from the CLT-based method? How does it compare to the p-value obtained from the permutation test?

One Sample Hypothesis testing Worksheet #1 (SP2024) - edited

Johnson County Community College *

Mathematics

Apr 3, 2024

Uploaded by GrandWater13233 on coursehero.com

  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

IMAGES

  1. One-Sided Hypothesis Test for a Proportion (TIU Math Dept)

    1 sided hypothesis test

  2. 1.3_ One Sided Hypothesis Testing

    1 sided hypothesis test

  3. Hypothesis Testing and Confidence Intervals

    1 sided hypothesis test

  4. ️Hypothesis Testing Worksheet Free Download| Goodimg.co

    1 sided hypothesis test

  5. Six Sigma Tools

    1 sided hypothesis test

  6. Hypothesis testing tutorial using p value method

    1 sided hypothesis test

VIDEO

  1. Large Sample Hypothesis Tests Sample Size

  2. Hypothesis test Z Test Part 4 Single Sample one tailed Test MBS First Semester Statistics

  3. ANOVA test Part 1 One Way F test Hypothesis MBS First Semester Statistics Solution in Nepali

  4. Biostatistics: T-Test and ANOVA (Analysis of Variance), Part 2, Helpful Video in Amharic Speech

  5. Hypothesis Testing using one-sample T-test and Z-test

  6. Lecture 17: Hypothesis Testing

COMMENTS

  1. One-Tailed and Two-Tailed Hypothesis Tests Explained

    Write the null and alternative hypothesis using a 1-tailed and 2-tailed test for each problem. (In paragraph and symbols) ... The t critical value for the two-tailed test is +/- 2.086 while for the one-sided test it is 1.725. It is true that probability associated with those critical values doubles for the one-tailed test (2.5% -> 5%), but the ...

  2. One-Tailed Hypothesis Tests: 3 Example Problems

    n = 20 widgets. x = 19.8 grams. s = 3.1 grams. Plugging these values into the One Sample t-test Calculator, we obtain the following results: t-test statistic: -0.288525. one-tailed p-value: 0.388. Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

  3. One-Tailed Test Explained: Definition and Example

    One-Tailed Test: A one-tailed test is a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both. If ...

  4. One- and two-tailed tests

    In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) ¯. If testing for whether the coin is biased towards heads, a one-tailed test would be used - only large numbers of heads would be significant.

  5. 3.1: The Fundamentals of Hypothesis Testing

    A Left-sided Test. This tests whether the population parameter is equal to, versus less than, some specific value. Ho: μ = 12 vs. H 1: μ < 12. The critical region is in the left tail and the critical value is a negative value that defines the rejection zone. Figure \(\PageIndex{3}\): The rejection zone for a left-sided hypothesis test.

  6. One-tailed and two-tailed tests (video)

    what can be said about the 2 sided p-value for testing the null hypothesis of no change in cholesterol levels, if on average after three months the cholesterol levels among 100 patients decreased by 15.0 and standard deviation of the changes in cholesterol was 40.

  7. 4.4: Hypothesis Testing

    Two-sided hypothesis testing with p-values. We now consider how to compute a p-value for a two-sided test. In one-sided tests, we shade the single tail in the direction of the alternative hypothesis. For example, when the alternative had the form \(\mu\) > 7, then the p-value was represented by the upper tail (Figure 4.16). When the alternative ...

  8. FAQ: What are the differences between one-tailed and two-tailed tests?

    First let's start with the meaning of a two-tailed test. If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution ...

  9. Data analysis: hypothesis testing: 4.3 One-sided tests

    4.3 One-sided tests. As well as non-directional hypotheses, you will also encounter hypotheses that have a less than or equal to (≤) and greater than (>) supposition (sign) in the statement (as you saw in Activity 3). This is called a directional hypothesis. A directional hypothesis is a type of research hypothesis that aims to predict the ...

  10. 11.6: One Sided Tests

    When introducing the theory of null hypothesis tests, I mentioned that there are some situations when it's appropriate to specify a one-sided test (see Section 11.4.3). So far, all of the t-tests have been two-sided tests. For instance, when we specified a one sample t-test for the grades in Dr Zeppo's class, the null hypothesis was that ...

  11. One Sample t-test

    One Sample t-test. data: iq t = -2.3801, df=99, p-value = 0.01922. alternative hypothesis: true mean is not equal to 100. 95 percent confidence interval: 92.35365 99.30635. Sample estimates. mean of x. 95.83. In order to perform a one-tailed test, you need to specify the alternative hypothesis.

  12. One Tailed and Two Tailed Tests, Critical Values ...

    This statistics video tutorial explains when you should use a one tailed test vs a two tailed test when solving problems associated with hypothesis testing. ...

  13. Statistics Notes: One and two sided tests of significance

    The chance of a spurious significant difference is doubled. Two sided tests should be used, which would give probabilities of 0.26, 0.064, and 0.38, and no significant differences. In general a one sided test is appropriate when a large difference in one direction would lead to the same action as no difference at all.

  14. One sided confidence interval for hypothesis testing

    One sided confidence intervals are dual to one tailed hypothesis tests just as regular two sided CIs are dual to two tailed tests. If θ θ is a parameter, and we say that (a, ∞) ( a, ∞) is a one sided CI for θ θ, then this means that a a was found by a process that will yield a value below the true value of θ θ 95% 95 % of the time.

  15. 3.2 One-sided vs two-sided tests

    This resulted in a critical value of -1.67. For a one-sided test (left-tailed), we have: p-value =P (T ≤ t) for T ∼ tdf p -value = P ( T ≤ t) for T ∼ t df. Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will get practice using both ...

  16. Hypothesis testing: Null Hypothesis for one-sided tests

    8. With a one sided test, we might want to assess if a sample mean is greater than some theoretical mean (or the other way round): HA:μS > μT H A: μ S > μ T. What confuses me is that even for one-sided test the Null-hypothesis is described as equality between the means, i.e.: H0:μS = μT H 0: μ S = μ T . Why is that?

  17. One-Sample t-Test

    The software shows the null hypothesis value of 20 and the average and standard deviation from the data. The test statistic is 3.07. This matches the calculations above. The software shows results for a two-sided test and for one-sided tests. We want the two-sided test. Our null hypothesis is that the mean grams of protein is equal to 20.

  18. Z-table

    We can conclude that 1.242% of observations will fall more than ± 2.5 standard deviations away from the mean in a normally distributed population. If this z-value is a test statistic, the p-value for a two-sided z-test is 0.01242. The z-table chart below illustrates this result.

  19. 10.6: One-Sided Tests

    Suppose we wanted to test the hypothesis that grades go up from test 1 to test 2 in Dr Zeppo's class, and are not prepared to consider the idea that the grades go down. Again, we can use the one.sided argument to specify the one-sided test, and it works the same way it does for the independent samples t-test.

  20. Bayesian and frequentist evidence in one-sided hypothesis testing

    While it is widely accepted that for two-sided testing the p value overstates the evidence against the null (Edwards et al. 1963; Dickey 1977; Berger and Sellke 1987), it has been argued that for one-sided testing the Bayesian and frequentist approaches agree in producing a data-based evaluation of the evidence on the null hypothesis (Casella and Berger 1987).

  21. Defining a hypothesis and deciding on two- or one-sided test for a

    Note: For the one-sided test, if you were to make the mistake of using the wrong alternative direction alte="less" to request t.test(dif, alte="less"), then you would get the bogus P-value 0.9575. In a one-sided t test, a P-value greater than 1/2 always deserves a second look.

  22. PDF Microsoft PowerPoint

    in Hypothesis Testing. Problem with using a predefined : you. don't know by how much you exceeded it. Another approach is to calculate Prob(H 0 is true given the sample data) referred to as P‐value. It the smallest that would lead to rejection of null hypothesis.

  23. Do you get more food when you order in-person at Chipotle?

    Inspired by this Reddit post, we will conduct a hypothesis test to determine if there is a difference in the weight of Chipotle orders between in-person and online orders.The data was originally collected by Zackary Smigel, and a cleaned copy be found in data/chipotle.csv.. Throughout the application exercise we will use the infer package which is part of tidymodels to conduct our permutation ...

  24. Practical (not theoretical) examples of where a 1 sided test would be

    In particular, when comparing a one-sided and two-sided version of the test, the same evidence in favour of the (one-sided) hypothesis will give a p-value that is half as much in the one-sided test as in the two-sided test.

  25. One Sample Hypothesis testing Worksheet #1 (SP2024)

    One-Sample Hypothesis Testing Worksheet #1 Name_____ Math 181 Statistics Helgeson Due at the start of class on Wednesday, March 20, 2024. 1. From looking at Helgeson's red & white bead container that does not have the red and white beads all mixed together, but rather, has the red and white beads separated from each other, it appears that the percentage of red beads in the container may be ...