Statology

Statistics Made Easy

Two-Tailed Hypothesis Tests: 3 Example Problems

In statistics, we use hypothesis tests to determine whether some claim about a population parameter is true or not.

Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms:

H 0 (Null Hypothesis): Population parameter = ≤, ≥ some value

H A (Alternative Hypothesis): Population parameter <, >, ≠ some value

There are two types of hypothesis tests:

  • One-tailed test : Alternative hypothesis contains either < or > sign
  • Two-tailed test : Alternative hypothesis contains the ≠ sign

In a two-tailed test , the alternative hypothesis always contains the not equal ( ≠ ) sign.

This indicates that we’re testing whether or not some effect exists, regardless of whether it’s a positive or negative effect.

Check out the following example problems to gain a better understanding of two-tailed tests.

Example 1: Factory Widgets

Suppose it’s assumed that the average weight of a certain widget produced at a factory is 20 grams. However, one engineer believes that a new method produces widgets that weigh less than 20 grams.

To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 20 grams
  • H A (Alternative Hypothesis): μ ≠ 20 grams

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The engineer believes that the new method will influence widget weight, but doesn’t specify whether it will cause average weight to increase or decrease.

To test this, he uses the new method to produce 20 widgets and obtains the following information:

  • n = 20 widgets
  • x = 19.8 grams
  • s = 3.1 grams

Plugging these values into the One Sample t-test Calculator , we obtain the following results:

  • t-test statistic: -0.288525
  • two-tailed p-value: 0.776

Since the p-value is not less than .05, the engineer fails to reject the null hypothesis.

He does not have sufficient evidence to say that the true mean weight of widgets produced by the new method is different than 20 grams.

Example 2: Plant Growth

Suppose a standard fertilizer has been shown to cause a species of plants to grow by an average of 10 inches. However, one botanist believes a new fertilizer causes this species of plants to grow by an average amount different than 10 inches.

To test this, she can perform a one-tailed hypothesis test with the following null and alternative hypotheses:

  • H 0 (Null Hypothesis): μ = 10 inches
  • H A (Alternative Hypothesis): μ ≠ 10 inches

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The botanist believes that the new fertilizer will influence plant growth, but doesn’t specify whether it will cause average growth to increase or decrease.

To test this claim, she applies the new fertilizer to a simple random sample of 15 plants and obtains the following information:

  • n = 15 plants
  • x = 11.4 inches
  • s = 2.5 inches
  • t-test statistic: 2.1689
  • two-tailed p-value: 0.0478

Since the p-value is less than .05, the botanist rejects the null hypothesis.

She has sufficient evidence to conclude that the new fertilizer causes an average growth that is different than 10 inches.

Example 3: Studying Method

A professor believes that a certain studying technique will influence the mean score that her students receive on a certain exam, but she’s unsure if it will increase or decrease the mean score, which is currently 82.

To test this, she lets each student use the studying technique for one month leading up to the exam and then administers the same exam to each of the students.

She then performs a hypothesis test using the following hypotheses:

  • H 0 : μ = 82
  • H A : μ ≠ 82

This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal “≠” sign. The professor believes that the studying technique will influence the mean exam score, but doesn’t specify whether it will cause the mean score to increase or decrease.

To test this claim, the professor has 25 students use the new studying method and then take the exam. He collects the following data on the exam scores for this sample of students:

  • t-test statistic: 3.6586
  • two-tailed p-value: 0.0012

Since the p-value is less than .05, the professor rejects the null hypothesis.

She has sufficient evidence to conclude that the new studying method produces exam scores with an average score that is different than 82.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing What is a Directional Hypothesis? When Do You Reject the Null Hypothesis?

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is a Two-Tailed Test?

Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.

  • Two-Tailed Test FAQs
  • Corporate Finance
  • Financial Analysis

What Is a Two-Tailed Test? Definition and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

hypothesis test two sided

Investopedia / Joules Garcia

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

Key Takeaways

  • In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
  • It is used in null-hypothesis testing and testing for statistical significance.
  • If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
  • By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.

A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.

A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.

There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.

A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.

To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.

For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.

If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.

Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.

When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a  one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be  at least  x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being  either  higher or lower than it.

If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.

A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.

Example of a Two-Tailed Test

As a hypothetical example, imagine that a new  stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.

A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?

  • H 0 : Null Hypothesis: mean = 18
  • H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
  • Rejection region: Z <= - Z 2.5  and Z>=Z 2.5  (assuming 5% significance level, split 2.5 each on either side).
  • Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25

This calculated Z value falls between the two limits defined by: - Z 2.5  = -1.96 and Z 2.5  = 1.96.

This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.

How Is a Two-Tailed Test Designed?

A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.

What Is the Difference Between a Two-Tailed and One-Tailed Test?

A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than  or  significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.

What Is a Z-score?

A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.

San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."

hypothesis test two sided

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

FAQ: What are the differences between one-tailed and two-tailed tests?

When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output.  If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test.  However, the p-value presented is (almost always) for a two-tailed test.  But how do you choose which test?  Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output?  

What is a two-tailed test?

First let’s start with the meaning of a two-tailed test.  If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction.  This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions.  For example, we may wish to compare the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x . The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.     

What is a one-tailed test?

Next, let’s discuss the meaning of a one-tailed test.  If you are using a significance level of .05, a one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest.  This means that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction.  Let’s return to our example comparing the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x , but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05.  The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A discussion of when this is an appropriate option follows.   

When is a one-tailed test appropriate?

Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction.  Imagine you have developed a new drug that you believe is an improvement over an existing drug.  You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug.  The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.

So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical, then you can proceed with a one-tailed test. For example, imagine again that you have developed a new drug. It is cheaper than the existing drug and, you believe, no less effective.  In testing this drug, you are only interested in testing if it less effective than the existing drug.  You do not care if it is significantly more effective.  You only wish to show that it is not less effective. In this scenario, a one-tailed test would be appropriate. 

When is a one-tailed test NOT appropriate?

Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate.  Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was.  Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable–a steep price to pay for a significance star in your results table!   

Deriving a one-tailed test from two-tailed output

The default among statistical packages performing tests is to report two-tailed p-values.  Because the most commonly used test statistic distributions (standard normal, Student’s t) are symmetric about zero, most one-tailed p-values can be derived from the two-tailed p-values.   

Below, we have the output from a two-sample t-test in Stata.  The test is comparing the mean male score to the mean female score.  The null hypothesis is that the difference in means is zero.  The two-sided alternative is that the difference in means is not zero.  There are two one-sided alternatives that one could opt to test instead: that the male score is higher than the female score (diff  > 0) or that the female score is higher than the male score (diff < 0).  In this instance, Stata presents results for all three alternatives.  Under the headings Ha: diff < 0 and Ha: diff > 0 are the results for the one-tailed tests. In the middle, under the heading Ha: diff != 0 (which means that the difference is not equal to 0), are the results for the two-tailed test. 

Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999

Note that the test statistic, -3.7341, is the same for all of these tests.  The two-tailed p-value is P > |t|. This can be rewritten as P(>3.7341) + P(< -3.7341).  Because the t-distribution is symmetric about zero, these two probabilities are equal: P > |t| = 2 *  P(< -3.7341).  Thus, we can see that the two-tailed p-value is twice the one-tailed p-value for the alternative hypothesis that (diff < 0).  The other one-tailed alternative hypothesis has a p-value of P(>-3.7341) = 1-(P<-3.7341) = 1-0.0001 = 0.9999.   So, depending on the direction of the one-tailed hypothesis, its p-value is either 0.5*(two-tailed p-value) or 1-0.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero. 

In this example, the two-tailed p-value suggests rejecting the null hypothesis of no difference. Had we opted for the one-tailed test of (diff > 0), we would fail to reject the null because of our choice of tails. 

The output below is from a regression analysis in Stata.  Unlike the example above, only the two-sided p-values are presented in this output.

Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 46.58 Model | 7363.62077 2 3681.81039 Prob > F = 0.0000 Residual | 15572.5742 197 79.0486001 R-squared = 0.3210 -------------+------------------------------ Adj R-squared = 0.3142 Total | 22936.195 199 115.257261 Root MSE = 8.8909 ------------------------------------------------------------------------------ socst | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2191144 .0820323 2.67 0.008 .0573403 .3808885 math | .4778911 .0866945 5.51 0.000 .3069228 .6488594 _cons | 15.88534 3.850786 4.13 0.000 8.291287 23.47939 ------------------------------------------------------------------------------

For each regression coefficient, the tested null hypothesis is that the coefficient is equal to zero.  Thus, the one-tailed alternatives are that the coefficient is greater than zero and that the coefficient is less than zero. To get the p-value for the one-tailed test of the variable science having a coefficient greater than zero, you would divide the .008 by 2, yielding .004 because the effect is going in the predicted direction. This is P(>2.67). If you had made your prediction in the other direction (the opposite direction of the model effect), the p-value would have been 1 – .004 = .996.  This is P(<2.67). For all three p-values, the test statistic is 2.67. 

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis test two sided

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Prevent plagiarism. Run a free check.

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

hypothesis test two sided

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.2 hypothesis testing (p-value approach).

The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or equal to) \(\alpha\), then it is "unlikely." And, if the P -value is large, say more than \(\alpha\), then it is "likely."

If the P -value is less than (or equal to) \(\alpha\), then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the P -value is greater than \(\alpha\), then the null hypothesis is not rejected.

Specifically, the four steps involved in using the P -value approach to conducting any hypothesis test are:

  • Specify the null and alternative hypotheses.
  • Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. Again, to conduct the hypothesis test for the population mean μ , we use the t -statistic \(t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}\) which follows a t -distribution with n - 1 degrees of freedom.
  • Using the known distribution of the test statistic, calculate the P -value : "If the null hypothesis is true, what is the probability that we'd observe a more extreme test statistic in the direction of the alternative hypothesis than we did?" (Note how this question is equivalent to the question answered in criminal trials: "If the defendant is innocent, what is the chance that we'd observe such extreme criminal evidence?")
  • Set the significance level, \(\alpha\), the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P -value to \(\alpha\). If the P -value is less than (or equal to) \(\alpha\), reject the null hypothesis in favor of the alternative hypothesis. If the P -value is greater than \(\alpha\), do not reject the null hypothesis.

Example S.3.2.1

Mean gpa section  .

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * equaling 2.5. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right Tailed

The P -value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the probability that we would observe a test statistic greater than t * = 2.5 if the population mean \(\mu\) really were 3. Recall that probability equals the area under the probability curve. The P -value is therefore the area under a t n - 1 = t 14 curve and to the right of the test statistic t * = 2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t-distrbution graph showing the right tail beyond a t value of 2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than \(\alpha\) = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ > 3 if we lowered our willingness to make a Type I error to \(\alpha\) = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

Left Tailed

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the probability that we would observe a test statistic less than t * = -2.5 if the population mean μ really were 3. The P -value is therefore the area under a t n - 1 = t 14 curve and to the left of the test statistic t* = -2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t distribution graph showing left tail below t value of -2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ < 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean μ really was 3. That is, the two-tailed test requires taking into account the possibility that the test statistic could fall into either tail (hence the name "two-tailed" test). The P -value is, therefore, the area under a t n - 1 = t 14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually.

t-distribution graph of two tailed probability for t values of -2.5 and 2.5

Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests. The P -value, 0.0254, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0254, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ ≠ 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0254, is then greater than \(\alpha\) = 0.01.

Now that we have reviewed the critical value and P -value approach procedures for each of the three possible hypotheses, let's look at three new examples — one of a right-tailed test, one of a left-tailed test, and one of a two-tailed test.

The good news is that, whenever possible, we will take advantage of the test statistics and P -values reported in statistical software, such as Minitab, to conduct our hypothesis tests in this course.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Crit Care Med
  • v.23(Suppl 3); 2019 Sep

An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors

Priya ranganathan.

1 Department of Anesthesiology, Critical Care and Pain, Tata Memorial Hospital, Mumbai, Maharashtra, India

2 Department of Surgical Oncology, Tata Memorial Centre, Mumbai, Maharashtra, India

The second article in this series on biostatistics covers the concepts of sample, population, research hypotheses and statistical errors.

How to cite this article

Ranganathan P, Pramesh CS. An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23(Suppl 3):S230–S231.

Two papers quoted in this issue of the Indian Journal of Critical Care Medicine report. The results of studies aim to prove that a new intervention is better than (superior to) an existing treatment. In the ABLE study, the investigators wanted to show that transfusion of fresh red blood cells would be superior to standard-issue red cells in reducing 90-day mortality in ICU patients. 1 The PROPPR study was designed to prove that transfusion of a lower ratio of plasma and platelets to red cells would be superior to a higher ratio in decreasing 24-hour and 30-day mortality in critically ill patients. 2 These studies are known as superiority studies (as opposed to noninferiority or equivalence studies which will be discussed in a subsequent article).

SAMPLE VERSUS POPULATION

A sample represents a group of participants selected from the entire population. Since studies cannot be carried out on entire populations, researchers choose samples, which are representative of the population. This is similar to walking into a grocery store and examining a few grains of rice or wheat before purchasing an entire bag; we assume that the few grains that we select (the sample) are representative of the entire sack of grains (the population).

The results of the study are then extrapolated to generate inferences about the population. We do this using a process known as hypothesis testing. This means that the results of the study may not always be identical to the results we would expect to find in the population; i.e., there is the possibility that the study results may be erroneous.

HYPOTHESIS TESTING

A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the “alternate” hypothesis, and the opposite is called the “null” hypothesis; every study has a null hypothesis and an alternate hypothesis. For superiority studies, the alternate hypothesis states that one treatment (usually the new or experimental treatment) is superior to the other; the null hypothesis states that there is no difference between the treatments (the treatments are equal). For example, in the ABLE study, we start by stating the null hypothesis—there is no difference in mortality between groups receiving fresh RBCs and standard-issue RBCs. We then state the alternate hypothesis—There is a difference between groups receiving fresh RBCs and standard-issue RBCs. It is important to note that we have stated that the groups are different, without specifying which group will be better than the other. This is known as a two-tailed hypothesis and it allows us to test for superiority on either side (using a two-sided test). This is because, when we start a study, we are not 100% certain that the new treatment can only be better than the standard treatment—it could be worse, and if it is so, the study should pick it up as well. One tailed hypothesis and one-sided statistical testing is done for non-inferiority studies, which will be discussed in a subsequent paper in this series.

STATISTICAL ERRORS

There are two possibilities to consider when interpreting the results of a superiority study. The first possibility is that there is truly no difference between the treatments but the study finds that they are different. This is called a Type-1 error or false-positive error or alpha error. This means falsely rejecting the null hypothesis.

The second possibility is that there is a difference between the treatments and the study does not pick up this difference. This is called a Type 2 error or false-negative error or beta error. This means falsely accepting the null hypothesis.

The power of the study is the ability to detect a difference between groups and is the converse of the beta error; i.e., power = 1-beta error. Alpha and beta errors are finalized when the protocol is written and form the basis for sample size calculation for the study. In an ideal world, we would not like any error in the results of our study; however, we would need to do the study in the entire population (infinite sample size) to be able to get a 0% alpha and beta error. These two errors enable us to do studies with realistic sample sizes, with the compromise that there is a small possibility that the results may not always reflect the truth. The basis for this will be discussed in a subsequent paper in this series dealing with sample size calculation.

Conventionally, type 1 or alpha error is set at 5%. This means, that at the end of the study, if there is a difference between groups, we want to be 95% certain that this is a true difference and allow only a 5% probability that this difference has occurred by chance (false positive). Type 2 or beta error is usually set between 10% and 20%; therefore, the power of the study is 90% or 80%. This means that if there is a difference between groups, we want to be 80% (or 90%) certain that the study will detect that difference. For example, in the ABLE study, sample size was calculated with a type 1 error of 5% (two-sided) and power of 90% (type 2 error of 10%) (1).

Table 1 gives a summary of the two types of statistical errors with an example

Statistical errors

In the next article in this series, we will look at the meaning and interpretation of ‘ p ’ value and confidence intervals for hypothesis testing.

Source of support: Nil

Conflict of interest: None

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

K12 LibreTexts

9.3: One and Two-Sided Tests of Significance

  • Last updated
  • Save as PDF
  • Page ID 5785

How many tails are there in the test for the claim: “Less than 5% of high school students are familiar with the tails of a distribution?”

The “tails” of a test are the values outside of the critical values . In other words, the tails are the ends of the distribution, and they begin at the greatest or least value included in the alternative hypothesis (the critical values).

In the graph below, the tails are in red and the rest of the distribution is in green. The critical values of the test in the image are -8 and 28, as these are the dividers between values supporting the alternative and null hypothesis. The area in red can also be seen as the rejection region , since an observed value in this region indicates that the null hypothesis be rejected.

Screen Shot 2020-07-16 at 10.35.28 PM.png

CK-12 Foundation, Larame Spence - CK-12 Foundation - CC BY-NC-SA

There are three types of test from a “tails” standpoint:

  • A left-tailed test only has a tail on the left side of the graph:

Screen Shot 2020-07-16 at 10.42.25 PM.png

  • A right-tailed test only has a tail on the right side of the graph:

Screen Shot 2020-07-16 at 10.47.57 PM.png

  • A two-tailed test has tails on both ends of the graph. This is a test where the null hypothesis is a claim of a specific value. For example: H 0 :X=5

Screen Shot 2020-07-16 at 10.48.27 PM.png

Determining the Null Hypothesis and Kind of Test

1. A researcher claims that black horses are, on average, more than 30 lbs heavier than white horses, which average 1100 lbs. What is the null hypothesis, and what kind of test is this?

The null hypothesis would be notated H 0 :μ≤1130 lbs

This is a right-tailed test, since the tail of the graph would be on the right. Recognize that values above 1130 would indicate that the null hypothesis be rejected, and the red area represents the rejection region .

2. A package of gum claims that the flavor lasts more than 39 minutes. What would be the null hypothesis of a test to determine the validity of the claim? What sort of test is this?

The null hypothesis would by notated as H 0 :μ≤39.

This is a right-tailed test, since the rejection region would consist of values greater than 39.

3. An ice pack claims to stay cold between 35 and 65 minutes. What would be the null hypothesis of a test to determine the validity of the claim? What sort of test would it be?

The null hypothesis would be H 0 :35>μ∪μ>65.

This is a two-tailed test, since the null hypothesis contains the values above and below a given range.

Earlier Problem Revisited

How many tails are there in the test for the claim: “Less than 5% of high school students are familiar with hypothesis tails?”

This is a left-tailed test, since the null hypothesis is H0:X≥5% meaning that the rejection region contains only values below a given value.

For each question: a) State the null and alternative hypothesis for a test to determine the validity of the claim, and b) Identify left, right, or two-tailed test type.

The weatherman claims an average of 2" - 2.5" of snow throughout the country.

a) H 1 :2′′<μ<2.5′′ and H 0 :2′′>μ∪μ>2.5′′

b) This is a two-tailed test, since the rejection region is at the ends of the graph.

The average Siberian Husky weighs more than 35 pounds.

a) H 1 :μ>35 lbs and H 0 :35 lbs≥μ

b) This is a right-tailed test, since the rejection region is located on the right end of the graph.

The United Parcel Service claims less then 5-day shipping times for ground packages.

a) H 1 :μ<5 days and H 0 :μ≥5 days

b) This is a left-tailed test, since the ‘reject’ values are on the left end of the graph.

  • The average miniature horse is less than 34 inches at the shoulder. Tiny Equines claims that their miniature horses are an average of 3 inches less than the breed average.
  • Speedy Solutions is a delivery company that claims all deliveries average less than 72 hours.
  • Terrific Textiles claims that 45% to 55% of t-shirts sold in CO are red.
  • Gas Hater Car Sales claims that 70% or more of the cars on the lot average more than 40 mpg.
  • The 2011 average service time for Fast Fatz Burgers was 56 seconds. The store manager claims that the 2012 average is more than 5 seconds faster.
  • Colorful Cavity Causers sells multi-colored candies by the bag. Each bag is claimed to average between 5.5 and 6.5 oz.
  • Aaron works for an electronics company, quality-testing batteries. The company claims a minimum of 5 hrs battery life.
  • The lessons in this Probability and Statistics text average 13 practice problems each.
  • The outside temperature in Maui averages between 72 and 83 degrees year-round.
  • 72% of elementary school teachers are female.

Screen Shot 2020-07-16 at 10.59.36 PM.png

Review (Answers)

To view the Review answers, open this PDF file and look for section 10.3.

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • *New* Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

hypothesis test two sided

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

A Beginner’s Guide to Hypothesis Testing in Business

Business professionals performing hypothesis testing

  • 30 Mar 2021

Becoming a more data-driven decision-maker can bring several benefits to your organization, enabling you to identify new opportunities to pursue and threats to abate. Rather than allowing subjective thinking to guide your business strategy, backing your decisions with data can empower your company to become more innovative and, ultimately, profitable.

If you’re new to data-driven decision-making, you might be wondering how data translates into business strategy. The answer lies in generating a hypothesis and verifying or rejecting it based on what various forms of data tell you.

Below is a look at hypothesis testing and the role it plays in helping businesses become more data-driven.

Access your free e-book today.

What Is Hypothesis Testing?

To understand what hypothesis testing is, it’s important first to understand what a hypothesis is.

A hypothesis or hypothesis statement seeks to explain why something has happened, or what might happen, under certain conditions. It can also be used to understand how different variables relate to each other. Hypotheses are often written as if-then statements; for example, “If this happens, then this will happen.”

Hypothesis testing , then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

Hypothesis Testing in Business

When it comes to data-driven decision-making, there’s a certain amount of risk that can mislead a professional. This could be due to flawed thinking or observations, incomplete or inaccurate data , or the presence of unknown variables. The danger in this is that, if major strategic decisions are made based on flawed insights, it can lead to wasted resources, missed opportunities, and catastrophic outcomes.

The real value of hypothesis testing in business is that it allows professionals to test their theories and assumptions before putting them into action. This essentially allows an organization to verify its analysis is correct before committing resources to implement a broader strategy.

As one example, consider a company that wishes to launch a new marketing campaign to revitalize sales during a slow period. Doing so could be an incredibly expensive endeavor, depending on the campaign’s size and complexity. The company, therefore, may wish to test the campaign on a smaller scale to understand how it will perform.

In this example, the hypothesis that’s being tested would fall along the lines of: “If the company launches a new marketing campaign, then it will translate into an increase in sales.” It may even be possible to quantify how much of a lift in sales the company expects to see from the effort. Pending the results of the pilot campaign, the business would then know whether it makes sense to roll it out more broadly.

Related: 9 Fundamental Data Science Skills for Business Professionals

Key Considerations for Hypothesis Testing

1. alternative hypothesis and null hypothesis.

In hypothesis testing, the hypothesis that’s being tested is known as the alternative hypothesis . Often, it’s expressed as a correlation or statistical relationship between variables. The null hypothesis , on the other hand, is a statement that’s meant to show there’s no statistical relationship between the variables being tested. It’s typically the exact opposite of whatever is stated in the alternative hypothesis.

For example, consider a company’s leadership team that historically and reliably sees $12 million in monthly revenue. They want to understand if reducing the price of their services will attract more customers and, in turn, increase revenue.

In this case, the alternative hypothesis may take the form of a statement such as: “If we reduce the price of our flagship service by five percent, then we’ll see an increase in sales and realize revenues greater than $12 million in the next month.”

The null hypothesis, on the other hand, would indicate that revenues wouldn’t increase from the base of $12 million, or might even decrease.

Check out the video below about the difference between an alternative and a null hypothesis, and subscribe to our YouTube channel for more explainer content.

2. Significance Level and P-Value

Statistically speaking, if you were to run the same scenario 100 times, you’d likely receive somewhat different results each time. If you were to plot these results in a distribution plot, you’d see the most likely outcome is at the tallest point in the graph, with less likely outcomes falling to the right and left of that point.

distribution plot graph

With this in mind, imagine you’ve completed your hypothesis test and have your results, which indicate there may be a correlation between the variables you were testing. To understand your results' significance, you’ll need to identify a p-value for the test, which helps note how confident you are in the test results.

In statistics, the p-value depicts the probability that, assuming the null hypothesis is correct, you might still observe results that are at least as extreme as the results of your hypothesis test. The smaller the p-value, the more likely the alternative hypothesis is correct, and the greater the significance of your results.

3. One-Sided vs. Two-Sided Testing

When it’s time to test your hypothesis, it’s important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests , or one-tailed and two-tailed tests, respectively.

Typically, you’d leverage a one-sided test when you have a strong conviction about the direction of change you expect to see due to your hypothesis test. You’d leverage a two-sided test when you’re less confident in the direction of change.

Business Analytics | Become a data-driven leader | Learn More

4. Sampling

To perform hypothesis testing in the first place, you need to collect a sample of data to be analyzed. Depending on the question you’re seeking to answer or investigate, you might collect samples through surveys, observational studies, or experiments.

A survey involves asking a series of questions to a random population sample and recording self-reported responses.

Observational studies involve a researcher observing a sample population and collecting data as it occurs naturally, without intervention.

Finally, an experiment involves dividing a sample into multiple groups, one of which acts as the control group. For each non-control group, the variable being studied is manipulated to determine how the data collected differs from that of the control group.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

Learn How to Perform Hypothesis Testing

Hypothesis testing is a complex process involving different moving pieces that can allow an organization to effectively leverage its data and inform strategic decisions.

If you’re interested in better understanding hypothesis testing and the role it can play within your organization, one option is to complete a course that focuses on the process. Doing so can lay the statistical and analytical foundation you need to succeed.

Do you want to learn more about hypothesis testing? Explore Business Analytics —one of our online business essentials courses —and download our Beginner’s Guide to Data & Analytics .

hypothesis test two sided

About the Author

STM1001 Topic 5: Hypothesis Testing

3.2 one-sided vs two-sided tests.

In the previous section, we mentioned that the test we had carried out was a two-sided test . One-sided tests are also possible. Consider the following cases:

  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu \neq 5\)
  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu > 5\)
  • \(H_0:\mu = 5\;\;\text{versus}\;\;H_1:\mu < 5\)

Examples 2 and 3 above are referred to as 'one-sided tests' because they are only testing for extreme values in one direction. Consider the below figure, which shows the critical values (CV) required for each test:

  • The two-sided test can also be referred to as a two-tailed test , because as we can see above, there are two tails of the distribution curve for which we are interested in extreme values. Because the combined area of the shaded area must equal \(\alpha = 0.05\) , in this case, we have an area of \(\alpha / 2 = 0.5 / 2 = 0.025\) at each tail. This resulted in critical values of -1.99 and 1.99 respectively. For a two-sided test , we have: \[p\text{-value} = 2 \times P(T\geq |t|) \text{ for } T\sim t_{\text{df}}\]
  • The one-sided test (right-tailed) can also be referred to as a right-tailed test , because as we can see above, it is the right-tail of the distribution curve for which we are interested in extreme values. Because the total area of the shaded area must equal \(\alpha = 0.05\) , in this case, we simply have an area of \(\alpha = 0.5\) in the right tail. This resulted in a critical value of 1.67. For a one-sided test (right-tailed) , we have: \[p\text{-value} = P(T\geq t) \text{ for } T\sim t_{\text{df}}\]
  • The one-sided test (left-tailed) can also be referred to as a left-tailed test , because as we can see above, it is the left-tail of the distribution curve for which we are interested in extreme values. Because the total area of the shaded area must equal \(\alpha = 0.05\) , in this case, we simply have an area of \(\alpha = 0.5\) in the left tail. This resulted in a critical value of -1.67. For a one-sided test (left-tailed) , we have: \[p\text{-value} = P(T\leq t) \text{ for } T\sim t_{\text{df}}\]

Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will get practice using both two-sided and one-sided tests.

  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students greater than 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students different from 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • Suppose a sample of university students were asked the question, "In hours, what was your phone screen time yesterday?". Consider the following research question: Is the average daily phone screen time of university students less than 4 hours? Choose the correct null and alternative hypotheses from the options below: \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\) \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu > 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu \neq 4\)
  • \(H_0:\mu = 4\;\;\text{versus}\;\;H_1:\mu < 4\)
  • Instructor View

One-sided vs. two-sided tests, and data snooping

Last updated on 2024-03-12 | Edit this page

  • Why is it important to define the hypothesis before running the test?
  • What is the difference between one-sided and two-sided tests?
  • Explain the difference between one-sided and two-sided tests
  • Raise awareness of data snooping / HARKing
  • Introduce further arguments in the binom.test function

A one-sided test

Binomial null distribution with one-sided significance indicated.

What you’ve seen in the last episode was a one-sided test , which means we looked at only at one side of the distribution. In this example, we observed 9 out of 100 persons with disease, then we asked: What is the probability under the null, to observe at least 9 persons with that disease. And we rejected all the outcomes where this probability was lower than 5%. The alternative hypothesis that we then take on is that the prevalence is larger than 4%. But we could just as well have looked in the other direction and have asked: What is the probability of seeing at most the observed number of diseased under the null, and then in case of rejecting the null, we’d have accept the alternative hypothesis that the prevalence is below 4%.

Let me remind you how we initially phrased the research question and the alternative hypothesis: We wanted to test whether the prevalence is different from 4%. Well, different can be smaller or larger. So, to be fair, this is what we should actually do:

Binomial null distribution with two-sided significance indicated.

But what exactly was wrong with a one-sided test? An observation of 9 is clearly higher than the expected 4, so no need to test on the other side, right? Unfortunately: no.

Excursion: Data snooping / HARKing

What we did is called HARKing , which stands for “hypothesis after results known”, or, also data snooping. The problem is, we decided on the direction to look at after having seen at the data, and this messes with the significance level \(\alpha\) . The 5% is just not true anymore, because we spent all the 5% on one side, and we cheated by looking into the data to decide which side we want to look at. But if the null were true, then there would be a (roughly) 50:50 chance that the test group gives you an outcome that is higher or lower than the expected value. So we’d have to double the alpha, and in reality it would be around 10%. This means, assuming the null was true, we actually had an about 10% chance of falsely rejecting it.

Above it says that there is a ~50% chance of the observation being below 4, and that a one-sided test after looking into the data had a significance level of ~0.1. The numbers are approximate, because a binomial distribution has discrete numbers. This means that

  • there’s not actually an outcome where the probability of seeing an outcome as high as this or higher is exactly 5%.
  • it’s actually more likely to see an outcome below 4 ( \(p=0.43\) ), than seeing an outcome above ( \(p=0.37\) ). The numbers don’t add up to 1, because there is also the option of observing exactly 4 ( \(p=0.2\) ).

One-sided and two-sided tests in R

Many tests in R have the argument alternative , which allows you to choose the alternative hypothesis to be two.sided , greater , or less . If you don’t specify this argument, it defaults to two.sided . So it turns out, in the exercise above, you did the right thing with:

A one-sided test would be:

It’s a little bit confusing that both tests give a similar p-value here, which is again due to the distribution’s discreteness. Using different numbers will show how the p-value for the same observation is lower if you choose a one-sided test:

Challenge: one-sided test

In which of these cases would a one-sided test be OK?

  • The trial was conducted to find out whether the disease prevalence is increased due to the precondition. In case of a significant outcome, persons with the preconditions would be monitored more carefully by their doctors.
  • You want to find out whether new-born children already resemble their father. Study participants look at a photo of a baby, and two photos of adult men, one of which is the father. They have to guess which of them. The hypothesis is that new-borns resemble their fathers and the father is thus guessed >50% of the cases. There is no reason to believe that children resemble their fathers less than they resemble randomly chosen men.
  • When looking at the data (only 1 out of 100), it becomes clear that the prevalence is certainly not increased by the precondition. It might even decrease the risk. We thus test for that.

Show me the solution

In the first and second scenario, one-sided tests could be used. There are different opinions to how sparsely they should be used. For example, Whitlock and Schluter gave the second scenario as a potential use-case. They believe that a one-sided test should only be used if “the null hypothesis are inconceivable for any reason other than chance”. They would likely argue against using a one-sided test in the first scenario, because what if the disease prevalence is decreased due to the precondition? Wouldn’t we find this interesting as well?

Hypothesis Tests in R

This tutorial covers basic hypothesis testing in R.

  • Normality tests
  • Shapiro-Wilk normality test
  • Kolmogorov-Smirnov test
  • Comparing central tendencies: Tests with continuous / discrete data
  • One-sample t-test : Normally-distributed sample vs. expected mean
  • Two-sample t-test : Two normally-distributed samples
  • Wilcoxen rank sum : Two non-normally-distributed samples
  • Weighted two-sample t-test : Two continuous samples with weights
  • Comparing proportions: Tests with categorical data
  • Chi-squared goodness of fit test : Sampled frequencies of categorical values vs. expected frequencies
  • Chi-squared independence test : Two sampled frequencies of categorical values
  • Weighted chi-squared independence test : Two weighted sampled frequencies of categorical values
  • Comparing multiple groups: Tests with categorical and continuous / discrete data
  • Analysis of Variation (ANOVA) : Normally-distributed samples in groups defined by categorical variable(s)
  • Kruskal-Wallace One-Way Analysis of Variance : Nonparametric test of the significance of differences between two or more groups

Hypothesis Testing

Science is "knowledge or a system of knowledge covering general truths or the operation of general laws especially as obtained and tested through scientific method" (Merriam-Webster 2022) .

The idealized world of the scientific method is question-driven , with the collection and analysis of data determined by the formulation of research questions and the testing of hypotheses. Hypotheses are tentative assumptions about what the answers to your research questions may be.

  • Formulate questions: How can I understand some phenomenon?
  • Literature review: What does existing research say about my questions?
  • Formulate hypotheses: What do I think the answers to my questions will be?
  • Collect data: What data can I gather to test my hypothesis?
  • Test hypotheses: Does the data support my hypothesis?
  • Communicate results: Who else needs to know about this?
  • Formulate questions: Frame missing knowledge about a phenomenon as research question(s).
  • Literature review: A literature review is an investigation of what existing research says about the phenomenon you are studying. A thorough literature review is essential to identify gaps in existing knowledge you can fill, and to avoid unnecessarily duplicating existing research.
  • Formulate hypotheses: Develop possible answers to your research questions.
  • Collect data: Acquire data that supports or refutes the hypothesis.
  • Test hypotheses: Run tools to determine if the data corroborates the hypothesis.
  • Communicate results: Share your findings with the broader community that might find them useful.

While the process of knowledge production is, in practice, often more iterative than this waterfall model, the testing of hypotheses is usually a fundamental element of scientific endeavors involving quantitative data.

hypothesis test two sided

The Problem of Induction

The scientific method looks to the past or present to build a model that can be used to infer what will happen in the future. General knowledge asserts that given a particular set of conditions, a particular outcome will or is likely to occur.

The problem of induction is that we cannot be 100% certain that what we are assuming is a general principle is not, in fact, specific to the particular set of conditions when we made our empirical observations. We cannot prove that that such principles will hold true under future conditions or different locations that we have not yet experienced (Vickers 2014) .

The problem of induction is often associated with the 18th-century British philosopher David Hume . This problem is especially vexing in the study of human beings, where behaviors are a function of complex social interactions that vary over both space and time.

hypothesis test two sided

Falsification

One way of addressing the problem of induction was proposed by the 20th-century Viennese philosopher Karl Popper .

Rather than try to prove a hypothesis is true, which we cannot do because we cannot know all possible situations that will arise in the future, we should instead concentrate on falsification , where we try to find situations where a hypothesis is false. While you cannot prove your hypothesis will always be true, you only need to find one situation where the hypothesis is false to demonstrate that the hypothesis can be false (Popper 1962) .

If a hypothesis is not demonstrated to be false by a particular test, we have corroborated that hypothesis. While corroboration does not "prove" anything with 100% certainty, by subjecting a hypothesis to multiple tests that fail to demonstrate that it is false, we can have increasing confidence that our hypothesis reflects reality.

hypothesis test two sided

Null and Alternative Hypotheses

In scientific inquiry, we are often concerned with whether a factor we are considering (such as taking a specific drug) results in a specific effect (such as reduced recovery time).

To evaluate whether a factor results in an effect, we will perform an experiment and / or gather data. For example, in a clinical drug trial, half of the test subjects will be given the drug, and half will be given a placebo (something that appears to be the drug but is actually a neutral substance).

hypothesis test two sided

Because the data we gather will usually only be a portion (sample) of total possible people or places that could be affected (population), there is a possibility that the sample is unrepresentative of the population. We use a statistical test that considers that uncertainty when assessing whether an effect is associated with a factor.

  • Statistical testing begins with an alternative hypothesis (H 1 ) that states that the factor we are considering results in a particular effect. The alternative hypothesis is based on the research question and the type of statistical test being used.
  • Because of the problem of induction , we cannot prove our alternative hypothesis. However, under the concept of falsification , we can evaluate the data to see if there is a significant probability that our data falsifies our alternative hypothesis (Wilkinson 2012) .
  • The null hypothesis (H 0 ) states that the factor has no effect. The null hypothesis is the opposite of the alternative hypothesis. The null hypothesis is what we are testing when we perform a hypothesis test.

hypothesis test two sided

The output of a statistical test like the t-test is a p -value. A p -value is the probability that any effects we see in the sampled data are the result of random sampling error (chance).

  • If a p -value is greater than the significance level (0.05 for 5% significance) we fail to reject the null hypothesis since there is a significant possibility that our results falsify our alternative hypothesis.
  • If a p -value is lower than the significance level (0.05 for 5% significance) we reject the null hypothesis and have corroborated (provided evidence for) our alternative hypothesis.

The calculation and interpretation of the p -value goes back to the central limit theorem , which states that random sampling error has a normal distribution.

hypothesis test two sided

Using our example of a clinical drug trial, if the mean recovery times for the two groups are close enough together that there is a significant possibility ( p > 0.05) that the recovery times are the same (falsification), we fail to reject the null hypothesis.

hypothesis test two sided

However, if the mean recovery times for the two groups are far enough apart that the probability they are the same is under the level of significance ( p < 0.05), we reject the null hypothesis and have corroborated our alternative hypothesis.

hypothesis test two sided

Significance means that an effect is "probably caused by something other than mere chance" (Merriam-Webster 2022) .

  • The significance level (α) is the threshold for significance and, by convention, is usually 5%, 10%, or 1%, which corresponds to 95% confidence, 90% confidence, or 99% confidence, respectively.
  • A factor is considered statistically significant if the probability that the effect we see in the data is a result of random sampling error (the p -value) is below the chosen significance level.
  • A statistical test is used to evaluate whether a factor being considered is statistically significant (Gallo 2016) .

Type I vs. Type II Errors

Although we are making a binary choice between rejecting and failing to reject the null hypothesis, because we are using sampled data, there is always the possibility that the choice we have made is an error.

There are two types of errors that can occur in hypothesis testing.

  • Type I error (false positive) occurs when a low p -value causes us to reject the null hypothesis, but the factor does not actually result in the effect.
  • Type II error (false negative) occurs when a high p -value causes us to fail to reject the null hypothesis, but the factor does actually result in the effect.

The numbering of the errors reflects the predisposition of the scientific method to be fundamentally skeptical . Accepting a fact about the world as true when it is not true is considered worse than rejecting a fact about the world that actually is true.

hypothesis test two sided

Statistical Significance vs. Importance

When we fail to reject the null hypothesis, we have found information that is commonly called statistically significant . But there are multiple challenges with this terminology.

First, statistical significance is distinct from importance (NIST 2012) . For example, if sampled data reveals a statistically significant difference in cancer rates, that does not mean that the increased risk is important enough to justify expensive mitigation measures. All statistical results require critical interpretation within the context of the phenomenon being observed. People with different values and incentives can have different interpretations of whether statistically significant results are important.

Second, the use of 95% probability for defining confidence intervals is an arbitrary convention. This creates a good vs. bad binary that suggests a "finality and certitude that are rarely justified." Alternative approaches like Beyesian statistics that express results as probabilities can offer more nuanced ways of dealing with complexity and uncertainty (Clayton 2022) .

Science vs. Non-science

Not all ideas can be falsified, and Popper uses the distinction between falsifiable and non-falsifiable ideas to make a distinction between science and non-science. In order for an idea to be science it must be an idea that can be demonstrated to be false.

While Popper asserts there is still value in ideas that are not falsifiable, such ideas are not science in his conception of what science is. Such non-science ideas often involve questions of subjective values or unseen forces that are complex, amorphous, or difficult to objectively observe.

Example Data

As example data, this tutorial will use a table of anonymized individual responses from the CDC's Behavioral Risk Factor Surveillance System . The BRFSS is a "system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services" (CDC 2019) .

A CSV file with the selected variables used in this tutorial is available here and can be imported into R with read.csv() .

Guidance on how to download and process this data directly from the CDC website is available here...

Variable Types

The publicly-available BRFSS data contains a wide variety of discrete, ordinal, and categorical variables. Variables often contain special codes for non-responsiveness or missing (NA) values. Examples of how to clean these variables are given here...

The BRFSS has a codebook that gives the survey questions associated with each variable, and the way that responses are encoded in the variable values.

hypothesis test two sided

Normality Tests

Tests are commonly divided into two groups depending on whether they are built on the assumption that the continuous variable has a normal distribution.

  • Parametric tests presume a normal distribution.
  • Non-parametric tests can work with normal and non-normal distributions.

The distinction between parametric and non-parametric techniques is especially important when working with small numbers of samples (less than 40 or so) from a larger population.

The normality tests given below do not work with large numbers of values, but with many statistical techniques, violations of normality assumptions do not cause major problems when large sample sizes are used. (Ghasemi and Sahediasi 2012) .

The Shapiro-Wilk Normality Test

  • Data: A continuous or discrete sampled variable
  • R Function: shapiro.test()
  • Null hypothesis (H 0 ): The population distribution from which the sample is drawn is not normal
  • History: Samuel Sanford Shapiro and Martin Wilk (1965)

This is an example with random values from a normal distribution.

This is an example with random values from a uniform (non-normal) distribution.

The Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov is a more-generalized test than the Shapiro-Wilks test that can be used to test whether a sample is drawn from any type of distribution.

  • Data: A continuous or discrete sampled variable and a reference probability distribution
  • R Function: ks.test()
  • Null hypothesis (H 0 ): The population distribution from which the sample is drawn does not match the reference distribution
  • History: Andrey Kolmogorov (1933) and Nikolai Smirnov (1948)
  • pearson.test() The Pearson Chi-square Normality Test from the nortest library. Lower p-values (closer to 0) means to reject the reject the null hypothesis that the distribution IS normal.

Modality Tests of Samples

Comparing two central tendencies: tests with continuous / discrete data, one sample t-test (two-sided).

The one-sample t-test tests the significance of the difference between the mean of a sample and an expected mean.

  • Data: A continuous or discrete sampled variable and a single expected mean (μ)
  • Parametric (normal distributions)
  • R Function: t.test()
  • Null hypothesis (H 0 ): The means of the sampled distribution matches the expected mean.
  • History: William Sealy Gosset (1908)

t = ( Χ - μ) / (σ̂ / √ n )

  • t : The value of t used to find the p-value
  • Χ : The sample mean
  • μ: The population mean
  • σ̂: The estimate of the standard deviation of the population (usually the stdev of the sample
  • n : The sample size

T-tests should only be used when the population is at least 20 times larger than its respective sample. If the sample size is too large, the low p-value makes the insignificant look significant. .

For example, we test a hypothesis that the mean weight in IL in 2020 is different than the 2005 continental mean weight.

Walpole et al. (2012) estimated that the average adult weight in North America in 2005 was 178 pounds. We could presume that Illinois is a comparatively normal North American state that would follow the trend of both increased age and increased weight (CDC 2021) .

hypothesis test two sided

The low p-value leads us to reject the null hypothesis and corroborate our alternative hypothesis that mean weight changed between 2005 and 2020 in Illinois.

One Sample T-Test (One-Sided)

Because we were expecting an increase, we can modify our hypothesis that the mean weight in 2020 is higher than the continental weight in 2005. We can perform a one-sided t-test using the alternative="greater" parameter.

The low p-value leads us to again reject the null hypothesis and corroborate our alternative hypothesis that mean weight in 2020 is higher than the continental weight in 2005.

Note that this does not clearly evaluate whether weight increased specifically in Illinois, or, if it did, whether that was caused by an aging population or decreasingly healthy diets. Hypotheses based on such questions would require more detailed analysis of individual data.

Although we can see that the mean cancer incidence rate is higher for counties near nuclear plants, there is the possiblity that the difference in means happened by accident and the nuclear plants have nothing to do with those higher rates.

The t-test allows us to test a hypothesis. Note that a t-test does not "prove" or "disprove" anything. It only gives the probability that the differences we see between two areas happened by chance. It also does not evaluate whether there are other problems with the data, such as a third variable, or inaccurate cancer incidence rate estimates.

hypothesis test two sided

Note that this does not prove that nuclear power plants present a higher cancer risk to their neighbors. It simply says that the slightly higher risk is probably not due to chance alone. But there are a wide variety of other other related or unrelated social, environmental, or economic factors that could contribute to this difference.

Box-and-Whisker Chart

One visualization commonly used when comparing distributions (collections of numbers) is a box-and-whisker chart. The boxes show the range of values in the middle 25% to 50% to 75% of the distribution and the whiskers show the extreme high and low values.

hypothesis test two sided

Although Google Sheets does not provide the capability to create box-and-whisker charts, Google Sheets does have candlestick charts , which are similar to box-and-whisker charts, and which are normally used to display the range of stock price changes over a period of time.

This video shows how to create a candlestick chart comparing the distributions of cancer incidence rates. The QUARTILE() function gets the values that divide the distribution into four equally-sized parts. This shows that while the range of incidence rates in the non-nuclear counties are wider, the bulk of the rates are below the rates in nuclear counties, giving a visual demonstration of the numeric output of our t-test.

While categorical data can often be reduced to dichotomous data and used with proportions tests or t-tests, there are situations where you are sampling data that falls into more than two categories and you would like to make hypothesis tests about those categories. This tutorial describes a group of tests that can be used with that type of data.

Two-Sample T-Test

When comparing means of values from two different groups in your sample, a two-sample t-test is in order.

The two-sample t-test tests the significance of the difference between the means of two different samples.

  • Two normally-distributed, continuous or discrete sampled variables, OR
  • A normally-distributed continuous or sampled variable and a parallel dichotomous variable indicating what group each of the values in the first variable belong to
  • Null hypothesis (H 0 ): The means of the two sampled distributions are equal.

For example, given the low incomes and delicious foods prevalent in Mississippi, we might presume that average weight in Mississippi would be higher than in Illinois.

hypothesis test two sided

We test a hypothesis that the mean weight in IL in 2020 is less than the 2020 mean weight in Mississippi.

The low p-value leads us to reject the null hypothesis and corroborate our alternative hypothesis that mean weight in Illinois is less than in Mississippi.

While the difference in means is statistically significant, it is small (182 vs. 187), which should lead to caution in interpretation that you avoid using your analysis simply to reinforce unhelpful stigmatization.

Wilcoxen Rank Sum Test (Mann-Whitney U-Test)

The Wilcoxen rank sum test tests the significance of the difference between the means of two different samples. This is a non-parametric alternative to the t-test.

  • Data: Two continuous sampled variables
  • Non-parametric (normal or non-normal distributions)
  • R Function: wilcox.test()
  • Null hypothesis (H 0 ): For randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.
  • History: Frank Wilcoxon (1945) and Henry Mann and Donald Whitney (1947)

The test is is implemented with the wilcox.test() function.

  • When the test is performed on one sample in comparison to an expected value around which the distribution is symmetrical (μ), the test is known as a Mann-Whitney U test .
  • When the test is performed to compare two samples, the test is known as a Wilcoxon rank sum test .

For this example, we will use AVEDRNK3: During the past 30 days, on the days when you drank, about how many drinks did you drink on the average?

  • 1 - 76: Number of drinks
  • 77: Don’t know/Not sure
  • 99: Refused
  • NA: Not asked or Missing

The histogram clearly shows this to be a non-normal distribution.

hypothesis test two sided

Continuing the comparison of Illinois and Mississippi from above, we might presume that with all that warm weather and excellent food in Mississippi, they might be inclined to drink more. The means of average number of drinks per month seem to suggest that Mississippians do drink more than Illinoians.

We can test use wilcox.test() to test a hypothesis that the average amount of drinking in Illinois is different than in Mississippi. Like the t-test, the alternative can be specified as two-sided or one-sided, and for this example we will test whether the sampled Illinois value is indeed less than the Mississippi value.

The low p-value leads us to reject the null hypothesis and corroborates our hypothesis that average drinking is lower in Illinois than in Mississippi. As before, this tells us nothing about why this is the case.

Weighted Two-Sample T-Test

The downloadable BRFSS data is raw, anonymized survey data that is biased by uneven geographic coverage of survey administration (noncoverage) and lack of responsiveness from some segments of the population (nonresponse). The X_LLCPWT field (landline, cellphone weighting) is a weighting factor added by the CDC that can be assigned to each response to compensate for these biases.

The wtd.t.test() function from the weights library has a weights parameter that can be used to include a weighting factor as part of the t-test.

Comparing Proportions: Tests with Categorical Data

Chi-squared goodness of fit.

  • Tests the significance of the difference between sampled frequencies of different values and expected frequencies of those values
  • Data: A categorical sampled variable and a table of expected frequencies for each of the categories
  • R Function: chisq.test()
  • Null hypothesis (H 0 ): The relative proportions of categories in one variable are different from the expected proportions
  • History: Karl Pearson (1900)
  • Example Question: Are the voting preferences of voters in my district significantly different from the current national polls?

For example, we test a hypothesis that smoking rates changed between 2000 and 2020.

In 2000, the estimated rate of adult smoking in Illinois was 22.3% (Illinois Department of Public Health 2004) .

The variable we will use is SMOKDAY2: Do you now smoke cigarettes every day, some days, or not at all?

  • 1: Current smoker - now smokes every day
  • 2: Current smoker - now smokes some days
  • 3: Not at all
  • 7: Don't know
  • NA: Not asked or missing - NA is used for people who have never smoked

We subset only yes/no responses in Illinois and convert into a dummy variable (yes = 1, no = 0).

The listing of the table as percentages indicates that smoking rates were halved between 2000 and 2020, but since this is sampled data, we need to run a chi-squared test to make sure the difference can't be explained by the randomness of sampling.

In this case, the very low p-value leads us to reject the null hypothesis and corroborates the alternative hypothesis that smoking rates changed between 2000 and 2020.

Chi-Squared Contingency Analysis / Test of Independence

  • Tests the significance of the difference between frequencies between two different groups
  • Data: Two categorical sampled variables
  • Null hypothesis (H 0 ): The relative proportions of one variable are independent of the second variable.

We can also compare categorical proportions between two sets of sampled categorical variables.

The chi-squared test can is used to determine if two categorical variables are independent. What is passed as the parameter is a contingency table created with the table() function that cross-classifies the number of rows that are in the categories specified by the two categorical variables.

The null hypothesis with this test is that the two categories are independent. The alternative hypothesis is that there is some dependency between the two categories.

For this example, we can compare the three categories of smokers (daily = 1, occasionally = 2, never = 3) across the two categories of states (Illinois and Mississippi).

hypothesis test two sided

The low p-value leads us to reject the null hypotheses that the categories are independent and corroborates our hypotheses that smoking behaviors in the two states are indeed different.

p-value = 1.516e-09

Weighted Chi-Squared Contingency Analysis

As with the weighted t-test above, the weights library contains the wtd.chi.sq() function for incorporating weighting into chi-squared contingency analysis.

As above, the even lower p-value leads us to again reject the null hypothesis that smoking behaviors are independent in the two states.

Suppose that the Macrander campaign would like to know how partisan this election is. If people are largely choosing to vote along party lines, the campaign will seek to get their base voters out to the polls. If people are splitting their ticket, the campaign may focus their efforts more broadly.

In the example below, the Macrander campaign took a small poll of 30 people asking who they wished to vote for AND what party they most strongly affiliate with.

The output of table() shows fairly strong relationship between party affiliation and candidates. Democrats tend to vote for Macrander, while Republicans tend to vote for Stewart, while independents all vote for Miller.

This is reflected in the very low p-value from the chi-squared test. This indicates that there is a very low probability that the two categories are independent. Therefore we reject the null hypothesis.

In contrast, suppose that the poll results had showed there were a number of people crossing party lines to vote for candidates outside their party. The simulated data below uses the runif() function to randomly choose 50 party names.

The contingency table() shows no clear relationship between party affiliation and candidate. This is validated quantitatively by the chi-squared test. The fairly high p-value of 0.4018 indicates a 40% chance that the two categories are independent. Therefore, we fail to reject the null hypothesis and the campaign should focus their efforts on the broader electorate.

The warning message given by the chisq.test() function indicates that the sample size is too small to make an accurate analysis. The simulate.p.value = T parameter adds Monte Carlo simulation to the test to improve the estimation and get rid of the warning message. However, the best way to get rid of this message is to get a larger sample.

Comparing Categorical and Continuous Variables

Analysis of variation (anova).

Analysis of Variance (ANOVA) is a test that you can use when you have a categorical variable and a continuous variable. It is a test that considers variability between means for different categories as well as the variability of observations within groups.

There are a wide variety of different extensions of ANOVA that deal with covariance (ANCOVA), multiple variables (MANOVA), and both of those together (MANCOVA). These techniques can become quite complicated and also assume that the values in the continuous variables have a normal distribution.

  • Data: One or more categorical (independent) variables and one continuous (dependent) sampled variable
  • R Function: aov()
  • Null hypothesis (H 0 ): There is no difference in means of the groups defined by each level of the categorical (independent) variable
  • History: Ronald Fisher (1921)
  • Example Question: Do low-, middle- and high-income people vary in the amount of time they spend watching TV?

As an example, we look at the continuous weight variable (WEIGHT2) split into groups by the eight income categories in INCOME2: Is your annual household income from all sources?

  • 1: Less than $10,000
  • 2: $10,000 to less than $15,000
  • 3: $15,000 to less than $20,000
  • 4: $20,000 to less than $25,000
  • 5: $25,000 to less than $35,000
  • 6: $35,000 to less than $50,000
  • 7: $50,000 to less than $75,000)
  • 8: $75,000 or more

The barplot() of means does show variation among groups, although there is no clear linear relationship between income and weight.

hypothesis test two sided

To test whether this variation could be explained by randomness in the sample, we run the ANOVA test.

The low p-value leads us to reject the null hypothesis that there is no difference in the means of the different groups, and corroborates the alternative hypothesis that mean weights differ based on income group.

However, it gives us no clear model for describing that relationship and offers no insights into why income would affect weight, especially in such a nonlinear manner.

Suppose you are performing research into obesity in your city. You take a sample of 30 people in three different neighborhoods (90 people total), collecting information on health and lifestyle. Two variables you collect are height and weight so you can calculate body mass index . Although this index can be misleading for some populations (notably very athletic people), ordinary sedentary people can be classified according to BMI:

Average BMI in the US from 2007-2010 was around 28.6 and rising, standard deviation of around 5 .

You would like to know if there is a difference in BMI between different neighborhoods so you can know whether to target specific neighborhoods or make broader city-wide efforts. Since you have more than two groups, you cannot use a t-test().

Kruskal-Wallace One-Way Analysis of Variance

A somewhat simpler test is the Kruskal-Wallace test which is a nonparametric analogue to ANOVA for testing the significance of differences between two or more groups.

  • R Function: kruskal.test()
  • Null hypothesis (H 0 ): The samples come from the same distribution.
  • History: William Kruskal and W. Allen Wallis (1952)

For this example, we will investigate whether mean weight varies between the three major US urban states: New York, Illinois, and California.

hypothesis test two sided

To test whether this variation could be explained by randomness in the sample, we run the Kruskal-Wallace test.

The low p-value leads us to reject the null hypothesis that the samples come from the same distribution. This corroborates the alternative hypothesis that mean weights differ based on state.

A convienent way of visualizing a comparison between continuous and categorical data is with a box plot , which shows the distribution of a continuous variable across different groups:

hypothesis test two sided

A percentile is the level at which a given percentage of the values in the distribution are below: the 5th percentile means that five percent of the numbers are below that value.

The quartiles divide the distribution into four parts. 25% of the numbers are below the first quartile. 75% are below the third quartile. 50% are below the second quartile, making it the median.

Box plots can be used with both sampled data and population data.

The first parameter to the box plot is a formula: the continuous variable as a function of (the tilde) the second variable. A data= parameter can be added if you are using variables in a data frame.

The chi-squared test can be used to determine if two categorical variables are independent of each other.

  • FOR INSTRUCTOR
  • FOR INSTRUCTORS

8.4.3 Hypothesis Testing for the Mean

$\quad$ $H_0$: $\mu=\mu_0$, $\quad$ $H_1$: $\mu \neq \mu_0$.

$\quad$ $H_0$: $\mu \leq \mu_0$, $\quad$ $H_1$: $\mu > \mu_0$.

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$.

Two-sided Tests for the Mean:

Therefore, we can suggest the following test. Choose a threshold, and call it $c$. If $|W| \leq c$, accept $H_0$, and if $|W|>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have

  • As discussed above, we let \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} Note that, assuming $H_0$, $W \sim N(0,1)$. We will choose a threshold, $c$. If $|W| \leq c$, we accept $H_0$, and if $|W|>c$, accept $H_1$. To choose $c$, we let \begin{align} P(|W| > c \; | \; H_0) =\alpha. \end{align} Since the standard normal PDF is symmetric around $0$, we have \begin{align} P(|W| > c \; | \; H_0) = 2 P(W>c | \; H_0). \end{align} Thus, we conclude $P(W>c | \; H_0)=\frac{\alpha}{2}$. Therefore, \begin{align} c=z_{\frac{\alpha}{2}}. \end{align} Therefore, we accept $H_0$ if \begin{align} \left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \leq z_{\frac{\alpha}{2}}, \end{align} and reject it otherwise.
  • We have \begin{align} \beta (\mu) &=P(\textrm{type II error}) = P(\textrm{accept }H_0 \; | \; \mu) \\ &= P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right). \end{align} If $X_i \sim N(\mu,\sigma^2)$, then $\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$. Thus, \begin{align} \beta (\mu)&=P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right)\\ &=P\left(\mu_0- z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \overline{X} \leq \mu_0+ z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\right)\\ &=\Phi\left(z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right)-\Phi\left(-z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right). \end{align}
  • Let $S^2$ be the sample variance for this random sample. Then, the random variable $W$ defined as \begin{equation} W(X_1,X_2, \cdots, X_n)=\frac{\overline{X}-\mu_0}{S / \sqrt{n}} \end{equation} has a $t$-distribution with $n-1$ degrees of freedom, i.e., $W \sim T(n-1)$. Thus, we can repeat the analysis of Example 8.24 here. The only difference is that we need to replace $\sigma$ by $S$ and $z_{\frac{\alpha}{2}}$ by $t_{\frac{\alpha}{2},n-1}$. Therefore, we accept $H_0$ if \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}, \end{align} and reject it otherwise. Let us look at a numerical example of this case.

$\quad$ $H_0$: $\mu=170$, $\quad$ $H_1$: $\mu \neq 170$.

  • Let's first compute the sample mean and the sample standard deviation. The sample mean is \begin{align}%\label{} \overline{X}&=\frac{X_1+X_2+X_3+X_4+X_5+X_6+X_7+X_8+X_9}{9}\\ &=165.8 \end{align} The sample variance is given by \begin{align}%\label{} {S}^2=\frac{1}{9-1} \sum_{k=1}^9 (X_k-\overline{X})^2&=68.01 \end{align} The sample standard deviation is given by \begin{align}%\label{} S&= \sqrt{S^2}=8.25 \end{align} The following MATLAB code can be used to obtain these values: x=[176.2,157.9,160.1,180.9,165.1,167.2,162.9,155.7,166.2]; m=mean(x); v=var(x); s=std(x); Now, our test statistic is \begin{align} W(X_1,X_2, \cdots, X_9)&=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}\\ &=\frac{165.8-170}{8.25 / 3}=-1.52 \end{align} Thus, $|W|=1.52$. Also, we have \begin{align} t_{\frac{\alpha}{2},n-1} = t_{0.025,8} \approx 2.31 \end{align} The above value can be obtained in MATLAB using the command $\mathtt{tinv(0.975,8)}$. Thus, we conclude \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}. \end{align} Therefore, we accept $H_0$. In other words, we do not have enough evidence to conclude that the average height in the city is different from the average height in the country.

Let us summarize what we have obtained for the two-sided test for the mean.

One-sided Tests for the Mean:

  • As before, we define the test statistic as \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} If $H_0$ is true (i.e., $\mu \leq \mu_0$), we expect $\overline{X}$ (and thus $W$) to be relatively small, while if $H_1$ is true, we expect $\overline{X}$ (and thus $W$) to be larger. This suggests the following test: Choose a threshold, and call it $c$. If $W \leq c$, accept $H_0$, and if $W>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have \begin{align} P(\textrm{type I error}) &= P(\textrm{Reject }H_0 \; | \; H_0) \\ &= P(W > c \; | \; \mu \leq \mu_0) \leq \alpha. \end{align} Here, the probability of type I error depends on $\mu$. More specifically, for any $\mu \leq \mu_0$, we can write \begin{align} P(\textrm{type I error} \; | \; \mu) &= P(\textrm{Reject }H_0 \; | \; \mu) \\ &= P(W > c \; | \; \mu)\\ &=P \left(\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}+\frac{\mu-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c+\frac{\mu_0-\mu}{\sigma / \sqrt{n}} \; | \; \mu\right)\\ &\leq P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c \; | \; \mu\right) \quad (\textrm{ since }\mu \leq \mu_0)\\ &=1-\Phi(c) \quad \big(\textrm{ since given }\mu, \frac{\overline{X}-\mu}{\sigma / \sqrt{n}} \sim N(0,1) \big). \end{align} Thus, we can choose $\alpha=1-\Phi(c)$, which results in \begin{align} c=z_{\alpha}. \end{align} Therefore, we accept $H_0$ if \begin{align} \frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \leq z_{\alpha}, \end{align} and reject it otherwise.

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$,

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.29: Hypothesis Test for a Difference in Two Population Means (1 of 2)

  • Last updated
  • Save as PDF
  • Page ID 14167

Learning Objectives

  • Under appropriate conditions, conduct a hypothesis test about a difference between two population means. State a conclusion in context.

Using the Hypothesis Test for a Difference in Two Population Means

The general steps of this hypothesis test are the same as always. As expected, the details of the conditions for use of the test and the test statistic are unique to this test (but similar in many ways to what we have seen before.)

Step 1: Determine the hypotheses.

The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0 , is again a statement of “no effect” or “no difference.”

  • H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2

The alternative hypothesis, H a , can be any one of the following.

  • H a : μ 1 – μ 2 < 0, which is the same as H a : μ 1 < μ 2
  • H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2
  • H a : μ 1 – μ 2 ≠ 0, which is the same as H a : μ 1 ≠ μ 2

Step 2: Collect the data.

As usual, how we collect the data determines whether we can use it in the inference procedure. We have our usual two requirements for data collection.

  • Samples must be random to remove or minimize bias.
  • Samples must be representative of the populations in question.

We use this hypothesis test when the data meets the following conditions.

  • The two random samples are independent .
  • The variable is normally distributed in both populations . If this variable is not known, samples of more than 30 will have a difference in sample means that can be modeled adequately by the t-distribution. As we discussed in “Hypothesis Test for a Population Mean,” t-procedures are robust even when the variable is not normally distributed in the population. If checking normality in the populations is impossible, then we look at the distribution in the samples. If a histogram or dotplot of the data does not show extreme skew or outliers, we take it as a sign that the variable is not heavily skewed in the populations, and we use the inference procedure. (Note: This is the same condition we used for the one-sample t-test in “Hypothesis Test for a Population Mean.”)

Step 3: Assess the evidence.

If the conditions are met, then we calculate the t-test statistic. The t-test statistic has a familiar form.

Since the null hypothesis assumes there is no difference in the population means, the expression (μ 1 – μ 2 ) is always zero.

As we learned in “Estimating a Population Mean,” the t-distribution depends on the degrees of freedom (df) . In the one-sample and matched-pair cases df = n – 1. For the two-sample t-test, determining the correct df is based on a complicated formula that we do not cover in this course. We will either give the df or use technology to find the df . With the t-test statistic and the degrees of freedom, we can use the appropriate t-model to find the P-value, just as we did in “Hypothesis Test for a Population Mean.” We can even use the same simulation.

Step 4: State a conclusion.

To state a conclusion, we follow what we have done with other hypothesis tests. We compare our P-value to a stated level of significance.

  • If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
  • If the P-value > α, we fail to reject the null hypothesis. We do not have enough evidence to support the alternative hypothesis.

As always, we state our conclusion in context, usually by referring to the alternative hypothesis.

“Context and Calories”

Does the company you keep impact what you eat? This example comes from an article titled “Impact of Group Settings and Gender on Meals Purchased by College Students” (Allen-O’Donnell, M., T. C. Nowak, K. A. Snyder, and M. D. Cottingham, Journal of Applied Social Psychology 49(9), 2011, onlinelibrary.wiley.com/doi/10.1111/j.1559-1816.2011.00804.x/full) . In this study, researchers examined this issue in the context of gender-related theories in their field. For our purposes, we look at this research more narrowly.

Step 1: Stating the hypotheses.

In the article, the authors make the following hypothesis. “The attempt to appear feminine will be empirically demonstrated by the purchase of fewer calories by women in mixed-gender groups than by women in same-gender groups.” We translate this into a simpler and narrower research question: Do women purchase fewer calories when they eat with men compared to when they eat with women?

Here the two populations are “women eating with women” (population 1) and “women eating with men” (population 2). The variable is the calories in the meal. We test the following hypotheses at the 5% level of significance.

The null hypothesis is always H 0 : μ 1 – μ 2 = 0, which is the same as H 0 : μ 1 = μ 2 .

The alternative hypothesis H a : μ 1 – μ 2 > 0, which is the same as H a : μ 1 > μ 2 .

Here μ 1 represents the mean number of calories ordered by women when they were eating with other women, and μ 2 represents the mean number of calories ordered by women when they were eating with men.

Note: It does not matter which population we label as 1 or 2, but once we decide, we have to stay consistent throughout the hypothesis test. Since we expect the number of calories to be greater for the women eating with other women, the difference is positive if “women eating with women” is population 1. If you prefer to work with positive numbers, choose the group with the larger expected mean as population 1. This is a good general tip.

Step 2: Collect Data.

As usual, there are two major things to keep in mind when considering the collection of data.

  • Samples need to be representative of the population in question.
  • Samples need to be random in order to remove or minimize bias.

Representative Samples?

The researchers state their hypothesis in terms of “women.” We did the same. But the researchers gathered data by watching people eat at the HUB Rock Café II on the campus of Indiana University of Pennsylvania during the Spring semester of 2006. Almost all of the women in the data set were white undergraduates between the ages of 18 and 24, so there are some definite limitations on the scope of this study. These limitations will affect our conclusion (and the specific definition of the population means in our hypotheses.)

Random Samples?

The observations were collected on February 13, 2006, through February 22, 2006, between 11 a.m. and 7 p.m. We can see that the researchers included both lunch and dinner. They also made observations on all days of the week to ensure that weekly customer patterns did not confound their findings. The authors state that “since the time period for observations and the place where [they] observed students were limited, the sample was a convenience sample.” Despite these limitations, the researchers conducted inference procedures with the data, and the results were published in a reputable journal. We will also conduct inference with this data, but we also include a discussion of the limitations of the study with our conclusion. The authors did this, also.

Do the data met the conditions for use of a t-test?

The researchers reported the following sample statistics.

  • In a sample of 45 women dining with other women, the average number of calories ordered was 850, and the standard deviation was 252.
  • In a sample of 27 women dining with men, the average number of calories ordered was 719, and the standard deviation was 322.

One of the samples has fewer than 30 women. We need to make sure the distribution of calories in this sample is not heavily skewed and has no outliers, but we do not have access to a spreadsheet of the actual data. Since the researchers conducted a t-test with this data, we will assume that the conditions are met. This includes the assumption that the samples are independent.

As noted previously, the researchers reported the following sample statistics.

To compute the t-test statistic, make sure sample 1 corresponds to population 1. Here our population 1 is “women eating with other women.” So x 1 = 850, s 1 = 252, n 1 =45, and so on.

Using technology, we determined that the degrees of freedom are about 45 for this data. To find the P-value, we use our familiar simulation of the t-distribution. Since the alternative hypothesis is a “greater than” statement, we look for the area to the right of T = 1.81. The P-value is 0.0385.

The green area to the left of the t value = 0.9615. The blue area to the right of the T value = 0.0385.

Generic Conclusion

The hypotheses for this test are H 0 : μ 1 – μ 2 = 0 and H a : μ 1 – μ 2 > 0. Since the P-value is less than the significance level (0.0385 < 0.05), we reject H 0 and accept H a .

Conclusion in context

At Indiana University of Pennsylvania, the mean number of calories ordered by undergraduate women eating with other women is greater than the mean number of calories ordered by undergraduate women eating with men (P-value = 0.0385).

Comment about Conclusions

In the conclusion above, we did not generalize the findings to all women. Since the samples included only undergraduate women at one university, we included this information in our conclusion. But our conclusion is a cautious statement of the findings. The authors see the results more broadly in the context of theories in the field of social psychology. In the context of these theories, they write, “Our findings support the assertion that meal size is a tool for influencing the impressions of others. For traditional-age, predominantly White college women, diminished meal size appears to be an attempt to assert femininity in groups that include men.” This viewpoint is echoed in the following summary of the study for the general public on National Public Radio (npr.org).

  • Both men and women appear to choose larger portions when they eat with women, and both men and women choose smaller portions when they eat in the company of men, according to new research published in the Journal of Applied Social Psychology . The study, conducted among a sample of 127 college students, suggests that both men and women are influenced by unconscious scripts about how to behave in each other’s company. And these scripts change the way men and women eat when they eat together and when they eat apart.

Should we be concerned that the findings of this study are generalized in this way? Perhaps. But the authors of the article address this concern by including the following disclaimer with their findings: “While the results of our research are suggestive, they should be replicated with larger, representative samples. Studies should be done not only with primarily White, middle-class college students, but also with students who differ in terms of race/ethnicity, social class, age, sexual orientation, and so forth.” This is an example of good statistical practice. It is often very difficult to select truly random samples from the populations of interest. Researchers therefore discuss the limitations of their sampling design when they discuss their conclusions.

In the following activities, you will have the opportunity to practice parts of the hypothesis test for a difference in two population means. On the next page, the activities focus on the entire process and also incorporate technology.

National Health and Nutrition Survey

https://assessments.lumenlearning.co...sessments/3705

https://assessments.lumenlearning.co...sessments/3782

https://assessments.lumenlearning.co...sessments/3706

Contributors and Attributions

  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

IMAGES

  1. PPT

    hypothesis test two sided

  2. Two-Sided Hypothesis Test for a Proportion Part 2 (TIU Math Dept)

    hypothesis test two sided

  3. hypothesis test formula statistics

    hypothesis test two sided

  4. Hypothesis Testing In Trading

    hypothesis test two sided

  5. Two-Sided Hypothesis Test for a Proportion (TIU Math Dept)

    hypothesis test two sided

  6. Hypothesis testing tutorial using p value method

    hypothesis test two sided

VIDEO

  1. Hypothesis Test About a Population Variance

  2. Large Sample Hypothesis Tests Sample Size

  3. 5 4 A Hypothesis Test for if Two Proportions are Equal

  4. Hypothesis Test for Two Sample Proportion

  5. Two Sample Hypothesis Z Test

  6. Hypothesis test(Two sample means using Excel ||Ep-22|| ft.Nirmal Bajracharya

COMMENTS

  1. One-Tailed and Two-Tailed Hypothesis Tests Explained

    With a two-tailed hypothesis test, you'll obtain a two-sided confidence interval. The confidence interval tells us that the population mean is likely to fall between 3.372 and 4.828. ... 05, double that to 0.10 and then subtract from 1 (1 - 0.10 = 0.90). 90% is the confidence level you want to use for a two-sided test. After obtaining the ...

  2. Two-Tailed Hypothesis Tests: 3 Example Problems

    To test this, he can perform a one-tailed hypothesis test with the following null and alternative hypotheses: H 0 (Null Hypothesis): μ = 20 grams; H A (Alternative Hypothesis): μ ≠ 20 grams; This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The engineer believes that ...

  3. What Is a Two-Tailed Test? Definition and Example

    Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...

  4. 3.1: The Fundamentals of Hypothesis Testing

    This tests whether the population parameter is equal to, versus not equal to, some specific value. Ho: μ = 12 vs. H1: μ ≠ 12. The critical region is divided equally into the two tails and the critical values are ± values that define the rejection zones. Figure 3.1.1: The rejection zone for a two-sided hypothesis test.

  5. 4.4: Hypothesis Testing

    Two-sided hypothesis testing with p-values. We now consider how to compute a p-value for a two-sided test. In one-sided tests, we shade the single tail in the direction of the alternative hypothesis. For example, when the alternative had the form \(\mu\) > 7, then the p-value was represented by the upper tail (Figure 4.16). When the alternative ...

  6. FAQ: What are the differences between one-tailed and two-tailed tests?

    The null hypothesis is that the difference in means is zero. The two-sided alternative is that the difference in means is not zero. There are two one-sided alternatives that one could opt to test instead: that the male score is higher than the female score (diff > 0) or that the female score is higher than the male score (diff < 0).

  7. Hypothesis testing: One-tailed and two-tailed tests

    The alternate hypothesis for a two-sided t-test would simply state that the mean blood pressure for the medication group is different than the placebo group, but it wouldn't specify if medication would raise or lower the mean blood pressure. Typically, researchers choose to use two-sided t-tests, since they usually don't know how a ...

  8. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  9. One-tailed and two-tailed tests (video)

    Also note that the two hypothesis have to be mutually exhaustive. In other words the null hypothesis should be greater than or equal to the default mean of 1.2 seconds and not just simply equal to. Thanks! ... what can be said about the 2 sided p-value for testing the null hypothesis of no change in cholesterol levels, if on average after three ...

  10. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  11. An Introduction to Statistics: Understanding Hypothesis Testing and

    Ranganathan P, Pramesh CS. An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23(Suppl 3):S230-S231. ... This is known as a two-tailed hypothesis and it allows us to test for superiority on either side (using a two-sided test). This is because, when we start a study, we are not ...

  12. A Complete Guide to Hypothesis Testing

    from scipy import stats # one sample t-test (two-sided) stats.ttest_1samp(group1, null_mean) For a two sample t-test (where the samples are independent), H₀ is "the mean of group A is equal to the mean of group B" vs. the alternative hypothesis Ha that "the mean of group A is different/greater/less than the mean of group B". In the ...

  13. One Tailed and Two Tailed Tests, Critical Values ...

    This statistics video tutorial explains when you should use a one tailed test vs a two tailed test when solving problems associated with hypothesis testing. ...

  14. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Don't mix and match one-sided and two-sided test results. Confidence levels are determine by the significance level, not p-values. So, what you need is a two-sided 95% CI (1-alpha). You could then state the results are statistically significant and you have 95% confidence that the population effect is between X and Y. If you want a lower ...

  15. Lab 2: Intro to Hypothesis Testing

    When performing a two-sided permutation test, we calculate both one-sided \(P\)-values, multiply the smaller by 2, and if necessary round down to 1.0. Two-sided \(P\)-values are the default in statistical practice: you should perform a two-sided test unless there is a clear reason to pick a one-sided alternative hypothesis.

  16. 9.3: One and Two-Sided Tests of Significance

    two-tailed test. A two-tailed test has tails on both ends of the normal distribution curve. The null hypothesis will be in the form Ho = 5, for example. This page titled 9.3: One and Two-Sided Tests of Significance is shared under a CK-12 license and was authored, remixed, and/or curated by CK-12 Foundation via source content that was edited to ...

  17. A Beginner's Guide to Hypothesis Testing in Business

    3. One-Sided vs. Two-Sided Testing. When it's time to test your hypothesis, it's important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests, or one-tailed and two-tailed tests, respectively. Typically, you'd leverage a one-sided test when you have a strong conviction ...

  18. Hypothesis Testing and Confidence Intervals

    These two approaches always agree. The relationship between the confidence level and the significance level for a hypothesis test is as follows: Confidence level = 1 - Significance level (alpha) For example, if your significance level is 0.05, the equivalent confidence level is 95%.

  19. 3.2 One-sided vs two-sided tests

    This resulted in a critical value of -1.67. For a one-sided test (left-tailed), we have: p-value =P (T ≤ t) for T ∼ tdf p -value = P ( T ≤ t) for T ∼ t df. Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will get practice using both ...

  20. Hypothesis testing in biology: One-sided vs. two-sided tests, and data

    Many tests in R have the argument alternative, which allows you to choose the alternative hypothesis to be two.sided, greater, or less. If you don't specify this argument, it defaults to two.sided . So it turns out, in the exercise above, you did the right thing with: R. binom.test(9,100,p=0.04)

  21. Hypothesis Tests in R

    One-sample Kolmogorov-Smirnov test data: y D = 0.54413, p-value = 7.47e-09 alternative hypothesis: two-sided Comparing Two Central Tendencies: Tests with Continuous / Discrete Data One Sample T-Test (Two-Sided) The one-sample t-test tests the significance of the difference between the mean of a sample and an expected mean.

  22. Hypothesis Testing for the Mean

    In this case, the null hypothesis is a simple hypothesis and the alternative hypothesis is a two-sided hypothesis (i.e., it includes both $\mu \lt \mu_0$ and $\mu>\mu_0$). We call this hypothesis test a two-sided test. The second and the third cases are one-sided tests. More specifically, the second case is

  23. PDF Microsoft PowerPoint

    healthy population. Let's call hypothesis H 1. Here H 1 is one‐sided • If we asked: cancer is not equal to healthy H 1 would be a two‐ sidedhypothesis • Assume we have a sample of 100 cancer patients with sample mean 48 mRNAs/cell and standard deviation σ=10 mRNA/cell

  24. 10.29: Hypothesis Test for a Difference in Two Population Means (1 of 2)

    Here the two populations are "women eating with women" (population 1) and "women eating with men" (population 2). The variable is the calories in the meal. We test the following hypotheses at the 5% level of significance. The null hypothesis is always H 0: μ 1 - μ 2 = 0, which is the same as H 0: μ 1 = μ 2.

  25. PDF If testing a 2-sided hypothesis, use a 2-sided test! → for null

    If testing a 2-sided hypothesis, use a 2-sided test! Morals of the sidedness (or tail) tale: + A single, 1-sided test is fine if one has prior information and makes *a* 1-sided hypothesis. + For all other cases, use *a* 2-sided test. + A pair of 1-sided tests with FPR = α is equivalent to one 2-sided test with FPR = 2α, i.e.,