Statology

Statistics Made Easy

Two Sample t-test: Definition, Formula, and Example

A two sample t-test is used to determine whether or not two population means are equal.

This tutorial explains the following:

  • The motivation for performing a two sample t-test.
  • The formula to perform a two sample t-test.
  • The assumptions that should be met to perform a two sample t-test.
  • An example of how to perform a two sample t-test.

Two Sample t-test: Motivation

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. Since there are thousands of turtles in each population, it would be too time-consuming and costly to go around and weigh each individual turtle.

Instead, we might take a simple random sample of 15 turtles from each population and use the mean weight in each sample to determine if the mean weight is equal between the two populations:

Two sample t-test example

However, it’s virtually guaranteed that the mean weight between the two samples will be at least a little different. The question is whether or not this difference is statistically significant . Fortunately, a two sample t-test allows us to answer this question.

Two Sample t-test: Formula

A two-sample t-test always uses the following null hypothesis:

  • H 0 : μ 1  = μ 2 (the two population means are equal)

The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  • H 1 (two-tailed): μ 1  ≠ μ 2 (the two population means are not equal)
  • H 1 (left-tailed): μ 1  < μ 2  (population 1 mean is less than population 2 mean)
  • H 1 (right-tailed):  μ 1 > μ 2  (population 1 mean is greater than population 2 mean)

We use the following formula to calculate the test statistic t:

Test statistic:  ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 )

where  x 1  and  x 2 are the sample means, n 1 and n 2  are the sample sizes, and where s p is calculated as:

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)

where s 1 2  and s 2 2  are the sample variances.

If the p-value that corresponds to the test statistic t with (n 1 +n 2 -1) degrees of freedom is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis.

Two Sample t-test: Assumptions

For the results of a two sample t-test to be valid, the following assumptions should be met:

  • The observations in one sample should be independent of the observations in the other sample.
  • The data should be approximately normally distributed.
  • The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test .
  • The data in both samples was obtained using a random sampling method .

Two Sample t-test : Example

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. To test this, will perform a two sample t-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose we collect a random sample of turtles from each population with the following information:

  • Sample size n 1 = 40
  • Sample mean weight  x 1  = 300
  • Sample standard deviation s 1 = 18.5
  • Sample size n 2 = 38
  • Sample mean weight  x 2  = 305
  • Sample standard deviation s 2 = 16.7

Step 2: Define the hypotheses.

We will perform the two sample t-test with the following hypotheses:

  • H 0 :  μ 1  = μ 2 (the two population means are equal)
  • H 1 :  μ 1  ≠ μ 2 (the two population means are not equal)

Step 3: Calculate the test statistic  t .

First, we will calculate the pooled standard deviation s p :

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)  = √  (40-1)18.5 2  +  (38-1)16.7 2  /  (40+38-2)  = 17.647

Next, we will calculate the test statistic  t :

t = ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 ) =  (300-305) / 17.647(√ 1/40 + 1/38 ) =  -1.2508

Step 4: Calculate the p-value of the test statistic  t .

According to the T Score to P Value Calculator , the p-value associated with t = -1.2508 and degrees of freedom = n 1 +n 2 -2 = 40+38-2 = 76 is  0.21484 .

Step 5: Draw a conclusion.

Since this p-value is not less than our significance level α = 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the mean weight of turtles between these two populations is different.

Note:  You can also perform this entire two sample t-test by simply using the Two Sample t-test Calculator .

Additional Resources

The following tutorials explain how to perform a two-sample t-test using different statistical programs:

How to Perform a Two Sample t-test in Excel How to Perform a Two Sample t-test in SPSS How to Perform a Two Sample t-test in Stata How to Perform a Two Sample t-test in R How to Perform a Two Sample t-test in Python How to Perform a Two Sample t-test on a TI-84 Calculator

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

2 sample hypothesis testing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 12, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

The Two-Sample t -Test

What is the two-sample t -test.

The two-sample t -test (also known as the independent samples t -test) is a method used to test whether the unknown population means of two groups are equal or not.

Is this the same as an A/B test?

Yes, a two-sample t -test is used to analyze the results from A/B tests.

When can I use the test?

You can use the test when your data values are independent, are randomly sampled from two normal populations and the two independent groups have equal variances.

What if I have more than two groups?

Use a multiple comparison method. Analysis of variance (ANOVA) is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if the variances for my two groups are not equal?

You can still use the two-sample t- test. You use a different estimate of the standard deviation. 

What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data. When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

See how to perform a two-sample t -test using statistical software

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library .

Using the two-sample t -test

The sections below discuss what is needed to perform the test, checking our data, how to perform the test and statistical details.

What do we need?

For the two-sample t -test, we need two variables. One variable defines the two groups. The second variable is the measurement of interest.

We also have an idea, or hypothesis, that the means of the underlying populations for the two groups are different. Here are a couple of examples:

  • We have students who speak English as their first language and students who do not. All students take a reading test. Our two groups are the native English speakers and the non-native speakers. Our measurements are the test scores. Our idea is that the mean test scores for the underlying populations of native and non-native English speakers are not the same. We want to know if the mean score for the population of native English speakers is different from the people who learned English as a second language.
  • We measure the grams of protein in two different brands of energy bars. Our two groups are the two brands. Our measurement is the grams of protein for each energy bar. Our idea is that the mean grams of protein for the underlying populations for the two brands may be different. We want to know if we have evidence that the mean grams of protein for the two brands of energy bars is different or not.

Two-sample t -test assumptions

To conduct a valid test:

  • Data values must be independent. Measurements for one observation do not affect measurements for any other observation.
  • Data in each group must be obtained via a random sample from the population.
  • Data in each group are normally distributed .
  • Data values are continuous.
  • The variances for the two independent groups are equal.

For very small groups of data, it can be hard to test these requirements. Below, we'll discuss how to check the requirements using software and what to do when a requirement isn’t met.

Two-sample t -test example

One way to measure a person’s fitness is to measure their body fat percentage. Average body fat percentages vary by age, but according to some guidelines, the normal range for men is 15-20% body fat, and the normal range for women is 20-25% body fat.

Our sample data is from a group of men and women who did workouts at a gym three times a week for a year. Then, their trainer measured the body fat. The table below shows the data.

Table 1: Body fat percentage data grouped by gender

You can clearly see some overlap in the body fat measurements for the men and women in our sample, but also some differences. Just by looking at the data, it's hard to draw any solid conclusions about whether the underlying populations of men and women at the gym have the same mean body fat. That is the value of statistical tests – they provide a common, statistically valid way to make decisions, so that everyone makes the same decision on the same set of data values.

Checking the data

Let’s start by answering: Is the two-sample t -test an appropriate method to evaluate the difference in body fat between men and women?

  • The data values are independent. The body fat for any one person does not depend on the body fat for another person.
  • We assume the people measured represent a simple random sample from the population of members of the gym.
  • We assume the data are normally distributed, and we can check this assumption.
  • The data values are body fat measurements. The measurements are continuous.
  • We assume the variances for men and women are equal, and we can check this assumption.

Before jumping into analysis, we should always take a quick look at the data. The figure below shows histograms and summary statistics for the men and women.

Histogram and summary statistics for the body fat data

The two histograms are on the same scale. From a quick look, we can see that there are no very unusual points, or outliers . The data look roughly bell-shaped, so our initial idea of a normal distribution seems reasonable.

Examining the summary statistics, we see that the standard deviations are similar. This supports the idea of equal variances. We can also check this using a test for variances.

Based on these observations, the two-sample t -test appears to be an appropriate method to test for a difference in means.

How to perform the two-sample t -test

For each group, we need the average, standard deviation and sample size. These are shown in the table below.

Table 2: Average, standard deviation and sample size statistics grouped by gender

Without doing any testing, we can see that the averages for men and women in our samples are not the same. But how different are they? Are the averages “close enough” for us to conclude that mean body fat is the same for the larger population of men and women at the gym? Or are the averages too different for us to make this conclusion?

We'll further explain the principles underlying the two sample t -test in the statistical details section below, but let's first proceed through the steps from beginning to end. We start by calculating our test statistic. This calculation begins with finding the difference between the two averages:

$ 22.29 - 14.95 = 7.34 $

This difference in our samples estimates the difference between the population means for the two groups.

Next, we calculate the pooled standard deviation. This builds a combined estimate of the overall standard deviation. The estimate adjusts for different group sizes. First, we calculate the pooled variance:

$ s_p^2 = \frac{((n_1 - 1)s_1^2) + ((n_2 - 1)s_2^2)} {n_1 + n_2 - 2} $

$ s_p^2 = \frac{((10 - 1)5.32^2) + ((13 - 1)6.84^2)}{(10 + 13 - 2)} $

$ = \frac{(9\times28.30) + (12\times46.82)}{21} $

$ = \frac{(254.7 + 561.85)}{21} $

$ =\frac{816.55}{21} = 38.88 $

Next, we take the square root of the pooled variance to get the pooled standard deviation. This is:

$ \sqrt{38.88} = 6.24 $

We now have all the pieces for our test statistic. We have the difference of the averages, the pooled standard deviation and the sample sizes.  We calculate our test statistic as follows:

$ t = \frac{\text{difference of group averages}}{\text{standard error of difference}} = \frac{7.34}{(6.24\times \sqrt{(1/10 + 1/13)})} = \frac{7.34}{2.62} = 2.80 $

To evaluate the difference between the means in order to make a decision about our gym programs, we compare the test statistic to a theoretical value from the t- distribution. This activity involves four steps:

  • We decide on the risk we are willing to take for declaring a significant difference. For the body fat data, we decide that we are willing to take a 5% risk of saying that the unknown population means for men and women are not equal when they really are. In statistics-speak, the significance level, denoted by α, is set to 0.05. It is a good practice to make this decision before collecting the data and before calculating test statistics.
  • We calculate a test statistic. Our test statistic is 2.80.
  • We find the theoretical value from the t- distribution based on our null hypothesis which states that the means for men and women are equal. Most statistics books have look-up tables for the t- distribution. You can also find tables online. The most likely situation is that you will use software and will not use printed tables. To find this value, we need the significance level (α = 0.05) and the degrees of freedom . The degrees of freedom ( df ) are based on the sample sizes of the two groups. For the body fat data, this is: $ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $ The t value with α = 0.05 and 21 degrees of freedom is 2.080.
  • We compare the value of our statistic (2.80) to the t value. Since 2.80 > 2.080, we reject the null hypothesis that the mean body fat for men and women are equal, and conclude that we have evidence body fat in the population is different between men and women.

Statistical details

Let’s look at the body fat data and the two-sample t -test using statistical terms.

Our null hypothesis is that the underlying population means are the same. The null hypothesis is written as:

$ H_o:  \mathrm{\mu_1} =\mathrm{\mu_2} $

The alternative hypothesis is that the means are not equal. This is written as:

$ H_o:  \mathrm{\mu_1} \neq \mathrm{\mu_2} $

We calculate the average for each group, and then calculate the difference between the two averages. This is written as:

$\overline{x_1} -  \overline{x_2} $

We calculate the pooled standard deviation. This assumes that the underlying population variances are equal. The pooled variance formula is written as:

The formula shows the sample size for the first group as n 1 and the second group as n 2 . The standard deviations for the two groups are s 1 and s 2 . This estimate allows the two groups to have different numbers of observations. The pooled standard deviation is the square root of the variance and is written as s p .

What if your sample sizes for the two groups are the same? In this situation, the pooled estimate of variance is simply the average of the variances for the two groups:

$ s_p^2 = \frac{(s_1^2 + s_2^2)}{2} $

The test statistic is calculated as:

$ t = \frac{(\overline{x_1} -\overline{x_2})}{s_p\sqrt{1/n_1 + 1/n_2}} $

The numerator of the test statistic is the difference between the two group averages. It estimates the difference between the two unknown population means. The denominator is an estimate of the standard error of the difference between the two unknown population means. 

Technical Detail: For a single mean, the standard error is $ s/\sqrt{n} $  . The formula above extends this idea to two groups that use a pooled estimate for s (standard deviation), and that can have different group sizes.

We then compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the body fat data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the group sizes and are calculated as:

$ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $

The formula shows the sample size for the first group as n 1 and the second group as n 2 .  Statisticians write the t value with α = 0.05 and 21 degrees of freedom as:

$ t_{0.05,21} $

The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are two possible results from our comparison:

  • The test statistic is lower than the t value. You fail to reject the hypothesis of equal means. You conclude that the data support the assumption that the men and women have the same average body fat.
  • The test statistic is higher than the t value. You reject the hypothesis of equal means. You do not conclude that men and women have the same average body fat.

t -Test with unequal variances

When the variances for the two groups are not equal, we cannot use the pooled estimate of standard deviation. Instead, we take the standard error for each group separately. The test statistic is:

$ t = \frac{ (\overline{x_1} -  \overline{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} $

The numerator of the test statistic is the same. It is the difference between the averages of the two groups. The denominator is an estimate of the overall standard error of the difference between means. It is based on the separate standard error for each group.

The degrees of freedom calculation for the t value is more complex with unequal variances than equal variances and is usually left up to statistical software packages. The key point to remember is that if you cannot use the pooled estimate of standard deviation, then you cannot use the simple formula for the degrees of freedom.

Testing for normality

The normality assumption is more important   when the two groups have small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the body fat data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for men and women, and supports our decision.

 Normal quantile plot of the body fat measurements for men and women

You can also perform a formal test for normality using software. The figure above shows results of testing for normality with JMP software. We test each group separately. Both the test for men and the test for women show that we cannot reject the hypothesis of a normal distribution. We can go ahead with the assumption that the body fat data for men and for women are normally distributed.

Testing for unequal variances

Testing for unequal variances is complex. We won’t show the calculations in detail, but will show the results from JMP software. The figure below shows results of a test for unequal variances for the body fat data.

Test for unequal variances for the body fat data

Without diving into details of the different types of tests for unequal variances, we will use the F test. Before testing, we decide to accept a 10% risk of concluding the variances are equal when they are not. This means we have set α = 0.10.

Like most statistical software, JMP shows the p -value for a test. This is the likelihood of finding a more extreme value for the test statistic than the one observed. It’s difficult to calculate by hand. For the figure above, with the F test statistic of 1.654, the p- value is 0.4561. This is larger than our α value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In practical terms, we can go ahead with the two-sample t -test with the assumption of equal variances for the two groups.

Understanding p-values

Using a visual, you can check to see if your test statistic is a more extreme value in the distribution. The figure below shows a t- distribution with 21 degrees of freedom.

t-distribution with 21 degrees of freedom and α = .05

Since our test is two-sided and we have set α = .05, the figure shows that the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only 5% of the data overall is further out in the tails than 2.080. Because our test statistic of 2.80 is beyond the cut-off point, we reject the null hypothesis of equal means.

Putting it all together with software

The figure below shows results for the two-sample t -test for the body fat data from JMP software.

Results for the two-sample t-test from JMP software

The results for the two-sample t -test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal. Our alternative hypothesis is that the mean body fat is not equal. The one-sided tests are for one-sided alternative hypotheses – for example, for a null hypothesis that mean body fat for men is less than that for women.

We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p -value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and women are different, when they are not. It is important to make this decision before doing the statistical test.

The figure also shows the results for the t- test that does not assume equal variances. This test does not use the pooled estimate of the standard deviation. As was mentioned above, this test also has a complex formula for degrees of freedom. You can see that the degrees of freedom are 20.9888. The software shows a p- value of 0.0086. Again, with our decision of a 5% risk, we can reject the null hypothesis of equal mean body fat for men and women.

Other topics

If you have more than two independent groups, you cannot use the two-sample t- test. You should use a multiple comparison   method. ANOVA, or analysis of variance, is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if my data are not from normal distributions?

If your sample size is very small, it might be hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the body fat data, the trainer knows that the underlying distribution of body fat is normally distributed. Even for a very small sample, the trainer would likely go ahead with the t -test and assume normality.

What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use nonparametric analyses. These types of analyses do not depend on an assumption that the data values are from a specific distribution. For the two-sample t ­-test, the Wilcoxon rank sum test is a nonparametric test that could be used.

Calcworkshop

Two Sample T Test Defined w/ 7 Step-by-Step Examples!

// Last Updated: October 9, 2020 - Watch Video //

Did you know that the two sample t test is used to calculate the difference between population means?

Jenn (B.S., M.Ed.) of Calcworkshop® teaching two sample t test

Jenn, Founder Calcworkshop ® , 15+ Years Experience (Licensed & Certified Teacher)

It’s true!

Now, there 3 ways to calculate the difference between means, as listed below:

  • If the population standard deviation is known (z-test)
  • Independent samples with an un-known standard deviation (two-sample-t-test)
  • pooled variances
  • un-pooled variances
  • Matched Pair

Let’s find out more!

So how do we compare the mean of some quantitative variables for two different populations?

If our parameters of interest are the population means , then the best approach is to take random samples from both populations and compare their sample means as noted on the Engineering Statistics Handbook .

In other words , we analyze the difference between two sample means to understand the average difference between the two populations. And as always, the larger the sample size the more accurate our inferences will be.

Just like we saw with one-sample means , we will either employ a z-test or t-test depending on whether or not the population standard deviation is known or unknown .

However, there is a component we must consider, if we have independent random samples where the population standard deviation is unknown – do we pool our variances ?

When we found the difference of population proportions, we automatically pooled our variances. However, with the difference of population means, we will have to check. We do this by finding an F-statistic .

If this F-statistic is less than or equal to the critical number, then we will pool our variances. Otherwise, we will not pool.

Please note, that it is infrequent to have two independent samples with equal, or almost equal, variances — therefore, the formula for un-pooled variations is more readily accepted for most high school statistics courses.

But it is an important skill to learn and understand, so we will be working through several examples of when we need to pool variances and when we do not.

Worked Example

For example, imagine the college provost at one school said their students study more, on average than those at the neighboring school.

However, the provost at the nearby school believed the study time was the same and wants to clear up the controversy.

So, independent random samples were taken from both schools, with the results stated below. And at a 5% significance level, the following significance test is conducted.

two sample t test pooled example

Two Sample T Test Pooled Example

Notice that we pooled our variances because our F-statistic yielded a value less than our critical value. The interpretation of our results are as follows:

  • Since the p-value is greater than our significance level, we fail to reject the null hypothesis.
  • And conclude that the students at both schools, on average, study the same amount.

Matched Pairs Test

But what do we do if the populations we wish to compare are not different but the same?

Meaning, the difference between means is due to the population’s varying conditions and not due to the experimental units in the study.

When this happens, we have what is called a Matched Pairs T Test .

The great thing about a paired t test is that it becomes a one-sample t-test on the differences.

And then we will calculate the sample mean and sample standard deviation, sometimes referred to as standard error, using these difference values.

matched pairs t test formula

Matched Pairs T Test Formula

What is important to remember with any of these tests, whether it be a z-test or a two-sample t-test, our conclusions will be the same as a one-sample test.

For example, once we find out the test statistic, we then determine our p-value, and if our p-value is less than or equal to our significance level, we will reject our null hypothesis.

one sample flow chart

One Sample Flow Chart

two sample flow chart

Two Sample Flow Chart

As the flow chart demonstrates above, our first step is to decide what type of test we are conducting. Is the standard deviation known? Do we have a one sample test or a two sample test or is it matched-pair?

Then, once we have identified the test we are using, our procedure is as follows:

  • Calculate the test statistic
  • Determine our p-value
  • If our p-value is less than or equal to our significance level, we will reject our null hypothesis.
  • Otherwise we fail to reject the null hypothesis

Together, we will work through various examples of all different hypothesis tests for the difference in population means, so we become comfortable with each formula and know why and how to use them effectively.

Two Sample T Test – Lesson & Examples (Video)

1 hr 22 min

  • Introduction to Video: Two Sample Hypothesis Test for Population Means
  • 00:00:37 – How to write a two sample hypothesis test when population standard deviation is known? (Example#1)
  • Exclusive Content for Members Only
  • 00:16:35 – Construct a two sample hypothesis test when population standard deviation is known (Example #2)
  • 00:26:01 – What is a Two-Sample t-test? Pooled variances or non-pooled variances?
  • 00:28:31 – Use a two sample t-test with un-pooled variances (Example #3)
  • 00:37:48 – Create a two sample t-test and confidence interval with pooled variances (Example #4)
  • 00:51:23 – Construct a two-sample t-test (Example #5)
  • 00:59:47 – Matched Pair one sample t-test (Example #6)
  • 01:09:38 – Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7)
  • Practice Problems with Step-by-Step Solutions
  • Chapter Tests with Video Solutions

Get access to all the courses and over 450 HD videos with your subscription

Monthly and Yearly Plans Available

Get My Subscription Now

Still wondering if CalcWorkshop is right for you? Take a Tour and find out how a membership can take the struggle out of learning math.

5 Star Excellence award from Shopper Approved for collecting at least 100 5 star reviews

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.5 - hypothesis testing for two-sample proportions.

We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group.

These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing for two groups. If this starts to get a little confusion, just skim over it for a general understanding! Remember we can rely on the software to do the calculations for us, but it is good to have a basic understanding of the logic!

We will use the sampling distribution of \(\hat{p}_1-\hat{p}_2\) as we did for the confidence interval.

For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be:

\(H_0\colon p_1-p_2=0\)

Another way to look at it is \(H_0\colon p_1=p_2\). This is worth stopping to think about. Remember, in hypothesis testing, we assume the null hypothesis is true. In this case, it means that \(p_1\) and \(p_2\) are equal. Under this assumption, then \(\hat{p}_1\) and \(\hat{p}_2\) are both estimating the same proportion. Think of this proportion as \(p^*\).

Therefore, the sampling distribution of both proportions, \(\hat{p}_1\) and \(\hat{p}_2\), will, under certain conditions, be approximately normal centered around \(p^*\), with standard error \(\sqrt{\dfrac{p^*(1-p^*)}{n_i}}\), for \(i=1, 2\).

We take this into account by finding an estimate for this \(p^*\) using the two-sample proportions. We can calculate an estimate of \(p^*\) using the following formula:

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\)

This value is the total number in the desired categories \((x_1+x_2)\) from both samples over the total number of sampling units in the combined sample \((n_1+n_2)\).

Putting everything together, if we assume \(p_1=p_2\), then the sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be approximately normal with mean 0 and standard error of \(\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\), under certain conditions.

\(z^*=\dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...will follow a standard normal distribution.

Finally, we can develop our hypothesis test for \(p_1-p_2\).

Hypothesis Testing for Two-Sample Proportions

Conditions :

\(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than five

Test Statistic:

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...where \(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\).

The critical values, p-values, and decisions will all follow the same steps as those from a hypothesis test for a one-sample proportion.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 11.

  • Hypotheses for a two-sample t test

Example of hypotheses for paired and two-sample t tests

  • Writing hypotheses to test the difference of means
  • Two-sample t test for difference of means
  • Test statistic in a two-sample t test
  • P-value in a two-sample t test
  • Conclusion for a two-sample t test using a P-value
  • Conclusion for a two-sample t test using a confidence interval
  • Making conclusions about the difference of means

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Inference for Comparing 2 Population Means (HT for 2 Means, independent samples)

More of the good stuff! We will need to know how to label the null and alternative hypothesis, calculate the test statistic, and then reach our conclusion using the critical value method or the p-value method.

The Test Statistic for a Test of 2 Means from Independent Samples:

[latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}}[/latex]

What the different symbols mean:

[latex]n_1[/latex] is the sample size for the first group

[latex]n_2[/latex] is the sample size for the second group

[latex]df[/latex], the degrees of freedom, is the smaller of [latex]n_1 - 1[/latex] and [latex]n_2 - 1[/latex]

[latex]\mu_1[/latex] is the population mean from the first group

[latex]\mu_2[/latex] is the population mean from the second group

[latex]\bar{x_1}[/latex] is the sample mean for the first group

[latex]\bar{x_2}[/latex] is the sample mean for the second group

[latex]s_1[/latex] is the sample standard deviation for the first group

[latex]s_2[/latex] is the sample standard deviation for the second group

[latex]\alpha[/latex] is the significance level , usually given within the problem, or if not given, we assume it to be 5% or 0.05

Assumptions when conducting a Test for 2 Means from Independent Samples:

  • We do not know the population standard deviations, and we do not assume they are equal
  • The two samples or groups are independent
  • Both samples are simple random samples
  • Both populations are Normally distributed OR both samples are large ([latex]n_1 > 30[/latex] and [latex]n_2 > 30[/latex])

Steps to conduct the Test for 2 Means from Independent Samples:

  • Identify all the symbols listed above (all the stuff that will go into the formulas). This includes [latex]n_1[/latex] and [latex]n_2[/latex], [latex]df[/latex], [latex]\mu_1[/latex] and [latex]\mu_2[/latex], [latex]\bar{x_1}[/latex] and [latex]\bar{x_2}[/latex], [latex]s_1[/latex] and [latex]s_2[/latex], and [latex]\alpha[/latex]
  • Identify the null and alternative hypotheses
  • Calculate the test statistic, [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}}[/latex]
  • Find the critical value(s) OR the p-value OR both
  • Apply the Decision Rule
  • Write up a conclusion for the test

Example 1: Study on the effectiveness of stents for stroke patients [1]

In this study , researchers randomly assigned stroke patients to two groups: one received the current standard care (control) and the other received a stent surgery in addition to the standard care (stent treatment). If the stents work, the treatment group should have a lower average disability score . Do the results give convincing statistical evidence that the stent treatment reduces the average disability from stroke?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the patients with stent treatment and patients receiving the standard care), so we will conduct a Test of 2 Means.

  • [latex]n_1 = 98[/latex] is the sample size for the first group
  • [latex]n_2 = 93[/latex] is the sample size for the second group
  • [latex]df[/latex], the degrees of freedom, is the smaller of [latex]98 - 1 = 97[/latex] and [latex]93 - 1 = 92[/latex], so [latex]df = 92[/latex]
  • [latex]\bar{x_1} = 2.26[/latex] is the sample mean for the first group
  • [latex]\bar{x_2} = 3.23[/latex] is the sample mean for the second group
  • [latex]s_1 = 1.78[/latex] is the sample standard deviation for the first group
  • [latex]s_2 = 1.78[/latex] is the sample standard deviation for the second group
  • [latex]\alpha = 0.05[/latex] (we were not told a specific value in the problem, so we are assuming it is 5%)
  • One additional assumption we extend from the null hypothesis is that [latex]\mu_1 - \mu_2 = 0[/latex]; this means that in our formula, those variables cancel out
  • [latex]H_{0}: \mu_1 = \mu_2[/latex]
  • [latex]H_{A}: \mu_1 < \mu_2[/latex]
  • [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}} = \displaystyle \frac{(2.26 - 3.23) - 0)}{\sqrt{\displaystyle \frac{1.78^2}{98} + \displaystyle \frac{1.78^2}{93}}} = -3.76[/latex]
  • StatDisk : We can conduct this test using StatDisk. The nice thing about StatDisk is that it will also compute the test statistic. From the main menu above we click on Analysis, Hypothesis Testing, and then Mean Two Independent Samples. From there enter the 0.05 significance, along with the specific values as outlined in the picture below in Step 2. Notice the alternative hypothesis is the [latex]<[/latex] option. Enter the sample size, mean, and standard deviation for each group, and make sure that unequal variances is selected. Now we click on Evaluate. If you check the values, the test statistic is reported in the Step 3 display, as well as the P-Value of 0.00011.
  • Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.00011[/latex], which is definitely smaller than [latex]\alpha = 0.05[/latex], so we have enough evidence for the alternative hypothesis…but what does this mean?
  • Conclusion: Because our p-value  of [latex]0.00011[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing statistical evidence that the stent treatment reduces the average disability from stroke.

Example 2: Home Run Distances

In 1998, Sammy Sosa and Mark McGwire (2 players in Major League Baseball) were on pace to set a new home run record. At the end of the season McGwire ended up with 70 home runs, and Sosa ended up with 66. The home run distances were recorded and compared (sometimes a player’s home run distance is used to measure their “power”). Do the results give convincing statistical evidence that the home run distances are different from each other? Who would you say “hit the ball farther” in this comparison?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the home run distances), so we will conduct a Test of 2 Means.

  • [latex]n_1 = 70[/latex] is the sample size for the first group
  • [latex]n_2 = 66[/latex] is the sample size for the second group
  • [latex]df[/latex], the degrees of freedom, is the smaller of [latex]70 - 1 = 69[/latex] and [latex]66 - 1 = 65[/latex], so [latex]df = 65[/latex]
  • [latex]\bar{x_1} = 418.5[/latex] is the sample mean for the first group
  • [latex]\bar{x_2} = 404.8[/latex] is the sample mean for the second group
  • [latex]s_1 = 45.5[/latex] is the sample standard deviation for the first group
  • [latex]s_2 = 35.7[/latex] is the sample standard deviation for the second group
  • [latex]H_{A}: \mu_1 \neq \mu_2[/latex]
  • [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}} = \displaystyle \frac{(418.5 - 404.8) - 0)}{\sqrt{\displaystyle \frac{45.5^2}{70} + \displaystyle \frac{35.7^2}{65}}} = 1.95[/latex]
  • StatDisk : We can conduct this test using StatDisk. The nice thing about StatDisk is that it will also compute the test statistic. From the main menu above we click on Analysis, Hypothesis Testing, and then Mean Two Independent Samples. From there enter the 0.05 significance, along with the specific values as outlined in the picture below in Step 2. Notice the alternative hypothesis is the [latex]\neq[/latex] option. Enter the sample size, mean, and standard deviation for each group, and make sure that unequal variances is selected. Now we click on Evaluate. If you check the values, the test statistic is reported in the Step 3 display, as well as the P-Value of 0.05221.
  • Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.05221[/latex], which is larger than [latex]\alpha = 0.05[/latex], so we do not have enough evidence for the alternative hypothesis…but what does this mean?
  • Conclusion: Because our p-value  of [latex]0.05221[/latex] is larger than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we fail to reject [latex]H_{0}[/latex]. We do not have convincing statistical evidence that the home run distances are different.
  • Follow-up commentary: But what does this mean? There actually was a difference, right? If we take McGwire’s average and subtract Sosa’s average we get a difference of 13.7. What this result indicates is that the difference is not statistically significant; it could be due more to random chance than something meaningful. Other factors, such as sample size, could also be a determining factor (with a larger sample size, the difference may have been more meaningful).
  • Adapted from the Skew The Script curriculum ( skewthescript.org ), licensed under CC BY-NC-Sa 4.0 ↵

Basic Statistics Copyright © by Allyn Leon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.E: Hypothesis Testing with Two Samples (Exercises)

  • Last updated
  • Save as PDF
  • Page ID 1149

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

10.1: Introduction

10.2: two population means with unknown standard deviations.

Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for

  • independent group means, population standard deviations, and/or variances known
  • independent group means, population standard deviations, and/or variances unknown
  • matched or paired samples
  • single mean
  • two proportions
  • single proportion

Exercise 10.2.3

It is believed that 70% of males pass their drivers test in the first attempt, while 65% of females pass the test in the first attempt. Of interest is whether the proportions are in fact equal.

Exercise 10.2.4

A new laundry detergent is tested on consumers. Of interest is the proportion of consumers who prefer the new brand over the leading competitor. A study is done to test this.

Exercise 10.2.5

A new windshield treatment claims to repel water more effectively. Ten windshields are tested by simulating rain without the new treatment. The same windshields are then treated, and the experiment is run again. A hypothesis test is conducted.

Exercise 10.2.6

The known standard deviation in salary for all mid-level professionals in the financial industry is $11,000. Company A and Company B are in the financial industry. Suppose samples are taken of mid-level professionals from Company A and from Company B. The sample mean salary for mid-level professionals in Company A is $80,000. The sample mean salary for mid-level professionals in Company B is $96,000. Company A and Company B management want to know if their mid-level professionals are paid differently, on average.

Exercise 10.2.7

The average worker in Germany gets eight weeks of paid vacation.

Exercise 10.2.8

According to a television commercial, 80% of dentists agree that Ultrafresh toothpaste is the best on the market.

Exercise 10.2.9

It is believed that the average grade on an English essay in a particular school system for females is higher than for males. A random sample of 31 females had a mean score of 82 with a standard deviation of three, and a random sample of 25 males had a mean score of 76 with a standard deviation of four.

  • independent group means, population standard deviations and/or variances unknown

Exercise 10.2.10

The league mean batting average is 0.280 with a known standard deviation of 0.06. The Rattlers and the Vikings belong to the league. The mean batting average for a sample of eight Rattlers is 0.210, and the mean batting average for a sample of eight Vikings is 0.260. There are 24 players on the Rattlers and 19 players on the Vikings. Are the batting averages of the Rattlers and Vikings statistically different?

Exercise 10.2.11

In a random sample of 100 forests in the United States, 56 were coniferous or contained conifers. In a random sample of 80 forests in Mexico, 40 were coniferous or contained conifers. Is the proportion of conifers in the United States statistically more than the proportion of conifers in Mexico?

Exercise 10.2.12

A new medicine is said to help improve sleep. Eight subjects are picked at random and given the medicine. The means hours slept for each person were recorded before starting the medication and after.

Exercise 10.2.13

It is thought that teenagers sleep more than adults on average. A study is done to verify this. A sample of 16 teenagers has a mean of 8.9 hours slept and a standard deviation of 1.2. A sample of 12 adults has a mean of 6.9 hours slept and a standard deviation of 0.6.

Exercise 10.2.14

Varsity athletes practice five times a week, on average.

Exercise 10.2.15

A sample of 12 in-state graduate school programs at school A has a mean tuition of $64,000 with a standard deviation of $8,000. At school B, a sample of 16 in-state graduate programs has a mean of $80,000 with a standard deviation of $6,000. On average, are the mean tuitions different?

Exercise 10.2.16

A new WiFi range booster is being offered to consumers. A researcher tests the native range of 12 different routers under the same conditions. The ranges are recorded. Then the researcher uses the new WiFi range booster and records the new ranges. Does the new WiFi range booster do a better job?

Exercise 10.2.17

A high school principal claims that 30% of student athletes drive themselves to school, while 4% of non-athletes drive themselves to school. In a sample of 20 student athletes, 45% drive themselves to school. In a sample of 35 non-athlete students, 6% drive themselves to school. Is the percent of student athletes who drive themselves to school more than the percent of nonathletes?

Use the following information to answer the next three exercises: A study is done to determine which of two soft drinks has more sugar. There are 13 cans of Beverage A in a sample and six cans of Beverage B. The mean amount of sugar in Beverage A is 36 grams with a standard deviation of 0.6 grams. The mean amount of sugar in Beverage B is 38 grams with a standard deviation of 0.8 grams. The researchers believe that Beverage B has more sugar than Beverage A, on average. Both populations have normal distributions.

Exercise 10.2.18

Are standard deviations known or unknown?

Exercise 10.2.19

What is the random variable?

The random variable is the difference between the mean amounts of sugar in the two soft drinks.

Exercise 10.2.20

Is this a one-tailed or two-tailed test?

Use the following information to answer the next 12 exercises: The U.S. Center for Disease Control reports that the mean life expectancy was 47.6 years for whites born in 1900 and 33.0 years for nonwhites. Suppose that you randomly survey death records for people born in 1900 in a certain county. Of the 124 whites, the mean life span was 45.3 years with a standard deviation of 12.7 years. Of the 82 nonwhites, the mean life span was 34.1 years with a standard deviation of 15.6 years. Conduct a hypothesis test to see if the mean life spans in the county were the same for whites and nonwhites.

Exercise 10.2.21

Is this a test of means or proportions?

Exercise 10.2.22

State the null and alternative hypotheses.

  • \(H_{0}\): __________
  • \(H_{a}\): __________

Exercise 10.2.23

Is this a right-tailed, left-tailed, or two-tailed test?

Exercise 10.2.24

In symbols, what is the random variable of interest for this test?

Exercise 10.2.25

In words, define the random variable of interest for this test.

the difference between the mean life spans of whites and nonwhites

Exercise 10.2.26

Which distribution (normal or Student's t ) would you use for this hypothesis test?

Exercise 10.2.27

Explain why you chose the distribution you did for Exercise .

This is a comparison of two population means with unknown population standard deviations.

Exercise 10.2.28

Calculate the test statistic and \(p\text{-value}\).

Exercise 10.2.29

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized difference and the sample difference. Shade the area corresponding to the \(p\text{-value}\).

This is a horizontal axis with arrows at each end. The axis is labeled p'N - p'ND

  • Check student’s solution.

Exercise 10.2.30

Find the \(p\text{-value}\).

Exercise 10.2.31

At a pre-conceived \(\alpha = 0.05\), what is your:

  • Reason for the decision:
  • Conclusion (write out in a complete sentence):
  • Reject the null hypothesis
  • \(p\text{-value} < 0.05\)
  • There is not enough evidence at the 5% level of significance to support the claim that life expectancy in the 1900s is different between whites and nonwhites.

Exercise 10.2.32

Does it appear that the means are the same? Why or why not?

DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in Appendix E . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's t -distribution for a homework problem in what follows, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

The mean number of English courses taken in a two–year time period by male and female college students is believed to be about the same. An experiment is conducted and data are collected from 29 males and 16 females. The males took an average of three English courses with a standard deviation of 0.8. The females took an average of four English courses with a standard deviation of 1.0. Are the means statistically the same?

A student at a four-year college claims that mean enrollment at four–year colleges is higher than at two–year colleges in the United States. Two surveys are conducted. Of the 35 two–year colleges surveyed, the mean enrollment was 5,068 with a standard deviation of 4,777. Of the 35 four-year colleges surveyed, the mean enrollment was 5,466 with a standard deviation of 8,191.

Subscripts: 1: two-year colleges; 2: four-year colleges

  • \(H_{0}: \mu_{1} \geq \mu_{2}\)
  • \(H_{a}: \mu_{1} < \mu_{2}\)
  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean enrollments of the two-year colleges and the four-year colleges.
  • Student’s- t
  • test statistic: -0.2480
  • \(p\text{-value}: 0.4019\)
  • Alpha: 0.05
  • Decision: Do not reject
  • Reason for Decision: \(p\text{-value} > \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean enrollment at four-year colleges is higher than at two-year colleges.

At Rachel’s 11 th birthday party, eight girls were timed to see how long (in seconds) they could hold their breath in a relaxed position. After a two-minute rest, they timed themselves while jumping. The girls thought that the mean difference between their jumping and relaxed times would be zero. Test their hypothesis.

Mean entry-level salaries for college graduates with mechanical engineering degrees and electrical engineering degrees are believed to be approximately the same. A recruiting office thinks that the mean mechanical engineering salary is actually lower than the mean electrical engineering salary. The recruiting office randomly surveys 50 entry level mechanical engineers and 60 entry level electrical engineers. Their mean salaries were $46,100 and $46,700, respectively. Their standard deviations were $3,450 and $4,210, respectively. Conduct a hypothesis test to determine if you agree that the mean entry-level mechanical engineering salary is lower than the mean entry-level electrical engineering salary.

Subscripts: 1: mechanical engineering; 2: electrical engineering

  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean entry level salaries of mechanical engineers and electrical engineers.
  • \(t_{108}\)
  • test statistic: \(t = -0.82\)
  • \(p\text{-value}: 0.2061\)
  • \(\alpha: 0.05\)
  • Decision: Do not reject the null hypothesis.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean entry-level salaries of mechanical engineers is lower than that of electrical engineers.

Marketing companies have collected data implying that teenage girls use more ring tones on their cellular phones than teenage boys do. In one particular study of 40 randomly chosen teenage girls and boys (20 of each) with cellular phones, the mean number of ring tones for the girls was 3.2 with a standard deviation of 1.5. The mean for the boys was 1.7 with a standard deviation of 0.8. Conduct a hypothesis test to determine if the means are approximately the same or if the girls’ mean is higher than the boys’ mean.

Use the information from [link] to answer the next four exercises.

Using the data from Lap 1 only, conduct a hypothesis test to determine if the mean time for completing a lap in races is the same as it is in practices.

  • \(H_{0}: \mu_{1} = \mu_{2}\)

\(H_{a}: \mu_{1} \neq \mu_{2}\)

  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean times for completing a lap in races and in practices.
  • \(t_{20.32}\)
  • test statistic: –4.70
  • \(p\text{-value}: 0.0001\)
  • Decision: Reject the null hypothesis.
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean time for completing a lap in races is different from that in practices.

Repeat the test in Exercise 10.83, but use Lap 5 data this time.

Repeat the test in Exercise 10.83, but this time combine the data from Laps 1 and 5.

  • is the difference between the mean times for completing a lap in races and in practices.
  • \(t_{40.94}\)
  • test statistic: –5.08
  • \(p\text{-value}: 0\)
  • Reason for Decision: \(p\text{-value} < \alpha\)

In two to three complete sentences, explain in detail how you might use Terri Vogel’s data to answer the following question. “Does Terri Vogel drive faster in races than she does in practices?”

Use the following information to answer the next two exercises. The Eastern and Western Major League Soccer conferences have a new Reserve Division that allows new players to develop their skills. Data for a randomly picked date showed the following annual goals.

Conduct a hypothesis test to answer the next two exercises.

The exact distribution for the hypothesis test is:

  • the normal distribution
  • the Student's t -distribution
  • the uniform distribution
  • the exponential distribution

If the level of significance is 0.05, the conclusion is:

  • There is sufficient evidence to conclude that the W Division teams score fewer goals, on average, than the E teams
  • There is insufficient evidence to conclude that the W Division teams score more goals, on average, than the E teams.
  • There is insufficient evidence to conclude that the W teams score fewer goals, on average, than the E teams score.
  • Unable to determine

Suppose a statistics instructor believes that there is no significant difference between the mean class scores of statistics day students on Exam 2 and statistics night students on Exam 2. She takes random samples from each of the populations. The mean and standard deviation for 35 statistics day students were 75.86 and 16.91. The mean and standard deviation for 37 statistics night students were 75.41 and 19.73. The “day” subscript refers to the statistics day students. The “night” subscript refers to the statistics night students. A concluding statement is:

  • There is sufficient evidence to conclude that statistics night students' mean on Exam 2 is better than the statistics day students' mean on Exam 2.
  • There is insufficient evidence to conclude that the statistics day students' mean on Exam 2 is better than the statistics night students' mean on Exam 2.
  • There is insufficient evidence to conclude that there is a significant difference between the means of the statistics day students and night students on Exam 2.
  • There is sufficient evidence to conclude that there is a significant difference between the means of the statistics day students and night students on Exam 2.

Researchers interviewed street prostitutes in Canada and the United States. The mean age of the 100 Canadian prostitutes upon entering prostitution was 18 with a standard deviation of six. The mean age of the 130 United States prostitutes upon entering prostitution was 20 with a standard deviation of eight. Is the mean age of entering prostitution in Canada lower than the mean age in the United States? Test at a 1% significance level.

Test: two independent sample means, population standard deviations unknown.

Random variable:

\[\bar{X}_{1} - \bar{X}_{2}\]

Distribution: \(H_{0}: \mu_{1} = \mu_{2} H_{a}: \mu_{1} < \mu_{2}\) The mean age of entering prostitution in Canada is lower than the mean age in the United States.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the left of zero extends from the axis to the curve. The region under the curve to the left of the line is shaded representing p-value = 0.0157.

Graph: left-tailed

\(p\text{-value}: 0.0151\)

Decision: Do not reject \(H_{0}\).

Conclusion: At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that the mean age of entering prostitution in Canada is lower than the mean age in the United States.

A powder diet is tested on 49 people, and a liquid diet is tested on 36 different people. Of interest is whether the liquid diet yields a higher mean weight loss than the powder diet. The powder diet group had a mean weight loss of 42 pounds with a standard deviation of 12 pounds. The liquid diet group had a mean weight loss of 45 pounds with a standard deviation of 14 pounds.

Suppose a statistics instructor believes that there is no significant difference between the mean class scores of statistics day students on Exam 2 and statistics night students on Exam 2. She takes random samples from each of the populations. The mean and standard deviation for 35 statistics day students were 75.86 and 16.91, respectively. The mean and standard deviation for 37 statistics night students were 75.41 and 19.73. The “day” subscript refers to the statistics day students. The “night” subscript refers to the statistics night students. An appropriate alternative hypothesis for the hypothesis test is:

  • \(\mu_{day} > \mu_{night}\)
  • \(\mu_{day} < \mu_{night}\)
  • \(\mu_{day} = \mu_{night}\)
  • \(\mu_{day} \neq \mu_{night}\)

10.3: Two Population Means with Known Standard Deviations

Use the following information to answer the next five exercises. The mean speeds of fastball pitches from two different baseball pitchers are to be compared. A sample of 14 fastball pitches is measured from each pitcher. The populations have normal distributions. Table shows the result. Scouters believe that Rodriguez pitches a speedier fastball.

Exercise 10.3.2

The difference in mean speeds of the fastball pitches of the two pitchers

Exercise 10.3.3

Exercise 10.3.4

What is the test statistic?

Exercise 10.3.5

What is the \(p\text{-value}\)?

Exercise 10.3.6

At the 1% significance level, we can reject the null hypothesis. There is sufficient data to conclude that the mean speed of Rodriguez’s fastball is faster than Wesley’s.

Use the following information to answer the next five exercises. A researcher is testing the effects of plant food on plant growth. Nine plants have been given the plant food. Another nine plants have not been given the plant food. The heights of the plants are recorded after eight weeks. The populations have normal distributions. The following table is the result. The researcher thinks the food makes the plants grow taller.

Exercise 10.3.7

Is the population standard deviation known or unknown?

Exercise 10.3.8

Subscripts: 1 = Food, 2 = No Food

  • \(H_{a}: \mu_{1} > \mu_{2}\)

Exercise 10.3.9

Exercise 10.3.10

Draw the graph of the \(p\text{-value}\).

This is a normal distribution curve with mean equal to zero. The values 0 and 0.1 are labeled on the horiztonal axis. A vertical line extends from 0.1 to the curve. The region under the curve to the right of the line is shaded to represent p-value = 0.0198.

Exercise 10.3.11

At the 1% significance level, what is your conclusion?

Use the following information to answer the next five exercises. Two metal alloys are being considered as material for ball bearings. The mean melting point of the two alloys is to be compared. 15 pieces of each metal are being tested. Both populations have normal distributions. The following table is the result. It is believed that Alloy Zeta has a different melting point.

Exercise 10.3.12

Subscripts: 1 = Gamma, 2 = Zeta

Exercise 10.3.13

Is this a right-, left-, or two-tailed test?

Exercise 10.3.14

Exercise 10.3.15

Exercise 10.3.16

There is sufficient evidence to reject the null hypothesis. The data support that the melting point for Alloy Zeta is different from the melting point of Alloy Gamma.

DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's t -distribution for one of the following homework problems, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

A study is done to determine if students in the California state university system take longer to graduate, on average, than students enrolled in private universities. One hundred students from both the California state university system and private universities are surveyed. Suppose that from years of research, it is known that the population standard deviations are 1.5811 years and 1 year, respectively. The following data are collected. The California state university system students took on average 4.5 years with a standard deviation of 0.8. The private university students took on average 4.1 years with a standard deviation of 0.3.

Parents of teenage boys often complain that auto insurance costs more, on average, for teenage boys than for teenage girls. A group of concerned parents examines a random sample of insurance bills. The mean annual cost for 36 teenage boys was $679. For 23 teenage girls, it was $559. From past years, it is known that the population standard deviation for each group is $180. Determine whether or not you believe that the mean cost for auto insurance for teenage boys is greater than that for teenage girls.

Subscripts: 1 = boys, 2 = girls

  • \(H_{0}: \mu_{1} \leq \mu_{2}\)
  • The random variable is the difference in the mean auto insurance costs for boys and girls.
  • test statistic: \(z = 2.50\)
  • \(p\text{-value}: 0.0062\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean cost of auto insurance for teenage boys is greater than that for girls.

A group of transfer bound students wondered if they will spend the same mean amount on texts and supplies each year at their four-year university as they have at their community college. They conducted a random survey of 54 students at their community college and 66 students at their local four-year university. The sample means were $947 and $1,011, respectively. The population standard deviations are known to be $254 and $87, respectively. Conduct a hypothesis test to determine if the means are statistically the same.

Some manufacturers claim that non-hybrid sedan cars have a lower mean miles-per-gallon (mpg) than hybrid ones. Suppose that consumers test 21 hybrid sedans and get a mean of 31 mpg with a standard deviation of seven mpg. Thirty-one non-hybrid sedans get a mean of 22 mpg with a standard deviation of four mpg. Suppose that the population standard deviations are known to be six and three, respectively. Conduct a hypothesis test to evaluate the manufacturers claim.

Subscripts: 1 = non-hybrid sedans, 2 = hybrid sedans

  • The random variable is the difference in the mean miles per gallon of non-hybrid sedans and hybrid sedans.
  • test statistic: 6.36
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean miles per gallon of non-hybrid sedans is less than that of hybrid sedans.

A baseball fan wanted to know if there is a difference between the number of games played in a World Series when the American League won the series versus when the National League won the series. From 1922 to 2012, the population standard deviation of games won by the American League was 1.14, and the population standard deviation of games won by the National League was 1.11. Of 19 randomly selected World Series games won by the American League, the mean number of games won was 5.76. The mean number of 17 randomly selected games won by the National League was 5.42. Conduct a hypothesis test.

One of the questions in a study of marital satisfaction of dual-career couples was to rate the statement “I’m pleased with the way we divide the responsibilities for childcare.” The ratings went from one (strongly agree) to five (strongly disagree). Table contains ten of the paired responses for husbands and wives. Conduct a hypothesis test to see if the mean difference in the husband’s versus the wife’s satisfaction level is negative (meaning that, within the partnership, the husband is happier than the wife).

  • \(H_{0}: \mu_{d} = 0\)

\(H_{a}: \mu_{d} < 0\)

  • The random variable \(X_{d}\) is the average difference between husband’s and wife’s satisfaction level.
  • test statistic: \(t = –1.86\)
  • \(p\text{-value}: 0.0479\)
  • Check student’s solution
  • Decision: Reject the null hypothesis, but run another test.
  • Conclusion: This is a weak test because alpha and the p -value are close. However, there is insufficient evidence to conclude that the mean difference is negative.

10.4: Comparing Two Independent Population Proportions

Use the following information for the next five exercises. Two types of phone operating system are being tested to determine if there is a difference in the proportions of system failures (crashes). Fifteen out of a random sample of 150 phones with OS 1 had system failures within the first eight hours of operation. Nine out of another random sample of 150 phones with OS 2 had system failures within the first eight hours of operation. OS 2 is believed to be more stable (have fewer crashes) than OS 1 .

Exercise 10.4.2

Exercise 10.4.3

\(P'_{OS_{1}} - P'_{OS_{2}} =\) difference in the proportions of phones that had system failures within the first eight hours of operation with OS 1 and OS 2 .

Exercise 10.4.4

Exercise 10.4.5

Exercise 10.4.6

What can you conclude about the two operating systems?

Use the following information to answer the next twelve exercises. In the recent Census, three percent of the U.S. population reported being of two or more races. However, the percent varies tremendously from state to state. Suppose that two random surveys are conducted. In the first random survey, out of 1,000 North Dakotans, only nine people reported being of two or more races. In the second random survey, out of 500 Nevadans, 17 people reported being of two or more races. Conduct a hypothesis test to determine if the population percents are the same for the two states or if the percent for Nevada is statistically higher than for North Dakota.

Exercise 10.4.7

proportions

Exercise 10.4.8

  • \(H_{0}\): _________
  • \(H_{a}\): _________

Exercise 10.4.9

Is this a right-tailed, left-tailed, or two-tailed test? How do you know?

right-tailed

Exercise 10.4.10

What is the random variable of interest for this test?

Exercise 10.4.11

In words, define the random variable for this test.

The random variable is the difference in proportions (percents) of the populations that are of two or more races in Nevada and North Dakota.

Exercise 10.4.12

Exercise 10.4.13

Explain why you chose the distribution you did for the Exercise 10.56 .

Our sample sizes are much greater than five each, so we use the normal for two proportions distribution for this hypothesis test.

Exercise 10.4.14

Calculate the test statistic.

Exercise 10.4.15

Sketch a graph of the situation. Mark the hypothesized difference and the sample difference. Shade the area corresponding to the \(p\text{-value}\).

This is a horizontal axis with arrows at each end. The axis is labeled p'N - p'ND

Exercise 10.4.16

Exercise 10.4.17

  • Reject the null hypothesis.
  • \(p\text{-value} < \alpha\)
  • At the 5% significance level, there is sufficient evidence to conclude that the proportion (percent) of the population that is of two or more races in Nevada is statistically higher than that in North Dakota.

Exercise 10.4.18

Does it appear that the proportion of Nevadans who are two or more races is higher than the proportion of North Dakotans? Why or why not?

If you are using a Student's t -distribution for one of the following homework problems, including for paired data, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A recent drug survey showed an increase in the use of drugs and alcohol among local high school seniors as compared to the national percent. Suppose that a survey of 100 local seniors and 100 national seniors is conducted to see if the proportion of drug and alcohol use is higher locally than nationally. Locally, 65 seniors reported using drugs or alcohol within the past month, while 60 national seniors reported using them.

We are interested in whether the proportions of female suicide victims for ages 15 to 24 are the same for the whites and the blacks races in the United States. We randomly pick one year, 1992, to compare the races. The number of suicides estimated in the United States in 1992 for white females is 4,930. Five hundred eighty were aged 15 to 24. The estimate for black females is 330. Forty were aged 15 to 24. We will let female suicide victims be our population.

  • \(H_{0}: P_{W} = P_{B}\)
  • \(H_{a}: P_{W} \neq P_{B}\)
  • The random variable is the difference in the proportions of white and black suicide victims, aged 15 to 24.
  • normal for two proportions
  • test statistic: –0.1944
  • \(p\text{-value}: 0.8458\)
  • Reason for decision: \(p\text{-value} > \alpha\)
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the proportions of white and black female suicide victims, aged 15 to 24, are different.

Elizabeth Mjelde, an art history professor, was interested in whether the value from the Golden Ratio formula, \(\left(\frac{(larger + smaller dimension}{larger dimension}\right)\) was the same in the Whitney Exhibit for works from 1900 to 1919 as for works from 1920 to 1942. Thirty-seven early works were sampled, averaging 1.74 with a standard deviation of 0.11. Sixty-five of the later works were sampled, averaging 1.746 with a standard deviation of 0.1064. Do you think that there is a significant difference in the Golden Ratio calculation?

A recent year was randomly picked from 1985 to the present. In that year, there were 2,051 Hispanic students at Cabrillo College out of a total of 12,328 students. At Lake Tahoe College, there were 321 Hispanic students out of a total of 2,441 students. In general, do you think that the percent of Hispanic students at the two colleges is basically the same or different?

Subscripts: 1 = Cabrillo College, 2 = Lake Tahoe College

  • \(H_{0}: p_{1} = p_{2}\)
  • \(H_{a}: p_{1} \neq p_{2}\)
  • The random variable is the difference between the proportions of Hispanic students at Cabrillo College and Lake Tahoe College.
  • test statistic: 4.29
  • \(p\text{-value}: 0.00002\)
  • Reason for decision: p -value < alpha
  • Conclusion: There is sufficient evidence to conclude that the proportions of Hispanic students at Cabrillo College and Lake Tahoe College are different.

Use the following information to answer the next three exercises. Neuroinvasive West Nile virus is a severe disease that affects a person’s nervous system . It is spread by the Culex species of mosquito. In the United States in 2010 there were 629 reported cases of neuroinvasive West Nile virus out of a total of 1,021 reported cases and there were 486 neuroinvasive reported cases out of a total of 712 cases reported in 2011. Is the 2011 proportion of neuroinvasive West Nile virus cases more than the 2010 proportion of neuroinvasive West Nile virus cases? Using a 1% level of significance, conduct an appropriate hypothesis test.

  • “2011” subscript: 2011 group.
  • “2010” subscript: 2010 group
  • a test of two proportions
  • a test of two independent means
  • a test of a single mean
  • a test of matched pairs.

An appropriate null hypothesis is:

  • \(p_{2011} \leq p_{2010}\)
  • \(p_{2011} \geq p_{2010}\)
  • \(\mu_{2011} \leq \mu_{2010}\)
  • \(p_{2011} > p_{2010}\)

The \(p\text{-value}\) is 0.0022. At a 1% level of significance, the appropriate conclusion is

  • There is sufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is less than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is insufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is more than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is insufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is less than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is sufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is more than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.

Researchers conducted a study to find out if there is a difference in the use of eReaders by different age groups. Randomly selected participants were divided into two age groups. In the 16- to 29-year-old group, 7% of the 628 surveyed use eReaders, while 11% of the 2,309 participants 30 years old and older use eReaders.

Test: two independent sample proportions.

Random variable: \(p′_{1} - p′_{2}\)

Distribution:

The proportion of eReader users is different for the 16- to 29-year-old users from that of the 30 and older users.

Graph: two-tailed

This is a normal distribution curve with mean equal to zero. Both the right and left tails of the curve are shaded. Each tail represents 1/2(p-value) = 0.0017.

\(p\text{-value}: 0.0033\)

Conclusion: At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that the proportion of eReader users 16 to 29 years old is different from the proportion of eReader users 30 and older.

are considered obese if their body mass index (BMI) is at least 30. The researchers wanted to determine if the proportion of women who are obese in the south is less than the proportion of southern men who are obese. The results are shown in Table . Test at the 1% level of significance.

Two computer users were discussing tablet computers. A higher proportion of people ages 16 to 29 use tablets than the proportion of people age 30 and older. Table details the number of tablet owners for each age group. Test at the 1% level of significance.

Test: two independent sample proportions

  • \(H_{a}: p_{1} > p_{2}\)

A higher proportion of tablet owners are aged 16 to 29 years old than are 30 years old and older.

Graph: right-tailed

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.2354.

\(p\text{-value}: 0.2354\)

Decision: Do not reject the \(H_{0}\).

Conclusion: At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that a higher proportion of tablet owners are aged 16 to 29 years old than are 30 years old and older.

A group of friends debated whether more men use smartphones than women. They consulted a research study of smartphone use among adults. The results of the survey indicate that of the 973 men randomly sampled, 379 use smartphones. For women, 404 of the 1,304 who were randomly sampled use smartphones. Test at the 5% level of significance.

While her husband spent 2½ hours picking out new speakers, a statistician decided to determine whether the percent of men who enjoy shopping for electronic equipment is higher than the percent of women who enjoy shopping for electronic equipment. The population was Saturday afternoon shoppers. Out of 67 men, 24 said they enjoyed the activity. Eight of the 24 women surveyed claimed to enjoy the activity. Interpret the results of the survey.

Subscripts: 1: men; 2: women

  • \(H_{0}: p_{1} \leq p_{2}\)
  • \(P'_{1} - P\_{2}\) is the difference between the proportions of men and women who enjoy shopping for electronic equipment.
  • test statistic: 0.22
  • \(p\text{-value}: 0.4133\)
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the proportion of men who enjoy shopping for electronic equipment is more than the proportion of women.

We are interested in whether children’s educational computer software costs less, on average, than children’s entertainment software. Thirty-six educational software titles were randomly picked from a catalog. The mean cost was $31.14 with a standard deviation of $4.69. Thirty-five entertainment software titles were randomly picked from the same catalog. The mean cost was $33.86 with a standard deviation of $10.87. Decide whether children’s educational software costs less, on average, than children’s entertainment software.

Joan Nguyen recently claimed that the proportion of college-age males with at least one pierced ear is as high as the proportion of college-age females. She conducted a survey in her classes. Out of 107 males, 20 had at least one pierced ear. Out of 92 females, 47 had at least one pierced ear. Do you believe that the proportion of males has reached the proportion of females?

  • \(P'_{1} - P\_{2}\) is the difference between the proportions of men and women that have at least one pierced ear.
  • test statistic: –4.82
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportions of males and females with at least one pierced ear is different.

Use the data sets found in [link] to answer this exercise. Is the proportion of race laps Terri completes slower than 130 seconds less than the proportion of practice laps she completes slower than 135 seconds?

"To Breakfast or Not to Breakfast?" by Richard Ayore

In the American society, birthdays are one of those days that everyone looks forward to. People of different ages and peer groups gather to mark the 18th, 20th, …, birthdays. During this time, one looks back to see what he or she has achieved for the past year and also focuses ahead for more to come.

If, by any chance, I am invited to one of these parties, my experience is always different. Instead of dancing around with my friends while the music is booming, I get carried away by memories of my family back home in Kenya. I remember the good times I had with my brothers and sister while we did our daily routine.

Every morning, I remember we went to the shamba (garden) to weed our crops. I remember one day arguing with my brother as to why he always remained behind just to join us an hour later. In his defense, he said that he preferred waiting for breakfast before he came to weed. He said, “This is why I always work more hours than you guys!”

And so, to prove him wrong or right, we decided to give it a try. One day we went to work as usual without breakfast, and recorded the time we could work before getting tired and stopping. On the next day, we all ate breakfast before going to work. We recorded how long we worked again before getting tired and stopping. Of interest was our mean increase in work time. Though not sure, my brother insisted that it was more than two hours. Using the data in Table , solve our problem.

  • \(H_{a}: \mu_{d} > 0\)
  • The random variable \(X_{d}\) is the mean difference in work times on days when eating breakfast and on days when not eating breakfast.
  • test statistic: 4.8963

\(p\text{-value}: 0.0004\)

  • Reason for Decision:\(p\text{-value} < \alpha\)
  • Conclusion: At the 5% level of significance, there is sufficient evidence to conclude that the mean difference in work times on days when eating breakfast and on days when not eating breakfast has increased.

10.5: Matched or Paired Samples

Use the following information to answer the next five exercises. A study was conducted to test the effectiveness of a software patch in reducing system failures over a six-month period. Results for randomly selected installations are shown in Table . The “before” value is matched to an “after” value, and the differences are calculated. The differences have a normal distribution. Test at the 1% significance level.

Exercise 10.5.4

the mean difference of the system failures

Exercise 10.5.5

Exercise 10.5.6

Exercise 10.5.7

Exercise 10.5.8

What conclusion can you draw about the software patch?

With a \(p\text{-value} 0.0067\), we can reject the null hypothesis. There is enough evidence to support that the software patch is effective in reducing the number of system failures.

Use the following information to answer next five exercises. A study was conducted to test the effectiveness of a juggling class. Before the class started, six subjects juggled as many balls as they could at once. After the class, the same six subjects juggled as many balls as they could. The differences in the number of balls are calculated. The differences have a normal distribution. Test at the 1% significance level.

Exercise 10.5.9

Exercise 10.5.10

Exercise 10.5.11

What is the sample mean difference?

Exercise 10.5.12

This is a normal distribution curve with mean equal to zero. The values 0 and 1.67 are labeled on the horiztonal axis. A vertical line extends from 1.67 to the curve. The region under the curve to the right of the line is shaded to represent p-value = 0.0021.

Exercise 10.5.13

What conclusion can you draw about the juggling class?

Use the following information to answer the next five exercises. A doctor wants to know if a blood pressure medication is effective. Six subjects have their blood pressures recorded. After twelve weeks on the medication, the same six subjects have their blood pressure recorded again. For this test, only systolic pressure is of concern. Test at the 1% significance level.

Exercise 10.5.14

\(H_{0}: \mu_{d} \geq 0\)

Exercise 10.5.15

Exercise 10.5.16

Exercise 10.5.17

Exercise 10.5.18

What is the conclusion?

We decline to reject the null hypothesis. There is not sufficient evidence to support that the medication is effective.

Bringing It Together

Use the following information to answer the next ten exercises. indicate which of the following choices best identifies the hypothesis test.

  • independent group means, population standard deviations and/or variances known

Exercise 10.5.19

A powder diet is tested on 49 people, and a liquid diet is tested on 36 different people. The population standard deviations are two pounds and three pounds, respectively. Of interest is whether the liquid diet yields a higher mean weight loss than the powder diet.

Exercise 10.5.20

A new chocolate bar is taste-tested on consumers. Of interest is whether the proportion of children who like the new chocolate bar is greater than the proportion of adults who like it.

Exercise 10.5.21

The mean number of English courses taken in a two–year time period by male and female college students is believed to be about the same. An experiment is conducted and data are collected from nine males and 16 females.

Exercise 10.5.22

A football league reported that the mean number of touchdowns per game was five. A study is done to determine if the mean number of touchdowns has decreased.

Exercise 10.5.23

A study is done to determine if students in the California state university system take longer to graduate than students enrolled in private universities. One hundred students from both the California state university system and private universities are surveyed. From years of research, it is known that the population standard deviations are 1.5811 years and one year, respectively.

Exercise 10.5.24

According to a YWCA Rape Crisis Center newsletter, 75% of rape victims know their attackers. A study is done to verify this.

Exercise 10.5.25

According to a recent study, U.S. companies have a mean maternity-leave of six weeks.

Exercise 10.5.26

A recent drug survey showed an increase in use of drugs and alcohol among local high school students as compared to the national percent. Suppose that a survey of 100 local youths and 100 national youths is conducted to see if the proportion of drug and alcohol use is higher locally than nationally.

Exercise 10.5.27

A new SAT study course is tested on 12 individuals. Pre-course and post-course scores are recorded. Of interest is the mean increase in SAT scores. The following data are collected:

Exercise 10.5.28

University of Michigan researchers reported in the Journal of the National Cancer Institute that quitting smoking is especially beneficial for those under age 49. In this American Cancer Society study, the risk (probability) of dying of lung cancer was about the same as for those who had never smoked.

Exercise 10.5.29

Lesley E. Tan investigated the relationship between left-handedness vs. right-handedness and motor competence in preschool children. Random samples of 41 left-handed preschool children and 41 right-handed preschool children were given several tests of motor skills to determine if there is evidence of a difference between the children based on this experiment. The experiment produced the means and standard deviations shown Table . Determine the appropriate test and best distribution to use for that test.

  • Two independent means, normal distribution
  • Two independent means, Student’s-t distribution
  • Matched or paired samples, Student’s-t distribution
  • Two population proportions, normal distribution

Exercise 10.5.30

A golf instructor is interested in determining if her new technique for improving players’ golf scores is effective. She takes four (4) new students. She records their 18-hole scores before learning the technique and then after having taken her class. She conducts a hypothesis test. The data are as Table .

  • a test of two independent means.
  • a test of two proportions.
  • a test of a single mean.
  • a test of a single proportion.

If you are using a Student's t -distribution for the homework problems, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

Ten individuals went on a low–fat diet for 12 weeks to lower their cholesterol. The data are recorded in Table . Do you think that their cholesterol levels were significantly lowered?

\(p\text{-value} = 0.1494\)

At the 5% significance level, there is insufficient evidence to conclude that the medication lowered cholesterol levels after 12 weeks.

Use the following information to answer the next two exercises. A new AIDS prevention drug was tried on a group of 224 HIV positive patients. Forty-five patients developed AIDS after four years. In a control group of 224 HIV positive patients, 68 developed AIDS after four years. We want to test whether the method of treatment reduces the proportion of patients that develop AIDS after four years or if the proportions of the treated group and the untreated group stay the same.

Let the subscript \(t =\) treated patient and \(ut =\) untreated patient.

The appropriate hypotheses are:

  • \(H_{0}: p_{t} < p_{ut}\) and \(H_{a}: p_{t} \geq p_{ut}\)
  • \(H_{0}: p_{t} \leq p_{ut}\) and \(H_{a}: p_{t} > p_{ut}\)
  • \(H_{0}: p_{t} = p_{ut}\) and \(H_{a}: p_{t} \neq p_{ut}\)
  • \(H_{0}: p_{t} = p_{ut}\) and \(H_{a}: p_{t} < p_{ut}\)

If the \(p\text{-value}\) is 0.0062 what is the conclusion (use \(\alpha = 0.05\))?

  • The method has no effect.
  • There is sufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.
  • There is sufficient evidence to conclude that the method increases the proportion of HIV positive patients who develop AIDS after four years.
  • There is insufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.

Use the following information to answer the next two exercises. An experiment is conducted to show that blood pressure can be consciously reduced in people trained in a “biofeedback exercise program.” Six subjects were randomly selected and blood pressure measurements were recorded before and after the training. The difference between blood pressures was calculated (after - before) producing the following results: \(\bar{x}_{d} = -10.2\) \(s_{d} = 8.4\). Using the data, test the hypothesis that the blood pressure has decreased after the training.

The distribution for the test is:

  • \(N(-10.2, 8.4)\)
  • \(N\left(-10.2, \frac{8.4}{\sqrt{6}}\right)\)

If \(\alpha = 0.05\), the \(p\text{-value}\) and the conclusion are

  • 0.0014; There is sufficient evidence to conclude that the blood pressure decreased after the training.
  • 0.0014; There is sufficient evidence to conclude that the blood pressure increased after the training.
  • 0.0155; There is sufficient evidence to conclude that the blood pressure decreased after the training.
  • 0.0155; There is sufficient evidence to conclude that the blood pressure increased after the training.

A golf instructor is interested in determining if her new technique for improving players’ golf scores is effective. She takes four new students. She records their 18-hole scores before learning the technique and then after having taken her class. She conducts a hypothesis test. The data are as follows.

The correct decision is:

  • Reject \(H_{0}\).
  • Do not reject the \(H_{0}\).

A local cancer support group believes that the estimate for new female breast cancer cases in the south is higher in 2013 than in 2012. The group compared the estimates of new female breast cancer cases by southern state in 2012 and in 2013. The results are in Table .

Test: two matched pairs or paired samples ( t -test)

Random variable: \(\bar{X}_{d}\)

Distribution: \(t_{12}\)

\(H_{0}: \mu_{d} = 0 H_{a}: \mu_{d} > 0\)

The mean of the differences of new female breast cancer cases in the south between 2013 and 2012 is greater than zero. The estimate for new female breast cancer cases in the south is higher in 2013 than in 2012.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.0004.

Decision: Reject \(H_{0}\)

Conclusion: At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that there was a higher estimate of new female breast cancer cases in 2013 than in 2012.

A traveler wanted to know if the prices of hotels are different in the ten cities that he visits the most often. The list of the cities with the corresponding hotel prices for his two favorite hotel chains is in Table. Test at the 1% level of significance.

A politician asked his staff to determine whether the underemployment rate in the northeast decreased from 2011 to 2012. The results are in Table.

Test: matched or paired samples ( t -test)

Difference data: \(\{–0.9, –3.7, –3.2, –0.5, 0.6, –1.9, –0.5, 0.2, 0.6, 0.4, 1.7, –2.4, 1.8\}\)

Random Variable: \(\bar{X}_{d}\)

Distribution: \(H_{0}: \mu_{d} = 0 H_{a}: \mu_{d} < 0\)

The mean of the differences of the rate of underemployment in the northeastern states between 2012 and 2011 is less than zero. The underemployment rate went down from 2011 to 2012.

Graph: left-tailed.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.1207.

\(p\text{-value}: 0.1207\)

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude that there was a decrease in the underemployment rates of the northeastern states from 2011 to 2012.

10.6: Hypothesis Testing for Two Means and Two Proportions

Learn Statistics for Data Science, Machine Learning, and AI – Full Handbook

Tatev Aslanyan

Karl Pearson was a British mathematician who once said "Statistics is the grammar of science". This holds true especially for Computer and Information Sciences, Physical Science, and Biological Science.

When you are getting started with your journey in Data Science, Data Analytics, Machine Learning, or AI (including Generative AI) having statistical knowledge will help you better leverage data insights and actually understand all the algorithms beyond their implementation approach.

I can't overstate the importance of statistics in data science and Artificial Intelligence. Statistics provides tools and methods to find structure and give deeper data insights. Both Statistics and Mathematics love facts and hate guesses. Knowing the fundamentals of these two important subjects will allow you to think critically, and be creative when using the data to solve business problems and make data-driven decisions.

Key statistical concepts for your data science or data analysis journey with Python Code

In this handbook, I will cover the following Statistics topics for data science, machine learning, and artificial intelligence (including GenAI):

  • Random variables

Mean, Variance, Standard Deviation

  • Covariance and Correlation
  • Probability distribution functions (PDFs)
  • Bayes Theorem
  • Linear Regression and Ordinary Least Squares (OLS)

Gauss-Markov Theorem

  • Parameter properties (Bias, Consistency, Efficiency)
  • Confidence intervals
  • Hypothesis testing
  • Statistical significance
  • Type I & Type II Error
  • Statistical tests (Student's t-test, F-test, 2-Sample T-Test, 2-Sample Z-Test, Chi-Square Test)
  • p-value and its limitations

Inferential Statistics

  • Central Limit Theorem & Law of Large Numbers
  • Dimensionality reduction techniques (PCA, FA)
  • Interview Prep - Top 7 Statistics Questions with Answers
  • About The Author

How Can You Dive Deeper?

If you have no prior Statistical knowledge and you want to identify and learn the essential statistical concepts from the scratch and prepare for your job interviews, then this handbook is for you. It will also be a good read for anyone who wants to refresh their statistical knowledge.

Prerequisites

Before you start reading this handbook about key concepts in Statistics for Data Science, Machine Learning, and Artificial Intelligence, there are a few prerequisites that will help you make the most out of it.

This list is designed to ensure you are well-prepared and can fully grasp the statistical concepts discussed:

  • Basic Mathematical Skills : Comfort with high school level mathematics, including algebra and basic calculus, is essential. These skills are crucial for understanding statistical formulas and methods.
  • Logical Thinking : Ability to think logically and methodically to solve problems will aid in understanding statistical reasoning and applying these concepts to data-driven scenarios.
  • Computer Literacy : Basic knowledge of using computers and the internet is necessary since many examples and exercises might require the use of statistical software or coding.
  • Basic knowledge of Python, such as the creation of variables and working with some basic data structures and coding is also required (if you are not familiar with these concepts, check out my Python for Data Science 2024 -Full Course for Beginners here).
  • Curiosity and Willingness to Learn : A keen interest in learning and exploring data is perhaps the most important prerequisite. The field of data science is constantly evolving, and a proactive approach to learning will be incredibly beneficial.

This handbook assumes no prior knowledge of statistics, making it accessible to beginners. Still, familiarity with the above concepts will greatly enhance your understanding and ability to apply statistical methods effectively in various domains.

If you want to learn Mathematics, Statistics, Machine Learning or AI check out our YouTube Channel and LunarTech.ai for free resources.

Random Variables

Random variables form the cornerstone of many statistical concepts. It might be hard to digest the formal mathematical definition of a random variable, but simply put, it's a way to map the outcomes of random processes, such as flipping a coin or rolling a dice, to numbers.

For instance, we can define the random process of flipping a coin by random variable X which takes a value 1 if the outcome is heads and 0 if the outcome is tails.

In this example, we have a random process of flipping a coin where this experiment can produce two possible outcomes : {0,1}. This set of all possible outcomes is called the sample space of the experiment. Each time the random process is repeated, it is referred to as an event .

In this example, flipping a coin and getting a tail as an outcome is an event. The chance or the likelihood of this event occurring with a particular outcome is called the probability of that event.

A probability of an event is the likelihood that a random variable takes a specific value of x which can be described by P(x). In the example of flipping a coin, the likelihood of getting heads or tails is the same, that is 0.5 or 50%. So we have the following setting:

where the probability of an event, in this example, can only take values in the range [0,1].

To understand the concepts of mean, variance, and many other statistical topics, it is important to learn the concepts of population and sample.

The population is the set of all observations (individuals, objects, events, or procedures) and is usually very large and diverse. On the other hand, a sample is a subset of observations from the population that ideally is a true representation of the population.

1-VnNrkwNuW2hBKA8DC84Gdg

Given that experimenting with an entire population is either impossible or simply too expensive, researchers or analysts use samples rather than the entire population in their experiments or trials.

To make sure that the experimental results are reliable and hold for the entire population, the sample needs to be a true representation of the population. That is, the sample needs to be unbiased.

For this purpose, we can use statistical sampling techniques such as Random Sampling, Systematic Sampling, Clustered Sampling, Weighted Sampling, and Stratified Sampling.

The mean, also known as the average, is a central value of a finite set of numbers. Let’s assume a random variable X in the data has the following values:

where N is the number of observations or data points in the sample set or simply the data frequency. Then the sample mean defined by μ , which is very often used to approximate the population mean , can be expressed as follows:

The mean is also referred to as expectation which is often defined by E () or random variable with a bar on the top. For example, the expectation of random variables X and Y, that is E (X) and E (Y), respectively, can be expressed as follows:

Now that we have a solid understanding of the mean as a statistical measure, let's see how we can apply this knowledge practically using Python. Python is a versatile programming language that, with the help of libraries like NumPy, makes it easy to perform complex mathematical operations—including calculating the mean.

In the following code snippet, we demonstrate how to compute the mean of a set of numbers using NumPy. We will start by showing the calculation for a simple array of numbers. Then, we'll address a common scenario encountered in data science: calculating the mean of a dataset that includes undefined or missing values, represented as NaN (Not a Number). NumPy provides a function specifically designed to handle such cases, allowing us to compute the mean while ignoring these NaN values.

Here is how you can perform these operations in Python:

The variance measures how far the data points are spread out from the average value. It's equal to the sum of the squares of the differences between the data values and the average (the mean).

We can express the population variance as follows:

For deriving expectations and variances of different popular probability distribution functions, check out this Github repo .

Standard Deviation

The standard deviation is simply the square root of the variance and measures the extent to which data varies from its mean. The standard deviation defined by sigma can be expressed as follows:

Standard deviation is often preferred over the variance because it has the same units as the data points, which means you can interpret it more easily.

To compute the population variance using Python, we utilize the var function from the NumPy library. By default, this function calculates the population variance by setting the ddof (Delta Degrees of Freedom) parameter to 0. However, when dealing with samples and not the entire population, you would typically set ddof to 1 to get the sample variance.

The code snippet provided shows how to calculate the variance for a set of data. Additionally, it shows how to calculate the variance when there are NaN values in the data. NaN values represent missing or undefined data. When calculating the variance, these NaN values must be handled correctly; otherwise, they can result in a variance that is not a number (NaN), which is uninformative.

Here is how you can calculate the population variance in Python, taking into account the potential presence of NaN values:

The covariance is a measure of the joint variability of two random variables and describes the relationship between these two variables. It is defined as the expected value of the product of the two random variables’ deviations from their means.

The covariance between two random variables X and Z can be described by the following expression, where E (X) and E (Z) represent the means of X and Z, respectively.

Covariance can take negative or positive values as well as a value of 0. A positive value of covariance indicates that two random variables tend to vary in the same direction, whereas a negative value suggests that these variables vary in opposite directions. Finally, the value 0 means that they don’t vary together.

To explore the concept of covariance practically, we will use Python with the NumPy library, which provides powerful numerical operations. The np.cov function can be used to calculate the covariance matrix for two or more datasets. In the matrix, the diagonal elements represent the variance of each dataset, and the off-diagonal elements represent the covariance between each pair of datasets.

Let's look at an example of calculating the covariance between two sets of data:

Correlation

The correlation is also a measure of a relationship. It measures both the strength and the direction of the linear relationship between two variables.

If a correlation is detected, then it means that there is a relationship or a pattern between the values of two target variables. Correlation between two random variables X and Z is equal to the covariance between these two variables divided by the product of the standard deviations of these variables. This can be described by the following expression:

Correlation coefficients’ values range between -1 and 1. Keep in mind that the correlation of a variable with itself is always 1, that is Cor(X, X) = 1 .

Another thing to keep in mind when interpreting correlation is to not confuse it with causation , given that a correlation is not necessarily a causation. Even if there is a correlation between two variables, you cannot conclude that one variable causes a change in the other. This relationship could be coincidental, or a third factor might be causing both variables to change.

Unit-2-Module-1---Introduction-to-Generative-AI-5

Probability Distribution Functions

A function that describes all the possible values, the sample space, and the corresponding probabilities that a random variable can take within a given range, bounded between the minimum and maximum possible values, is called a probability distribution function (pdf) or probability density.

Every pdf needs to satisfy the following two criteria:

where the first criterium states that all probabilities should be numbers in the range of [0,1] and the second criterium states that the sum of all possible probabilities should be equal to 1.

Probability functions are usually classified into two categories: discrete and continuous .

Discrete distribution function describes the random process with countable sample space, like in an example of tossing a coin that has only two possible outcomes. Continuous distribution functions describe the random process with a continuous sample space.

Examples of discrete distribution functions are Bernoulli , Binomial , Poisson , Discrete Uniform . Examples of continuous distribution functions are Normal , Continuous Uniform , Cauchy .

Binomial Distribution

The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each with the boolean-valued outcome: success (with probability p ) or failure (with probability q = 1 − p).

Let's assume a random variable X follows a Binomial distribution, then the probability of observing k successes in n independent trials can be expressed by the following probability density function:

$$\Pr(X = k) = \binom{n}{k} p^k q^{n-k}$$

The binomial distribution is useful when analyzing the results of repeated independent experiments, especially if you're interested in the probability of meeting a particular threshold given a specific error rate.

Binomial Distribution Mean and Variance

The mean of a binomial distribution, denoted as E ( X )= np , tells you the average number of successes you can expect if you conduct n independent trials of a binary experiment.

A binary experiment is one where there are only two outcomes: success (with probability p ) or failure (with probability q =1− p ).

For example, if you were to flip a coin 100 times and you define a success as the coin landing on heads (let's say the probability of heads is 0.5), the binomial distribution would tell you how likely it is to get any number of heads in those 100 flips. The mean E ( X ) would be 100×0.5=50, indicating that on average, you’d expect to get 50 heads.

The variance Var(X)=npq measures the spread of the distribution, indicating how much the number of successes is likely to deviate from the mean.

Continuing with the coin flip example, the variance would be 100×0.5×0.5=25, which informs you about the variability of the outcomes. A smaller variance would mean the results are more tightly clustered around the mean, whereas a larger variance indicates they’re more spread out.

These concepts are crucial in many fields. For instance:

  • Quality Control : Manufacturers might use the binomial distribution to predict the number of defective items in a batch, helping them understand the quality and consistency of their production process.
  • Healthcare : In medicine, it could be used to calculate the probability of a certain number of patients responding to a treatment, based on past success rates.
  • Finance : In finance, binomial models are used to evaluate the risk of portfolio or investment strategies by predicting the number of times an asset will reach a certain price point.
  • Polling and Survey Analysis : When predicting election results or customer preferences, pollsters might use the binomial distribution to estimate how many people will favor a candidate or a product, given the probability drawn from a sample.

Understanding the mean and variance of the binomial distribution is fundamental to interpreting the results and making informed decisions based on the likelihood of different outcomes.

The figure below visualizes an example of Binomial distribution where the number of independent trials is equal to 8 and the probability of success in each trial is equal to 16%.

1-68nMYVFT0e5VsMBf8c226g

The Python code below creates a histogram to visualize the distribution of outcomes from 1000 experiments, each consisting of 8 trials with a success probability of 0.16. It uses NumPy to generate the binomial distribution data and Matplotlib to plot the histogram, showing the probability of the number of successes in those trials.

Poisson Distribution

The Poisson distribution is the discrete probability distribution of the number of events occurring in a specified time period, given the average number of times the event occurs over that time period.

Let's assume a random variable X follows a Poisson distribution. Then the probability of observing k events over a time period can be expressed by the following probability function:

where e is Euler’s number and λ lambda, the arrival rate parameter , is the expected value of X. The Poisson distribution function is very popular for its usage in modeling countable events occurring within a given time interval.

Poisson Distribution Mean and  Variance

The Poisson distribution is particularly useful for modeling the number of times an event occurs within a specified time frame. The mean E(X) and variance Var(X)

Var(X)  of a Poisson distribution are both equal to λ, which is the average rate at which events occur (also known as the rate parameter). This makes the Poisson distribution unique, as it is characterized by this single parameter.

The fact that the mean and variance are equal means that as we observe more events, the distribution of the number of occurrences becomes more predictable. It’s used in various fields such as business, engineering, and science for tasks like:

Predicting the number of customer arrivals at a store within an hour. Estimating the number of emails you'd receive in a day.  Understanding the number of defects in a batch of materials.

So, the Poisson distribution helps in making probabilistic forecasts about the occurrence of rare or random events over intervals of time or space.

For example, Poisson distribution can be used to model the number of customers arriving in the shop between 7 and 10 pm, or the number of patients arriving in an emergency room between 11 and 12 pm.

The figure below visualizes an example of Poisson distribution where we count the number of Web visitors arriving at the website where the arrival rate, lambda, is assumed to be equal to 7 minutes.

1-pMhbq88yZEp4gGFYhId82Q

In practical data analysis, it is often helpful to simulate the distribution of events. Below is a Python code snippet that demonstrates how to generate a series of data points that follow a Poisson distribution using NumPy. We then create a histogram using Matplotlib to visualize the distribution of the number of visitors (as an example) we might expect to see, based on our average rate λ = 7

This histogram helps in understanding the distribution's shape and variability. The most likely number of visitors is around the mean λ, but the distribution shows the probability of seeing fewer or greater numbers as well.

Normal Distribution

The Normal probability distribution is the continuous probability distribution for a real-valued random variable. Normal distribution, also called Gaussian distribution is arguably one of the most popular distribution functions that is commonly used in social and natural sciences for modeling purposes. For example, it is used to model people’s height or test scores.

Let's assume a random variable X follows a Normal distribution. Then its probability density function can be expressed as follows:

where the parameter μ (mu) is the mean of the distribution also referred to as the location parameter , parameter σ (sigma) is the standard deviation of the distribution also referred to as the scale parameter . The number π (pi) is a mathematical constant approximately equal to 3.14.

Normal Distribution Mean and Variance

The figure below visualizes an example of Normal distribution with a mean 0 ( μ = 0 ) and standard deviation of 1 ( σ = 1 ), which is referred to as Standard Normal distribution which is symmetric .

1-T_jAWtNjpf5lx29TbqwigQ

The visualization of the standard normal distribution is crucial because this distribution underpins many statistical methods and probability theory. When data is normally distributed with a mean ( μ ) of 0 and standard deviation (σ) of 1, it is referred to as the standard normal distribution. It's symmetric around the mean, with the shape of the curve often called the "bell curve" due to its bell-like shape.

The standard normal distribution is fundamental for the following reasons:

  • Central Limit Theorem: This theorem states that, under certain conditions, the sum of a large number of random variables will be approximately normally distributed. It allows for the use of normal probability theory for sample means and sums, even when the original data is not normally distributed.
  • Z-Scores: Values from any normal distribution can be transformed into the standard normal distribution using Z-scores, which indicate how many standard deviations an element is from the mean. This allows for the comparison of scores from different normal distributions.
  • Statistical Inference and AB Testing: Many statistical tests, such as t-tests and ANOVAs, assume that the data follows a normal distribution, or they rely on the central limit theorem. Understanding the standard normal distribution helps in the interpretation of these tests' results.
  • Confidence Intervals and Hypothesis Testing: The properties of the standard normal distribution are used to construct confidence intervals and to perform hypothesis testing.

All topics which we will cover below!

So, being able to visualize and understand the standard normal distribution is key to applying many statistical techniques accurately.

The Python code below uses NumPy to generate 1000 random samples from a normal distribution with a mean (μ) of 0 and a standard deviation (σ) of 1, which are standard parameters for the standard normal distribution. These generated samples are stored in the variable X.

To visualize the distribution of these samples, the code employs Matplotlib to create a histogram. The plt.hist function is used to plot the histogram of the samples with 30 bins, and the density parameter is set to True to normalize the histogram so that the area under it sums to 1. This effectively turns the histogram into a probability density plot.

Additionally, the SciPy library is used to overlay the probability density function (PDF) of the theoretical normal distribution on the histogram. The norm.pdf function generates the y-values for the PDF given an array of x-values. This theoretical curve is plotted in yellow over the histogram to show how closely the random samples fit the expected distribution.

The resulting graph displays the histogram of the generated samples in purple, with the theoretical normal distribution overlaid in yellow. The x-axis represents the range of values that the samples can take, while the y-axis represents the probability density. This visualization is a powerful tool for comparing the empirical distribution of the data with the theoretical model, allowing us to see whether our samples follow the expected pattern of a normal distribution.

Unit-2-Module-1---Introduction-to-Generative-AI-7

Bayes' Theorem

The Bayes' Theorem (often called Bayes' Law ) is arguably the most powerful rule of probability and statistics. It was named after famous English statistician and philosopher, Thomas Bayes.

Bayes' theorem is a powerful probability law that brings the concept of subjectivity into the world of Statistics and Mathematics where everything is about facts. It describes the probability of an event, based on the prior information of conditions that might be related to that event.

For instance, if the risk of getting Coronavirus or Covid-19 is known to increase with age, then Bayes' Theorem allows the risk to an individual of a known age to be determined more accurately. It does this by conditioning it on the age rather than simply assuming that this individual is common to the population as a whole.

The concept of conditional probability , which plays a central role in Bayes' theorem, is a measure of the probability of an event happening, given that another event has already occurred.

Bayes' theorem can be described by the following expression where the X and Y stand for events X and Y, respectively:

  • Pr (X|Y): the probability of event X occurring given that event or condition Y has occurred or is true
  • Pr (Y|X): the probability of event Y occurring given that event or condition X has occurred or is true
  • Pr (X) & Pr (Y): the probabilities of observing events X and Y, respectively

In the case of the earlier example, the probability of getting Coronavirus (event X) conditional on being at a certain age is Pr (X|Y). This is equal to the probability of being at a certain age given that the person got a Coronavirus, Pr (Y|X), multiplied with the probability of getting a Coronavirus, Pr (X), divided by the probability of being at a certain age, Pr (Y).

Linear Regression

Earlier, we introduced the concept of causation between variables, which happens when a variable has a direct impact on another variable.

When the relationship between two variables is linear, then Linear Regression is a statistical method that can help model the impact of a unit change in a variable, the independent variable on the values of another variable, the dependent variable .

Dependent variables are often referred to as response variables or explained variables, whereas independent variables are often referred to as regressors or explanatory variables .

When the Linear Regression model is based on a single independent variable, then the model is called Simple Linear Regression . When the model is based on multiple independent variables, it’s referred to as Multiple Linear Regression.

Simple Linear Regression can be described by the following expression:

where Y is the dependent variable, X is the independent variable which is part of the data, β0 is the intercept which is unknown and constant, β1 is the slope coefficient or a parameter corresponding to the variable X which is unknown and constant as well. Finally, u is the error term that the model makes when estimating the Y values.

The main idea behind linear regression is to find the best-fitting straight line, the regression line, through a set of paired ( X, Y ) data.

One example of the Linear Regression application is modeling the impact of flipper length on penguins’ body mass, which is visualized below:

1-cS-5_yS2xa--V97U1RoAIQ

The R code snippet you've shared is for creating a scatter plot with a linear regression line using the ggplot2 package in R, which is a powerful and widely-used library for creating graphics and visualizations. The code uses a dataset named penguins from the palmerpenguins package, presumably containing data about penguin species, including measurements like flipper length and body mass.

Multiple Linear Regression with three independent variables can be described by the following expression:

Ordinary Least Squares

The ordinary least squares (OLS) is a method for estimating the unknown parameters such as β0 and β1 in a linear regression model. The model is based on the principle of least squares . This minimizes the sum of the squares of the differences between the observed dependent variable and its values that are predicted by the linear function of the independent variable (often referred to as fitted values ).

This difference between the real and predicted values of dependent variable Y is referred to as residual . So OLS minimizes the sum of squared residuals.

This optimization problem results in the following OLS estimates for the unknown parameters β0 and β1 which are also known as coefficient estimates :

Once these parameters of the Simple Linear Regression model are estimated, the fitted values of the response variable can be computed as follows:

Standard Error

The residuals or the estimated error terms can be determined as follows:

It is important to keep in mind the difference between the error terms and residuals. Error terms are never observed, while the residuals are calculated from the data. The OLS estimates the error terms for each observation but not the actual error term. So, the true error variance is still unknown.

Also, these estimates are subject to sampling uncertainty. This means that we will never be able to determine the exact estimate, the true value, of these parameters from sample data in an empirical application. But we can estimate it by calculating the sample residual variance by using the residuals as follows:

This estimate for the variance of sample residuals helps us estimate the variance of the estimated parameters, which is often expressed as follows:

The square root of this variance term is called the standard error of the estimate. This is a key component in assessing the accuracy of the parameter estimates. It is used to calculate test statistics and confidence intervals.

The standard error can be expressed as follows:

It is important to keep in mind the difference between the error terms and residuals. Error terms are never observed, while the residuals are calculated from the data.

OLS Assumptions

The OLS estimation method makes the following assumptions which need to be satisfied to get reliable prediction results:

  • The Linearity assumption states that the model is linear in parameters.
  • The Random Sample assumption states that all observations in the sample are randomly selected.
  • The Exogeneity assumption states that independent variables are uncorrelated with the error terms.
  • The Homoskedasticity assumption states that the variance of all error terms is constant.
  • The No Perfect Multi-Collinearity assumption states that none of the independent variables is constant and there are no exact linear relationships between the independent variables.

The Python code snippet you've shared performs Ordinary Least Squares (OLS) regression, which is a method used in statistics to estimate the relationship between independent variables and a dependent variable. This process involves calculating the best-fit line through the data points that minimizes the sum of the squared differences between the observed values and the values predicted by the model.

The code defines a function runOLS(Y, X) that takes in a dependent variable Y and an independent variable X and performs the following steps:

  • Estimates the OLS coefficients (beta_hat) using the linear algebra solution to the least squares problem.
  • Makes predictions ( Y_hat ) using the estimated coefficients and calculates the residuals.
  • Computes the residual sum of squares (RSS), total sum of squares (TSS), mean squared error (MSE), root mean squared error (RMSE), and R-squared value, which are common metrics used to assess the fit of the model.
  • Calculates the standard error of the coefficient estimates, t-statistics, p-values, and confidence intervals for the estimated coefficients.

These calculations are standard in regression analysis and are used to interpret and understand the strength and significance of the relationship between the variables. The result of this function includes the estimated coefficients and various statistics that help evaluate the model's performance.

Parameter Properties

Under the assumption that the OLS criteria/assumptions we just discussed are satisfied, the OLS estimators of coefficients β0 and β1 are BLUE and Consistent . So what does this mean?

This theorem highlights the properties of OLS estimates where the term BLUE stands for Best Linear Unbiased Estimator . Let's explore what this means in more detail.

The bias of an estimator is the difference between its expected value and the true value of the parameter being estimated. It can be expressed as follows:

When we state that the estimator is unbiased , we mean that the bias is equal to zero. This implies that the expected value of the estimator is equal to the true parameter value, that is:

Unbiasedness does not guarantee that the obtained estimate with any particular sample is equal or close to β. What it means is that, if we repeatedly draw random samples from the population and then computes the estimate each time, then the average of these estimates would be equal or very close to β.

The term Best in the Gauss-Markov theorem relates to the variance of the estimator and is referred to as efficiency . A parameter can have multiple estimators but the one with the lowest variance is called efficient.

Consistency

The term consistency goes hand in hand with the terms sample size and convergence . If the estimator converges to the true parameter as the sample size becomes very large, then this estimator is said to be consistent, that is:

All these properties hold for OLS estimates as summarized in the Gauss-Markov theorem. In other words, OLS estimates have the smallest variance, they are unbiased, linear in parameters, and are consistent. These properties can be mathematically proven by using the OLS assumptions made earlier.

Confidence Intervals

The Confidence Interval is the range that contains the true population parameter with a certain pre-specified probability. This is referred to as the confidence level of the experiment, and it's obtained by using the sample results and the margin of error .

Margin of Error

The margin of error is the difference between the sample results and based on what the result would have been if you had used the entire population.

Confidence Level

The Confidence Level describes the level of certainty in the experimental results. For example, a 95% confidence level means that if you were to perform the same experiment repeatedly 100 times, then 95 of those 100 trials would lead to similar results.

Note that the confidence level is defined before the start of the experiment because it will affect how big the margin of error will be at the end of the experiment.

Confidence Interval for OLS Estimates

As I mentioned earlier, the OLS estimates of the Simple Linear Regression, the estimates for intercept β0 and slope coefficient β1, are subject to sampling uncertainty. But we can construct Confidence Intervals (CIs) for these parameters which will contain the true value of these parameters in 95% of all samples.

That is, 95% confidence interval for β can be interpreted as follows:

  • The confidence interval is the set of values for which a hypothesis test cannot be rejected to the level of 5%.
  • The confidence interval has a 95% chance to contain the true value of β.

95% confidence interval of OLS estimates can be constructed as follows:

This is based on the parameter estimate, the standard error of that estimate, and the value 1.96 representing the margin of error corresponding to the 5% rejection rule.

This value is determined using the Normal Distribution table , which we'll discuss later on in this handbook.

Meanwhile, the following figure illustrates the idea of 95% CI:

1-XtBhY43apW_xIyf23eOWow

Note that the confidence interval depends on the sample size as well, given that it is calculated using the standard error which is based on sample size.

Statistical Hypothesis Testing

Testing a hypothesis in Statistics is a way to test the results of an experiment or survey to determine how meaningful they the results are.

Basically, you're testing whether the obtained results are valid by figuring out the odds that the results have occurred by chance. If it is the letter, then the results are not reliable and neither is the experiment. Hypothesis Testing is part of the Statistical Inference .

Null and Alternative Hypothesis

Firstly, you need to determine the thesis you wish to test. Then you need to formulate the Null Hypothesis and the Alternative Hypothesis. The test can have two possible outcomes. Based on the statistical results, you can either reject the stated hypothesis or accept it.

As a rule of thumb, statisticians tend to put the version or formulation of the hypothesis under the Null Hypothesis that needs to be rejected , whereas the acceptable and desired version is stated under the Alternative Hypothesis .

Statistical Significance

Let’s look at the earlier mentioned example where we used the Linear Regression model to investigate whether a penguin's Flipper Length, the independent variable, has an impact on Body Mass , the dependent variable.

We can formulate this model with the following statistical expression:

Then, once the OLS estimates of the coefficients are estimated, we can formulate the following Null and Alternative Hypothesis to test whether the Flipper Length has a statistically significant impact on the Body Mass:

1-DVPqyel26EtGY__fwp_-rA

where H0 and H1 represent Null Hypothesis and Alternative Hypothesis, respectively.

Rejecting the Null Hypothesis would mean that a one-unit increase in Flipper Length has a direct impact on the Body Mass (given that the parameter estimate of β1 is describing this impact of the independent variable, Flipper Length, on the dependent variable, Body Mass). We can reformulate this hypothesis as follows:

where H0 states that the parameter estimate of β1 is equal to 0, that is Flipper Length effect on Body Mass is statistically insignificant whereas H1 states that the parameter estimate of β1 is not equal to 0, suggesting that Flipper Length effect on Body Mass is statistically significant .

Type I and Type II Errors

When performing Statistical Hypothesis Testing, you need to consider two conceptual types of errors: Type I error and Type II error.

Type I errors occur when the Null is incorrectly rejected, and Type II errors occur when the Null Hypothesis is incorrectly not rejected. A confusion matrix can help you clearly visualize the severity of these two types of errors.

As a rule of thumb, statisticians tend to put the version of the hypothesis under the Null Hypothesis that that needs to be rejected, whereas the acceptable and desired version is stated under the Alternative Hypothesis.

Unit-2-Module-1---Introduction-to-Generative-AI-3-1

Statistical Tests

Once the you've stataed the Null and the Alternative Hypotheses and defined the test assumptions, the next step is to determine which statistical test is appropriate and to calculate the test statistic .

Whether or not to reject or not reject the Null can be determined by comparing the test statistic with the critical value . This comparison shows whether or not the observed test statistic is more extreme than the defined critical value.

It can have two possible results:

  • The test statistic is more extreme than the critical value → the null hypothesis can be rejected
  • The test statistic is not as extreme as the critical value → the null hypothesis cannot be rejected

The critical value is based on a pre-specified significance level α (usually chosen to be equal to 5%) and the type of probability distribution the test statistic follows.

The critical value divides the area under this probability distribution curve into the rejection region(s) and non-rejection region . There are numerous statistical tests used to test various hypotheses. Examples of Statistical tests are Student’s t-test , F-test , Chi-squared test , Durbin-Hausman-Wu Endogeneity test , W hite Heteroskedasticity test . In this handbook, we will look at two of these statistical tests: the Student's t-test and the F-test.

Student’s t-test

One of the simplest and most popular statistical tests is the Student’s t-test. You can use it to test various hypotheses, especially when dealing with a hypothesis where the main area of interest is to find evidence for the statistically significant effect of a single variable .

The test statistics of the t-test follows Student’s t distribution and can be determined as follows:

where h0 in the nominator is the value against which the parameter estimate is being tested. So, the t-test statistics are equal to the parameter estimate minus the hypothesized value divided by the standard error of the coefficient estimate.

Let's use this for our earlier hypothesis, where we wanted to test whether Flipper Length has a statistically significant impact on Body Mass or not. This test can be performed using a t-test. The h0 is in that case equal to the 0 since the slope coefficient estimate is tested against a value of 0.

Two-sided vs one-sided t-test

There are two versions of the t-test: a two-sided t-test and a one-sided t-test . Whether you need the former or the latter version of the test depends entirely on the hypothesis that you want to test.

You can use the two-sided or two-tailed t-test when the hypothesis is testing equal versus not equal relationship under the Null and Alternative Hypotheses. It would be similar to the following example:

The two-sided t-test has two rejection regions as visualized in the figure below:

1-otgnlBKy306KgrFUZxk0Og

In this version of the t-test, the Null is rejected if the calculated t-statistics is either too small or too large.

Here, the test statistics are compared to the critical values based on the sample size and the chosen significance level. To determine the exact value of the cutoff point, you can use a two-sided t-distribution table .

On the other hand, you can use the one-sided or one-tailed t-test when the hypothesis is testing positive/negative versus negative/positive relationships under the Null and Alternative Hypotheses. It looks like this:

1-uKChnDWApLtrCf8bq13o4w

One-sided t-test has a single rejection region . Depending on the hypothesis side, the rejection region is either on the left-hand side or the right-hand side as visualized in the figure below:

1-SVKBOOFtXIvYwL2gC9XEoQ

In this version of the t-test, the Null is rejected if the calculated t-statistics is smaller/larger than the critical value.

1-UvLof79AQigLFgxbKAvYgA

F-test is another very popular statistical test often used to test hypotheses testing a joint statistical significance of multiple variables . This is the case when you want to test whether multiple independent variables have a statistically significant impact on a dependent variable.

Following is an example of a statistical hypothesis that you can test using the F-test:

where the Null states that the three variables corresponding to these coefficients are jointly statistically insignificant, and the Alternative states that these three variables are jointly statistically significant.

The test statistics of the F-test follows F distribution and can be determined as follows:

  • the SSRrestricted is the sum of squared residuals of the restricted model , which is the same model excluding from the data the target variables stated as insignificant under the Null
  • the SSRunrestricted is the sum of squared residuals of the unrestricted model , which is the model that includes all variables
  • the q represents the number of variables that are being jointly tested for the insignificance under the Null
  • N is the sample size
  • and the k is the total number of variables in the unrestricted model.

SSR values are provided next to the parameter estimates after running the OLS regression, and the same holds for the F-statistics as well.

Following is an example of MLR model output where the SSR and F-statistics values are marked.

1-5kTyYIc3LztrgM-oLKltwg

F-test has a single rejection region as visualized below:

1-U3c2dRBPYCqtDqNGvk1BKA

If the calculated F-statistics is bigger than the critical value, then the Null can be rejected. This suggests that the independent variables are jointly statistically significant. The rejection rule can be expressed as follows:

2-sample T-test

If you want to test whether there is a statistically significant difference between the control and experimental groups’ metrics that are in the form of averages (for example, average purchase amount), metric follows student-t distribution. When the sample size is smaller than 30, you can use 2-sample T-test to test the following hypothesis:

where the sampling distribution of means of Control group follows Student-t distribution with degrees of freedom N_con-1. Also, the sampling distribution of means of the Experimental group also follows the Student-t distribution with degrees of freedom N_exp-1.

Note that the N_con and N_exp are the number of users in the Control and Experimental groups, respectively.

Then you can calculate an estimate for the pooled variance of the two samples as follows:

where σ²_con and σ²_exp are the sample variances of the Control and Experimental groups, respectively. Then the Standard Error is equal to the square root of the estimate of the pooled variance, and can be defined as:

Consequently, the test statistics of the 2-sample T-test with the hypothesis stated earlier can be calculated as follows:

In order to test the statistical significance of the observed difference between sample means, we need to calculate the p-value of our test statistics.

The p-value is the probability of observing values at least as extreme as the common value when this is due to a random chance. Stated differently, the p-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the null hypothesis is true.

Then the p-value of the test statistics can be calculated as follows:

The interpretation of a p -value is dependent on the chosen significance level, alpha, which you choose before running the test during the power analysis .

If the calculated p -value appears to be smaller than equal to alpha (for example, 0.05 for 5% significance level) we can reject the null hypothesis and state that there is a statistically significant difference between the primary metrics of the Control and Experimental groups.

Finally, to determine how accurate the obtained results are and also to comment about the practical significance of the obtained results, you can compute the Confidence Interval of your test by using the following formula:

where the t_(1-alpha/2) is the critical value of the test corresponding to the two-sided t-test with alpha significance level. It can be found using the t-table .

The Python code provided performs a two-sample t-test, which is used in statistics to determine if two sets of data are significantly different from each other. This particular snippet simulates two groups (control and experimental) with data following a t-distribution, calculates the mean and variance for each group, and then performs the following steps:

  • It calculates the pooled variance, which is a weighted average of the variances of the two groups.
  • It computes the standard error of the difference between the two means.
  • It calculates the t-statistic, which is the difference between the two sample means divided by the standard error. This statistic measures how much the groups differ in units of standard error.
  • It determines the critical t-value from the t-distribution for the given significance level and degrees of freedom, which is used to decide whether the t-statistic is large enough to indicate a statistically significant difference between the groups.
  • It calculates the p-value, which indicates the probability of observing such a difference between means if the null hypothesis (that there is no difference) is true.
  • It computes the margin of error and constructs the confidence interval around the difference in means.

Finally, the code prints out the t-statistic, critical t-value, p-value, and confidence interval. These results can be used to infer whether the observed differences in means are statistically significant or likely due to random variation.

2-sample Z-test

There are various situations when you may want to use a 2-sample z-test:

  • if you want to test whether there is a statistically significant difference between the control and experimental groups’ metrics that are in the form of averages (for example, average purchase amount) or proportions (for example, Click Through Rate)
  • if the metric follows Normal distribution
  • when the sample size is larger than 30, such that you can use the Central Limit Theorem (CLT) to state that the sampling distributions of the Control and Experimental groups are asymptotically Normal

Here we will make a distinction between two cases: where the primary metric is in the form of proportions (like Click Through Rate) and where the primary metric is in the form of averages (like average purchase amount).

Case 1: Z-test for comparing proportions (2-sided)

If you want to test whether there is a statistically significant difference between the Control and Experimental groups’ metrics that are in the form of proportions (like CTR) and if the click event occurs independently, you can use a 2-sample Z-test to test the following hypothesis:

where each click event can be described by a random variable that can take two possible values: 1 (success) and 0 (failure). It also follows a Bernoulli distribution (click: success and no click: failure) where p_con and p_exp are the probabilities of clicking (probability of success) of Control and Experimental groups, respectively.

So, after collecting the interaction data of the Control and Experimental users, you can calculate the estimates of these two probabilities as follows:

Since we are testing for the difference in these probabilities, we need to obtain an estimate for the pooled probability of success and an estimate for pooled variance, which can be done as follows:

Then the Standard Error is equal to the square root of the estimate of the pooled variance. It can be defined as:

And so, the test statistics of the 2-sample Z-test for the difference in proportions can be calculated as follows:

Then the p-value of this test statistics can be calculated as follows:

Finally, you can compute the Confidence Interval of the test as follows:

where the z_(1-alpha/2) is the critical value of the test corresponding to the two-sided Z-test with alpha significance level. You can find it using the Z-table .

The rejection region of this two-sided 2-sample Z-test can be visualized by the following graph:

Image Source: LunarTech

The Python code snippet you’ve provided performs a two-sample Z-test for proportions. This type of test is used to determine whether there is a significant difference between the proportions of two groups. Here’s a brief explanation of the steps the code performs:

  • Calculates the sample proportions for both the control and experimental groups.
  • Computes the pooled sample proportion, which is an estimate of the proportion assuming the null hypothesis (that there is no difference between the group proportions) is true.
  • Calculates the pooled sample variance based on the pooled proportion and the sizes of the two samples.
  • Derives the standard error of the difference in sample proportions.
  • Calculates the Z-test statistic, which measures the number of standard errors between the sample proportion difference and the null hypothesis.
  • Finds the critical Z-value from the standard normal distribution for the given significance level.
  • Computes the p-value to assess the evidence against the null hypothesis.
  • Calculates the margin of error and the confidence interval for the difference in proportions.
  • Outputs the test statistic, critical value, p-value, and confidence interval, and based on the test statistic and critical value, it may print a statement to either reject or not reject the null hypothesis.

The latter part of the code uses Matplotlib to create a visualization of the standard normal distribution and the rejection regions for the two-sided Z-test. This visual aid helps to understand where the test statistic falls in relation to the distribution and the critical values.

Case 2: Z-test for Comparing Means (2-sided)

If you want to test whether there is a statistically significant difference between the Control and Experimental groups’ metrics that are in the form of averages (like average purchase amount) you can use a 2-sample Z-test to test the following hypothesis:

where the sampling distribution of means of the Control group follows Normal distribution with mean mu_con and σ²_con/N_con. Moreover, the sampling distribution of means of the Experimental group also follows the Normal distribution with mean mu_exp and σ²_exp/N_exp.

Then the difference in the means of the control and experimental groups also follows Normal distributions with mean mu_con-mu_exp and variance σ²_con/N_con + σ²_exp/N_exp.

Consequently, the test statistics of the 2-sample Z-test for the difference in means can be calculated as follows:

The Standard Error is equal to the square root of the estimate of the pooled variance and can be defined as:

The Python code provided appears to be set up for conducting a two-sample Z-test, typically used to determine if there is a significant difference between the means of two independent groups. In this context, the code might be comparing two different processes or treatments.

  • It generates two arrays of random integers to represent data for a control group ( X_A ) and an experimental group ( X_B ).
  • It calculates the sample means ( mu_con , mu_exp ) and variances ( variance_con , variance_exp ) for both groups.
  • The pooled variance is computed, which is used in the denominator of the test statistic formula for the Z-test, providing a measure of the data's common variance.
  • The Z-test statistic ( T ) is calculated by taking the difference between the two sample means and dividing it by the standard error of the difference.
  • The p-value is calculated to test the hypothesis of whether the means of the two groups are statistically different from each other.
  • The critical Z-value ( Z_crit ) is determined from the standard normal distribution, which defines the cutoff points for significance.
  • A margin of error is computed, and a confidence interval for the difference in means is constructed.
  • The test statistic, critical value, p-value, and confidence interval are printed to the console.

Lastly, the code uses Matplotlib to plot the standard normal distribution and highlight the rejection regions for the Z-test. This visualization can help in understanding the result of the Z-test in terms of where the test statistic lies relative to the distribution and the critical values for a two-sided test.

Chi-Squared test

If you want to test whether there is a statistically significant difference between the Control and Experimental groups’ performance metrics (for example their conversions) and you don’t really want to know the nature of this relationship (which one is better) you can use a Chi-Squared test to test the following hypothesis:

Note that the metric should be in the form of a binary variable (for example, conversion or no conversion/click or no click). The data can then be represented in the form of the following table, where O and T correspond to observed and theoretical values, respectively.

1-1RVqOq4mc4-oach5QHCy5g

Then the test statistics of the Chi-2 test can be expressed as follows:

where the Observed corresponds to the observed data and the Expected corresponds to the theoretical value, and i can take values 0 (no conversion) and 1(conversion). It’s important to see that each of these factors has a separate denominator. The formula for the test statistics when you have two groups only can be represented as follows:

The expected value is simply equal to the number of times each version of the product is viewed multiplied by the probability of it leading to conversion (or to a click in case of CTR).

Note that, since the Chi-2 test is not a parametric test, its Standard Error and Confidence Interval can’t be calculated in a standard way as we did in the parametric Z-test or T-test.

Image Source: LunarTech

The Python code you've shared is for conducting a Chi-squared test, a statistical hypothesis test that is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

In the provided code snippet, it looks like the test is being used to compare two categorical datasets:

  • It calculates the Chi-squared test statistic by summing the squared difference between observed ( O ) and expected ( T ) frequencies, divided by the expected frequencies for each category. This is known as the squared relative distance and is used as the test statistic for the Chi-squared test.
  • It then calculates the p-value for this test statistic using the degrees of freedom, which in this case is assumed to be 1 (but this would typically depend on the number of categories minus one).
  • The Matplotlib library is used to plot the probability density function (pdf) of the Chi-squared distribution with one degree of freedom. It also highlights the rejection region for the test, which corresponds to the critical value of the Chi-squared distribution that the test statistic must exceed for the difference to be considered statistically significant.

The visualization helps to understand the Chi-squared test by showing where the test statistic lies in relation to the Chi-squared distribution and its critical value. If the test statistic is within the rejection region, the null hypothesis of no difference in frequencies can be rejected.

Another quick way to determine whether to reject or to support the Null Hypothesis is by using p-values . The p-value is the probability of the condition under the Null occurring. Stated differently, the p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic. The smaller the p-value, the stronger is the evidence against the Null Hypothesis, suggesting that it can be rejected.

The interpretation of a p -value is dependent on the chosen significance level. Most often, 1%, 5%, or 10% significance levels are used to interpret the p-value. So, instead of using the t-test and the F-test, p-values of these test statistics can be used to test the same hypotheses.

The following figure shows a sample output of an OLS regression with two independent variables. In this table, the p-value of the t-test, testing the statistical significance of class_size variable’s parameter estimate, and the p-value of the F-test, testing the joint statistical significance of the class_size, and el_pct variables parameter estimates, are underlined.

1-aJh-8BEvYnwid5jS7fDLHA

The p-value corresponding to the class_size variable is 0.011. When we compare this value to the significance levels 1% or 0.01 , 5% or 0.05, 10% or 0.1, then we can make the following conclusions:

  • 0.011 > 0.01 → Null of the t-test can’t be rejected at 1% significance level
  • 0.011 < 0.05 → Null of the t-test can be rejected at 5% significance level
  • 0.011 < 0.10 → Null of the t-test can be rejected at 10% significance level

So, this p-value suggests that the coefficient of the class_size variable is statistically significant at 5% and 10% significance levels. The p-value corresponding to the F-test is 0.0000. And since 0 is smaller than all three cutoff values (0.01, 0.05, 0.10), we can conclude that the Null of the F-test can be rejected in all three cases.

This suggests that the coefficients of class_size and el_pct variables are jointly statistically significant at 1%, 5%, and 10% significance levels.

Limitation of p-values

Using p-values has many benefits, but it has also limitations. One of the main ones is that the p-value depends on both the magnitude of association and the sample size. If the magnitude of the effect is small and statistically insignificant, the p-value might still show a significant impact because the sample size is large. The opposite can occur as well – an effect can be large, but fail to meet the p<0.01, 0.05, or 0.10 criteria if the sample size is small.

Inferential statistics uses sample data to make reasonable judgments about the population from which the sample data originated. We use it to investigate the relationships between variables within a sample and make predictions about how these variables will relate to a larger population.

Both the Law of Large Numbers (LLN) and the Central Limit Theorem (CLM) have a significant role in Inferential statistics because they show that the experimental results hold regardless of what shape the original population distribution was when the data is large enough.

The more data is gathered, the more accurate the statistical inferences become – hence, the more accurate parameter estimates are generated.

Law of Large Numbers (LLN)

Suppose X1, X2, . . . , Xn are all independent random variables with the same underlying distribution (also called independent identically-distributed or i.i.d), where all X’s have the same mean μ and standard deviation σ . As the sample size grows, the probability that the average of all X’s is equal to the mean μ is equal to 1.

The Law of Large Numbers can be summarized as follows:

1-guDCKe5lIntrCicvX1WeBQ

Central Limit Theorem (CLM)

Suppose X1, X2, . . . , Xn are all independent random variables with the same underlying distribution (also called independent identically-distributed or i.i.d), where all X’s have the same mean μ and standard deviation σ . As the sample size grows, the probability distribution of X converges in the distribution in Normal distribution with mean μ and variance σ- squared.

The Central Limit Theorem can be summarized as follows:

1-FCDUcznU-VRRdctstA1WJA

Stated differently, when you have a population with mean μ and standard deviation σ and you take sufficiently large random samples from that population with replacement, then the distribution of the sample means will be approximately normally distributed.

Dimensionality Reduction Techniques

Dimensionality reduction is the transformation of data from a high-dimensional space into a low-dimensional space such that this low-dimensional representation of the data still contains the meaningful properties of the original data as much as possible.

With the increase in popularity in Big Data, the demand for these dimensionality reduction techniques, reducing the amount of unnecessary data and features, increased as well. Examples of popular dimensionality reduction techniques are Principle Component Analysis , Factor Analysis , Canonical Correlation , Random Forest .

Principle Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique that is very often used to reduce the dimensionality of large data sets. It does this by transforming a large set of variables into a smaller set that still contains most of the information or the variation in the original large dataset.

Let’s assume we have a data X with p variables X1, X2, …., Xp with eigenvectors e1, …, ep, and eigenvalues λ1,…, λp. Eigenvalues show the variance explained by a particular data field out of the total variance.

The idea behind PCA is to create new (independent) variables, called Principal Components, that are a linear combination of the existing variable. The i th principal component can be expressed as follows:

Then using the Elbow Rule or Kaiser Rule , you can determine the number of principal components that optimally summarize the data without losing too much information.

It is also important to look at the proportion of total variation (PRTV) that is explained by each principal component to decide whether it is beneficial to include or to exclude it. PRTV for the i th principal component can be calculated using eigenvalues as follows:

The elbow rule or the elbow method is a heuristic approach that we can use to determine the number of optimal principal components from the PCA results.

The idea behind this method is to plot the explained variation as a function of the number of components and pick the elbow of the curve as the number of optimal principal components.

Following is an example of such a scatter plot where the PRTV (Y-axis) is plotted on the number of principal components (X-axis). The elbow corresponds to the X-axis value 2, which suggests that the number of optimal principal components is 2.

1-cLCESS2u2ZIsQbPBd7Ljlg

Factor Analysis (FA)

Factor analysis or FA is another statistical method for dimensionality reduction. It is one of the most commonly used inter-dependency techniques. We can use it when the relevant set of variables shows a systematic inter-dependence and our objective is to find out the latent factors that create a commonality.

Let’s assume we have a data X with p variables X1, X2, …., Xp. The FA model can be expressed as follows:

  • X is a [p x N] matrix of p variables and N observations
  • µ is [p x N] population mean matrix
  • A is [p x k] common factor loadings matrix
  • F [k x N] is the matrix of common factors
  • and u [pxN] is the matrix of specific factors.

So, to put it differently, a factor model is as a series of multiple regressions, predicting each of the variables Xi from the values of the unobservable common factors are:

Each variable has k of its own common factors, and these are related to the observations via the factor loading matrix for a single observation as follows:

In factor analysis, the factors are calculated to maximize between-group variance while minimizing in-group varianc e. They are factors because they group the underlying variables. Unlike the PCA, in FA the data needs to be normalized, given that FA assumes that the dataset follows Normal Distribution.

Interview Prep – Top 7 Statistics Questions with Answers

Are you preparing for interviews in statistics, data analysis, or data science? It's crucial to know key statistical concepts and their applications.

Below I've included seven important statistics questions with answers, covering basic statistical tests, probability theory, and the use of statistics in decision-making, like A/B testing.

Question 1: What is the d ifference b etween a t-test and Z-test ?

The question "What is the difference between a t-test and Z-test?" is a common question in data science interviews because it tests the candidate's understanding of basic statistical concepts used in comparing group means.

This knowledge is crucial because choosing the right test affects the validity of conclusions drawn from data, which is a daily task in a data scientist's role when it comes to interpreting experiments, analyzing survey results, or evaluating models.

Both t-tests and Z-tests are statistical methods used to determine if there are significant differences between the means of two groups. But they have key differences:

  • Assumptions : You can use a t-test when the sample sizes are small and the population standard deviation is unknown. It doesn't require the sample mean to be normally distributed if the sample size is sufficiently large due to the Central Limit Theorem. The Z-test assumes that both the sample and the population distributions are normally distributed.
  • Sample Size : T-tests are typically used for sample sizes smaller than 30, whereas Z-tests are used for larger sample sizes (greater than or equal to 30) when the population standard deviation is known.
  • Test Statistic : The t-test uses the t-distribution to calculate the test statistic, taking into account the sample standard deviation. The Z-test uses the standard normal distribution, utilizing the known population standard deviation.
  • P-Value : The p-value in a t-test is determined based on the t-distribution, which accounts for the variability in smaller samples. The Z-test uses the standard normal distribution to calculate the p-value, suitable for larger samples or known population parameters.

Question 2: What is a p-value?

The question "What is a p-value?" requires the understanding of a fundamental concept in hypothesis testing that we descussed in this blog in detail with examples. It's not just a number – it's a bridge between the data you collect and the conclusions you draw for data driven decision making.

P-values quantify the evidence against a null hypothesis—how likely it is to observe the collected data if the null hypothesis were true.

For data scientists, p-values are part of everyday language in statistical analysis, model validation, and experimental design. They have to interpret p-values correctly to make informed decisions and often need to explain their implications to stakeholders who might not have deep statistical knowledge.

Thus, understanding p-values helps data scientists to convey the level of certainty or doubt in their findings and to justify subsequent actions or recommendations.

So here you need to show your understanding of what p-value measures and connect it to statistical significance and hypothesis testing.

The p-value measures the probability of observing a test statistic at least as extreme as the one observed, under the assumption that the null hypothesis is true. It helps in deciding whether the observed data significantly deviate from what would be expected under the null hypothesis.

If the p-value is lower than a predetermined threshold (alpha level, usually set at 0.05), the null hypothesis is rejected, indicating that the observed result is statistically significant.

Question 3: What are limitations of p-values?

P-values are a staple of inferential statistics, providing a metric for evaluating evidence against a null hypothesis. In these question you need to name couple of them.

  • Dependence on Sample Size : The p-value is sensitive to the sample size. Large samples might yield significant p-values even for trivial effects, while small samples may not detect significant effects even if they exist.
  • Not a Measure of Effect Size or Importance : A small p-value does not necessarily mean the effect is practically significant – it simply indicates it's unlikely to have occurred by chance.
  • Misinterpretation : P-values can be misinterpreted as the probability that the null hypothesis is true, which is incorrect. They only measure the evidence against the null hypothesis.

Question 4: What is a Confidence Level?

A confidence level represents the frequency with which an estimated confidence interval would contain the true population parameter if the same process were repeated multiple times.

For example, a 95% confidence level means that if the study were repeated 100 times, approximately 95 of the confidence intervals calculated from those studies would be expected to contain the true population parameter.

Question 5: What is the Probability of Picking 5 Red and 5 Blue Balls Without Replacement?

What is the probability of picking exactly 5 red balls and 5 blue balls in 10 picks without replacement from a set of 100 balls, where there are 70 red balls and 30 blue balls? The text describes how to calculate this probability using combinatorial mathematics and the hypergeometric distribution.

In this question, you're dealing with a classic probability problem that involves combinatorial principles and the concept of probability without replacement. The context is a finite set of balls, each draw affecting the subsequent ones because the composition of the set changes with each draw.

To approach this problem, you need to consider:

  • The total number of balls : If the question doesn't specify this, you need to ask or make a reasonable assumption based on the context.
  • Initial proportion of balls : Know the initial count of red and blue balls in the set.
  • Sequential probability : Remember that each time you draw a ball, you don't put it back, so the probability of drawing a ball of a certain color changes with each draw.
  • Combinations : Calculate the number of ways to choose 5 red balls from the total red balls and 5 blue balls from the total blue balls, then divide by the number of ways to choose any 10 balls from the total.

Thinking through these points will guide you in formulating the solution based on the hypergeometric distribution, which describes the probability of a given number of successes in draws without replacement from a finite population.

This question tests your ability to apply probability theory to a dynamic scenario, a skill that's invaluable in data-driven decision-making and statistical modeling.

To find the probability of picking exactly 5 red balls and 5 blue balls in 10 picks without replacement, we calculate the probability of picking 5 red balls out of 70 and 5 blue balls out of 30, and then divide by the total ways to pick 10 balls out of 100:

Screenshot-2024-04-09-at-12.35.56-AM

Let's calculate this probability:

Screenshot-2024-04-09-at-12.36.16-AM

Question 6: Explain Bayes' Theorem and its importance in calculating posterior probabilities.

Provide an example of how it might be used in genetic testing to determine the likelihood of an individual carrying a certain gene.

Bayes' Theorem is a cornerstone of probability theory that enables the updating of initial beliefs (prior probabilities) with new evidence to obtain updated beliefs (posterior probabilities). This question wants to test candidates ability to explain the concept, mathematical framework for incorporating new evidence into existing predictions or models.

Bayes' Theorem is a fundamental theorem in probability theory and statistics that describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It's crucial for calculating posterior probabilities, which are the probabilities of hypotheses given observed evidence.

Screenshot-2024-04-09-at-12.41.03-AM

  • P ( A ∣ B ) is the posterior probability: the probability of hypothesis A given the evidence B .
  • P(B∣A) is the likelihood: the probability of observing evidence B given that hypothesis A is true.
  • P(A) is the prior probability: the initial probability of hypothesis A , before observing evidence B .
  • P(B) is the marginal probability: the total probability of observing evidence B B under all possible hypotheses.

Question 7: Describe how you would statistically determine if the results of an A/B test are significant - walk me through AB Testing process.

In this question, the interviewer is assessing your comprehensive knowledge of the A/B testing framework. They are looking for evidence that you can navigate the full spectrum of A/B testing procedures, which is essential for data scientists and AI professionals tasked with optimizing features, making data-informed decisions, and testing software products.

The interviewer wants to confirm that you understand each step in the process, beginning with formulating statistical hypotheses derived from business objectives. They are interested in your ability to conduct a power analysis and discuss its components, including determining effect size, significance level, and power, all critical in calculating the minimum sample size needed to detect a true effect and prevent p-hacking.

The discussion on randomization, data collection, and monitoring checks whether you grasp how to maintain the integrity of the test conditions. You should also be prepared to explain the selection of appropriate statistical tests, calculation of test statistics, p-values, and interpretation of results for both statistical and practical significance.

Ultimately, the interviewer is testing whether you can act as a data advocate: someone who can meticulously run A/B tests, interpret the results, and communicate findings and recommendations effectively to stakeholders, thereby driving data-driven decision-making within the organization.

To Learn AB Testing check my AB Testing Crash Course on YouTube .

In an A/B test, my first step is to establish clear business and statistical hypotheses. For example, if we’re testing a new webpage layout, the business hypothesis might be that the new layout increases user engagement. Statistically, this translates to expecting a higher mean engagement score for the new layout compared to the old.

Next, I’d conduct a power analysis. This involves deciding on an effect size that's practically significant for our business context—say, a 10% increase in engagement. I'd choose a significance level, commonly 0.05, and aim for a power of 80%, reducing the likelihood of Type II errors.

The power analysis, which takes into account the effect size, significance level, and power, helps determine the minimum sample size needed. This is crucial for ensuring that our test is adequately powered to detect the effect we care about and for avoiding p-hacking by committing to a sample size upfront.

With our sample size determined, I’d ensure proper randomization in assigning users to the control and test groups, to eliminate selection bias. During the test, I’d closely monitor data collection for any anomalies or necessary adjustments.

Upon completion of the data collection, I’d choose an appropriate statistical test based on the data distribution and variance homogeneity—typically a t-test if the sample size is small or a normal distribution can’t be assumed, or a Z-test for larger samples with a known variance.

Calculating the test statistic and the corresponding p-value allows us to test the null hypothesis. If the p-value is less than our chosen alpha level, we reject the null hypothesis, suggesting that the new layout has a statistically significant impact on engagement.

In addition to statistical significance, I’d evaluate the practical significance by looking at the confidence interval for the effect size and considering the business impact.

Finally, I’d document the entire process and results, then communicate them to stakeholders in a clear, non-technical language. This includes not just the statistical significance, but also how the results translate to business outcomes. As a data advocate, my goal is to support data-driven decisions that align with our business objectives and user experience strategy

For getting more interview questions from Stats to Deep Learning - with over 400 Q&A as well as personalized interview preparation check out our Free Resource Hub and our Data Science Bootcamp with Free Trial .

Thank you for choosing this guide as your learning companion. As you continue to explore the vast field of machine learning, I hope you do so with confidence, precision, and an innovative spirit. Best wishes in all your future endeavors!

About the Author

I am Tatev Aslanyan , Senior Machine Learning and AI Researcher, and Co-Founder of LunarTech where we are making Data Science and AI accessible to everyone. I have had the privilege of working in Data Science across numerous countries, including the US, UK, Canada, and the Netherlands.

With an MSc and BSc in Econometrics under my belt, my journey in Machine and AI has been nothing short of incredible. Drawing from my technical studies during my Bachelors & Masters, along with over 5 years of hands-on experience in the Data Science Industry, in Machine Learning and AI, I've gathered this high-level summary of ML topics to share with you.

After studying this guide, if you're keen to dive even deeper and structured learning is your style, consider joining us at LunarTech , we offer individual courses and Bootcamp in Data Science, Machine Learning and AI.

We provide a comprehensive program that offers an in-depth understanding of the theory, hands-on practical implementation, extensive practice material, and tailored interview preparation to set you up for success at your own phase.

You can check out our Ultimate Data Science Bootcamp and join a free trial to try the content first hand. This has earned the recognition of being one of the Best Data Science Bootcamps of 2023 , and has been featured in esteemed publications like Forbes , Yahoo , Entrepreneur and more. This is your chance to be a part of a community that thrives on innovation and knowledge.  Here is the Welcome message!

Connect with Me

Screenshot-2024-04-09-at-12.05.32-AM

  • Follow me on LinkedIn and  on YouTube
  • Check LunarTech.ai for FREE Resources
  • Subscribe to my The Data Science and AI Newsletter

apple-touch-icon-1024x1024

If you want to learn more about a career in Data Science, Machine Learning and AI, and learn how to secure a Data Science job, you can download this free Data Science and AI Career Handbook .

Co-founder of LunarTech, I harness power of Statistics, Machine Learning, Artificial Intelligence to deliver transformative solutions. Applied Data Scientist, MSc/BSc Econometrics

If you read this far, thank the author to show them you care. Say Thanks

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

2-Sample Hypothesis Test

To add output from a 2-sample hypothesis test, go to Add and complete a form .

In This Topic

2 proportions, mann-whitney.

For example, you can test whether the process proportion defective is the same before and after a change has been made to the process. To see an example, go to Minitab Help: Example of 2 Proportions .

Data considerations

Your data must contain only two categories, such as pass/fail. For more details, go to Minitab Help: Data considerations for 2 Proportions .

For example, you can test whether the process mean is the same before and after a change has been made to the process. To see an example, go to Minitab Help: Example of 2-Sample t .

Your data must be continuous values for Y (output). The sample data should not be severely skewed, and each sample size should be greater than 15. For more details, go to Minitab Help: Data considerations for 2-Sample t .

This test is an alternative to the 2-sample t-test and is used when the data from the two samples are not reasonably normal.

For example, a consultant compares the payrolls of two companies to determine whether their median salaries differ. If the medians from the two companies are different, the consultant uses the confidence interval to determine whether the difference is practically significant. To see an example, go to Minitab Help: Example of Mann-Whitney .

The populations of each sample must have the same shape and spread. The data do not need to be normally distributed. However, if you have more than 15 observations in each sample or your data are not severely skewed, use a 2-Sample t-test because the test has more power. For more details, go to Minitab Help: Data considerations for Mann-Whitney .

The paired t-test is useful for analyzing the same set of items that were measured under two different conditions, differences in measurements made on the same subject before and after a treatment, or differences between two treatments given to the same subject.

For example, a physiologist wants to determine whether a particular fitness program has an effect on resting heart rate. The heart rates of 15 randomly selected people were measured prior the program and then measured again one year later. Therefore, the before and after measurements for each person are a pair of observations. To see an example, go to Minitab Help: Example of Paired t .

Your data must be continuous values for Y (output). You should have a set of paired (dependent) observations, such as measurements made on the same item under different conditions. For more details, go to Minitab Help: Data considerations for Paired t .

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

IMAGES

  1. Hypothesis Testing Example Two Sample t-Test

    2 sample hypothesis testing

  2. Hypothesis Testing Solved Examples(Questions and Solutions)

    2 sample hypothesis testing

  3. Hypothesis Testing With Two Proportions

    2 sample hypothesis testing

  4. PPT

    2 sample hypothesis testing

  5. Ch8: Hypothesis Testing (2 Samples)

    2 sample hypothesis testing

  6. PPT

    2 sample hypothesis testing

VIDEO

  1. Two-Sample Hypothesis Testing

  2. What is a hypothesis test? A beginner's guide to hypothesis testing!

  3. Two Sample Hypothesis Test for Independent Means using Stapplet

  4. 24. Hypothesis Testing for Two Population Variances

  5. (1) One Sample Hypothesis Testing SD Known Large Sample BADM 3933

  6. Two-Sample Hypothesis: Pooled t-Test

COMMENTS

  1. Two Sample t-test: Definition, Formula, and Example

    Fortunately, a two sample t-test allows us to answer this question. Two Sample t-test: Formula. A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  2. PDF Two Samples Hypothesis Testing

    • In a previous learning module, we discussed how to perform hypothesis tests for a single variable x. • Here, we extend the concept of hypothesis testing to the comparison of two variables x A and x B. Two Samples Hypothesis Testing when n is the same for the two Samples . Two-tailed paired samples hypothesis test: • In engineering ...

  3. 10: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  4. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  5. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant . There are a large number of statistical tests that ...

  6. Two-Sample t-Test

    The two-sample t-test (also known as the independent samples t-test) is a method used to test whether the unknown population means of two groups are equal or not. ... Since 2.80 > 2.080, we reject the null hypothesis that the mean body fat for men and women are equal, and conclude that we have evidence body fat in the population is different ...

  7. Hypothesis Testing for 2 Samples: Introduction

    The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).". This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

  8. Hypotheses for a two-sample t test (video)

    If that's below your significance level, then you would reject your null hypothesis and it would suggest the alternative that might be that, "Hey, maybe this mean "is greater than zero." On the other hand, a two-sample T test is where you're thinking about two different populations. For example, you could be thinking about a population of men ...

  9. Statistical Hypothesis Testing Overview

    Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.

  10. Chapter 15 Hypothesis Testing: Two Sample Tests

    15.1.2 Two Sample t test approach. For this we can use the two-sample t-test to compare the means of these two distinct populations. Here the alternative hypothesis is that the lottery players score more points H A: μL > μN L H A: μ L > μ N L thus the null hypothesis is H 0: μL ≤ μN L. H 0: μ L ≤ μ N L. We can now perform the test ...

  11. Two-sample t test for difference of means

    And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct the two sample T test here, to see whether there's evidence that the sizes of tomato plants differ between the fields. Alright, now let's work through this together. So like always, let's first construct our null hypothesis.

  12. Hypothesis Testing

    This statistics video explains how to perform hypothesis testing with two sample means using the t-test with the student's t-distribution and the z-test with...

  13. Two Sample T Test (Defined w/ 7 Step-by-Step Examples!)

    00:37:48 - Create a two sample t-test and confidence interval with pooled variances (Example #4) 00:51:23 - Construct a two-sample t-test (Example #5) 00:59:47 - Matched Pair one sample t-test (Example #6) 01:09:38 - Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7) Practice ...

  14. Hypothesis Testing: Two Samples

    The Population Mean: This image shows a series of histograms for a large number of sample means taken from a population.Recall that as more sample means are taken, the closer the mean of these means will be to the population mean. In this section, we explore hypothesis testing of two independent population means (and proportions) and also tests for paired samples of population means.

  15. 5.5

    5.5 - Hypothesis Testing for Two-Sample Proportions. We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group. These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing ...

  16. 10: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  17. Example of hypotheses for paired and two-sample t tests

    First of all, if you have two groups, one testing one placebo, then it's 2 samples. If it is the same group before and after, then paired t-test. I'm trying to run a dependent sample t-test/paired sample t test through using data from a Qualtrics survey measuring two groups of people (one with social anxiety and one without on the effects of ...

  18. Hypothesis Testing: 2 Means (Independent Samples)

    Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the home run distances), so we will conduct a Test of 2 Means. n1 = 70 n 1 = 70 is the sample size for the first group. n2 = 66 n 2 = 66 is the sample size for the second group.

  19. 8: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 8.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  20. 10.E: Hypothesis Testing with Two Samples (Exercises)

    Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for. independent group means, population standard deviations, and/or variances known. independent group means, population standard deviations, and/or variances unknown. matched or paired samples. single mean.

  21. PDF Hypothesis testing: two samples

    Pearson chi2 Goodness of Fit Test • Assume there is a sample of size n from a population with kclasses (e.g. 6 M&M colors) • Null hypothesis H 0: class i has frequency f i in the population • Alternative hypothesis H 1: some population frequencies are inconsistent with f i •Let O

  22. PDF Two Samples Hypothesis Testing

    Two Samples Hypothesis Testing, Page 2 that an appropriate null hypothesis is δ < 0, i.e., the modification caused the population mean between the two samples to decrease (the least likely scenario since we are assuming here that our experiments show that xB >xA).Thus, we set: [This is a one-tailed hypothesis test.] o Null hypothesis: Critical value is μ0 = 0; the least likely scenario is ...

  23. Learn Statistics for Data Science, Machine Learning, and AI

    If you want to test whether there is a statistically significant difference between the Control and Experimental groups' metrics that are in the form of averages (like average purchase amount) you can use a 2-sample Z-test to test the following hypothesis:

  24. 2-Sample Hypothesis Test

    2-sample t. Use a 2-sample t-test to determine whether the population means of two groups differ. You can also calculate a range of values that is likely to include the difference between the population means. For example, you can test whether the process mean is the same before and after a change has been made to the process.

  25. PDF If testing a 2-sided hypothesis, use a 2-sided test! → for null

    If testing a 2-sided hypothesis, use a 2-sided test! Morals of the sidedness (or tail) tale: + A single, 1-sided test is fine if one has prior information and makes *a* 1-sided hypothesis. + For all other cases, use *a* 2-sided test. + A pair of 1-sided tests with FPR = α is equivalent to one 2-sided test with FPR = 2α, i.e.,