If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 12, simple hypothesis testing.

  • Idea behind hypothesis testing
  • Examples of null and alternative hypotheses
  • Writing null and alternative hypotheses
  • P-values and significance tests
  • Comparing P-values to different significance levels
  • Estimating a P-value from a simulation
  • Estimating P-values from simulations
  • Using P-values to make conclusions
  • Your answer should be
  • an integer, like 6 ‍  
  • an exact decimal, like 0.75 ‍  
  • a simplified proper fraction, like 3 / 5 ‍  
  • a simplified improper fraction, like 7 / 4 ‍  
  • a mixed number, like 1   3 / 4 ‍  
  • a percent, like 12.34 % ‍  
  • (Choice A)   We cannot reject the hypothesis. A We cannot reject the hypothesis.
  • (Choice B)   We should reject the hypothesis. B We should reject the hypothesis.

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Related Articles:

  • Probability and Statistics
  • Data Handling

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

hypothesis test math problem

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

Forgot password? New user? Sign up

Existing user? Log in

Hypothesis Testing

Already have an account? Log in here.

A hypothesis test is a statistical inference method used to test the significance of a proposed (hypothesized) relation between population statistics (parameters) and their corresponding sample estimators . In other words, hypothesis tests are used to determine if there is enough evidence in a sample to prove a hypothesis true for the entire population.

The test considers two hypotheses: the null hypothesis , which is a statement meant to be tested, usually something like "there is no effect" with the intention of proving this false, and the alternate hypothesis , which is the statement meant to stand after the test is performed. The two hypotheses must be mutually exclusive ; moreover, in most applications, the two are complementary (one being the negation of the other). The test works by comparing the \(p\)-value to the level of significance (a chosen target). If the \(p\)-value is less than or equal to the level of significance, then the null hypothesis is rejected.

When analyzing data, only samples of a certain size might be manageable as efficient computations. In some situations the error terms follow a continuous or infinite distribution, hence the use of samples to suggest accuracy of the chosen test statistics. The method of hypothesis testing gives an advantage over guessing what distribution or which parameters the data follows.

Definitions and Methodology

Hypothesis test and confidence intervals.

In statistical inference, properties (parameters) of a population are analyzed by sampling data sets. Given assumptions on the distribution, i.e. a statistical model of the data, certain hypotheses can be deduced from the known behavior of the model. These hypotheses must be tested against sampled data from the population.

The null hypothesis \((\)denoted \(H_0)\) is a statement that is assumed to be true. If the null hypothesis is rejected, then there is enough evidence (statistical significance) to accept the alternate hypothesis \((\)denoted \(H_1).\) Before doing any test for significance, both hypotheses must be clearly stated and non-conflictive, i.e. mutually exclusive, statements. Rejecting the null hypothesis, given that it is true, is called a type I error and it is denoted \(\alpha\), which is also its probability of occurrence. Failing to reject the null hypothesis, given that it is false, is called a type II error and it is denoted \(\beta\), which is also its probability of occurrence. Also, \(\alpha\) is known as the significance level , and \(1-\beta\) is known as the power of the test. \(H_0\) \(\textbf{is true}\)\(\hspace{15mm}\) \(H_0\) \(\textbf{is false}\) \(\textbf{Reject}\) \(H_0\)\(\hspace{10mm}\) Type I error Correct Decision \(\textbf{Reject}\) \(H_1\) Correct Decision Type II error The test statistic is the standardized value following the sampled data under the assumption that the null hypothesis is true, and a chosen particular test. These tests depend on the statistic to be studied and the assumed distribution it follows, e.g. the population mean following a normal distribution. The \(p\)-value is the probability of observing an extreme test statistic in the direction of the alternate hypothesis, given that the null hypothesis is true. The critical value is the value of the assumed distribution of the test statistic such that the probability of making a type I error is small.
Methodologies: Given an estimator \(\hat \theta\) of a population statistic \(\theta\), following a probability distribution \(P(T)\), computed from a sample \(\mathcal{S},\) and given a significance level \(\alpha\) and test statistic \(t^*,\) define \(H_0\) and \(H_1;\) compute the test statistic \(t^*.\) \(p\)-value Approach (most prevalent): Find the \(p\)-value using \(t^*\) (right-tailed). If the \(p\)-value is at most \(\alpha,\) reject \(H_0\). Otherwise, reject \(H_1\). Critical Value Approach: Find the critical value solving the equation \(P(T\geq t_\alpha)=\alpha\) (right-tailed). If \(t^*>t_\alpha\), reject \(H_0\). Otherwise, reject \(H_1\). Note: Failing to reject \(H_0\) only means inability to accept \(H_1\), and it does not mean to accept \(H_0\).
Assume a normally distributed population has recorded cholesterol levels with various statistics computed. From a sample of 100 subjects in the population, the sample mean was 214.12 mg/dL (milligrams per deciliter), with a sample standard deviation of 45.71 mg/dL. Perform a hypothesis test, with significance level 0.05, to test if there is enough evidence to conclude that the population mean is larger than 200 mg/dL. Hypothesis Test We will perform a hypothesis test using the \(p\)-value approach with significance level \(\alpha=0.05:\) Define \(H_0\): \(\mu=200\). Define \(H_1\): \(\mu>200\). Since our values are normally distributed, the test statistic is \(z^*=\frac{\bar X - \mu_0}{\frac{s}{\sqrt{n}}}=\frac{214.12 - 200}{\frac{45.71}{\sqrt{100}}}\approx 3.09\). Using a standard normal distribution, we find that our \(p\)-value is approximately \(0.001\). Since the \(p\)-value is at most \(\alpha=0.05,\) we reject \(H_0\). Therefore, we can conclude that the test shows sufficient evidence to support the claim that \(\mu\) is larger than \(200\) mg/dL.

If the sample size was smaller, the normal and \(t\)-distributions behave differently. Also, the question itself must be managed by a double-tail test instead.

Assume a population's cholesterol levels are recorded and various statistics are computed. From a sample of 25 subjects, the sample mean was 214.12 mg/dL (milligrams per deciliter), with a sample standard deviation of 45.71 mg/dL. Perform a hypothesis test, with significance level 0.05, to test if there is enough evidence to conclude that the population mean is not equal to 200 mg/dL. Hypothesis Test We will perform a hypothesis test using the \(p\)-value approach with significance level \(\alpha=0.05\) and the \(t\)-distribution with 24 degrees of freedom: Define \(H_0\): \(\mu=200\). Define \(H_1\): \(\mu\neq 200\). Using the \(t\)-distribution, the test statistic is \(t^*=\frac{\bar X - \mu_0}{\frac{s}{\sqrt{n}}}=\frac{214.12 - 200}{\frac{45.71}{\sqrt{25}}}\approx 1.54\). Using a \(t\)-distribution with 24 degrees of freedom, we find that our \(p\)-value is approximately \(2(0.068)=0.136\). We have multiplied by two since this is a two-tailed argument, i.e. the mean can be smaller than or larger than. Since the \(p\)-value is larger than \(\alpha=0.05,\) we fail to reject \(H_0\). Therefore, the test does not show sufficient evidence to support the claim that \(\mu\) is not equal to \(200\) mg/dL.

The complement of the rejection on a two-tailed hypothesis test (with significance level \(\alpha\)) for a population parameter \(\theta\) is equivalent to finding a confidence interval \((\)with confidence level \(1-\alpha)\) for the population parameter \(\theta\). If the assumption on the parameter \(\theta\) falls inside the confidence interval, then the test has failed to reject the null hypothesis \((\)with \(p\)-value greater than \(\alpha).\) Otherwise, if \(\theta\) does not fall in the confidence interval, then the null hypothesis is rejected in favor of the alternate \((\)with \(p\)-value at most \(\alpha).\)

  • Statistics (Estimation)
  • Normal Distribution
  • Correlation
  • Confidence Intervals

Problem Loading...

Note Loading...

Set Loading...

Cambridge University Faculty of Mathematics

Or search by topic

Number and algebra

  • The Number System and Place Value
  • Calculations and Numerical Methods
  • Fractions, Decimals, Percentages, Ratio and Proportion
  • Properties of Numbers
  • Patterns, Sequences and Structure
  • Algebraic expressions, equations and formulae
  • Coordinates, Functions and Graphs

Geometry and measure

  • Angles, Polygons, and Geometrical Proof
  • 3D Geometry, Shape and Space
  • Measuring and calculating with units
  • Transformations and constructions
  • Pythagoras and Trigonometry
  • Vectors and Matrices

Probability and statistics

  • Handling, Processing and Representing Data
  • Probability

Working mathematically

  • Thinking mathematically
  • Mathematical mindsets
  • Cross-curricular contexts
  • Physical and digital manipulatives

For younger learners

  • Early Years Foundation Stage

Advanced mathematics

  • Decision Mathematics and Combinatorics
  • Advanced Probability and Statistics

Published 2018 Revised 2019

What Is a Hypothesis Test?

The null hypothesis significance testing (nhst) framework, our simple scenario.

  • Our null hypothesis is $H_0\colon \pi=\frac{1}{2}$.  This says that the proportion is what we believe it should be.
  • Our alternative hypothesis is $H_1\colon \pi\ne\frac{1}{2}$.  This says that the proportion has changed.

Testing our hypotheses

  • We can work out the critical region for $X$, that is, those extreme values of $X$ which would lead us to reject the null hypothesis at 5% significance.  (This can be done even before performing the experiment.)  The probability of $X$ taking a value in this critical region, assuming that the null hypothesis is true, should be 5%, or as close at we can get to 5% without going over it.  In symbols, we can say: $$\mathrm{P}(\text{$X$ in critical region} | \text{$H_0$ is true}) \le 0.05.$$ Then we reject the null hypothesis if $X$ lies in that region.
  • We can work out the probability of $X$ taking the value it did or a more extreme value, assuming that the null hypothesis is true.  This is known as the p-value .  If the p-value is less than 0.05, then we will reject the null hypothesis at 5% significance. [ note 1 ]  In symbols, we can write $$\text{p-value} = \mathrm{P}(\text{$X$ taking this or a more extreme value} | \text{$H_0$ is true}).$$

Other types of scenario

  • Does this drug/treatment/intervention/... have any effect?
  • Which of these drugs/... is more effective, or are they equally effective?
  • Is the mean height/mass/intelligence/test score/... of this population equal to some predicted value?
  • Is the standard deviation of the height/mass/... equal to some predicted value?
  • For two distinct groups of people, is their mean height/mass/... of each group the same?
  • Does this group of people's heights/masses/... appear to be following the probability distribution we expect?
  • Do these two populations' heights/masses/... appear to have the same distribution as each other?
  • Do this population's heights and weights appear to be correlated?

Interpreting the results

The key question that hypothesis testing (nhst) answers, what a hypothesis test does not tell us, a non-significant result.

  • It could be that the null hypothesis is true.  In this case, we would have to be unlucky to get a significant p-value, so most of the time, we will end up accepting the null hypothesis.  (If the null hypothesis is true, we would reject it with a probability of only 0.05.)  
  • On the other hand, it could be that the alternative hypothesis is true, but we did not use a large enough sample to obtain a significant result (or we were just unlucky).  In such a case, we could say that our test was insensitive .  In this situation (the alternative hypothesis is true but we do not reject the null hypothesis), we say that we have made a Type II error .  The probability of this happening depends on the sample size and on how different the true $\pi$ is from $\frac{1}{2}$ (or whatever our null hypothesis says), as is explored in Powerful Hypothesis Testing .

A significant result

  • It could be that the null hypothesis is true.  In this case, we reject the null hypothesis with a probability of $0.05=\frac{1}{20}$, that is, one time in 20 (at a significance level of 5%), so we were just unlucky.  
  • On the other hand, the alternative hypothesis could indeed be true.  Either the sample was large enough to obtain a significant result, or the sample size wasn't that large, but we were just lucky.

Using this tree diagram, we can work out the probabilities of $H_0$ being true or $H_1$ being true given our experimental results.  To avoid the expressions becoming unwieldy, we will write $H_0$ for "$\text{$H_0$ true}$", $H_1$ for "$\text{$H_1$ true}$" and "$\text{p}^+$" for "observed p-value or more extreme".  Then we can write (conditional) probabilities on the branches of the tree diagram leading to our observed p-value: [ note 2 ]

The two routes which give our observed p-value (or more extreme) have the following probabilities: $$\begin{align*} \mathrm{P}(H_0\cap \text{p}^+) &= \mathrm{P}(H_0) \times \mathrm{P}(\text{p}^+ | H_0) \\ \mathrm{P}(H_1\cap \text{p}^+) &= \mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1) \end{align*}$$ (Recall that $\mathrm{P}(H_0\cap \text{p}^+)$ means "the probability of $H_0$ being true and the p-value being that observed or more extreme".) We can therefore work out the probability of the alternative hypothesis being true given the observed p-value, using conditional probability: $$\begin{align*} \mathrm{P}(H_1|\text{p}^+) &= \frac{\mathrm{P}(H_1\cap \text{p}^+)}{\mathrm{P}(\text{p}^+)} \\ &= \frac{\mathrm{P}(H_1\cap \text{p}^+)}{\mathrm{P}(H_0\cap\text{p}^+)+\mathrm{P}(H_1\cap\text{p}^+)} \\ &= \frac{\mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1)}{\mathrm{P}(H_0) \times \mathrm{P}(\text{p}^+ | H_0) + \mathrm{P}(H_1) \times \mathrm{P}(\text{p}^+ | H_1)} \end{align*}$$ Though this is a mouthful, it is a calculation which only involves the four probabilities on the above tree diagram.  (This is an example of Bayes' Theorem , discussed further in this resource .) However, we immediately hit a big difficulty if we try to calculate this for a given experiment.  We know $\mathrm{P}(\text{p}^+ | H_0)$: this is just the p-value itself.  (The p-value tells us the probability of obtaining a result at least this extreme given that the null hypothesis is true.)  But we don't know the probability of the null hypothesis being true or false (that is, $\mathrm{P}(H_0)$ and $\mathrm{P}(H_1)=1-\mathrm{P}(H_0)$), nor do we know the probability of the observed result if the alternative hypothesis is true ($P(\text{p}^+|H_1)$), as knowing that the proportion of greens is not $\frac{1}{2}$ does not tell us what it actually is.  (Similar issues apply to all the other contexts of hypothesis testing listed above.)  So we are quite stuck: in the null hypothesis significance testing model, it is impossible to give a numerical answer to our key question: "Given our results, what is the probability that the alternative hypothesis is true?"  This is because we don't know two of the three probabilities that we need in order to answer the question. An example might highlight the issue a little better.  Let us suppose that we are trying to work out whether a coin is biased (alternative hypothesis), or whether the probability of heads is exactly $\frac{1}{2}$ (null hypothesis).  We toss the coin 50 times and obtain a p-value of 0.02.  Do we now believe that the coin is biased?  Most people believe that coins are not biased, and so are much more likely to attribute this result to chance or poor coin-tossing technique than to the coin being biased. On the other hand, consider a case of a road planner who introduces a traffic-calming feature to reduce the number of fatalities along a certain stretch of road.  The null hypothesis is that there is no change in fatality rate, while the alternative hypothesis is that the fatality rate has decreased.  A hypothesis test is performed on data collected for 24 months before and 24 months after the feature is built.  Again, the p-value was 0.02.  Do we believe that the alternative hypothesis is true?  In this case, we are more likely to believe that the alternative hypothesis is true, because it makes a lot of sense that this feature will reduce the number of fatalities. Our "instinctive" responses to these results are tied up with assigning values to the unknown probabilities in the formula above.  For the coin, we would probably take $\mathrm{P}(H_0)$ to be close to 1, say $0.99$, as we think it is very unlikely that the coin is biased, and $\mathrm{P}(\text{p}^+|H_1)$ will be, say, $0.1$: if the coin is biased, the bias is not likely to be very large, and so it is only a bit more likely that the result will be significant in this case.  Putting these figures into the formula above gives: $$\mathrm{P}(H_1|\text{p}^+) = \frac{0.01 \times 0.1}{0.99 \times 0.02 + 0.01 \times 0.1} \approx 0.05,$$ that is, we are still very doubtful that this coin is biased, even after performing the experiment.  Note that in this case, the probability of these results given that the null hypothesis is true is 0.02, whereas the probability that the null hypothesis is true given these results is $1-0.05=0.95$, which is very different.  This shows how dramatically different the answers to the two questions can be. On the other hand, for the fatalities situation, we might assume quite the opposite: we are pretty confident that the traffic-calming feature will help, so we might take $\mathrm{P}(H_0)$ to be $0.4$, and $\mathrm{P}(\text{p}^+|H_1)$ will be, say, $0.25$ (though the traffic-calming may help, the impact may be relatively small).  Putting these figures into the formula gives: $$\mathrm{P}(H_1|\text{p}^+) = \frac{0.6 \times 0.25}{0.4 \times 0.02 + 0.6 \times 0.25} \approx 0.95,$$ so we are now much more convinced that the traffic-calming feature is helping than we were before we had the data.  This time, the probability of these results given that the null hypothesis is true is still 0.02, whereas the probability that the null hypothesis is true given these results is $1-0.95=0.05$, which is not that different. This approach may seem very disturbing, as we have to make assumptions about what we believe before we do the hypothesis test.  But as we have seen, we cannot answer our key question without making such assumptions.  

Other approaches and some warnings

  • Because our test is two-tailed (in the alternative hypothesis, the true proportion could be less than $\frac{1}{2}$ or more than $\frac{1}{2}$), we must be careful when calculating the p-value: we calculate the probability of the observed outcome or more extreme occurring, and then double the answer to account for the other tail.  We could also compare the probability of the value or more extreme to 0.025 instead of 0.05, but that would not be called a p-value. Likewise, when we determine the critical region, we will have two parts: a tail with large values of $X$ and a tail with small values of $X$; we require that the probability of $X$ lying in the large-value tail is as close as possible to 0.025 without going over it, and the same for the probability of $X$ lying in the small-value tail.  
  • There are complications here when working with two-tail tests as opposed to one-tail tests.  We will ignore this problem, as it does not significantly affect the overall discussion.  
  • "Likelihood" is a technical term.  For a discrete test statistic $X$, the likelihood of $H_0$ given the data $X=x$ means $P(X=x|H_0)$, in other words, how likely would this data be if $H_0$ were true.  It is not the probability of $H_0$ being true given the data.

Further reading

  • Hypothesis testing

by Marco Taboga , PhD

Hypothesis testing is a method of making statistical inferences in which:

we establish an hypothesis, called null hypothesis;

we use some data to decide whether to reject or not to reject the hypothesis.

This lecture provides a rigorous introduction to the mathematics of hypothesis tests, and it provides several links to other pages where the single steps of a test of hypothesis can be studied in more detail.

Table of contents

What you need to know to get started

Testing restrictions, parametric tests, null hypothesis.

  • Alternative hypothesis

Types of errors

Critical region, test statistic, power function, size of a test, criteria to evaluate tests.

Remember that a statistical inference is a statement about the probability distribution from which a sample has been drawn.

[eq3]

The statement we make is chosen between two possible statements:

[eq8]

For concreteness, we will focus on parametric hypothesis testing in this lecture, but most of the things we will say apply with straightforward modifications to hypothesis testing in general.

[eq16]

Understanding how to formulate a null hypothesis is a fundamental step in hypothesis testing. We suggest to read a thorough discussion of null hypotheses here .

When we decide whether to reject a restriction or not to reject it, we can incur in two types of errors:

[eq24]

This mathematical formulation is made more concrete in the next section.

The critical region is often implicitly defined in terms of a test statistic and a critical region for the test statistic.

[eq29]

Example In our example, where we are testing that the mean of the normal distribution is zero, we could use a test statistic called z-statistic. If you want to read the details, go to the lecture on hypothesis tests about the mean .

[eq31]

This maximum probability is called the size of the test .

The size of the test is also called by some authors the level of significance of the test. However, according to other authors, who assign a slightly different meaning to the term, the level of significance of a test is an upper bound on the size of the test.

Tests of hypothesis are most commonly evaluated based on their size and power.

An ideal test should have:

Of course, such an ideal test is never found in practice, but the best we can hope for is a test with a very small size and a very high probability of rejecting a false hypothesis. Nevertheless, this ideal is routinely used to choose among different tests.

For example:

Several other criteria, beyond power and size, are used to evaluate tests of hypothesis. We do not discuss them here, but we refer the reader to the very nice exposition in Berger and Casella (2002).

Examples of how the mathematics of hypothesis testing works can be found in the following lectures:

Hypothesis tests about the mean (examples of tests of hypothesis about the mean of an unknown distribution);

Hypothesis tests about the variance (examples of tests of hypothesis about the variance of an unknown distribution).

Berger, R. L. and G. Casella (2002) "Statistical inference", Duxbury Advanced Series.

How to cite

Please cite as:

Taboga, Marco (2021). "Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/hypothesis-testing.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • Central Limit Theorem
  • Beta distribution
  • F distribution
  • Point estimation
  • Bernoulli distribution
  • Likelihood ratio test
  • Multinomial distribution
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Critical value
  • Almost sure
  • Continuous random variable
  • Probability density function
  • Integrable variable
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

17 Introduction to Hypothesis Testing

Jenna Lehmann

What is Hypothesis Testing?

Hypothesis testing is a big part of what we would actually consider testing for inferential statistics. It’s a procedure and set of rules that allow us to move from descriptive statistics to make inferences about a population based on sample data. It is a statistical method that uses sample data to evaluate a hypothesis about a population.

This type of test is usually used within the context of research. If we expect to see a difference between a treated and untreated group (in some cases the untreated group is the parameters we know about the population), we expect there to be a difference in the means between the two groups, but that the standard deviation remains the same, as if each individual score has had a value added or subtracted from it.

Steps of Hypothesis Testing

The following steps will be tailored to fit the first kind of hypothesis testing we will learn first: single-sample z-tests. There are many other kinds of tests, so keep this in mind.

  • Null Hypothesis (H0): states that in the general population there is no change, no difference, or no relationship, or in the context of an experiment, it predicts that the independent variable has no effect on the dependent variable.
  • Alternative Hypothesis (H1): states that there is a change, a difference, or a relationship for the general population, or in the context of an experiment, it predicts that the independent variable has an effect on the dependent variable.

\alpha = 0.05,

  • Critical Region: Composed of the extreme sample values that are very unlikely to be obtained if the null hypothesis is true. Determined by alpha level. If sample data fall in the critical region, the null hypothesis is rejected, because it’s very unlikely they’ve fallen there by chance.
  • After collecting the data, we find the sample mean. Now we can compare the sample mean with the null hypothesis by computing a z-score that describes where the sample mean is located relative to the hypothesized population mean. We use the z-score formula.
  • We decided previously what the two z-score boundaries are for a critical score. If the z-score we get after plugging the numbers in the aforementioned equation is outside of that critical region, we reject the null hypothesis. Otherwise, we would say that we failed to reject the null hypothesis.

Regions of the Distribution

Because we’re making judgments based on probability and proportion, our normal distributions and certain regions within them come into play.

The Critical Region is composed of the extreme sample values that are very unlikely to be obtained if the null hypothesis is true. Determined by alpha level. If sample data fall in the critical region, the null hypothesis is rejected, because it’s very unlikely they’ve fallen there by chance.

These regions come into play when talking about different errors.

A Type I Error occurs when a researcher rejects a null hypothesis that is actually true; the researcher concludes that a treatment has an effect when it actually doesn’t. This happens when a researcher unknowingly obtains an extreme, non-representative sample. This goes back to alpha level: it’s the probability that the test will lead to a Type I error if the null hypothesis is true.

(\beta)

A result is said to be significant or statistically significant if it is very unlikely to occur when the null hypothesis is true. That is, the result is sufficient to reject the null hypothesis. For instance, two means can be significantly different from one another.

Factors that Influence and Assumptions of Hypothesis Testing

Assumptions of Hypothesis Testing:

  • Random sampling: it is assumed that the participants used in the study were selected randomly so that we can confidently generalize our findings from the sample to the population.
  • Independent observation: two observations are independent if there is no consistent, predictable relationship between the first observation and the second. The value of σ is unchanged by the treatment; if the population standard deviation is unknown, we assume that the standard deviation for the unknown population (after treatment) is the same as it was for the population before treatment. There are ways of checking to see if this is true in SPSS or Excel.
  • Normal sampling distribution: in order to use the unit normal table to identify the critical region, we need the distribution of sample means to be normal (which means we need the population to be distributed normally and/or each sample size needs to be 30 or greater based on what we know about the central limit theorem).

Factors that influence hypothesis testing:

  • The variability of the scores, which is measured by either the standard deviation or the variance. The variability influences the size of the standard error in the denominator of the z-score.
  • The number of scores in the sample. This value also influences the size of the standard error in the denominator.

Test statistic: indicates that the sample data are converted into a single, specific statistic that is used to test the hypothesis (in this case, the z-score statistic).

Directional Hypotheses and Tailed Tests

In a directional hypothesis test , also known as a one-tailed test, the statistical hypotheses specify with an increase or decrease in the population mean. That is, they make a statement about the direction of the effect.

The Hypotheses for a Directional Test:

  • H0: The test scores are not increased/decreased (the treatment doesn’t work)
  • H1: The test scores are increased/decreased (the treatment works as predicted)

Because we’re only worried about scores that are either greater or less than the scores predicted by the null hypothesis, we only worry about what’s going on in one tail meaning that the critical region only exists within one tail. This means that all of the alpha is contained in one tail rather than split up into both (so the whole 5% is located in the tail we care about, rather than 2.5% in each tail). So before, we cared about what’s going on at the 0.025 mark of the unit normal table to look at both tails, but now we care about 0.05 because we’re only looking at one tail.

A one-tailed test allows you to reject the null hypothesis when the difference between the sample and the population is relatively small, as long as that difference is in the direction that you predicted. A two-tailed test, on the other hand, requires a relatively large difference independent of direction. In practice, researchers hypothesize using a one-tailed method but base their findings off of whether the results fall into the critical region of a two-tailed method. For the purposes of this class, make sure to calculate your results using the test that is specified in the problem.

Effect Size

A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used. Usually done with Cohen’s d. If you imagine the two distributions, they’re layered over one another. The more they overlap, the smaller the effect size (the means of the two distributions are close). The more they are spread apart, the greater the effect size (the means of the two distributions are farther apart).

Statistical Power

The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. It’s usually what we’re hoping to get when we run an experiment. It’s displayed in the table posted above. Power and effect size are connected. So, we know that the greater the distance between the means, the greater the effect size. If the two distributions overlapped very little, there would be a greater chance of selecting a sample that leads to rejecting the null hypothesis.

This chapter was originally posted to the Math Support Center blog at the University of Baltimore on June 11, 2019.

Math and Statistics Guides from UB's Math & Statistics Center Copyright © by Jenna Lehmann is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Calcworkshop

Hypothesis Test Comprehensive Walkthrough

Advance Hypothesis Test Skills — Boost Statistical Savvy — Excel in Practical Situations

Hypothesis Testing

1 hr 17 min 21 Examples

  • Introduction to Video: Statistical Hypotheses
  • Overview of Hypothesis Testing and determining a correctly stated hypothesis testing problem (Examples #1-7)
  • State the Null Hypothesis and the Alternative Hypothesis for each scenario (Examples #8-12)
  • Hypothesis Testing Steps and Overview of Type I and Type II errors (Examples #13-14)
  • Describe a Type 1 error and a Type 2 error (Examples #15-16)
  • Overview of p-value and Tails of the Hypothesis Test
  • Find the probability of a Type I and Type II error (Example #17)
  • Identify null hypothesis, alternative hypothesis, and state whether the scenario is a one-tail or two-tailed test (Examples #18-21)

Population Proportion

1 hr 10 min 7 Examples

  • Introduction to Video: Hypothesis Test for Population Proportions
  • Overview of hypothesis tests for proportions and understanding the p-value and significance level
  • Test for significance using a one-tail z-test (Examples #1-2)
  • Construct a hypothesis test for a two-tail z-test (Examples#3)
  • Create a hypothesis test and provide a confidence interval (Example #4)
  • How to create a hypothesis test for the difference of two population proportions (Example #5)
  • Construct a hypothesis test and provide a confidence interval for the difference of proportions (Example #6)
  • Create a hypothesis test for the difference of population proportions (Example #7)

One Sample T Test

59 min 6 Examples

  • Introduction to Video: One Sample t-test
  • Steps for conducting a hypothesis test for population means (one sample z-test or one sample t-test)
  • Conduct a hypothesis test and confidence interval when population standard deviation is known (Example #1)
  • Test the hypothesis when population standard deviation is known (Example #2)
  • Use a one-sample t-test to test a claim (Example #3)
  • Conduct a hypothesis test and confidence interval when population standard deviation is unknown (Example #4)
  • Conduct a hypothesis test by using a one-sample t-test and provide a confidence interval (Example #5)
  • Test the hypothesis by first finding the sample mean and standard deviation (Example #6)

Two Sample T Test

1 hr 22 min 7 Examples

  • Introduction to Video: Two Sample Hypothesis Test for Population Means
  • How to write a two sample hypothesis test when population standard deviation is known? (Example#1)
  • Construct a two sample hypothesis test when population standard deviation is known (Example #2)
  • What is a Two-Sample t-test? Pooled variances or non-pooled variances?
  • Use a two sample t-test with un-pooled variances (Example #3)
  • Create a two sample t-test and confidence interval with pooled variances (Example #4)
  • Construct a two-sample t-test (Example #5)
  • Matched Pair one sample t-test (Example #6)
  • Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7)

Chi Square Test

1 hr 34 min 8 Examples

  • Introduction to Video: Chi-Square Goodness of Fit
  • Overview of the Chi-Square Distribution and Goodness of Fit Test
  • Use the Chi Square Goodness-of-fit Test to determine if observed frequencies match expected frequencies (Examples #1-2)
  • Determine if observed frequencies match expected frequencies using Chi-Square Goodness of fit test (Examples #1-2)
  • Chi Square Test explained for Independence with (Example #5)
  • Test for independence (Example #6)
  • Overview of the Chi Square Test for Homogeneity with (Example #7)
  • Determine if the population proportions are homogeneous (Example #8)

Chapter Test

1 hr 27 min 11 Practice Problems

  • One sample hypothesis test with confidence interval for population proportions (Problem #1)
  • One sample hypothesis test with confidence interval for population means (Problem #2)
  • Describe Type I and Type II errors (Problem #3)
  • Two sample hypothesis test for population proportions (Problem #4)
  • Two sample t-test with pooled variances for difference of means (Problem #5)
  • Chi-Square Goodness of Fit Test (Problem #6)
  • Two sample t-test without pooling for difference of means (Problem #7)
  • Chi-Square Test for Independence (Problem #8)
  • Mathed Pair Test (Problem #9)
  • Chi-Square Test for Homogeneity (Problem #10)
  • One sample t-test with confidence interval for difference of means (Problem #11)

hypothesis test math problem

Hypothesis Testing

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps.

2. Identify a test statistic that can be used to assess the truth of the null hypothesis .

Explore with Wolfram|Alpha

WolframAlpha

More things to try:

  • hypothesis testing mean
  • 30-level 12-ary tree
  • Fresnel S(x) integral rep

Referenced on Wolfram|Alpha

Cite this as:.

Weisstein, Eric W. "Hypothesis Testing." From MathWorld --A Wolfram Web Resource. https://mathworld.wolfram.com/HypothesisTesting.html

Subject classifications

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

12.3: Steps in Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 109929

CHAPTER OBJECTIVES

By the end of this chapter, the student should be able to:

  • Differentiate between Type I and Type II Errors
  • Describe hypothesis testing in general and in practice
  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation known.
  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation unknown.
  • Conduct and interpret hypothesis tests for a single population proportion

One job of a statistician is to make statistical inferences about populations based on samples taken from the population. Confidence intervals are one way to estimate a population parameter. Another way to make a statistical inference is to make a decision about a parameter. For instance, a car dealer advertises that its new small truck gets 35 miles per gallon, on average. A tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. A company says that women managers in their company earn an average of $60,000 per year.

CNX_Stats_C09_CO.jpg

A statistician will make a decision about these claims. This process is called "hypothesis testing." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analysis of the data, to reject the null hypothesis. In this chapter, you will conduct hypothesis tests on single means and single proportions. You will also learn about the errors associated with these tests.

Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will:

  • Set up two contradictory hypotheses.
  • Collect sample data (in homework problems, the data or summary statistics will be given to you).
  • Determine the correct distribution to perform the hypothesis test.
  • Analyze sample data by performing the calculations that ultimately will allow you to reject or decline to reject the null hypothesis.
  • Make a decision and write a meaningful conclusion.

To do the hypothesis test homework problems for this chapter and later chapters, make copies of the appropriate special solution sheets. See Appendix E .

  • The desired confidence level.
  • Information that is known about the distribution (for example, known standard deviation).
  • The sample and its size.
  • Number Theory
  • Data Structures
  • Cornerstones

Exercises - Hypothesis Testing Basics

hypothesis test math problem

  • The mean annual starting salary for computer science majors is greater than $\$70,000$.
  • The standard deviation for human body temperatures equals $0.62^{\circ} F$.
  • The proportion of people that suffer from diabetes in America is less than $9\%$
  • The standard deviation of duration times (in seconds) of the Old Faithful geyser is less than 40 seconds.
  • $H_0 : \mu \le 70,000$ and $H_1 : \mu \gt 70,000$
  • $H_0 : \sigma = 0.62$ and $H_1 : \sigma \ne 0.62$
  • $H_0 : p \ge 0.09$ and $H_1 : p \lt 0.09$
  • $H_0 : \sigma \ge 40$ and $H_1 : \sigma \lt 40$
  • Two-tailed test; $\alpha = 0.01$
  • Right-tailed test; $\alpha = 0.02$
  • $\alpha = 0.05$; $H_1 : p \ne 98.6^{\circ} F$
  • $\alpha = 0.005$; $H_1 : p \lt 5280 \textrm{ ft}$
  • $z = \pm 2.5758$
  • $z = \pm 1.96$
  • $z = -2.5758$

The test statistic for hypothesis tests involving a single proportion is given by:

For each situation below, find the $p$-value using a $0.05$ level of significance, and state the conclusion (i.e., reject or fail to reject the null hypothesis):

  • The test statistic in a left-tailed test is $z=-1.25$
  • The test statistic in a two-tailed test is $z=1.75$
  • With $H_1 : p \ne 0.707$, the test statistic is $z = -2.75$
  • With $H_1 : p \gt 1/4$, the test statistic is $z = 2.30$
  • $0.1056$; fail to reject the null hypothesis
  • $0.0802$; fail to reject the null hypothesis
  • $0.0060$; reject the null hypothesis
  • $0.0107$; reject the null hypothesis
  • In testing the claim that the proportion of blue M & M's is greater $5\%$, the null hypothesis is rejected.
  • In testing the mean length of a pregnancy for American women taking a particular drug is no longer its expected length of $268$ days, we fail to reject the null hypothesis.
  • There is statistically significant evidence the proportion of blue M & M's is greater than $5\%$.
  • There is no statistically significant significant evidence the length of a pregnancy for American women taking this drug is no longer its expected length of $268$ days.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

8.4: Hypothesis Test Examples for Proportions

  • Last updated
  • Save as PDF
  • Page ID 11533

  • In a hypothesis test problem, you may see words such as "the level of significance is 1%." The "1%" is the preconceived or preset \(\alpha\).
  • The statistician setting up the hypothesis test selects the value of α to use before collecting the sample data.
  • If no level of significance is given, a common standard to use is \(\alpha = 0.05\).
  • When you calculate the \(p\)-value and draw the picture, the \(p\)-value is the area in the left tail, the right tail, or split evenly between the two tails. For this reason, we call the hypothesis test left, right, or two tailed.
  • The alternative hypothesis, \(H_{a}\), tells you if the test is left, right, or two-tailed. It is the key to conducting the appropriate test.
  • \(H_{a}\) never has a symbol that contains an equal sign.
  • Thinking about the meaning of the \(p\)-value: A data analyst (and anyone else) should have more confidence that he made the correct decision to reject the null hypothesis with a smaller \(p\)-value (for example, 0.001 as opposed to 0.04) even if using the 0.05 level for alpha. Similarly, for a large p -value such as 0.4, as opposed to a \(p\)-value of 0.056 (\(\alpha = 0.05\) is less than either number), a data analyst should have more confidence that she made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.

Full Hypothesis Test Examples

Example \(\PageIndex{7}\)

Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine if the percentage is the same or different from 50% . Joon samples 100 first-time brides and 53 reply that they are younger than their grooms. For the hypothesis test, she uses a 1% level of significance.

Set up the hypothesis test:

The 1% level of significance means that α = 0.01. This is a test of a single population proportion .

\(H_{0}: p = 0.50\)  \(H_{a}: p \neq 0.50\)

The words "is the same or different from" tell you this is a two-tailed test.

Calculate the distribution needed:

Random variable: \(P′ =\) the percent of of first-time brides who are younger than their grooms.

Distribution for the test: The problem contains no mention of a mean. The information is given in terms of percentages. Use the distribution for P′ , the estimated proportion.

\[P' - N\left(p, \sqrt{\frac{p-q}{n}}\right)\nonumber \]

\[P' - N\left(0.5, \sqrt{\frac{0.5-0.5}{100}}\right)\nonumber \]

where \(p = 0.50, q = 1−p = 0.50\), and \(n = 100\)

Calculate the p -value using the normal distribution for proportions:

\[p\text{-value} = P(p′ < 0.47 or p′ > 0.53) = 0.5485\nonumber \]

where \[x = 53, p' = \frac{x}{n} = \frac{53}{100} = 0.53\nonumber \].

Interpretation of the \(p\text{-value})\: If the null hypothesis is true, there is 0.5485 probability (54.85%) that the sample (estimated) proportion \(p'\) is 0.53 or more OR 0.47 or less (see the graph in Figure).

Normal distribution curve of the percent of first time brides who are younger than the groom with values of 0.47, 0.50, and 0.53 on the x-axis. Vertical upward lines extend from 0.47 and 0.53 to the curve. 1/2(p-values) are calculated for the areas on outsides of 0.47 and 0.53.

\(\mu = p = 0.50\) comes from \(H_{0}\), the null hypothesis.

\(p′ = 0.53\). Since the curve is symmetrical and the test is two-tailed, the \(p′\) for the left tail is equal to \(0.50 – 0.03 = 0.47\) where \(\mu = p = 0.50\). (0.03 is the difference between 0.53 and 0.50.)

Compare \(\alpha\) and the \(p\text{-value}\):

Since \(\alpha = 0.01\) and \(p\text{-value} = 0.5485\). \(\alpha < p\text{-value}\).

Make a decision: Since \(\alpha < p\text{-value}\), you cannot reject \(H_{0}\).

Conclusion: At the 1% level of significance, the sample data do not show sufficient evidence that the percentage of first-time brides who are younger than their grooms is different from 50%.

The \(p\text{-value}\) can easily be calculated.

Press STAT and arrow over to TESTS . Press 5:1-PropZTest . Enter .5 for \(p_{0}\), 53 for \(x\) and 100 for \(n\). Arrow down to Prop and arrow to not equals \(p_{0}\). Press ENTER . Arrow down to Calculate and press ENTER . The calculator calculates the \(p\text{-value}\) (\(p = 0.5485\)) and the test statistic (\(z\)-score). Prop not equals .5 is the alternate hypothesis. Do this set of instructions again except arrow to Draw (instead of Calculate ). Press ENTER . A shaded graph appears with \(\(z\) = 0.6\) (test statistic) and \(p = 0.5485\) (\(p\text{-value}\)). Make sure when you use Draw that no other equations are highlighted in \(Y =\) and the plots are turned off.

The Type I and Type II errors are as follows:

The Type I error is to conclude that the proportion of first-time brides who are younger than their grooms is different from 50% when, in fact, the proportion is actually 50%. (Reject the null hypothesis when the null hypothesis is true).

The Type II error is there is not enough evidence to conclude that the proportion of first time brides who are younger than their grooms differs from 50% when, in fact, the proportion does differ from 50%. (Do not reject the null hypothesis when the null hypothesis is false.)

Exercise \(\PageIndex{7}\)

A teacher believes that 85% of students in the class will want to go on a field trip to the local zoo. She performs a hypothesis test to determine if the percentage is the same or different from 85%. The teacher samples 50 students and 39 reply that they would want to go to the zoo. For the hypothesis test, use a 1% level of significance.

First, determine what type of test this is, set up the hypothesis test, find the \(p\text{-value}\), sketch the graph, and state your conclusion.

Since the problem is about percentages, this is a test of single population proportions.

  • \(H_{0} : p = 0.85\)
  • \(H_{a}: p \neq 0.85\)
  • \(p = 0.7554\)

9.6.13.png

Because \(p > \alpha\), we fail to reject the null hypothesis. There is not sufficient evidence to suggest that the proportion of students that want to go to the zoo is not 85%.

Example \(\PageIndex{8}\)

Suppose a consumer group suspects that the proportion of households that have three cell phones is 30%. A cell phone company has reason to believe that the proportion is not 30%. Before they start a big advertising campaign, they conduct a hypothesis test. Their marketing people survey 150 households with the result that 43 of the households have three cell phones.

Set up the Hypothesis Test:

\(H_{0}: p = 0.30, H_{a}: p \neq 0.30\)

Determine the distribution needed:

The random variable is \(P′ =\) proportion of households that have three cell phones.

The distribution for the hypothesis test is \(P' - N\left(0.30, \sqrt{\frac{(0.30 \cdot 0.70)}{150}}\right)\)

Exercise 9.6.8.2

a. The value that helps determine the \(p\text{-value}\) is \(p′\). Calculate \(p′\).

a. \(p' = \frac{x}{n}\) where \(x\) is the number of successes and \(n\) is the total number in the sample.

\(x = 43, n = 150\)

\(p′ = 43150\)

Exercise 9.6.8.3

b. What is a success for this problem?

b. A success is having three cell phones in a household.

Exercise 9.6.8.4

c. What is the level of significance?

c. The level of significance is the preset \(\alpha\). Since \(\alpha\) is not given, assume that \(\alpha = 0.05\).

Exercise 9.6.8.5

d. Draw the graph for this problem. Draw the horizontal axis. Label and shade appropriately.

Calculate the \(p\text{-value}\).

d. \(p\text{-value} = 0.7216\)

Exercise 9.6.8.6

e. Make a decision. _____________(Reject/Do not reject) \(H_{0}\) because____________.

e. Assuming that \(\alpha = 0.05, \alpha < p\text{-value}\). The decision is do not reject \(H_{0}\) because there is not sufficient evidence to conclude that the proportion of households that have three cell phones is not 30%.

Exercise \(\PageIndex{8}\)

Marketers believe that 92% of adults in the United States own a cell phone. A cell phone manufacturer believes that number is actually lower. 200 American adults are surveyed, of which, 174 report having cell phones. Use a 5% level of significance. State the null and alternative hypothesis, find the p -value, state your conclusion, and identify the Type I and Type II errors.

  • \(H_{0}: p = 0.92\)
  • \(H_{a}: p < 0.92\)
  • \(p\text{-value} = 0.0046\)

Because \(p < 0.05\), we reject the null hypothesis. There is sufficient evidence to conclude that fewer than 92% of American adults own cell phones.

  • Type I Error: To conclude that fewer than 92% of American adults own cell phones when, in fact, 92% of American adults do own cell phones (reject the null hypothesis when the null hypothesis is true).
  • Type II Error: To conclude that 92% of American adults own cell phones when, in fact, fewer than 92% of American adults own cell phones (do not reject the null hypothesis when the null hypothesis is false).

The next example is a poem written by a statistics student named Nicole Hart. The solution to the problem follows the poem. Notice that the hypothesis test is for a single population proportion. This means that the null and alternate hypotheses use the parameter \(p\). The distribution for the test is normal. The estimated proportion \(p′\) is the proportion of fleas killed to the total fleas found on Fido. This is sample information. The problem gives a preconceived \(\alpha = 0.01\), for comparison, and a 95% confidence interval computation. The poem is clever and humorous, so please enjoy it!

Example \(\PageIndex{9}\)

My dog has so many fleas,

They do not come off with ease. As for shampoo, I have tried many types Even one called Bubble Hype, Which only killed 25% of the fleas, Unfortunately I was not pleased.

I've used all kinds of soap, Until I had given up hope Until one day I saw An ad that put me in awe.

A shampoo used for dogs Called GOOD ENOUGH to Clean a Hog Guaranteed to kill more fleas.

I gave Fido a bath And after doing the math His number of fleas Started dropping by 3's! Before his shampoo I counted 42.

At the end of his bath, I redid the math And the new shampoo had killed 17 fleas. So now I was pleased.

Now it is time for you to have some fun With the level of significance being .01, You must help me figure out

Use the new shampoo or go without?

\(H_{0}: p \leq 0.25\)   \(H_{a}: p > 0.25\)

In words, CLEARLY state what your random variable \(\bar{X}\) or \(P′\) represents.

\(P′ =\) The proportion of fleas that are killed by the new shampoo

State the distribution to use for the test.

\[N\left(0.25, \sqrt{\frac{(0.25){1-0.25}}{42}}\right)\nonumber \]

Test Statistic: \(z = 2.3163\)

Calculate the \(p\text{-value}\) using the normal distribution for proportions:

\[p\text{-value} = 0.0103\nonumber \]

In one to two complete sentences, explain what the p -value means for this problem.

If the null hypothesis is true (the proportion is 0.25), then there is a 0.0103 probability that the sample (estimated) proportion is 0.4048 \(\left(\frac{17}{42}\right)\) or more.

Use the previous information to sketch a picture of this situation. CLEARLY, label and scale the horizontal axis and shade the region(s) corresponding to the \(p\text{-value}\).

Normal distribution graph of the proportion of fleas killed by the new shampoo with values of 0.25 and 0.4048 on the x-axis. A vertical upward line extends from 0.4048 to the curve and the area to the left of this is shaded in. The test statistic of the sample proportion is listed.

Indicate the correct decision (“reject” or “do not reject” the null hypothesis), the reason for it, and write an appropriate conclusion, using complete sentences.

Conclusion: At the 1% level of significance, the sample data do not show sufficient evidence that the percentage of fleas that are killed by the new shampoo is more than 25%.

Construct a 95% confidence interval for the true mean or proportion. Include a sketch of the graph of the situation. Label the point estimate and the lower and upper bounds of the confidence interval.

Normal distribution graph of the proportion of fleas killed by the new shampoo with values of 0.26, 17/42, and 0.55 on the x-axis. A vertical upward line extends from 0.26 and 0.55. The area between these two points is equal to 0.95.

Confidence Interval: (0.26,0.55) We are 95% confident that the true population proportion p of fleas that are killed by the new shampoo is between 26% and 55%.

This test result is not very definitive since the \(p\text{-value}\) is very close to alpha. In reality, one would probably do more tests by giving the dog another bath after the fleas have had a chance to return.

Example \(\PageIndex{11}\)

In a study of 420,019 cell phone users, 172 of the subjects developed brain cancer. Test the claim that cell phone users developed brain cancer at a greater rate than that for non-cell phone users (the rate of brain cancer for non-cell phone users is 0.0340%). Since this is a critical issue, use a 0.005 significance level. Explain why the significance level should be so low in terms of a Type I error.

We will follow the four-step process.

  • \(H_{0}: p \leq 0.00034\)
  • \(H_{a}: p > 0.00034\)

If we commit a Type I error, we are essentially accepting a false claim. Since the claim describes cancer-causing environments, we want to minimize the chances of incorrectly identifying causes of cancer.

  • We will be testing a sample proportion with \(x = 172\) and \(n = 420,019\). The sample is sufficiently large because we have \(np = 420,019(0.00034) = 142.8\), \(nq = 420,019(0.99966) = 419,876.2\), two independent outcomes, and a fixed probability of success \(p = 0.00034\). Thus we will be able to generalize our results to the population.

Figure 9.6.11.

Figure 9.6.12.

  • Since the \(p\text{-value} = 0.0073\) is greater than our alpha value \(= 0.005\), we cannot reject the null. Therefore, we conclude that there is not enough evidence to support the claim of higher brain cancer rates for the cell phone users.

Example \(\PageIndex{12}\)

According to the US Census there are approximately 268,608,618 residents aged 12 and older. Statistics from the Rape, Abuse, and Incest National Network indicate that, on average, 207,754 rapes occur each year (male and female) for persons aged 12 and older. This translates into a percentage of sexual assaults of 0.078%. In Daviess County, KY, there were reported 11 rapes for a population of 37,937. Conduct an appropriate hypothesis test to determine if there is a statistically significant difference between the local sexual assault percentage and the national sexual assault percentage. Use a significance level of 0.01.

We will follow the four-step plan.

  • We need to test whether the proportion of sexual assaults in Daviess County, KY is significantly different from the national average.
  • \(H_{0}: p = 0.00078\)
  • \(H_{a}: p \neq 0.00078\)

Figure 9.6.13.

Figure 9.6.14.

  • Since the \(p\text{-value}\), \(p = 0.00063\), is less than the alpha level of 0.01, the sample data indicates that we should reject the null hypothesis. In conclusion, the sample data support the claim that the proportion of sexual assaults in Daviess County, Kentucky is different from the national average proportion.

The hypothesis test itself has an established process. This can be summarized as follows:

  • Determine \(H_{0}\) and \(H_{a}\). Remember, they are contradictory.
  • Determine the random variable.
  • Determine the distribution for the test.
  • Draw a graph, calculate the test statistic, and use the test statistic to calculate the \(p\text{-value}\). (A z -score and a t -score are examples of test statistics.)
  • Compare the preconceived α with the p -value, make a decision (reject or do not reject H 0 ), and write a clear conclusion using English sentences.

Notice that in performing the hypothesis test, you use \(\alpha\) and not \(\beta\). \(\beta\) is needed to help determine the sample size of the data that is used in calculating the \(p\text{-value}\). Remember that the quantity \(1 – \beta\) is called the Power of the Test . A high power is desirable. If the power is too low, statisticians typically increase the sample size while keeping α the same.If the power is low, the null hypothesis might not be rejected when it should be.

  • Data from Amit Schitai. Director of Instructional Technology and Distance Learning. LBCC.
  • Data from Bloomberg Businessweek . Available online at http://www.businessweek.com/news/2011- 09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.html.
  • Data from energy.gov. Available online at http://energy.gov (accessed June 27. 2013).
  • Data from Gallup®. Available online at www.gallup.com (accessed June 27, 2013).
  • Data from Growing by Degrees by Allen and Seaman.
  • Data from La Leche League International. Available online at www.lalecheleague.org/Law/BAFeb01.html.
  • Data from the American Automobile Association. Available online at www.aaa.com (accessed June 27, 2013).
  • Data from the American Library Association. Available online at www.ala.org (accessed June 27, 2013).
  • Data from the Bureau of Labor Statistics. Available online at http://www.bls.gov/oes/current/oes291111.htm .
  • Data from the Centers for Disease Control and Prevention. Available online at www.cdc.gov (accessed June 27, 2013)
  • Data from the U.S. Census Bureau, available online at quickfacts.census.gov/qfd/states/00000.html (accessed June 27, 2013).
  • Data from the United States Census Bureau. Available online at www.census.gov/hhes/socdemo/language/.
  • Data from Toastmasters International. Available online at http://toastmasters.org/artisan/deta...eID=429&Page=1 .
  • Data from Weather Underground. Available online at www.wunderground.com (accessed June 27, 2013).
  • Federal Bureau of Investigations. “Uniform Crime Reports and Index of Crime in Daviess in the State of Kentucky enforced by Daviess County from 1985 to 2005.” Available online at http://www.disastercenter.com/kentucky/crime/3868.htm (accessed June 27, 2013).
  • “Foothill-De Anza Community College District.” De Anza College, Winter 2006. Available online at research.fhda.edu/factbook/DA...t_da_2006w.pdf.
  • Johansen, C., J. Boice, Jr., J. McLaughlin, J. Olsen. “Cellular Telephones and Cancer—a Nationwide Cohort Study in Denmark.” Institute of Cancer Epidemiology and the Danish Cancer Society, 93(3):203-7. Available online at http://www.ncbi.nlm.nih.gov/pubmed/11158188 (accessed June 27, 2013).
  • Rape, Abuse & Incest National Network. “How often does sexual assault occur?” RAINN, 2009. Available online at www.rainn.org/get-information...sexual-assault (accessed June 27, 2013).

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

Hypothesis testing

When interpreting research findings, researchers need to assess whether these findings may have occurred by chance. Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population.

Hypothesis testing uses sample data to evaluate a hypothesis about a population . A hypothesis test assesses how unusual the result is, whether it is reasonable chance variation or whether the result is too extreme to be considered chance variation.

Basic concepts

  • Null and research hypothesis

Probability value and types of errors

Effect size and statistical significance.

  • Directional and non-directional hypotheses

Null and research hypotheses

To carry out statistical hypothesis testing, research and null hypothesis are employed:

  • Research hypothesis : this is the hypothesis that you propose, also known as the alternative hypothesis HA. For example:

H A: There is a relationship between intelligence and academic results.

H A: First year university students obtain higher grades after an intensive Statistics course.

H A; Males and females differ in their levels of stress.

  • The null hypothesis (H o ) is the opposite of the research hypothesis and expresses that there is no relationship between variables, or no differences between groups; for example:

H o : There is no relationship between intelligence and academic results.

H o:  First year university students do not obtain higher grades after an intensive Statistics course.

H o : Males and females will not differ in their levels of stress.

The purpose of hypothesis testing is to test whether the null hypothesis (there is no difference, no effect) can be rejected or approved. If the null hypothesis is rejected, then the research hypothesis can be accepted. If the null hypothesis is accepted, then the research hypothesis is rejected.

In hypothesis testing, a value is set to assess whether the null hypothesis is accepted or rejected and whether the result is statistically significant:

  • A critical value is the score the sample would need to decide against the null hypothesis.
  • A probability value is used to assess the significance of the statistical test. If the null hypothesis is rejected, then the alternative to the null hypothesis is accepted.

The probability value, or p value , is the probability of an outcome or research result given the hypothesis. Usually, the probability value is set at 0.05: the null hypothesis will be rejected if the probability value of the statistical test is less than 0.05. There are two types of errors associated to hypothesis testing:

  • What if we observe a difference – but none exists in the population?
  • What if we do not find a difference – but it does exist in the population?

These situations are known as Type I and Type II errors:

  • Type I Error: is the type of error that involves the rejection of a null hypothesis that is actually true (i.e. a false positive).
  • Type II Error:  is the type of error that occurs when we do not reject a null hypothesis that is false (i.e. a false negative).

hypothesis testing process and types of errors

These errors cannot be eliminated; they can be minimised, but minimising one type of error will increase the probability of committing the other type.

The probability of making a Type I error depends on the criterion that is used to accept or reject the null hypothesis: the p value or alpha level . The alpha is set by the researcher, usually at .05, and is the chance the researcher is willing to take and still claim the significance of the statistical test.). Choosing a smaller alpha level will decrease the likelihood of committing Type I error.

For example, p<0.05  indicates that there are 5 chances in 100 that the difference observed was really due to sampling error – that 5% of the time a Type I error will occur or that there is a 5% chance that the opposite of the null hypothesis is actually true.

With a p<0.01, there will be 1 chance in 100 that the difference observed was really due to sampling error – 1% of the time a Type I error will occur.

The p level is specified before analysing the data. If the data analysis results in a probability value below the α (alpha) level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected.

When the null hypothesis is rejected, the effect is said to be statistically significant. However, statistical significance does not mean that the effect is important.

A result can be statistically significant, but the effect size may be small. Finding that an effect is significant does not provide information about how large or important the effect is. In fact, a small effect can be statistically significant if the sample size is large enough.

Information about the effect size, or magnitude of the result, is given by the statistical test. For example, the strength of the correlation between two variables is given by the coefficient of correlation, which varies from 0 to 1.

  • A hypothesis that states that students who attend an intensive Statistics course will obtain higher grades than students who do not attend would be directional.
  • A non-directional hypothesis states that there will be differences between students who attend do or don’t attend an intensive Statistics course, but we don’t know what group will get higher grades than the other. The hypothesis only states that they will obtain different grades.

The hypothesis testing process

The hypothesis testing process can be divided into five steps:

  • Restate the research question as research hypothesis and a null hypothesis about the populations.
  • Determine the characteristics of the comparison distribution.
  • Determine the cut off sample score on the comparison distribution at which the null hypothesis should be rejected.
  • Determine your sample’s score on the comparison distribution.
  • Decide whether to reject the null hypothesis.

This example illustrates how these five steps can be applied to text a hypothesis:

  • Let’s say that you conduct an experiment to investigate whether students’ ability to memorise words improves after they have consumed caffeine.
  • The experiment involves two groups of students: the first group consumes caffeine; the second group drinks water.
  • Both groups complete a memory test.
  • A randomly selected individual in the experimental condition (i.e. the group that consumes caffeine) has a score of 27 on the memory test. The scores of people in general on this memory measure are normally distributed with a mean of 19 and a standard deviation of 4.
  • The researcher predicts an effect (differences in memory for these groups) but does not predict a particular direction of effect (i.e. which group will have higher scores on the memory test). Using the 5% significance level, what should you conclude?

Step 1 : There are two populations of interest.

Population 1: People who go through the experimental procedure (drink coffee).

Population 2: People who do not go through the experimental procedure (drink water).

  • Research hypothesis: Population 1 will score differently from Population 2.
  • Null hypothesis: There will be no difference between the two populations.

Step 2 : We know that the characteristics of the comparison distribution (student population) are:

Population M = 19, Population SD= 4, normally distributed. These are the mean and standard deviation of the distribution of scores on the memory test for the general student population.

Step 3 : For a two-tailed test (the direction of the effect is not specified) at the 5% level (25% at each tail), the cut off sample scores are +1.96 and -1.99.

hypothesis test math problem

Step 4 : Your sample score of 27 needs to be converted into a Z value. To calculate Z = (27-19)/4= 2 ( check the Converting into Z scores section if you need to review how to do this process)

Step 5 : A ‘Z’ score of 2 is more extreme than the cut off Z of +1.96 (see figure above). The result is significant and, thus, the null hypothesis is rejected.

You can find more examples here:

  • Statistics (RMIT Learning Lab)

Some commonly used statistical techniques

Correlation analysis, multiple regression.

  • Analysis of variance

Chi-square test for independence

Correlation analysis explores the association between variables . The purpose of correlational analysis is to discover whether there is a relationship between variables, which is unlikely to occur by sampling error. The null hypothesis is that there is no relationship between the two variables. Correlation analysis provides information about:

  • The direction of the relationship: positive or negative- given by the sign of the correlation coefficient.
  • The strength or magnitude of the relationship between the two variables- given by the correlation coefficient, which varies from 0 (no relationship between the variables) to 1 (perfect relationship between the variables).
  • Direction of the relationship.

A positive correlation indicates that high scores on one variable are associated with high scores on the other variable; low scores on one variable are associated with low scores on the second variable . For instance, in the figure below, higher scores on negative affect are associated with higher scores on perceived stress

example of positive correlation graph

A negative correlation indicates that high scores on one variable are associated with low scores on the other variable. The graph shows that a person who scores high on perceived stress will probably score low on mastery. The slope of the graph is downwards- as it moves to the right. In the figure below, higher scores on mastery are associated with lower scores on perceived stress.

example of negative correlation graph

Fig 2. Negative correlation between two variables. Adapted from Pallant, J. (2013). SPSS survival manual: A step by step guide to data analysis using IBM SPSS (5th ed.). Sydney, Melbourne, Auckland, London: Allen & Unwin

2. The strength or magnitude of the relationship

The strength of a linear relationship between two variables is measured by a statistic known as the correlation coefficient , which varies from 0 to -1, and from 0 to +1. There are several correlation coefficients; the most widely used are Pearson’s r and Spearman’s rho. The strength of the relationship is interpreted as follows:

  • Small/weak: r= .10 to .29
  • Medium/moderate: r= .30 to .49
  • Large/strong: r= .50 to 1

It is important to note that correlation analysis does not imply causality. Correlation is used to explore the association between variables, however, it does not indicate that one variable causes the other. The correlation between two variables could be due to the fact that a third variable is affecting the two variables.

Multiple regression is an extension of correlation analysis. Multiple regression is used to explore the relationship between one dependent variable and a number of independent variables or predictors . The purpose of a multiple regression model is to predict values of a dependent variable based on the values of the independent variables or predictors. For example, a researcher may be interested in predicting students’ academic success (e.g. grades) based on a number of predictors, for example, hours spent studying, satisfaction with studies, relationships with peers and lecturers.

A multiple regression model can be conducted using statistical software (e.g. SPSS). The software will test the significance of the model (i.e. does the model significantly predicts scores on the dependent variable using the independent variables introduced in the model?), how much of the variance in the dependent variable is explained by the model, and the individual contribution of each independent variable.

Example of multiple regression model

example of multiple regression model to predict help-seeking

From Dunn et al. (2014). Influence of academic self-regulation, critical thinking, and age on online graduate students' academic help-seeking.

In this model, help-seeking is the dependent variable; there are three independent variables or predictors. The coefficients show the direction (positive or negative) and magnitude of the relationship between each predictor and the dependent variable. The model was statistically significant and predicted 13.5% of the variance in help-seeking.

t-Tests are employed to compare the mean score on some continuous variable for two groups . The null hypothesis to be tested is there are no differences between the two groups (e.g. anxiety scores for males and females are not different).

If the significance value of the t-test is equal or less than .05, there is a significant difference in the mean scores on the variable of interest for each of the two groups. If the value is above .05, there is no significant difference between the groups.

t-Tests can be employed to compare the mean scores of two different groups (independent-samples t-test ) or to compare the same group of people on two different occasions ( paired-samples t-test) .

In addition to assessing whether the difference between the two groups is statistically significant, it is important to consider the effect size or magnitude of the difference between the groups. The effect size is given by partial eta squared (proportion of variance of the dependent variable that is explained by the independent variable) and Cohen’s d (difference between groups in terms of standard deviation units).

In this example, an independent samples t-test was conducted to assess whether males and females differ in their perceived anxiety levels. The significance of the test is .004. Since this value is less than .05, we can conclude that there is a statistically significant difference between males and females in their perceived anxiety levels.

t-test results obtained using SPSS

Whilst t-tests compare the mean score on one variable for two groups, analysis of variance is used to test more than two groups . Following the previous example, analysis of variance would be employed to test whether there are differences in anxiety scores for students from different disciplines.

Analysis of variance compare the variance (variability in scores) between the different groups (believed to be due to the independent variable) with the variability within each group (believed to be due to chance). An F ratio is calculated; a large F ratio indicates that there is more variability between the groups (caused by the independent variable) than there is within each group (error term). A significant F test indicates that we can reject the null hypothesis; i.e. that there is no difference between the groups.

Again, effect size statistics such as Cohen’s d and eta squared are employed to assess the magnitude of the differences between groups.

In this example, we examined differences in perceived anxiety between students from different disciplines. The results of the Anova Test show that the significance level is .005. Since this value is below .05, we can conclude that there are statistically significant differences between students from different disciplines in their perceived anxiety levels.

ANOVA results obtained using SPSS

Chi-square test for independence is used to explore the relationship between two categorical variables. Each variable can have two or more categories.

For example, a researcher can use a Chi-square test for independence to assess the relationship between study disciplines (e.g. Psychology, Business, Education,…) and help-seeking behaviour (Yes/No). The test compares the observed frequencies of cases with the values that would be expected if there was no association between the two variables of interest. A statistically significant Chi-square test indicates that the two variables are associated (e.g. Psychology students are more likely to seek help than Business students). The effect size is assessed using effect size statistics: Phi and Cramer’s V .

In this example, a Chi-square test was conducted to assess whether males and females differ in their help-seeking behaviour (Yes/No). The crosstabulation table shows the percentage of males of females who sought/didn't seek help. The table 'Chi square tests' shows the significance of the test (Pearson Chi square asymp sig: .482). Since this value is above .05, we conclude that there is no statistically significant difference between males and females in their help-seeking behaviour.

Chi-square test results obtained using SPSS

  • << Previous: Probability and the normal distribution
  • Next: Statistical techniques >>

Hypothesis test

A significance test, also referred to as a statistical hypothesis test, is a method of statistical inference in which observed data is compared to a claim (referred to as a hypothesis) in order to assess the truth of the claim. For example, one might wonder whether age affects the number of apples a person can eat, and may use a significance test to determine whether there is any evidence to suggest that it does.

Generally, the process of statistical hypothesis testing involves the following steps:

  • State the null hypothesis.
  • State the alternative hypothesis.
  • Select the appropriate test statistic and select a significance level.
  • Compute the observed value of the test statistic and its corresponding p-value.
  • Reject the null hypothesis in favor of the alternative hypothesis, or do not reject the null hypothesis.

The null hypothesis

The null hypothesis, H 0 , is the claim that is being tested in a statistical hypothesis test. It typically is a statement that there is no difference between the populations being studied, or that there is no evidence to support a claim being made. For example, "age has no effect on the number of apples a person can eat."

A significance test is designed to test the evidence against the null hypothesis. This is because it is easier to prove that a claim is false than to prove that it is true; demonstrating that the claim is false in one case is sufficient, while proving that it is true requires that the claim be true in all cases.

The alternative hypothesis

The alternative hypothesis is the opposite of the null hypothesis in that it is a statement that there is some difference between the populations being studied. For example, "younger people can eat more apples than older people."

The alternative hypothesis is typically the hypothesis that researchers are trying to prove. A significance test is meant to determine whether there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. Note that the results of a significance test should either be to reject the null hypothesis in favor of the alternative hypothesis, or to not reject the null hypothesis. The result should not be to reject the alternative hypothesis or to accept the alternative hypothesis.

Test statistics and significance level

A test statistic is a statistic that is calculated as part of hypothesis testing that compares the distribution of observed data to the expected distribution, based on the null hypothesis. Examples of test statistics include the Z-score, T-statistic, F-statistic, and the Chi-square statistic. The test statistic used is dependent on the significance test used, which is dependent on the type of data collected and the type of relationship to be tested.

In many cases, the chosen significance level is 0.05, though 0.01 is also used. A significance level of 0.05 indicates that there is a 5% chance of rejecting the null hypothesis when the null hypothesis is actually true. Thus, a smaller selected significance level will require more evidence if the null hypothesis is to be rejected in favor of the alternative hypothesis.

After the test statistic is computed, the p-value can be determined based on the result of the test statistic. The p-value indicates the probability of obtaining test results that are at least as extreme as the observed results, under the assumption that the null hypothesis is correct. It tells us how likely it is to obtain a result based solely on chance. The smaller the p-value, the less likely a result can occur purely by chance, while a larger p-value makes it more likely. For example, a p-value of 0.01 means that there is a 1% chance that a result occurred solely by chance, given that the null hypothesis is true; a p-value of 0.90 means that there is a 90% chance.

A p-value is significantly affected by sample size. The larger the sample size, the smaller the p-value, even if the difference between populations may not be meaningful. On the other hand, if a sample size is too small, a meaningful difference may not be detected.

The last step in a significance test is to determine whether the p-value provides evidence that the null hypothesis should be rejected in favor of the alternative hypothesis. This is based on the selected significance level. If the p-value is less than or equal to the selected significance level, the null hypothesis is rejected in favor of the alternative hypothesis, and the result is deemed statistically significant. If the p-value is greater than the selected significance level, the null hypothesis is not rejected, and the result is deemed not statistically significant.

The Genius Blog

Hypothesis Testing Solved Examples(Questions and Solutions)

Here is a list hypothesis testing exercises and solutions. Try to solve a question by yourself first before you look at the solution.

Question 1 In the population, the average IQ is 100 with a standard deviation of 15. A team of scientists want to test a new medication to see if it has either a positive or negative effect on intelligence, or not effect at all. A sample of 30 participants who have taken the medication  has a mean of 140. Did the medication affect intelligence? View Solution to Question 1

A professor wants to know if her introductory statistics class has a good grasp of basic math. Six students are chosen at random from the class and given a math proficiency test. The professor wants the class to be able to score above 70 on the test. The six students get the following scores:62, 92, 75, 68, 83, 95. Can the professor have 90% confidence that the mean score for the class on the test would be above 70. Solution to Question 2

Question 3 In a packaging plant, a machine packs cartons with jars. It is supposed that a new machine would pack faster on the average than the machine currently used. To test the hypothesis, the time it takes each machine to pack ten cartons are recorded. The result in seconds is as follows.

Do the data provide sufficient evidence to conclude that, on the average, the new machine packs faster? Perform  the required hypothesis test at the 5% level of significance. Solution to Question 3 

Question 4 We want to compare the heights in inches of two groups of individuals. Here are the measurements: X: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179 Y:  120, 180, 125, 188, 130, 190, 110, 185, 112, 188 Solution to Question 4 

Question 5 A clinic provides a program to help their clients lose weight and asks a consumer agency to investigate the effectiveness of the program. The agency takes a sample of 15 people, weighing each person in the sample before the program begins and 3 months later. The results a tabulated below

Determine is the program is effective. Solution to Question 5

Question 6 A sample of 20 students were selected and given a diagnostic module prior to studying for a test. And then they were given the test again after completing the module. . The result of the students scores in the test before and after the test is tabulated below.

We want to see if there is significant improvement in the student’s performance due to this teaching method Solution to Question 6 

Question 7 A study was performed to test wether cars get better mileage on premium gas than on regular gas. Each of 10 cars was first filled with regular or premium gas, decided by a coin toss, and the mileage for the tank was recorded. The mileage was recorded again for the same cars using other kind of gasoline. Determine wether cars get significantly better mileage with premium gas.

Mileage with regular gas: 16,20,21,22,23,22,27,25,27,28 Mileage with premium gas: 19, 22,24,24,25,25,26,26,28,32 Solution to Question 7 

Question 8  An automatic cutter machine must cut steel strips of 1200 mm length. From a preliminary data, we checked that the lengths of the pieces produced by the machine can be considered as normal random variables  with a 3mm standard deviation. We want to make sure that the machine is set correctly. Therefore 16 pieces of the products are randomly selected and weight. The figures were in mm: 1193,1196,1198,1195,1198,1199,1204,1193,1203,1201,1196,1200,1191,1196,1198,1191 Examine wether there is any significant deviation from the required size Solution to Question 8

Question 9 Blood pressure reading of ten patients before and after medication for reducing the blood pressure are as follows

Patient: 1,2,3,4,5,6,7,8,9,10 Before treatment: 86,84,78,90,92,77,89,90,90,86 After treatment:    80,80,92,79,92,82,88,89,92,83

Test the null hypothesis of no effect agains the alternate hypothesis that medication is effective. Execute it with Wilcoxon test Solution to Question 9

Question on ANOVA Sussan Sound predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides 24 students into three groups of 8 each. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with nose that changes volume periodically. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material. Their scores are tabulated below.

Group1: Constant sound: 7,4,6,8,6,6,2,9 Group 2: Random sound: 5,5,3,4,4,7,2,2 Group 3: No sound at all: 2,4,7,1,2,1,5,5 Solution to Question 10

Question 11 Using the following three groups of data, perform a one-way analysis of variance using α  = 0.05.

Solution to Question 11

Question 12 In a packaging plant, a machine packs cartons with jars. It is supposed that a new machine would pack faster on the average than the machine currently used. To test the hypothesis, the time it takes each machine to pack ten cartons are recorded. The result in seconds is as follows.

New Machine: 42,41,41.3,41.8,42.4,42.8,43.2,42.3,41.8,42.7 Old Machine:  42.7,43.6,43.8,43.3,42.5,43.5,43.1,41.7,44,44.1

Perform an F-test to determine if the null hypothesis should be accepted. Solution to Question 12

Question 13 A random sample 500 U.S adults are questioned about their political affiliation and opinion on a tax reform bill. We need to test if the political affiliation and their opinon on a tax reform bill are dependent, at 5% level of significance. The observed contingency table is given below.

Solution to Question 13

Question 14 Can a dice be considered regular which is showing the following frequency distribution during 1000 throws?

Solution to Question 14

Solution to Question 15

Question 16 A newly developed muesli contains five types of seeds (A, B, C, D and E). The percentage of which is 35%, 25%, 20%, 10% and 10% according to the product information. In a randomly selected muesli, the following volume distribution was found.

Lets us decide about the null hypothesis whether the composition of the sample corresponds to the distribution indicated on the packaging at alpha = 0.1 significance level. Solution to Question 16

Question 17 A research team investigated whether there was any significant correlation between the severity of a certain disease runoff and the age of the patients. During the study, data for n = 200 patients were collected and grouped according to the severity of the disease and the age of the patient. The table below shows the result

Let us decided about the correlation between the age of the patients and the severity of disease progression. Solution to Question 17

Question 18 A publisher is interested in determine which of three book cover is most attractive. He interviews 400 people in each of the three states (California, Illinois and New York), and asks each person which of the  cover he or she prefers. The number of preference for each cover is as follows:

Do these data indicate that there are regional differences in people’s preferences concerning these covers? Use the 0.05 level of significance. Solution to Question 18

Question 19 Trees planted along the road were checked for which ones are healthy(H) or diseased (D) and the following arrangement of the trees were obtained:

H H H H D D D H H H H H H H D D H H D D D

Test at the    = 0.05 significance wether this arrangement may be regarded as random

Solution to Question 19 

Question 20 Suppose we flip a coin n = 15 times and come up with the following arrangements

H T T T H H T T T T H H T H H

(H = head, T = tail)

Test at the alpha = 0.05 significance level whether this arrangement may be regarded as random.

Solution to Question 20

kindsonthegenius

You might also like, kolmogorov-smirnov goodness of fit test, how to perform mann-witney u test(step by step) – hypothesis testing, chi-square test for independence – question 17(a research team…).

I am really impressed with your writing abilities as well as with the structure to your weblog. Is this a paid subject matter or did you modify it yourself?

Either way stay up the excellent high quality writing, it’s uncommon to look a great blog like this one these days..

Below are given the gain in weights (in lbs.) of pigs fed on two diet A and B Dieta 25 32 30 34 24 14 32 24 30 31 35 25 – – DietB 44 34 22 10 47 31 40 30 32 35 18 21 35 29

The relationship between numerical magnitude processing and math anxiety, and their joint effect on adult math performance, varied by indicators of numerical tasks

  • Research Article
  • Open access
  • Published: 22 April 2024

Cite this article

You have full access to this open access article

hypothesis test math problem

  • Monika Szczygieł   ORCID: orcid.org/0000-0001-6544-0734 1 &
  • Mehmet Hayri Sarı   ORCID: orcid.org/0000-0002-7159-2635 2  

According to the hypothesis of Maloney et al. (Cognition 114(2):293–297, 2010. https://doi.org/10.1016/j.cognition.2009.09.013), math anxiety is related to deficits in numerical magnitude processing, which in turn compromises the development of advanced math skills. Because previous studies on this topic are contradictory, which may be due to methodological differences in the measurement of numerical magnitude processing, we tested Maloney et al.’s hypothesis using different tasks and their indicators: numerical magnitude processing (symbolic and non-symbolic comparison tasks: accuracy, reaction time, numerical ratio, distance and size effects, and Weber fraction; number line estimation task: estimation error), math anxiety (combined scores of learning, testing, math problem solving, and general math anxiety), and math performance. The results of our study conducted on 119 young adults mostly support the hypothesis proposed by Maloney et al. that deficiency in symbolic magnitude processing is related to math anxiety, but the relationship between non-symbolic processes and math anxiety was opposite to the assumptions. Moreover, the results indicate that estimation processes (but not comparison processes) and math anxiety are related to math performance in adults. Finally, high math anxiety moderated the relationship between reaction time in the symbolic comparison task, reaction time in the non-symbolic comparison task, numerical ratio effect in the symbolic comparison task, and math performance. Because the results of the joint effect of numerical magnitude processing and math anxiety on math performance were inconsistent, this part of the hypothesis is called into question.

Avoid common mistakes on your manuscript.

Introduction

Symbolic and non-symbolic magnitude processing and math anxiety are considered to be important predictors of math performance (Braham and Libertus 2018 ; Ramirez et al. 2016 ). However, relatively little is known about the relationship between numerical magnitude processing and math anxiety and their combined effect on math performance. Recently, Maloney et al. ( 2010 ) formulated the hypothesis that math anxiety develops because of deficits in numerical magnitude processing, which in turn, compromises the development of advanced mathematical skills. It should be noted that this hypothesis is difficult to verify in one cross-sectional study. Testing whether math anxiety stems from poor numerical processing and in turn affects math performance requires longitudinal studies, preferably from early childhood to adulthood, and control for multiple covariates. However, Maloney et al. ( 2010 ) hypothesis is commonly verified by examining the relationship between numerical magnitude processing and math anxiety, by comparison of the level of numerical magnitude processing in low and high math anxiety individuals, and by testing the mediation or moderation effect of math anxiety in the relationship between numerical magnitude processing and math performance. Research on this topic is most often conducted in a cross-sectional design, mainly in adults, and previous research results on this topic are contradictory.

Conflicting research results may stem from different definitions of numerical magnitude processing and the use of different methodologies for their measurement. Indeed, there is an ongoing debate in the field of mathematical cognition about the uniformity of numerical systems, the cognitive processes involved in processing numerical quantities, and the validity of magnitude processing indicators (Dietrich et al. 2015 ; Krajcsi 2017 ; Krajcsi & Szűcs 2022 ; Krajcsi et al. 2023 ; Lyons et al. 2012 , 2015 ; Piazza et al. 2004 ; Price et al. 2012 ; Smets et al. 2014 ). Therefore, we are interested in verifying the hypothesis regarding the relationship between numerical magnitude processing and math anxiety, and their joint effect on math performance, taking into account various numerical magnitude processes (symbolic vs. non-symbolic; comparison vs. estimation) and ways of measuring them (comparison tasks and its indicators: accuracy, reaction time, numerical ratio effect, distance effect, size effect, and Weber fraction; number line estimation task and its indicator: estimation error). Below, we present the theoretical foundations of numerical magnitude processing, math anxiety, and all relevant research on the relationships between them and their relationships to math performance. We present methodological details of previous relevant studies to highlight possible differences in results due to study design.

Numerical magnitude processing

Although there are many concepts of numerical magnitude processing, generally it is understood as the mental manipulation of quantitative information of either non-symbolic quantities (e.g., dot arrays) or symbolic numbers (e.g., Arabic digits; Chen et al. 2021 ; Schneider et al. 2017 , 2018a , 2018b ). Numerical magnitudes have spatial representations, which are described via the metaphor of the mental number line (Dehaene 2001 , 2011 ; Restle 1970 ; Schneider et al. 2018a ), in which numbers are represented on a left-to-right continuum (Dehaene et al. 1993 ; Zorzi et al. 2002 ). Representing and processing of magnitudes is supported by an approximate number system (ANS; Cantlon et al. 2009 ; Dehaene 2001 ; 2011 ). ANS is a language-independent and innate system that is shared across many species (Pica et al. 2004 ; Wynn 1992 ) and represents quantities approximately (Feigenson et al. 2004 ; Li et al. 2018 ). The precision of the non-symbolic numerical system increases with age (Halberda et al. 2008 ). Accordingly, the symbolic number representation system is an acquired and language-dependent system that represents quantities precisely (Dehaene 2001 ; Li et al. 2018 ) and develops gradually over the school years and allows for the processing of discrete numbers (Marinova & Reynvoet 2020 ; Schleepen et al. 2016 ). There is an ongoing debate about the relationship between the symbolic and non-symbolic systems (Krajcsi, et al., 2022 ; Price et al. 2012 ; Smets et al. 2014 ) as both are viewed as either related (Piazza et al. 2004 ) or independent (Dietrich et al. 2015 ; Honoré & Noël, 2016 ; Lyons et al. 2012 , 2015 ). Moreover, previous research results provide arguments that comparison and estimation processes are unrelated (Guillaume et al. 2016 ; Sasanguie & Reynvoet 2013 ) or weakly related (Tokita & Hirota 2021 ). Therefore, the search for correlates of numerical processing of quantities requires taking this diversity into account.

Non-symbolic and symbolic numerical abilities are most often measured with number line estimation and magnitude comparison tasks (Schneider et al. 2018b ). The number line estimation task requires participants to indicate the position of a given number on a line that is anchored at both ends (e.g., from 0 to 100; Núñez-Peña et al. 2019 ; Pantoja et al. 2020 ). The sum of errors (the difference between the target and marked point on a number line in each trial) is the most used indicator of the number line estimation task. To decide which of two quantities is larger, the comparison task relies on the comparison of Arabic numbers or dot arrays, presented in paired, sequential, or intermixed ways (Price et al. 2012 ). Accuracy (the sum of correct answers; Landerl et al. 2004 ), reaction time (the average reaction time for correct answers; Schwenk et al. 2017 ), individual Weber fraction ( W is the internal Weber fraction, which determines the degree of accuracy of the representation of the internal quantity; Krajcsi 2020 ; Pica et al. 2004 ), numerical ratio effect (it is easier to process number pairs with higher ratios than pairs with smaller ratios; Price et al. 2012 ), numerical distance effect (it is easier to compare number pairs that are further apart; Maloney et al. 2010 ), and numerical size effect (for a constant distance, it is easier to compare smaller number pairs; Hohol et al. 2020 ) are the most commonly used indicators of the performance of comparison tasks. Although these indicators of comparison tasks are considered to reflect the precision of numerical representations, they are usually more or less correlated to each other (e.g., numerical ratio, distance, and size effects are related to each other; Krajcsi 2020 ; Weber fraction and numerical ratio effect are not; Price et al. 2012 ).

Despite the diversity of basic numerical systems (symbolic vs. non-symbolic) and task related processes (estimation vs. comparison), the results of many previous studies suggest a reliable relationship between numerical magnitude processing and math performance in children, adolescents, and adults (Schneider et al. 2017 ; 2018a , 2018b ). However, the relationship between math performance and number line estimation is stronger than the relationship between magnitude comparison and math performance (Schneider et al. 2018b ); also, the strength of the correlation with math performance is significantly higher for the symbolic comparison task than for the non-symbolic. Additionally, the associations between math performance and magnitude comparison tasks decrease very slightly with age (Schneider et al. 2017 ), whereas the relationship between math performance and number line estimation increases slightly with age (Schneider et al. 2018a ).

Math Anxiety

Math anxiety can be defined as “[…] a feeling of tension and anxiety that interferes with the manipulation of numbers and the solving of mathematical problems in a wide variety of ordinary life and academic situations” (Richardson & Suinn 1972 , p. 551). It is a multidimensional construct whose various types have been tested by other researchers, e.g., math learning anxiety, math testing anxiety (Abbreviated Math Anxiety Scale; AMAS; Hopko et al. 2003 ), math problem solving anxiety (Math Anxiety Questionnaire for Adults; MAQA; Szczygieł, 2021a ). As various dimensions of math anxiety are usually positively and highly correlated to each other, math anxiety may also be treated as unidimensional (Single Item Math Anxiety Scale; SIMA; Núñez-Peña et al. 2014 ). Math anxiety begins in childhood and develops during the first years of primary school (Petronzi et al. 2019 ; Szczygieł & Pieronkiewicz 2022 ). It increases as the child gets older, peaking at 14 or 16 years old, followed by plateaus, but continuing through the school years and beyond (Yáñez-Marquina, & Villardón-Gallego 2017 ). Emotions accompanying learning mathematics are so intense that negative math attitude and high math anxiety may persist among adults many years after graduation (Hart & Ganley 2019 ; Szczygieł, 2021a , 2022 ).

Recent meta-analyses have clearly demonstrated a small-to-moderate negative association between math anxiety and math performance in children, adolescents, and adults (Barroso et al. 2021 ; Namkung et al. 2019 ; Zhang et al. 2019 ). The negative relationship between math anxiety and math performance is weaker in primary school children than in secondary school children and adults (Zhang et al. 2019 ). This relationship is observed regardless of the dimensions of math anxiety and the type of mathematical tasks.

Numerical magnitude processing and math anxiety

Although numerical magnitude processing and math anxiety have often been tested as predictors of math performance, little attention has been paid to the relationship between them. Recently, Maloney et al. ( 2010 , 2011 ) proposed the hypothesis that math anxiety is related to deficits in numerical magnitude processing. The development of math anxiety may be a result of a basic low-level deficit in numerical processing which in turn compromises the development of advanced mathematical skills. Maloney et al. ( 2010 ) indicated that math-anxious adults (AMAS) present higher reaction times in a visual enumeration task than their low math anxiety counterparts. In two follow-up studies, Maloney et al. ( 2011 ) demonstrated that high math anxiety adults (AMAS) have a stronger numerical distance effect on response time than low math anxiety individuals in a symbolic comparison task; this suggests that those with a high level of math anxiety have less precise numerical magnitude representations. Núñez-Peña & Suárez-Pellicioni ( 2014 ) tested high and low math anxiety adults (Math Anxiety Rating Scale, MARS, Richardson & Suinn 1972 ) using a single-digit comparison task. They revealed that numerical distance and size effects were marginally larger for the high math anxiety group of adults, thus supporting Maloney et al.’s hypothesis ( 2010 ). However, the hypothesis that numerical deficit underlies math anxiety (AMAS) was challenged in a study by Dietrich et al. ( 2015 ), who conducted a study with symbolic and non-symbolic comparison tasks (indicators: accuracy and reaction time for both, distance and size effects for both, Weber fraction for a non-symbolic task) in adults. Although they replicated previous findings showing that high math anxiety individuals had a larger distance effect in a symbolic comparison task than low math anxiety individuals, there was no relationship between math anxiety and non-symbolic comparison task indicators. Different results regarding symbolic processes were provided by Colomé ( 2019 ), who tested high and low math anxiety (MARS) groups of adults performing symbolic and non-symbolic comparison tasks and the counting Stroop task. The results indicated that high and low math anxiety groups did not differ in terms of accuracy, reaction time, Weber fraction, and numerical ratio effect in a non-symbolic task; they also did not differ in reaction time, numerical distance and size effects in symbolic tasks; and they also did not differ in accuracy, reaction time, and distance effect in the counting Stroop task. Núñez-Peña et al. ( 2019 ) are the only researchers who have checked whether math anxiety level (short MARS; Alexander & Martray 1989 ) is related to performance in the number line estimation task (two tasks – typical and easy: 0–100, 0–1,000; two tasks – untypical and difficult: 0–100,000, 267–367) in adults. They observed that math anxiety is negatively related to performance only in less familiar and more difficult number line estimation tasks, thus again challenging the hypothesis proposed by Maloney et al. ( 2010 ). Summing up, most results indicate that math anxiety is more related to symbolic magnitude processing, especially manipulation of large numbers, than to the processing of non-symbolic quantities. These results suggest that symbolic rather than non-symbolic representation is important for the formation of math anxiety.

Numerical magnitude processing, math anxiety, and math performance

Studies by Maloney et al. ( 2010 , 2011 ), Núñez-Peña & Suárez-Pellicioni ( 2014 ), Dietrich et al. ( 2015 ), Colomé ( 2019 ), and Núñez-Peña et al. ( 2019 ) focused on the relationship between magnitude processing and math anxiety, but not on math performance. However, other researchers have examined the relationship between all the aforementioned variables in adults, which to some extent enabled the verification of the hypothesis formulated by Maloney et al. ( 2010 ). Lindskog et al. ( 2017 ) tested adults using a non-symbolic intermixed comparison task, and the proportion of correct trials in the task was used as an ANS indicator. They revealed that math anxiety (revised MARS; Hopko 2003 ) mediates the relationship between ANS accuracy and math performance (arithmetic fluency test) as well as the relationship between math performance and ANS. Moreover, ANS accuracy predicted math anxiety and math anxiety predicted ANS, even though other variables were controlled for. Skagerlund et al. ( 2019 ) tested the relationship between ANS (a latent variable that consisted of one-digit and two-digit comparison tasks and reaction time as an indicator), math anxiety (Mathematics Anxiety Scale-UK; Hunt et al. 2011 ), and math performance (standardized mathematical test) in adults. They observed that symbolic magnitude processing mediates the relationship between math anxiety and math performance. Slightly different results were obtained by Maldonado Moscoso et al. ( 2020 ), who revealed that math anxiety (AMAS) mediates the link between ANS (measured by a non-symbolic comparison task with the Weber fraction as an indicator) and math performance (standardized mathematical test) in adults with high math anxiety. These researchers also found a significant correlation between ANS and math anxiety, but only in the high math anxiety group. However, their further study (Maldonado Moscoso et al. 2022 ) showed that the precision of numerosity estimation (Weber fraction) was negatively related to math anxiety (AMAS) and that math anxiety fully accounted for the relationship between ANS and math performance in adults. Inconsistent results were provided by Braham & Libertus ( 2018 ), who evaluated levels of non-symbolic magnitude processing (non-symbolic comparison task: accuracy as an indicator), math anxiety (MARS), and math performance (standardized mathematical test) in adults. They observed that ANS and math anxiety independently predict calculation, math fluency, and applied problem solving, but they interact only in the case of the last. Braham & Libertus ( 2018 ), concluded that better ANS may be a protective factor against the negative effect of math anxiety on math performance in certain types of math. In contrast, Silver et al. ( 2022 ) more recently showed that ANS (two non-symbolic comparison tasks, accuracy as an indicator) is not related to math anxiety (AMAS) in adults, and that math anxiety, but not ANS, predicts math performance in the structural equation model.

Previous studies also tested the relationship between numerical magnitude processing, math anxiety, and math performance in children. Cargnelutti et al. ( 2017 ) did not find any significant relationship between ANS (accuracy of non-symbolic comparison, addition, and estimation tasks) and math anxiety (Scale for Early Math Anxiety; Wu et al. 2012 ) in second grade children. In two studies on first- to third-grade children, Szczygieł ( 2021b ) mostly found no relationship between non-symbolic magnitude processing (comparison task with accuracy as an indicator) and math anxiety (modified Abbreviated Math Anxiety Scale for Elementary Children; Szczygieł, 2019 ; Math Anxiety Questionnaire for Children; Szczygieł, 2020a ). In a recent study, Sarı and Szczygieł ( 2023 ) observed that math anxiety (Math Anxiety Scale; Şentürk 2010 ) mostly does not mediate the relationship between mental number representation (error rate in number line estimation task) and math performance (standardized mathematical tests). However, they showed that a higher sum of errors in number line task performance is related to a higher level of math anxiety, and the accuracy of mental representation of numbers in high math anxiety children is a key factor contributing to math performance.

Findings regarding the relationship between symbolic and non-symbolic magnitude processing, math anxiety, and math performance are inconsistent. Because studies on children have largely failed to show a joint effect of numerical magnitude processing and math anxiety on math performance, and some studies in adults have found that such associations are observed in groups with high math anxiety, it can be assumed that the protective effect of numerical magnitude representations appears in individuals with a high level of math anxiety.

Objectives and hypotheses of the present study

The main aim of our study was to test the hypothesis formulated by Maloney et al. ( 2010 ), namely that math anxiety is related to deficits in numerical magnitude processing and both variables interact to predict math performance. Because the results of previous studies on this topic have been inconsistent and used different methodology, we suppose that the relationship between numerical magnitude processing, math anxiety, and math performance may depend primarily on the different cognitive processes involved in processing numerical information (symbolic vs. non-symbolic; comparison vs. estimation) and type of their indicators (accuracy, reaction time, ratio effect, size effect, distance effect, and Weber fraction). Our assumption is based on the following premises. First, it has been observed that symbolic and non-symbolic systems are related (Piazza et al. 2004 ) or independent (Dietrich et al. 2015 ; Lyons et al. 2012 , 2015 ); second, the processes of comparison and estimation have been found to be unrelated (Sasanguie & Reynvoet 2013 ); third, numerical magnitude indicators have been found to be weakly correlated or unrelated to each other (Krajcsi 2017 ; Krajcsi et al. 2022 ; Price et al. 2012 ; Smets et al. 2014 ). Therefore, the type of measurement and indicators used in a study may change the nature of the relationship between numerical magnitude processing, math anxiety, and math performance; they may also explain previously inconsistent results.

To test whether the relationship between numerical magnitude processing, math anxiety, and math performance depends on the cognitive resources involved in performing numerical tasks, we designed a study that included different types of measures of numerical processing (number line estimation task, symbolic and non-symbolic comparison tasks). In addition, we calculated numerous indicators for numerical comparison and estimation tasks (accuracy, reaction time, ratio effect, size effect, distance effect, and Weber fraction). We were interested in whether the results would be consistent regardless of the numerical tasks and indicators used. We expect that the strength of the relationships between numerical magnitude processing, math anxiety, and math performance will be greater in symbolic magnitude processing than in non-symbolic processing. Although previous studies have used various measures of math anxiety, we assume that this has not had a significant impact on the results because different dimensions of math anxiety are strongly related to each other (Oszwa 2020 ; Szczygieł, 2021a ). However, we examined different types of math anxiety (math learning anxiety, math testing anxiety, math problem solving anxiety, general math anxiety), taking into account the multidimensional math anxiety index in the study (we created one math anxiety indicator; see Data Analysis section and Appendix, Tables 4 and 5 ). Our study focuses on adults because most of the previous conflicting research verified the hypothesis of Maloney et al. ( 2010 ) in this age group. We were also interested in studying the relationship between numerical magnitude processing, math anxiety, and math performance in adults because developmental and educational changes are less rapid at this stage of life compared to childhood. Finally, we expect that the combined effect of numerical magnitude processing and math anxiety on math performance will emerge primarily among high math anxiety individuals, of which there may be many among adults.

Based on the previous inconsistent research results, we formulated four hypotheses: weaker numerical magnitude processing is related to higher math anxiety (H1); stronger numerical magnitude processing is positively related to math performance (H2); higher math anxiety is related to lower math performance (H3); stronger numerical magnitude processing is positively related to math performance in those with a high level of math anxiety (H4).

Participants

We recruited 121 people for the study but the results of two participants were excluded due to using the same pseudonym and overwriting the results. The final sample included 119 young adults (90 women, 29 men) between the ages of 18 and 32 ( M  = 21.42, SD = 2.99). Adults differed in their fields of education and profession. Participants declared that their high school class profile was related to science, technology, engineering, and mathematics (STEM ; N  = 52), humanities and social sciences ( N  = 42), and other fields ( N  = 25). Study or work in the field of STEM was declared by 40 people, humanities and social sciences by 60 people, and ‘other’ was indicated by 19 people. We used convenience sampling as participants were recruited through an advertisement posted on the nationwide internet platform olx.pl. The minimum number of subjects was determined with g*power a priori for the planned statistical analyses ( α  = 0.05, β  = 0.80, r  = 0.25, two-tails test). Because in some cases the result did not meet the criteria for a reliable indicator of the variable (see measurements), there are differences in the number of observations in the tasks.

Measurements

Number Line Estimation Task (NLE) is a computer task that measures the mental representation of numbers. Participants were presented with number lines bounded by 0 at the origin and 1000 at the endpoint, and Arabic digits were displayed above the line. The participants’ task is to mark the place corresponding to the given number (2, 5, 18, 34, 56, 78, 100, 122, 147, 150, 163, 179, 246, 366, 486, 606, 722, 725, 738, 754, 818, 938) on the line. We used the 22 numbers proposed by Opfer and Siegler ( 2007 ). After marking the location of a number, the next number is displayed, and so on (Schneider et al. 2018a ). Participants were not provided with feedback during the research session. The stimuli were presented to subjects in random order. No time pressure was applied in the procedure. The indicator of NLE is the sum of errors (difference between target and marked point on the number line) in each trial. A higher error in NLE indicates a worse mental representation of numbers.

The Non-Symbolic Comparison Task (NS) is used to measure the accuracy of ANS. In each trial, two boards with white dots appear on the screen; the participants’ task is to choose the one with more elements. The choice is made by pressing the marked keys on the keyboard: "A" (for the left board) or "L" (for the right board). The boards contained 8, 10, 12, 13, 14, 18, 20, 22, 26, or 32 dots. The second board always contained 16 dots. The set size and set ratio were balanced, which means the same number of sets with a given number and the same number of sets for each ratio were displayed on the screen. The boards differed in the size of the dots. In the consistent condition, the larger set was marked with larger dots; in the inconsistent condition, the larger set had dots of smaller size. In both the training and the test sessions, the boards presented the consistent and inconsistent conditions equally and in random order, as recommended by Nuerk et al. ( 2004 ). There were 30 trials in both conditions, and each trial was repeated twice, giving a total of 120 trials in the test session. Each pair of boards was displayed for 7 s, followed immediately by the next pair. The task started with a training session (4 trials). The whole task was implemented using DMDX software (Forster & Forster 2003 ). MATLAB was used to generate the boards (Gebuis & Reynvoet 2011 ). Accuracy (NS ACC), reaction time (NS RT), numerical ratio effect calculated on RT (NS NRE), distance effect calculated on RT (NS NDE), size effect calculated on RT (NS NSE), and Weber fraction ( W ) were used as indicators of ANS. NS ACC was calculated as the sum of correct answers (9% errors). NS RT was based on correct answers, except for outliers (5% outliers by rule M  ± 3 SD ; M  = 1476.72 ms, SD = 959.04 ms). Slopes were calculated for NRE and NDE. NSE was calculated as mean RT differences between large and small numbers in the constant distance between numbers (distance 2, 4, 6). Hypotheses were tested using data from participants in which the effect was demonstrated (see N in Table  1 ). Values used for slope calculation were as follows: NRE 0.5, 0.6, 0.7, 0.8, 0.9 (rounded to the decimal by mathematical rules), NDE 2, 3, 4, 6, 8, 10, 16. To make sure that numerical effects were observed, we conducted a one-sample t -Student test that compared the mean effects to 0. In each case, the difference was significant ( p  < 0.001), thus showing NS NRE ( \({t}_{\left(116\right)}=\) 16.87), NS NDE ( \({t}_{\left(118\right)}=\) − 17.26), and NS NSE ( \({t}_{\left(74\right)}=\) 9.08) effects. The Appendix includes figures (Figs. 4 and 5 ) for NS NRE and NS NDE effects (there is no NSE as it was calculated for differences between two conditions, no slopes). W for individuals was calculated using Pica et al. ( 2004 ) rules. More precise ANS is reflected by more points in the comparison task (NS ACC), shorter reaction time (NS RT), weaker numerical ratio effect slope (NS NRE), weaker distance effect slope (NS NDE), smaller differences in RT between large and small numbers (NS NSE), and higher Weber fraction ( W ). The numerical representation is more precise as the NRE and NDE slope approaches zero. In the case of NRE, a higher positive slope means worse accuracy of numerical representation. In the case of NDE, a higher negative slope reflects worse accuracy of numerical representation. We calculated split-half reliability for following indicators: NS ACC r  = 0.67, p  < 0.001, N  = 119, NS RT r  = 0.97, p  < 0.001, N  = 119, NS NRE r  = 0.57, p  < 0.001, N  = 109, NS NDE r  = 0.52, p  < 0.001, N  = 112, NS NSE r  = 0.42, p  < 0.01, N  = 38.

The Symbolic Comparison Task (S) is similar to NS with the exception that numbers were used instead of dots. Two-digit numbers were used in the range 21 to 98. 104 research trials were used, and two training trials preceded the research session. The procedure of the test was analogous to that described above. Accuracy (S ACC), reaction time (S RT), numerical ratio effect calculated on RT (S NRE), distance effect calculated on RT (S NDE), and numerical size effect calculated on RT (S NSE) were used as indicators of symbolic magnitude representation. We calculated S ACC as the sum of correct answers (2% errors). NS RT was calculated on correct answers except outliers (2% outliers in accordance with rule M  ± 3SD; M  = 892.27 ms, SD = 299.79 ms). The hypotheses were tested based on data from participants in which the effect was demonstrated (see N in Table  1 ). Values used for slope calculation for NRE were 0.3, 0.4, 0.6, 0.7, 0.8, 0.9, 1 (rounded to the decimal in accordance with mathematical rules). For NDE, the values used were 10, 20, 30, 40, 50, 60, 70 (rounded to the decimal in accordance with mathematical rules). The indicator for NSE was calculated as the difference in RT between the higher number and the lower number in number pairs with a fixed-unit distance. Again, we conducted a one-sample t -Student test to compare the mean effects to 0. In each case, the difference was significant ( p  < 0.001), thus showing S NRE ( \({t}_{\left(117\right)}=\) 27.5), S NDE ( \({t}_{\left(117\right)}=\) − 26.51), and S NSE ( \({t}_{(89)}=\) 13.71) effects. Figures for S NRE and S NDE effects (there was no S NSE as it was calculated for differences between two conditions, not slopes) are presented in the Appendix (Figs. 6 and 7 ). More precise ANS is reflected by more points in the comparison task (S ACC), shorter reaction time (S RT), weaker numerical ratio effect slope (S NRE), weaker distance effect slope (S NDE), and smaller differences between large and small numbers in RT (S NSE). As in NS, the numerical representation is more precise as the NRE and NDE slope approaches zero. In the case of NRE, a higher positive slope means worse accuracy of numerical representation; in the case of NDE, a higher negative slope reflects worse accuracy of numerical representation. We established split-half reliability for all indicators: S ACC r  = 0.51, p  < 0.001, N  = 119, S RT r  = 0.97, p  < 0.001, N  = 119, S NRE r  = 0.30, p  < 0.001, N  = 113, S NDE r  = 0.14, p  = 0.198, N  = 92, S NSE r  = 0.19, p  = 0.264, N  = 37.

  • Math anxiety

We wanted to consider the multidimensionality of math anxiety in the study, so we used several research tools and then created one math anxiety indicator (see Data Analysis and Appendix).

The Single-Item Math Anxiety Scale (SIMA) measures general math anxiety (Núñez-Peña et al. 2014 ). The scale consists of one question: "On a scale of 1 to 10, how mathematically anxious are you?". Respondents answer on a 10-point scale, where 1 means "no anxiety" and 10 means "very anxious". SIMA has good psychometric properties and is considered an interesting alternative to longer questionnaires measuring math anxiety. SIMA test–retest reliability was found to be r  = 0.72 in a Polish sample of adults (Szczygieł, 2022 ). As the scale contains 1 item, reliability was not estimated in the current research.

Math Anxiety Questionnaire for Adults (MAQA; Szczygieł, 2021a ) is a non-school-dependent questionnaire, which means it has no items related to formal mathematics education. Its purpose is to measure the level of anxiety associated with solving mathematical problems in everyday and academic life. MAQA was designed to measure math anxiety in adults in an ecologically valid way. The questionnaire requires referring to various situations related to mathematics by marking a response on a 4-point scale, where 1 means "I definitely do not feel anxiety" and 4 means "I definitely feel anxiety". The questionnaire includes 19 items. A higher sum of points in MAQA means a higher level of math anxiety. The MAQA has satisfactory psychometric properties (Szczygieł, 2021a ). McDonald’s ω calculated for latent MAQA factor was 0.87 in the current study.

The Abbreviated Math Anxiety Scale (AMAS) is a 9-item questionnaire on math anxiety, designed by Hopko et al. ( 2003 ) and adapted to the Polish context by Cipora et al. ( 2015 ). The AMAS total score includes two components: anxiety related to learning mathematics (AMAS-L) and anxiety related to being tested in mathematics (AMAS-T). Responses are given on a 5-point Likert scale, where 1 means low anxiety and 5 means high anxiety. A higher sum of points on each subscale indicates higher learning and testing math anxiety, respectively. The questionnaire is characterized by satisfactory psychometric properties in adults. The latent factor reliability for the Learning scale was ω = 0.74, and for the Testing scale it was ω = 0.82.

  • Math performance

Math Performance Test (MATH) was used to test math competencies in adults. There are no standardized math tests for adults in Poland, so participants’ math performance was measured using self-prepared tasks. The tasks were selected and compiled based on knowledge and competencies that should be mastered by high school students (counting, geometry). Each math task was a word problem formulated in an everyday life context. This decision was made because adults are diverse in terms of age and experience in learning mathematics. The test consists of 20 multiple choice close-ended questions. 3% of responses were removed as outliers (results from four participants). A higher total score indicates better math performance. The average level of test difficulty for the whole group was 0.83, which means that test was rather easy. The reliability of the test was Cronbach’s α  = 0.78.

The study took place in the laboratory and participants were tested individually by two researchers who were familiar with the procedure. The study was approved by the ethics committee. Participants were informed about the purpose of the study and the possibility of asking questions and withdrawing from participation; they were briefed on the GDPR rules and filled out a consent form. Then, the actual procedure started, with the following tasks being carried out in sequence: NS, NLE, S. Each task was preceded by instructions and a short training session. Next, the subject filled out the SIMA, AMAS, and MAQA questionnaires. The final part of the study was MATH, after which the subject filled out self-reported personal information (gender, education, profession). Finally, participants performed verbal and visuospatial working memory tasks, whose results are not presented in this study. Participants received remuneration in the form of a voucher to use at the selected store. The reward amount (€8–12) was random and was not dependent on the level of task performance.

Data analysis

The descriptive statistics, zero-order correlation, and moderation analyses were prepared in IBM SPSS Statistics 29 and PROCESS macro (Model 1; Hayes 2017 ). Before hypothesis testing, we created one indicator of math anxiety (MA) from the sum of points in all math anxiety questionnaires (SIMA, MAQA, AMAS). Before that, we tested whether math anxiety can be treated as one factor using CFA (R, Lavaan package, Rosseel 2012 ) with a maximum likelihood estimator. To evaluate the model’s fit, we used the following interpretation criterion: \({\chi }^{2}\)  should be non-significant, RMSEA and SRMR should be < 0.08, and CFI and TLI should be > 0.95 (Hu & Bentler, 1999 ; Kline 2016 ).

We tested the model in which MA was a superior latent factor over four factors: SIMA (observed variable—only one item), MAQA (latent variable consisted of 19 items, λ from 0.31 to 0.67, p  < 0.001), AMAS Learning (latent factor consisted of 5 items, λ from 0.48 to 0.71, p  < 0.001), and AMAS Testing (latent factor consisted of 4 items, λ from 0.67 to 0.80, p  < 0.001). Factor loadings for each item in all scales are provided in Table 4 in the Appendix. Correlations between all math anxiety scales are presented in Table 5 in the Appendix. The results confirmed the unidimensional model: \({\chi }_{(374)}^{2}\) = 289.41, p  = 1.00, CFI = 1.00, TLI = 1.03, RMSEA = 0 [90% CI 0, 0], SRMR = 0.08. All paths were well fitted to the math anxiety factor: SIMA λ = 0.86, p  < 0.001, MAQA λ = 0.75, p  < 0.001, AMAS Learning λ = 0.80, p  < 0.001, AMAS Testing λ = 0.78. The reliability of the latent factor for MA was ω = 0.85.

Descriptive statistics and correlation analyses

Descriptive statistics of all examined variables are presented in Table  1 .

We tested whether MA is positively related to NS/S RT, NS/S NRE, NS/S NSE, NLE error; MA is negatively related to NS/S ACC, NS/S NDE, NS W (H1); MATH is negatively related to NS/S RT, NS/S NRE, NS/S NSE, NLE error, MATH is positively related to NS/S ACC, NS/S NDE, NS W (H2); MA and MATH are negatively related (H3); a higher level of MATH will be related to higher NS/S ACC, shorter NS/S RT, weaker NS/S NRE, weaker NS/S NDE, weaker NS/S NSE, stronger Weber fraction ( W ), and lower NLE error in those with a high level of MA (H4). The results are presented in Table  2 .

We observed that MA is negatively and weakly related to NS RT (opposite to H1), NS NRE (opposite to H1), and NS NSE (opposite to H1). MA is positively and weakly related to NS NDE (opposite to H1). No relationship was observed between MA and NS ACC (H1 not confirmed, p  = 0.91), and MA and NS W (H1 not confirmed, p  = 0.59) . Then, we observed a weak negative relationship between MA and S ACC (H1 confirmed) and between MA and S NDE (H1 confirmed). We observed a weak positive relationship between MA and S RT (H1 confirmed) and between MA and S NRE (H1 confirmed). Between MA and S NSE we did not observe a significant relationship (H1 not confirmed, p  = 0.06). A weak positive relationship was observed between MA and NLE error (H1 confirmed).

H2 was confirmed only in the case of the relationship between MATH and NLE error (negative and moderate relationship). No significant correlations were observed between the MATH and NS indicators and between the MATH and S indicators.

MA and MATH were found to be negatively and weakly related, which confirms H3.

Interaction analysis

In the second step of analysis, we tested whether MA moderates the relationship between various indicators of numerical magnitude processing and MATH (see Table  3 , Figs.  1 , 2 and 3 ).

figure 1

Moderation Effect of Math Anxiety (MA) on the Relationship between Reaction Time in Non-Symbolic Comparison Task (NS RT) and Math Performance (MATH). Positive Relationship between NS RT and MATH is Observed in Adults with High level of MA

figure 2

Moderation Effect of Math Anxiety (MA) on the Relationship between Reaction Time in the Symbolic Comparison Task (S RT) and Math Performance (MATH). A Positive Relationship between S RT and MATH is Observed in Adults with High level of MA

figure 3

Moderation Effect of Math Anxiety (MA) on the Relationship between Numerical Distance Effect on Reaction Time in the Symbolic Comparison Task (S NRE RT) and Math Performance (MATH). A Positive Relationship between S NRE RT and MATH is Observed in Adults with High level of MA

The results mostly challenge hypothesis H4 by showing that more precise numerical magnitude processing is related to higher MATH in individuals with high MA in three out of twelve analyses. There are positive relationships between NS RT and MATH (see Fig.  1 ) and between S RT and MATH (see Fig.  2 ) in high MA participants. The results also showed that there is a negative relationship between S NRE RT and MATH (see Fig.  3 ) in high MA participants. The other tested interactions were non-significant (see Table  3 ).

Having high mathematical competencies is considered very important from an individual and socioeconomic point of view. Compared to those with lower levels, people with high mathematical competencies earn more, are more successful professionally (Estrada-Mejia et al. 2016 , 2020 ), and are more likely to make better decisions regarding their education and health (Garcia-Retamero et al. 2019 ; Rivera-Batiz 1992 ; Reyna et al. 2009 ; Sobków et al. 2020 ). All over the world, great importance is attached to organizing an optimal educational system that enables the acquisition of strong mathematical competencies and helps those who have difficulties coping with math (Dyson et al. 2013 ; Ramirez et al. 2018 ). Previous results suggest that enhanced numerical magnitude processing (Honoré & Noël, 2016 ) and reduced math anxiety (Sammallahti et al. 2023 ) lead to better math performance. However, relatively little is known about the relationship between numerical magnitude processing and math anxiety.

Recently, Maloney et al. ( 2010 , 2011 ) suggested that deficits in numerical magnitude processing are related to math anxiety, and both variables interact to predict math performance. Although numerical magnitude processing and math anxiety have been tested as individual predictors of math performance (Barroso et al. 2021 ; Cueli et al. 2019 ), the relationship between both variables and their joint effect on math performance has not been sufficiently explored, and existing studies have yielded conflicting results. The results of previous findings have revealed a significant (Sarı & Szczygieł, 2023 ) or non-significant (Cargnelutti et al. 2017 ; Szczygieł, 2021b ) relationship between numerical magnitude processing and math anxiety in children, and a significant (Braham & Libertus 2018 ; Lindskog et al. 2017 ; Maloney et al. 2010 , 2011 ; Maldonado Moscoso et al. 2020 , 2022 ; Núñez-Peña & Suárez-Pellicioni 2014 ; Núñez-Peña et al. 2019 ; Skagerlund et al. 2019 ) or non-significant (Braham & Libertus 2018 ; Colomé, 2019 ; Dietrich et al. 2015 ; Maldonado Moscoso et al. 2020 ; Núñez-Peña et al., 2010 ; Silver et al. 2022 ) relationship in adults. Inconsistent results have also been observed with respect to the relationship between the joint effect of numerical magnitude processing and math anxiety on math performance, showing significant mediation/interaction in adults (Braham & Libertus 2018 ; Lindskog et al. 2017 ; Skagerlund et al. 2019 ; Maldonado Moscoso et al. 2020 ), non-significant effects in adults (Braham & Libertus 2018 ; Silver et al. 2022 ), and non-significant effects in children (Szczygieł, 2021b ; Sarı & Szczygieł, 2023 ).

Given that researchers used different study designs, measurements, variable indicators, and methods of statistical analysis, we assumed that the observed differences in results might be due to methodological differences. Therefore, we examined whether the relationship between numerical magnitude processing, math anxiety, and math performance depends on different numerical tasks and their indicators. Essentially, we tested whether the relationships depend on different cognitive processes involved in solving symbolic vs. non-symbolic tasks, and estimation vs. comparison tasks. We also examined the extent to which numerical magnitude processing and math anxiety separately determine math performance in adults.

Numerical magnitude processing and math anxiety relationship

We observed that when symbolic numerical tasks are used in analyses, the hypothesis formulated by Maloney et al. ( 2011 ) regarding the relationship between numerical magnitude processing and math anxiety was mostly confirmed. However, the findings from the non-symbolic task challenge the hypothesis that less precise magnitude representation is the basis for math anxiety development. Therefore, the results suggest that the relationship depends on the type of numerical magnitude measure and its indices (Mielicki et al. 2022 ), and thereby, on the cognitive processes engaged in processing symbolic and non-symbolic numerical representations.

We observed that greater error on the number line estimation task was associated with higher math anxiety, thus supporting the hypothesis that individuals with less precise representation of symbolic magnitude have higher level of math anxiety. Our results are consistent with previous findings in adults (Núñez-Peña et al. 2019 ) and children (Sarı & Szczygieł, 2023 ). To our knowledge, these two studies are the only ones that have examined the relationship between a symbolic number line estimation task and math anxiety. We believe that a less precise mental number line contributes to less math understanding and more math anxiety.

Moreover, almost all indicators of the symbolic comparison task were associated with greater math anxiety (lower accuracy, higher reaction time, larger numerical ratio and distance effects). The relationship between MA and larger numerical size effect was non-significant, but the power of the test for this effect was lower than the power of the test for the other numerical effects because fewer participants had a reliable numerical size effect in comparison to other numerical effects. Although we did not divide the sample into low and high math anxiety individuals, we observed a similar pattern of results to previous findings comparing such groups (Maloney et al. 2010 – reaction time; Maloney et al. 2011 – numerical distance effect; Núñez-Peña & Suárez-Pellicioni 2014 – numerical distance effects). We also obtained results consistent with Dietrich et al. ( 2015 ) regarding the relationship between the numerical distance effect and math anxiety. More accurate and faster processing of symbolic numerical quantities may be crucial for the effectiveness of performing mathematical tasks, and this effectiveness is associated with positive (success) or negative (failure) emotions. As we used two-digit numbers in the symbolic comparison task, we also should note that cognitive multi-step processes were engaged in comparison. Indeed, in accordance with previous findings (Verguts et al. 2005 ; Nuerk et al. 2015 ), likely one numerical system is used for exact small and approximate large numbers and second numerical system represents multidigit numbers. Therefore, the question arises whether our results support the hypothesis that basic or more advanced numerical processes are related to math anxiety.

Most of the indicators of the non-symbolic comparison task were related to math anxiety, but the direction of the correlation was opposite to the hypothesis. Lower reaction time, and lower numerical ratio and size effects were related to higher math anxiety. Numerical distance effect was positively related to higher math anxiety. No relationship was observed between math anxiety and the accuracy and Weber fraction. These results are mostly inconsistent with previous findings in adults (Dietrich et al. 2015 – accuracy, reaction time, numerical distance and size effect, Weber fraction; Colomé et al., 2019 – accuracy, reaction time, Weber fraction, and numerical ratio effect) which showed a non-significant relationship between the variables or a different direction of the relationship. However, our results are consistent with findings that accuracy in non-symbolic comparison task is not related to math anxiety in children (Cargnelutti et al. 2017 ; Szczygieł, 2021b ). Because performance in the non-symbolic comparison task may be influenced by methodological factors (e.g., visual stimulus parameters, presentation duration time, set size; Dietrich et al. 2015 ), our surprising results are likely due to the low time pressure in the task. Thus, individuals with high math anxiety reacted faster than those with low math anxiety, which suggests that they wanted to reduce the time to decide. However, the results confirm Dietrich et al.’s ( 2015 ) hypothesis that the relationship between numerical magnitude tasks and math anxiety may depend on task-related decision-making processes (in this case, the trade-off between speed and accuracy resulting from the desire to complete the task as quickly as possible). We then observed that weaker ratio and size effects (both effects calculated from reaction time) were related to greater math anxiety. This means that better discrimination of similar patterns of dots, even at large magnitudes with the same distance effect, is related to higher math anxiety. These results suggest that math anxiety in adults may be driven primarily by factors other than basic numerical magnitude processing. Indeed, many factors have previously been identified as predictors of math anxiety (Szczygieł & Hohol 2024 ; Zhange et al., 2019 ). However, it should also be noted that the Weber fraction is considered the most adequate indicator of ANS (Krajcsi 2020 ; Pica et al. 2004 ; Price et al. 2012 ) and, in our study, W was not associated with math anxiety.

The fact that the relationship between numerical magnitude processing and math anxiety is task- and indicators-dependent is not surprising because previous findings (Bulthé et al. 2014 ; Honoré & Noël, 2016 ; Lyons et al. 2015 ) suggest that symbolic and non-symbolic quantities are encoded and processed differently. Previous research results suggest that non-symbolic numerical processing is directed by ANS, and such processes may be described interchangeably by numerical size and distance effects (both directed by ratio effect), while symbolic numerical processing is likely handled not by ANS but by an alternative representation (e.g., Verguts et al. 2005 ; Krajcsi 2017 ; Krajcsi et al. 2022 ). For example, in accordance with model of Krajcsi et al. ( 2022 ), the distance effect is directed by semantic distance of units, and size effect is led by symbols frequency. Our results support claims of distinct mechanisms directing processing of symbolic and non-symbolic magnitudes: all numerical effects (ratio, size, and distance effects) correlated significantly in non-symbolic format, ratio and distance, distance and size correlated to each other, but ratio and size effects did not correlate in the symbolic comparison task. Moreover, strength of correlations in non-symbolic magnitude processing was weaker than in the symbolic comparison task. Finally, a debate has emerged regarding the most accurate indicator that reflects the functioning of the ANS. Current perspectives lean towards considering Weber’s W as the most effective indicator for non-symbolic magnitude processing, especially when accounting for lapse rate and perceptual properties. However, it remains uncertain to what extent accuracy may be influenced by the lapse rate (Krajcsi et al. 2023 ). Therefore, the conclusion is that deficiencies in symbolic but not non-symbolic magnitude processing may be a risk factor for the development of math anxiety but further studies on the nature of numerical processing are needed.

Numerical magnitude processing and math performance

We observed that more precise mental number representation is related to higher level of math performance, while more effective processing of symbolic and non-symbolic numerical magnitudes is not related to math performance. Greater estimation error in the number line task was moderately related to poorer mathematical performance, as is consistent with previous findings in children (Friso-van den Bos et al. 2015 ; Schneider et al. 2018a ). Individuals with a more precise mental representation of numbers perform better on math tasks, likely due to the more accurate decision-making strategies used when judging proportions (Slusser & Barth 2017 ). It is worth mentioning that previous findings have also shown that math education improves the mental representation of numbers (Friso-van den Bos et al. 2015 ; Schneider et al. 2018a ). It should therefore be noted that this relationship may be bidirectional. More accurate representation of numbers improves math performance, and better math understanding improves estimation skills (Friso-van den Bos et al. 2015 ). The fact that the sum of error in the number line estimation task was related to math performance while the non-symbolic and symbolic comparison task indicators were unrelated may be explained by the different cognitive processes involved in the estimation and comparison tasks (Li et al. 2018 ). The non-symbolic comparison task is viewed as a test of the basic ability to process numerical quantities. It is likely that these skills do not translate into math performance in adulthood due to the development of more complex math skills that affect math performance. Similarly, it can be assumed that adults have mastered digits below 100 in childhood, therefore the role of processing simple symbols in their math performance decreases (Li et al. 2018 ). A large error variance was observed in the estimation task and a low error rate was observed in the comparison tasks, which means that the NLE task differentiated individuals more than the NS and S comparison tasks. Previous research also suggests that estimation is more important than comparison in predicting math performance (Schneider et al. 2018b ). Based on a simple estimation task, it is possible to identify students who have potential difficulties in understanding mathematics, therefore it would be strongly advisable to use this type of task in educational practice (Nosworthy et al. 2013 ). In summary, as is consistent with previous findings, we can conclude that the type of numerical representation measurement (comparison or estimation) may determine the significance and strength of the relationship between numerical magnitude processing and math performance (Li et al. 2018 ; Schneider et al. 2017 ; 2018a ), probably because they involve different cognitive processes (Sasanguie & Reynvoet 2013 ).

Math anxiety and math performance

In line with our expectations and previous findings (Barroso et al. 2021 ; Zhang et al. 2019 ), math anxiety and math performance were negatively related. High math anxiety in adults may have a negative effect on their academic outcomes (e.g., students may drop out of STEM education due to negative emotions and math-related failures; Beilock & Maloney 2015 ; Picha 2018 ) and on their environment (e.g., early childhood teachers and parents may, under certain circumstances, pass on math anxiety to children; Sarı & Hunt 2020 ; Szczygieł, 2020b ). Although it is still debated whether poor math performance causes high math anxiety, high math anxiety determines poor math performance or whether such a relationship is reciprocal (Carey et al. 2016 ), the negative relationship between both variables is observed across all age groups, including adults. Therefore, interventions to reduce math anxiety and/or improve math performance in adults who need it (e.g., STEM students, parents, and early childhood teachers; Maloney, 2015; Casad et al. 2015 ) should be considered.

The joint effect of numerical magnitude processing and math anxiety on math performance

The results largely challenge the hypothesis that numerical magnitude processing and math anxiety have a joint effect on math performance. However, we observed that longer reaction times in symbolic and non-symbolic comparison tasks and a smaller numerical ratio effect in the symbolic comparison task are related to better math performance in adults with high math anxiety. In our opinion, such results suggest that people with high math anxiety need more time to solve mathematical tasks correctly. Indeed, in adults with high math anxiety, as reaction times in non-symbolic and symbolic comparison tasks increase, accuracy on math tasks also increases. Moreover, better math performance depends on the ratio effect in the symbolic comparison task in adults with high math anxiety. Therefore, we hypothesize that people with high math anxiety need more time for solving math tasks, mainly when they operate on two-digit numbers that are close to each other.

Summarizing the results regarding the relationship between numerical magnitude processing, math anxiety, and math performance, we hypothesize that the key for better math performance is the accuracy of the mental number line. Additionally, people with high math anxiety need more time to process magnitudes and perform mathematical tasks. This hypothesis requires support in further research when time pressure is imposed on numerical magnitude processing and mathematical tasks. However, our results are consistent with those observed in children (Sarı & Szczygieł, 2023 – negative relationship between number line estimation task error and math performance in high math anxiety children; Szczygieł, 2021b – no mediation effect of non-symbolic and symbolic comparison task accuracy between math anxiety and math performance) and adults (Silver et al. 2022 – no interaction effect of accuracy of non-symbolic task and math anxiety on math performance; Skagerlund et al. 2019 – reaction time as indicator of symbolic comparison task mediated relationship between math anxiety and math performance). Results opposite to ours were observed in the study of Lindskog et al. ( 2017 – math anxiety mediated accuracy of non-symbolic comparison tasks and math performance) and Maldonado Moscoso et al. ( 2020 – math anxiety mediated the relationship between the Weber fraction and math performance in high math anxiety individuals; Maldonado Moscoso et al. 2022 – math anxiety fully accounted for the relationship between numerosity estimation precision and math abilities for all participants). However, numerical magnitude processing tasks in their study were presented under greater time pressure than in our study. Partially contradictory to our results were those obtained by Braham & Libertus ( 2018 ), who showed that accuracy in non-symbolic comparison tasks and math anxiety interact to predict math performance, but only for certain types of mathematical tasks. These results suggest that further studies should include various types of mathematical tasks.

Limitations and further research directions

Although our study was carefully designed, it has some important limitations. First, we tested a relatively small sample of adults, mainly young women and students living in a big city. Because we recruited respondents through an advertisement, the group we examined was limited to people seeking paid psychology studies. Moreover, people with high math anxiety were likely to refrain from participating in a study involving solving math problems. It should also be noted that the selected hypotheses were tested with fewer participants because some participants did not reveal a reliable numerical effect (primarily a numerical size effect in the symbolic and non-symbolic comparison tasks) or they revealed a floor effect in math performance. However, the advantage of our study was that we examined a diverse group of adults in terms of educational experience (STEM, HS, other fields of studies and professions). Although our results confirm that differences between studies may be explained by methodological details (type of measure and indicators of numerical magnitude processing), further studies are needed to show what this looks like in children and adolescents. Indeed, it is well known that both non-symbolic and symbolic numerical representations and math anxiety develop over time (Friso-van den Bos et al. 2015 ; Petronzi et al. 2019 ). Therefore, the relationships between these constructs and between them and math performance may vary by age group. Moreover, the educational situation of children and adults is very different (e.g., children are obliged to attend math classes, while adults can, if they want, continue their studies in STEM), which also justifies conducting research in these groups. Foremost, the longitudinal study design with control for various confounding variables would be most appropriate to test the hypothesis of Maloney et al. ( 2010 ). However, in our study, as in previous studies, we examined the relationships between variables in a cross-sectional design. It should also be noted that verifying this hypothesis is not easy because developmental and educational processes also take place.

Second, we ensured that numerical magnitude processing and math anxiety were tested using a variety of measures, but we used only a single task to measure math performance. The difficulty level of this task, according to Dolna’s classification ( 2016 ), was quite easy. We conducted a pilot study to select tasks for the test and to check whether the tasks were understandable for adults, but we did not further test the properties of the tasks. An important challenge for future studies on psychological correlates of math performance among Polish adults is the development of a standardized multidimensional scale of mathematical performance. Nevertheless, it can be assumed that the role of numerical magnitude processing and math anxiety – as domain-specific correlates of math performance – should be revealed for most mathematical tasks (Barroso et al., 2021 ; Schneider et al. 2017 ; 2018a , 2018b ; Zhang et al. 2019 ). The strength of the correlation between indices of numerical magnitude processing, math anxiety, and math performance depends not only on the true relationship, but also on the measurement error noise (Krajci, 2017 ). We observed that the reliability of the symbolic and non-symbolic comparison tasks varied between their indicators which may affect observed results. Although split-half reliability was satisfactory in most indicators and in line with previous studies (Price et al. 2012 ), we observed no relationship between results in two effects: S NSE and S NDE. This may be the result of a few trials contributing to a given unit, taking into account the numerous filters placed on the two-digit symbols to make these effects possible to count. Moreover, it is still discussed how to interpret numerical distance and size effects in multi-digit symbols as their comparison needs multi-step processing (Verguts et al. 2005 ; Nuerk et al. 2015 ). Because we used only one set of stimuli in the NLE task (the stimuli were not repeated and the task was performed only once), we were unable to determine the reliability of the task. Due to many computer-based tasks being used in numerical cognition research lacking documented reliability, it is unknown how the psychometric properties of the task have influenced many previous results. Therefore, providing such information is one of the greatest challenges for further research on numerical cognition.

Moreover, we did not control more-general cognitive skills (intelligence, execution functions) and anxiety (trait anxiety, test anxiety), which poses a challenge for further studies. We assume that controlling domain-general variables may weaken the tested relationships. Although numerical magnitude processing and math anxiety are math domain-specific variables, they are partially rooted in domain-general skills and emotions (Friso-van den Bos et al. 2014 ; Simms et al. 2016 ; Szczygieł, 2021a ). As recently observed, gender, spatial anxiety, emotional stability, state anxiety, and test anxiety explain 61% of the variation in math anxiety in adults (Szczygieł & Hohol 2024 ), however, further research is required on the cognitive nature of numerical magnitude representations (Nelwan et al. 2021 ). Assuming that symbolic and non-symbolic processes of magnitude processing and comparison and estimation are independent (Dietrich et al. 2015 ; Guillaume et al. 2016 ; Lyons et al. 2012 , 2015 ; Sasanguie & Reynvoet 2013 ), it is likely that they are determined to varying degrees by domain-general cognitive skills.

Conclusions

The results mostly support the hypothesis proposed by Maloney et al. ( 2010 ) in this area, namely that defective processing of symbolic magnitude is related to math anxiety; however, the results question assumption in this area that non-symbolic processes underlie math anxiety. We observed that better numerical estimation (but not comparison processes) and lower math anxiety correlate with better math performance in adults but results regarding the joint effect of numerical magnitude processing and math anxiety on math performance were inconsistent. We observed a relationship between math performance and reaction time in a symbolic comparison task, between math performance and reaction time in non-symbolic comparison tasks, and between math performance and the numerical ratio effect in a symbolic comparison task only in high math anxiety individuals. The results therefore suggest that timing and ratio (in symbolic processing) are crucial to the relationship between numerical magnitude processing and math performance in individuals with high math anxiety. Therefore, our results, like those of Braham and Libertus ( 2018 ), suggest that in further studies on predictors of math performance, numerical magnitude processing and math anxiety should be considered together, and their joint effect on math performance should be further examined.

Data availability

Data are publicly available in the OSF repository: https://doi.org/10.17605/OSF.IO/ZA4WS

Alexander L, Martray CR (1989) The development of an abbreviated version of the mathematics anxiety rating scale. Meas Eval Couns Dev 22(3):143–150

Article   Google Scholar  

Barroso C, Ganley CM, McGraw AL, Geer EA, Hart SA, Daucourt MC (2021) A meta-analysis of the relation between math anxiety and math achievement. Psychol Bull 147(2):134–168. https://doi.org/10.1037/bul0000307

Article   PubMed   Google Scholar  

Beilock S, Maloney S (2015) Math Anxiety: A factor in math achievement not to be ignored. Policy Insights Behav Brain Sci 2(1):4–12. https://doi.org/10.1177/2372732215601438

Braham EJ, Libertus ME (2018) When approximate number acuity predicts math performance: the moderating role of math anxiety. PLoS ONE 13(5):e0195696. https://doi.org/10.1371/journal.pone.0195696

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bulthé J, De Smedt B, Op de Beeck HP (2014) Format-dependent representations of symbolic and non-symbolic numbers in the human cortex as revealed by multi-voxel pattern analyses. Neuroimage 87:311–322. https://doi.org/10.1016/j.neuroimage.2013.10.049

Cantlon JF, Libertus ME, Pinel P, Dehaene S, Brannon EM, Pelphrey KA (2009) The neural development of an abstract concept of number. J Cogn Neurosci 21(11):2217–2229. https://doi.org/10.1162/jocn.2008.21159

Article   PubMed   PubMed Central   Google Scholar  

Carey E, Hill F, Devine A, Szücs D (2016) The chicken or the egg? The direction of the relationship between mathematics anxiety and mathematics performance. Front Psychol 6:1987. https://doi.org/10.3389/fpsyg.2015.01987

Cargnelutti E, Tomasetto C, Passolunghi MC (2017) The interplay between affective and cognitive factors in shaping early proficiency in mathematics. Trends Neurosci Educ 8–9:28–36. https://doi.org/10.1016/j.tine.2017.10.002

Casad BJ, Hale P, Wachs FL (2015) Parent-child math anxiety and math-gender stereotypes predict adolescents’ math education outcomes. Front Psychol 6:1597. https://doi.org/10.3389/fpsyg.2015.01597

Chen L, Wang Y, Wen H (2021) Numerical magnitude processing in deaf adolescents and its contribution to arithmetical ability. Front Psychol 12:584183. https://doi.org/10.3389/fpsyg.2021.584183

Cipora K, Szczygieł M, Willmes K, Nuerk H-C (2015) Math anxiety assessment with the abbreviated math anxiety scale: applicability and usefulness: insights from the polish adaptation. Front Psychol 6:1833. https://doi.org/10.3389/fpsyg.2015.01833

Colomé À (2019) Representation of numerical magnitude in math-anxious individuals. Q J Exper Psychol 72(3):424–435. https://doi.org/10.1177/1747021817752094

Cueli M, Areces D, McCaskey U, Álvarez-García D, González-Castro P (2019) Mathematics competence level: The contribution of non-symbolic and spatial magnitude comparison skills. Front Psychol 10:465. https://doi.org/10.3389/fpsyg.2019.00465

Dehaene S (2001) Précis of the number sense. Mind Lang 16(1):16–36

Dehaene S (2011) The number sense How the mind creates mathematics (Rev and updated ed.). Oxford University Press, Oxford

Google Scholar  

Dehaene S, Bossini S, Giraux P (1993) The mental representation of parity and number magnitude. J Exp Psychol Gen 122(3):371–396. https://doi.org/10.1037/0096-3445.122.3.371

Dietrich JF, Huber S, Moeller K, Klein E (2015) The influence of math anxiety on symbolic and non-symbolic magnitude processing. Front Psychol 6:1621. https://doi.org/10.3389/fpsyg.2015.01621

Dolna, J. (2016). Test szóstoklasisty 2016: ranking szkół podstawowych. Nasze Miasto Kraków. Downloaded from: http://krakow.naszemiasto.pl/artykul/test-szostoklasisty-2016-ranking-szkol-podstawowych,3763864,artgal,t,id,tm.html [08.05.2022]

Dyson NI, Jordan NC, Glutting J (2013) A number sense intervention for low-income kindergartners at risk for mathematics difficulties. J Learn Disabil 46(2):166–181. https://doi.org/10.1177/0022219411410233

Estrada-Mejia C, de Vries M, Zeelenberg M (2016) Numeracy and wealth. J Econ Psychol 54:53–63

Estrada-Mejia C, Peters E, Dieckmann NF, Zeelenberg M, De Vries M, Baker DP (2020) Schooling, numeracy, and wealth accumulation: a study involving an agrarian population. J Consum Aff. https://doi.org/10.1111/joca.12294

Feigenson L, Dehaene S, Spelke E (2004) Core systems of number. Trends Cogn Sci 8(7):307–314. https://doi.org/10.1016/j.tics.2004.05.002

Forster KI, Forster JC (2003) DMDX: A Windows display program with millisecond accuracy. Behav Res Methods, Instrum, Comput 35(1):116–124. https://doi.org/10.3758/bf03195503

Friso-van den Bos I, Kroesbergen EH, Van Luit JE (2014) Number sense in kindergarten children: Factor structure and working memory predictors. Learn Individ Differ 33:23–29

Friso-van den Bos I, Kroesbergen EH, Van Luit JE, Xenidou-Dervou I, Jonkman LM, Van der Schoot M, Van Lieshout EC (2015) Longitudinal development of number line estimation and mathematics performance in primary school children. J Exp Child Psychol 134:12–29

Garcia-Retamero R, Sobkow A, Petrova DG, Garrido D, Traczyk J (2019) Numeracy and risk literacy: What have we learned so far? Span J Psychol e10:1–11

Gebuis T, Reynvoet B (2011) Generating nonsymbolic number stimuli. Behav Res Methods 43(4):981–986. https://doi.org/10.3758/s13428-011-0097-5

Guillaume M, Gevers W, Content A (2016) Assessing the approximate number system: no relation between numerical comparison and estimation tasks. Psychol Res 80:248–258. https://doi.org/10.1007/s00426-015-0657-x

Halberda J, Mazzocco MM, Feigenson L (2008) Individual differences in non-verbal number acuity correlate with maths achievement. Nature 455(7213):665–668. https://doi.org/10.1038/nature07246

Article   CAS   PubMed   Google Scholar  

Hart SA, Ganley CM (2019) The nature of math anxiety in adults: prevalence and correlates. Journal of Numerical Cognition 5(2):122–139. https://doi.org/10.5964/jnc.v5i2.195

Hayes AF (2017) Introduction to mediation, moderation, and conditional process analysis a regression-based approach. Guilford Press

Hohol M, Willmes K, Nęcka E, Brożek B, Nuerk H-C, Cipora K (2020) Professional mathematicians do not differ from others in the symbolic numerical distance and size effects. Sci Rep 10:11531. https://doi.org/10.1038/s41598-020-68202-z

Honoré N, Noël M-P (2016) Improving preschoolers’ arithmetic through number magnitude training: the impact of non-symbolic and symbolic training. PLoS ONE 11(11):e0166685. https://doi.org/10.1371/journal.pone.0166685

Hopko DR (2003) Confirmatory factor analysis of the math anxiety rating scale–revised. Educ Psychol Measur 63(2):336–351

Hopko DR, Mahadevan R, Bare RL, Hunt MK (2003) The abbreviated math anxiety scale (AMAS): construction, validity, and reliability. Assessment 10(2):178–182. https://doi.org/10.1177/1073191103010002008

Hu L, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 6(1):1–55

Hunt TE, Clark-Carter D, Sheffield D (2011) The development and part validation of a U.K. Scale for Mathematics Anxiety. J Psychoeduc Assess 29(5):455–466. https://doi.org/10.1177/0734282910392892

Kline RB (2016) Principles and practice of structural equation modeling. Guilford publications, NY

Krajcsi A (2017) Numerical distance and size effects dissociate in Indo-Arabic number comparison. Psychon Bull Rev 24(3):927–934

Krajcsi A (2020) Ratio effect slope can sometimes be an appropriate metric of the approximate number system sensitivity. Atten Percept Psychophys 82(4):2165–2176. https://doi.org/10.3758/s13414-019-01939-6

Krajcsi A, Szűcs T (2022) Symbolic number comparison and number priming do not rely on the same mechanism. Psychon Bull Rev 29:1969–1977

Krajcsi A, Kojouharova P, Lengyel G (2022) Processing symbolic numbers: The example of distance and size effects. In: Gervain J, Csibra G, Kovács K (eds) A life in cognition language cognition and mind. Springer, Cham

Krajcsi A, Chesney D, Cipora K, Coolen IEJI, Gilmore C, Inglis M, Libertus M, Nuerk H-C, Reynvoet B (2023) Measuring the acuity of the approximate number system in young children. Dev Rev. https://doi.org/10.31234/osf.io/nyw94

Landerl K, Bevan A, Butterworth B (2004) Developmental dyscalculia and basic numerical capacities: a study of 8–9-year-old students. Cognition 93(2):99–125. https://doi.org/10.1016/j.cognition.2003.11.004

Li Y, Zhang M, Chen Y, Deng Z, Zhu X, Yan S (2018) Children’s non-symbolic and symbolic numerical representations and their associations with mathematical ability. Front Psychol 9:1035. https://doi.org/10.3389/fpsyg.2018.01035

Lindskog M, Winman A, Poom L (2017) Individual differences in nonverbal number skills predict math anxiety. Cognition 159:156–162. https://doi.org/10.1016/j.cognition.2016.11.014

Lyons IM, Ansari D, Beilock SL (2012) Symbolic estrangement: Evidence against a strong association between numerical symbols and the quantities they represent. J Exp Psychol Gen 141(4):635–641. https://doi.org/10.1037/a0027248

Lyons IM, Ansari D, Beilock SL (2015) Qualitatively different coding of symbolic and nonsymbolic numbers in the human brain. Hum Brain Mapp 36(2):475–488. https://doi.org/10.1002/hbm.22641

Maldonado Moscoso PA, Anobile G, Primi C, Arrighi R (2020) Math anxiety mediates the link between number sense and math achievements in high math anxiety young adults. Front Psychol 11:1095. https://doi.org/10.3389/fpsyg.2020.01095

Maldonado Moscoso PA, Castaldi E, Arrighi R, Primi C, Caponi C, Buonincontro S, Bolognini F, Anobile G (2022) Mathematics and numerosity but not visuo-spatial working memory correlate with mathematical anxiety in adults. Brain Sci 12(4):422. https://doi.org/10.3390/brainsci12040422

Maloney EA, Risko EF, Ansari D, Fugelsang J (2010) Mathematics anxiety affects counting but not subitizing during visual enumeration. Cognition 114(2):293–297. https://doi.org/10.1016/j.cognition.2009.09.013

Maloney EA, Ansari D, Fugelsang JA (2011) Rapid communication: the effect of mathematics anxiety on the processing of numerical magnitude. Q J Exper Psychol 64:10–16

Marinova M, Reynvoet B (2020) Can you trust your number sense: Distinct processing of numbers and quantities in elementary school children. J Numer Cognit 6(3):304–321

Mielicki M, Wilkey ED, Scheibe DA, Fitzsimmons C, Sidney PG, Bellon E, Ribner AD, Soltanlou M, Starling-Alves I, Coolen I (2022) Task features change the relation between math anxiety and number line estimation performance with rational numbers: two large-scale online studies. J Exper Psychol. https://doi.org/10.31219/osf.io/wvezm

Namkung JM, Peng P, Lin X (2019) The relation between mathematics anxiety and mathematics performance among school-aged students: a meta-analysis. Rev Educ Res 89(3):459–496. https://doi.org/10.3102/0034654319843494

Nelwan M, Friso-van den Bos I, Vissers C, Kroesbergen E (2021) The relation between working memory, number sense, and mathematics throughout primary education in children with and without mathematical difficulties. Child Neuropsychol 28(2):143–170

Nosworthy N, Bugden S, Archibald L, Evans B, Ansari D (2013) A two-minute paper-and-pencil test of symbolic and nonsymbolic numerical magnitude processing explains variability in primary school children’s arithmetic competence. PLoS ONE 8(7):e67918

Nuerk H-C, Weger U, Willmes K (2004) On the perceptual generality of the unit-decade compatibility effect. Exp Psychol 51(1):72–79. https://doi.org/10.1027/1618-3169.51.1.72

Nuerk H-C, Moeller K, Willmes K (2015) Multi-digit number processing: Overview, conceptual clarifications, and language influences. In: Kadosh RC, Dowker A (eds) The Oxford handbook of numerical cognition. Oxford University Press, pp 106–139

Núñez-Peña MI, Suárez-Pellicioni M (2014) Less precise representation of numerical magnitude in high math-anxious individuals: an ERP study of the size and distance effects. Biol Psychol 103:176–183

Núñez-Peña MI, Guilera G, Suárez-Pellicioni M (2014) The single-item math anxiety scale: an alternative way of measuring mathematical anxiety. J Psychoeduc Assess 32(4):306–317. https://doi.org/10.1177/0734282913508528

Núñez-Peña MI, Colomé À, Aguilar-Lleyda D (2019) Number line estimation in highly math-anxious individuals. Br J Psychol 110(1):40–59. https://doi.org/10.1111/bjop.12335

Opfer JE, Siegler RS (2007) Representational change and children’s numerical estimation. Cogn Psychol 55(3):169–195. https://doi.org/10.1016/j.cogpsych.2006.09.002

Oszwa, U. (2020). Lęk przed matematyką. Poglądy, badania, rozwiązania. [Math anxiety. Views, research, solutions.]. UMCS.

Pantoja N, Schaeffer MW, Rozek CS, Beilock SL, Levine SC (2020) Children’s math anxiety predicts their math achievement over and above a key foundational math skill. J Cogn Dev 21(5):709–728. https://doi.org/10.1080/15248372.2020.1832098

Petronzi D, Staples P, Sheffield D, Hunt TE, Fitton-Wilde S (2019) Further development of the children’s mathematics anxiety scale UK (CMAS-UK) for ages 4–7 years. Educ Stud Math 100(3):231–249

Piazza M, Izard V, Pinel P, Le Bihan D, Dehaene S (2004) Tuning curves for approximate numerosity in the human intraparietal sulcus. Neuron 44(3):547–555. https://doi.org/10.1016/j.neuron.2004.10.014

Pica P, Lemer C, Izard V, Dehaene S (2004) Exact and approximate arithmetic in an Amazonian indigene group. Science 306(5695):499–503. https://doi.org/10.1126/science.1102085

Picha, G. (2018). STEM education has a math anxiety problem. Education Week. August, 6. https://www.edweek.org/teaching-learning/opinion-stem-education-has-a-math-anxiety-problem/2018/08

Price GR, Palmer D, Battista Ch, Ansari D (2012) Nonsymbolic numerical magnitude comparison: reliability and validity of different task variants and outcome measures, and their relationship to arithmetic achievement in adults. Acta Physiol (oxf) 140:50–57

Ramirez G, Chang H, Maloney EA, Levine SC, Beilock SL (2016) On the relationship between math anxiety and math achievement in early elementary school: the role of problem solving strategies. J Exper Child Psychol 141:83–100. https://doi.org/10.1016/j.jecp.2015.07.014

Ramirez G, Shaw ST, Maloney EA (2018) Math anxiety: Past research, promising interventions, and a new interpretation framework. Educational Psychologist 53(3):145–164

Restle F (1970) Speed of adding and comparing numbers. J Exper Psychol 83(2):274–278

Reyna VF, Nelson WL, Han PK, Dieckmann NF (2009) How numeracy influences risk comprehension and medical decision making. Psychol Bull 135(6):943–973

Richardson FC, Suinn RM (1972) The Mathematics Anxiety Rating Scale: Psychometric data. J Couns Psychol 19(6):551–554. https://doi.org/10.1037/h0033456

Rivera-Batiz F (1992) Quantitative literacy and the likelihood of employment among young adults in the United States. J Human Resour 27(2):313–328

Rosseel Y (2012) lavaan: an R package for structural equation modeling. J Stat Softw 48:1–36

Sammallahti E, Finell J, Jonsson B, Korhonen J (2023) A meta-analysis of math anxiety interventions. J Numer Cognit 9(2):346–362. https://doi.org/10.23668/psycharchives.12882

Sarı M, Hunt TE (2020) Parent-child mathematics affect as predictors of children’s mathematics achievement. Int Online J Primary Educ 9:85–96

Sarı MH, Szczygieł M (2023) The role of math anxiety in the relationship between approximate number system and math performance in young children. Psychol Sch 60(4):912–930

Sasanguie D, Reynvoet B (2013) Number comparison and number line estimation rely on different mechanisms. Psychologica Belgica 53(4):17–35. https://doi.org/10.5334/pb-53-4-17

Schleepen TMJ, Van Mier HI, De Smedt B (2016) The contribution of numerical magnitude comparison and phonological processing to individual differences in fourth graders’ multiplication fact ability. PLoS ONE 11(6):e0158335. https://doi.org/10.1371/journal.pone.0158335

Schneider M, Beeres K, Coban L, Merz S, Schmidt SS, Stricker J, De Smedt B (2017) Associations of non-symbolic and symbolic numerical magnitude processing with mathematical competence: a meta-analysis. Dev Sci. https://doi.org/10.1111/desc.12372

Schneider M, Merz S, Stricker J, De Smedt B, Torbeyns J, Verschaffel L, Luwel K (2018a) Associations of number line estimation with mathematical competence: a meta-analysis. Child Dev 89(5):1467–1484

Schneider M, Thompson CA, Rittle-Johnson B (2018b) Associations of magnitude comparison and number line estimation with mathematical competence: A comparative review. In: Lemaire P (ed) Cognitive development from a strategy perspective: A festschrift for Robert S. Siegler, Routledge/Taylor & Francis Group, pp 100–119

Schwenk C, Sasanguie D, Kuhn JT, Kempe S, Doebler P, Holling H (2017) (Non-) symbolic magnitude processing in children with mathematical difficulties: a meta-analysis. Res Dev Disabil 64:152–167. https://doi.org/10.1016/j.ridd.2017.03.003

Şentürk, B. (2010). İlköğretim beşinci sınıf öğrencilerinin genel başarıları, matematik başarıları, matematik dersine yönelik tutumları ve matematik kaygıları arasındaki ilişki. Afyon Kocatepe Üniversitesi, Sosyal Bilimler Enstitüsü.

Silver AM, Elliott L, Reynvoet B, Sasanguie D, Libertus ME (2022) Teasing apart the unique contributions of cognitive and affective predictors of math performance. Ann N Y Acad Sci 1511(1):173–190. https://doi.org/10.1111/nyas.14747

Simms V, Clayton S, Cragg L, Gilmore C, Johnson S (2016) Explaining the relationship between number line estimation and mathematical achievement: The role of visuomotor integration and visuospatial skills. J Exp Child Psychol 145:22–33

Skagerlund K, Östergren R, Västfjäll D, Träff U (2019) How does mathematics anxiety impair mathematical abilities? Investigating the link between math anxiety, working memory, and number processing. PLoS ONE 14(1):e0211283. https://doi.org/10.1371/journal.pone.0211283

Slusser E, Barth H (2017) Intuitive proportion judgment in number-line estimation: converging evidence from multiple tasks. J Exp Child Psychol 162:181–198. https://doi.org/10.1016/j.jecp.2017.04.010

Smets K, Gebuis T, Defever E, Reynvoet B (2014) Concurrent validity of approximate number sense tasks in adults and children. Acta Physiol (oxf) 150:120–128

Sobków A, Olszewska A, Traczyk J (2020) Multiple numeric competencies predict decision outcomes beyond fluid intelligence and cognitive reflection. Intelligence 80:101452

Szczygieł M (2019) How to measure math anxiety in young children? Psychometric properties of the modified abbreviated math anxiety scale for elementary children (mAMAS-E). Polish Psychol Bull 4(50):303–315. https://doi.org/10.24425/ppb.2019.131003

Szczygieł M (2020a) More evidence that math anxiety is specific to math in young children. Int Electr J Elem Educ 12(5):429–438

Szczygieł M (2020b) When does math anxiety in parents and teachers predict math anxiety and math achievement in elementary school children? The role of gender and grade year. Soc Psychol Educ 23:1023–1054. https://doi.org/10.1007/s11218-020-09570-2

Szczygieł M (2021a) Not only reliability! The importance of the ecological validity of the math anxiety questionnaire for adults. Eur J Psychol Assess 38(2):78–90. https://doi.org/10.1027/1015-5759/a000646

Szczygieł M (2021b) The relationship between math anxiety and math achievement in young children is mediated through working memory, not by number sense, and it is not direct. Contemp Educ Psychol 65:101949. https://doi.org/10.1016/j.cedpsych.2021.101949

Szczygieł M (2022) The psychometric properties of the mathematics attitude scale for adults (MASA). Curr Psychol. https://doi.org/10.1007/s12144-022-02980-9

Szczygieł M, Pieronkiewicz B (2022) Exploring the nature of math anxiety in young children: Intensity, prevalence, reasons. Math Think Learn 24(3):248–264. https://doi.org/10.1080/10986065.2021.1882363

Szczygieł, M., & Hohol, M. (2024). The gender gap in math anxiety (and in a link between math anxiety and math performance too) is not so salient when other anxieties are controlled for. https://doi.org/10.31234/osf.io/5trew

Tokita M, Hirota S (2021) Numerosity comparison, estimation and proportion estimation abilities may predict numeracy and cognitive reflection in adults. Front Hum Neurosci 15:762344. https://doi.org/10.3389/fnhum.2021.762344

Verguts T, Fias W, Stevens M (2005) A model of exact small-number representation. Psychon Bull Rev 12(1):66–80. https://doi.org/10.3758/BF0319634

Wu SS, Barth M, Amin H, Malcarne V, Menon V (2012) Math anxiety in second and third graders and its relation to mathematics achievement. Front Psychol. https://doi.org/10.3389/fpsyg.2012.00162

Wynn K (1992) Addition and subtraction by human infants. Nature 358(6389):749–750. https://doi.org/10.1038/358749a0

Yáñez-Marquina L, Villardón-Gallego L (2017) Math anxiety, a hierarchical construct: Development and validation of the Scale for assessing math anxiety in secondary education. Ansiedad y Estrés 23(2–3):59–65. https://doi.org/10.1016/j.anyes.2017.10.001

Zhang J, Zhao N, Kong QP (2019) The relationship between math anxiety and math performance: a meta-analytic investigation. Front Psychol 10:1613. https://doi.org/10.3389/fpsyg.2019.01613

Zorzi M, Priftis K, Umiltà C (2002) Neglect disrupts the mental number line. Nature 417(6885):138–139. https://doi.org/10.1038/417138a

Download references

Acknowledgements

We would like to thank Marzena Kutt for helping with data collection and comments on the earlier version of the manuscript, to Attila Krajcsi for calculating the individual Weber fraction and comments on the earlier version of the manuscript, and to Michael Timberlake and Emily Cookson for English proof-reading.

This study was funded by University of the National Education Commission, Kraków (BN.610-104/PBU/2020).

Author information

Authors and affiliations.

Institute of Psychology, Jagiellonian University in Kraków, Kraków, Poland

Monika Szczygieł

Faculty of Education, Nevşehir Hacı Bektaş Veli University, Nevşehir, Türkiye

Mehmet Hayri Sarı

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Monika Szczygieł .

Ethics declarations

Conflict of interest.

The authors declare no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study obtained ethical approval from Ethical Committee, Institute of Psychology, University of the National Education Commission, Kraków.

Human and animal rights

This article does not contain any studies with animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor: Thomas Lachmann (University of Kaiserslautern-Landau); Reviewers: Paula Andrea Maldonado Moscoso (University of Trento)

See Figs. 4 , 5 , 6 and 7 .

figure 4

Numerical Ratio Effect (NRE) on Reaction Time (RT) in Non-Symbolic Comparison Task (NS)

figure 5

Numerical Distance Effect (NDE) on Reaction Time (RT) in Non-Symbolic Comparison Task (NS)

figure 6

Numerical Ratio Effect (NRE) on Reaction Time (RT) in Symbolic Comparison Task (S)

figure 7

Numerical Distance Effect (NDE) on Reaction Time (RT) in Symbolic Comparison Task (S)

See Tables 4 and 5 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Szczygieł, M., Sarı, M.H. The relationship between numerical magnitude processing and math anxiety, and their joint effect on adult math performance, varied by indicators of numerical tasks. Cogn Process (2024). https://doi.org/10.1007/s10339-024-01186-0

Download citation

Received : 03 September 2022

Accepted : 21 March 2024

Published : 22 April 2024

DOI : https://doi.org/10.1007/s10339-024-01186-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Symbolic magnitude processing
  • Non-symbolic magnitude processing
  • Mental number line
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Hypothesis Testing Solved Problems

    hypothesis test math problem

  2. Hypothesis Testing Solved Examples(Questions and Solutions)

    hypothesis test math problem

  3. Hypothesis Testing

    hypothesis test math problem

  4. PPT

    hypothesis test math problem

  5. Hypothesis Testing Statistics Formula Sheet

    hypothesis test math problem

  6. Hypothesis Testing

    hypothesis test math problem

VIDEO

  1. 8.1: Basics of Hypothesis Testing

  2. Hypothesis testing complete review

  3. Two sample hypothesis tests

  4. A-Level Maths: O3-05 Sample Means: Hypothesis Test Example 4

  5. Chapter 09: Hypothesis testing: non-directional worked example

  6. Hypothesis Testing Problems

COMMENTS

  1. Simple hypothesis testing (practice)

    Let's test the hypothesis that each answer has an equal chance of 20 % of appearing in the Magic 8 -Ball versus the alternative that " Ask again later " has a greater probability. The table below sums up the results of 1000 simulations, each simulating 10 random answers with a 20 % chance of getting " Ask again later ".

  2. Hypothesis Testing

    The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs.

  3. 8.1: The Elements of Hypothesis Testing

    Two Types of Errors. The format of the testing procedure in general terms is to take a sample and use the information it contains to come to a decision about the two hypotheses. As stated before our decision will always be either. reject the null hypothesis \ (H_0\) in favor of the alternative \ (H_a\) presented, or.

  4. Hypothesis Testing

    A hypothesis test is a statistical inference method used to test the significance of a proposed (hypothesized) relation between population statistics (parameters) and their corresponding sample estimators. In other words, hypothesis tests are used to determine if there is enough evidence in a sample to prove a hypothesis true for the entire population. The test considers two hypotheses: the ...

  5. What Is a Hypothesis Test?

    The null hypothesis is that there is no change in fatality rate, while the alternative hypothesis is that the fatality rate has decreased. A hypothesis test is performed on data collected for 24 months before and 24 months after the feature is built. Again, the p-value was 0.02.

  6. 9.E: Hypothesis Testing with One Sample (Exercises)

    An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized-approximately 1,200 students-small city demographic) to determine if the local high school's percentage was lower. One hundred fifty students were chosen at random and surveyed.

  7. 10.2: Null and Alternative Hypotheses

    The alternative hypothesis ( Ha H a) is a claim about the population that is contradictory to H0 H 0 and what we conclude when we reject H0 H 0. Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample ...

  8. 9.1: Introduction to Hypothesis Testing

    This page titled 9.1: Introduction to Hypothesis Testing is shared under a CC BY 2.0 license and was authored, remixed, and/or curated by Kyle Siegrist ( Random Services) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request. In hypothesis testing, the goal is ...

  9. Hypothesis testing

    A test of hypothesis is usually carried out by explicitly or implicitly subdividing the support into two disjoint subsets. One of the two subsets, denoted by is called the critical region (or rejection region) and it is the set of all values of for which the null hypothesis is rejected:

  10. Introduction to Hypothesis Testing

    Step 3: Collect Data and Compute Sample Statistics. After collecting the data, we find the sample mean. Now we can compare the sample mean with the null hypothesis by computing a z-score that describes where the sample mean is located relative to the hypothesized population mean. We use the z-score formula. Step 4: Make a Decision.

  11. Hypothesis Test (Comprehensive Walkthrough)

    Overview of hypothesis tests for proportions and understanding the p-value and significance level. Test for significance using a one-tail z-test (Examples #1-2) Construct a hypothesis test for a two-tail z-test (Examples#3) Create a hypothesis test and provide a confidence interval (Example #4) How to create a hypothesis test for the difference ...

  12. 8.1.1: Introduction to Hypothesis Testing Part 1

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  13. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  14. Hypothesis Testing -- from Wolfram MathWorld

    Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps. 1. Formulate the null hypothesis H_0 (commonly, that the observations are the result of pure chance) and the alternative hypothesis H_a (commonly, that the observations show a real effect combined with a component of chance ...

  15. 12.3: Steps in Hypothesis Testing

    Set up two contradictory hypotheses. Collect sample data (in homework problems, the data or summary statistics will be given to you). Determine the correct distribution to perform the hypothesis test. Analyze sample data by performing the calculations that ultimately will allow you to reject or decline to reject the null hypothesis.

  16. Exercises

    The test statistic for hypothesis tests involving a single proportion is given by: z = pˆ − p pq n−−−√ z = p ^ − p p q n. Find the value of the test statistic for the claim that the proportion of peas with yellow pods equals 0.25 0.25, where the sample involved includes 580 580 peas with 152 152 of them having yellow pods. z = 0.67 ...

  17. 8.4: Hypothesis Test Examples for Proportions

    Math 40: Statistics and Probability 8: Hypothesis Testing with One Sample 8.4: Hypothesis Test Examples for Proportions Expand/collapse global location 8.4: Hypothesis Test Examples for Proportions ... In a hypothesis test problem, you may see words such as "the level of significance is 1%." The "1%" is the preconceived or preset \(\alpha\).

  18. Hypothesis testing

    Step 4: Your sample score of 27 needs to be converted into a Z value. To calculate Z = (27-19)/4= 2 ( check the Converting into Z scores section if you need to review how to do this process) Step 5: A 'Z' score of 2 is more extreme than the cut off Z of +1.96 (see figure above). The result is significant and, thus, the null hypothesis is ...

  19. Hypothesis Testing Problems

    This statistics video tutorial provides practice problems on hypothesis testing. It explains how to tell if you should accept or reject the null hypothesis....

  20. Hypothesis test

    Hypothesis test. A significance test, also referred to as a statistical hypothesis test, is a method of statistical inference in which observed data is compared to a claim (referred to as a hypothesis) in order to assess the truth of the claim. For example, one might wonder whether age affects the number of apples a person can eat, and may use a significance test to determine whether there is ...

  21. Hypothesis Testing Solved Examples(Questions and Solutions)

    View Solution to Question 1. Question 2. A professor wants to know if her introductory statistics class has a good grasp of basic math. Six students are chosen at random from the class and given a math proficiency test. The professor wants the class to be able to score above 70 on the test. The six students get the following scores:62, 92, 75 ...

  22. The relationship between numerical magnitude processing and math

    Math anxiety can be defined as "[…] a feeling of tension and anxiety that interferes with the manipulation of numbers and the solving of mathematical problems in a wide variety of ordinary life and academic situations" (Richardson & Suinn 1972, p. 551).It is a multidimensional construct whose various types have been tested by other researchers, e.g., math learning anxiety, math testing ...

  23. The effect of applying of interactive learning multimedia on

    The problem that occurs at Junior High School 43 Padang is the low mathematics outcomes of learning achieved by students. The use of interactive learning multimedia is the solution that is applied in this problem. ... The research hypothesis is analyzed by t-test. P-value = 0.000 at significance level is obtained based on the data analysis ...