• Privacy Policy

Research Method

Home » Correlational Research – Methods, Types and Examples

Correlational Research – Methods, Types and Examples

Table of Contents

Correlational Research Design

Correlational Research

Correlational Research is a type of research that examines the statistical relationship between two or more variables without manipulating them. It is a non-experimental research design that seeks to establish the degree of association or correlation between two or more variables.

Types of Correlational Research

There are three types of correlational research:

Positive Correlation

A positive correlation occurs when two variables increase or decrease together. This means that as one variable increases, the other variable also tends to increase. Similarly, as one variable decreases, the other variable also tends to decrease. For example, there is a positive correlation between the amount of time spent studying and academic performance. The more time a student spends studying, the higher their academic performance is likely to be. Similarly, there is a positive correlation between a person’s age and their income level. As a person gets older, they tend to earn more money.

Negative Correlation

A negative correlation occurs when one variable increases while the other decreases. This means that as one variable increases, the other variable tends to decrease. Similarly, as one variable decreases, the other variable tends to increase. For example, there is a negative correlation between the number of hours spent watching TV and physical activity level. The more time a person spends watching TV, the less physically active they are likely to be. Similarly, there is a negative correlation between the amount of stress a person experiences and their overall happiness. As stress levels increase, happiness levels tend to decrease.

Zero Correlation

A zero correlation occurs when there is no relationship between two variables. This means that the variables are unrelated and do not affect each other. For example, there is zero correlation between a person’s shoe size and their IQ score. The size of a person’s feet has no relationship to their level of intelligence. Similarly, there is zero correlation between a person’s height and their favorite color. The two variables are unrelated to each other.

Correlational Research Methods

Correlational research can be conducted using different methods, including:

Surveys are a common method used in correlational research. Researchers collect data by asking participants to complete questionnaires or surveys that measure different variables of interest. Surveys are useful for exploring the relationships between variables such as personality traits, attitudes, and behaviors.

Observational Studies

Observational studies involve observing and recording the behavior of participants in natural settings. Researchers can use observational studies to examine the relationships between variables such as social interactions, group dynamics, and communication patterns.

Archival Data

Archival data involves using existing data sources such as historical records, census data, or medical records to explore the relationships between variables. Archival data is useful for investigating the relationships between variables that cannot be manipulated or controlled.

Experimental Design

While correlational research does not involve manipulating variables, researchers can use experimental design to establish cause-and-effect relationships between variables. Experimental design involves manipulating one variable while holding other variables constant to determine the effect on the dependent variable.

Meta-Analysis

Meta-analysis involves combining and analyzing the results of multiple studies to explore the relationships between variables across different contexts and populations. Meta-analysis is useful for identifying patterns and inconsistencies in the literature and can provide insights into the strength and direction of relationships between variables.

Data Analysis Methods

Correlational research data analysis methods depend on the type of data collected and the research questions being investigated. Here are some common data analysis methods used in correlational research:

Correlation Coefficient

A correlation coefficient is a statistical measure that quantifies the strength and direction of the relationship between two variables. The correlation coefficient ranges from -1 to +1, with -1 indicating a perfect negative correlation, +1 indicating a perfect positive correlation, and 0 indicating no correlation. Researchers use correlation coefficients to determine the degree to which two variables are related.

Scatterplots

A scatterplot is a graphical representation of the relationship between two variables. Each data point on the plot represents a single observation. The x-axis represents one variable, and the y-axis represents the other variable. The pattern of data points on the plot can provide insights into the strength and direction of the relationship between the two variables.

Regression Analysis

Regression analysis is a statistical method used to model the relationship between two or more variables. Researchers use regression analysis to predict the value of one variable based on the value of another variable. Regression analysis can help identify the strength and direction of the relationship between variables, as well as the degree to which one variable can be used to predict the other.

Factor Analysis

Factor analysis is a statistical method used to identify patterns among variables. Researchers use factor analysis to group variables into factors that are related to each other. Factor analysis can help identify underlying factors that influence the relationship between two variables.

Path Analysis

Path analysis is a statistical method used to model the relationship between multiple variables. Researchers use path analysis to test causal models and identify direct and indirect effects between variables.

Applications of Correlational Research

Correlational research has many practical applications in various fields, including:

  • Psychology : Correlational research is commonly used in psychology to explore the relationships between variables such as personality traits, behaviors, and mental health outcomes. For example, researchers may use correlational research to examine the relationship between anxiety and depression, or the relationship between self-esteem and academic achievement.
  • Education : Correlational research is useful in educational research to explore the relationships between variables such as teaching methods, student motivation, and academic performance. For example, researchers may use correlational research to examine the relationship between student engagement and academic success, or the relationship between teacher feedback and student learning outcomes.
  • Business : Correlational research can be used in business to explore the relationships between variables such as consumer behavior, marketing strategies, and sales outcomes. For example, marketers may use correlational research to examine the relationship between advertising spending and sales revenue, or the relationship between customer satisfaction and brand loyalty.
  • Medicine : Correlational research is useful in medical research to explore the relationships between variables such as risk factors, disease outcomes, and treatment effectiveness. For example, researchers may use correlational research to examine the relationship between smoking and lung cancer, or the relationship between exercise and heart health.
  • Social Science : Correlational research is commonly used in social science research to explore the relationships between variables such as socioeconomic status, cultural factors, and social behavior. For example, researchers may use correlational research to examine the relationship between income and voting behavior, or the relationship between cultural values and attitudes towards immigration.

Examples of Correlational Research

  • Psychology : Researchers might be interested in exploring the relationship between two variables, such as parental attachment and anxiety levels in young adults. The study could involve measuring levels of attachment and anxiety using established scales or questionnaires, and then analyzing the data to determine if there is a correlation between the two variables. This information could be useful in identifying potential risk factors for anxiety in young adults, and in developing interventions that could help improve attachment and reduce anxiety.
  • Education : In a correlational study in education, researchers might investigate the relationship between two variables, such as teacher engagement and student motivation in a classroom setting. The study could involve measuring levels of teacher engagement and student motivation using established scales or questionnaires, and then analyzing the data to determine if there is a correlation between the two variables. This information could be useful in identifying strategies that teachers could use to improve student motivation and engagement in the classroom.
  • Business : Researchers might explore the relationship between two variables, such as employee satisfaction and productivity levels in a company. The study could involve measuring levels of employee satisfaction and productivity using established scales or questionnaires, and then analyzing the data to determine if there is a correlation between the two variables. This information could be useful in identifying factors that could help increase productivity and improve job satisfaction among employees.
  • Medicine : Researchers might examine the relationship between two variables, such as smoking and the risk of developing lung cancer. The study could involve collecting data on smoking habits and lung cancer diagnoses, and then analyzing the data to determine if there is a correlation between the two variables. This information could be useful in identifying risk factors for lung cancer and in developing interventions that could help reduce smoking rates.
  • Sociology : Researchers might investigate the relationship between two variables, such as income levels and political attitudes. The study could involve measuring income levels and political attitudes using established scales or questionnaires, and then analyzing the data to determine if there is a correlation between the two variables. This information could be useful in understanding how socioeconomic factors can influence political beliefs and attitudes.

How to Conduct Correlational Research

Here are the general steps to conduct correlational research:

  • Identify the Research Question : Start by identifying the research question that you want to explore. It should involve two or more variables that you want to investigate for a correlation.
  • Choose the research method: Decide on the research method that will be most appropriate for your research question. The most common methods for correlational research are surveys, archival research, and naturalistic observation.
  • Choose the Sample: Select the participants or data sources that you will use in your study. Your sample should be representative of the population you want to generalize the results to.
  • Measure the variables: Choose the measures that will be used to assess the variables of interest. Ensure that the measures are reliable and valid.
  • Collect the Data: Collect the data from your sample using the chosen research method. Be sure to maintain ethical standards and obtain informed consent from your participants.
  • Analyze the data: Use statistical software to analyze the data and compute the correlation coefficient. This will help you determine the strength and direction of the correlation between the variables.
  • Interpret the results: Interpret the results and draw conclusions based on the findings. Consider any limitations or alternative explanations for the results.
  • Report the findings: Report the findings of your study in a research report or manuscript. Be sure to include the research question, methods, results, and conclusions.

Purpose of Correlational Research

The purpose of correlational research is to examine the relationship between two or more variables. Correlational research allows researchers to identify whether there is a relationship between variables, and if so, the strength and direction of that relationship. This information can be useful for predicting and explaining behavior, and for identifying potential risk factors or areas for intervention.

Correlational research can be used in a variety of fields, including psychology, education, medicine, business, and sociology. For example, in psychology, correlational research can be used to explore the relationship between personality traits and behavior, or between early life experiences and later mental health outcomes. In education, correlational research can be used to examine the relationship between teaching practices and student achievement. In medicine, correlational research can be used to investigate the relationship between lifestyle factors and disease outcomes.

Overall, the purpose of correlational research is to provide insight into the relationship between variables, which can be used to inform further research, interventions, or policy decisions.

When to use Correlational Research

Here are some situations when correlational research can be particularly useful:

  • When experimental research is not possible or ethical: In some situations, it may not be possible or ethical to manipulate variables in an experimental design. In these cases, correlational research can be used to explore the relationship between variables without manipulating them.
  • When exploring new areas of research: Correlational research can be useful when exploring new areas of research or when researchers are unsure of the direction of the relationship between variables. Correlational research can help identify potential areas for further investigation.
  • When testing theories: Correlational research can be useful for testing theories about the relationship between variables. Researchers can use correlational research to examine the relationship between variables predicted by a theory, and to determine whether the theory is supported by the data.
  • When making predictions: Correlational research can be used to make predictions about future behavior or outcomes. For example, if there is a strong positive correlation between education level and income, one could predict that individuals with higher levels of education will have higher incomes.
  • When identifying risk factors: Correlational research can be useful for identifying potential risk factors for negative outcomes. For example, a study might find a positive correlation between drug use and depression, indicating that drug use could be a risk factor for depression.

Characteristics of Correlational Research

Here are some common characteristics of correlational research:

  • Examines the relationship between two or more variables: Correlational research is designed to examine the relationship between two or more variables. It seeks to determine if there is a relationship between the variables, and if so, the strength and direction of that relationship.
  • Non-experimental design: Correlational research is typically non-experimental in design, meaning that the researcher does not manipulate any variables. Instead, the researcher observes and measures the variables as they naturally occur.
  • Cannot establish causation : Correlational research cannot establish causation, meaning that it cannot determine whether one variable causes changes in another variable. Instead, it only provides information about the relationship between the variables.
  • Uses statistical analysis: Correlational research relies on statistical analysis to determine the strength and direction of the relationship between variables. This may include calculating correlation coefficients, regression analysis, or other statistical tests.
  • Observes real-world phenomena : Correlational research is often used to observe real-world phenomena, such as the relationship between education and income or the relationship between stress and physical health.
  • Can be conducted in a variety of fields : Correlational research can be conducted in a variety of fields, including psychology, sociology, education, and medicine.
  • Can be conducted using different methods: Correlational research can be conducted using a variety of methods, including surveys, observational studies, and archival studies.

Advantages of Correlational Research

There are several advantages of using correlational research in a study:

  • Allows for the exploration of relationships: Correlational research allows researchers to explore the relationships between variables in a natural setting without manipulating any variables. This can help identify possible relationships between variables that may not have been previously considered.
  • Useful for predicting behavior: Correlational research can be useful for predicting future behavior. If a strong correlation is found between two variables, researchers can use this information to predict how changes in one variable may affect the other.
  • Can be conducted in real-world settings: Correlational research can be conducted in real-world settings, which allows for the collection of data that is representative of real-world phenomena.
  • Can be less expensive and time-consuming than experimental research: Correlational research is often less expensive and time-consuming than experimental research, as it does not involve manipulating variables or creating controlled conditions.
  • Useful in identifying risk factors: Correlational research can be used to identify potential risk factors for negative outcomes. By identifying variables that are correlated with negative outcomes, researchers can develop interventions or policies to reduce the risk of negative outcomes.
  • Useful in exploring new areas of research: Correlational research can be useful in exploring new areas of research, particularly when researchers are unsure of the direction of the relationship between variables. By conducting correlational research, researchers can identify potential areas for further investigation.

Limitation of Correlational Research

Correlational research also has several limitations that should be taken into account:

  • Cannot establish causation: Correlational research cannot establish causation, meaning that it cannot determine whether one variable causes changes in another variable. This is because it is not possible to control all possible confounding variables that could affect the relationship between the variables being studied.
  • Directionality problem: The directionality problem refers to the difficulty of determining which variable is influencing the other. For example, a correlation may exist between happiness and social support, but it is not clear whether social support causes happiness, or whether happy people are more likely to have social support.
  • Third variable problem: The third variable problem refers to the possibility that a third variable, not included in the study, is responsible for the observed relationship between the two variables being studied.
  • Limited generalizability: Correlational research is often limited in terms of its generalizability to other populations or settings. This is because the sample studied may not be representative of the larger population, or because the variables studied may behave differently in different contexts.
  • Relies on self-reported data: Correlational research often relies on self-reported data, which can be subject to social desirability bias or other forms of response bias.
  • Limited in explaining complex behaviors: Correlational research is limited in explaining complex behaviors that are influenced by multiple factors, such as personality traits, situational factors, and social context.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Case Study Research

Case Study – Methods, Examples and Guide

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

What is correlation analysis?

Last updated

11 May 2023

Reviewed by

Miroslav Damyanov

Correlation analysis is a staple of data analytics. It’s a commonly used method to measure the relationship between two variables. It helps researchers understand the extent to which changes to the value in one variable are associated with changes to the value in the other. 

Correlations are often misused and misunderstood, especially in the insight industry. Below is a helpful guide to help you understand the basics and mechanics of correlation analysis. 

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • Definition of correlation analysis

Correlation analysis, also known as bivariate, is a statistical test primarily used to identify and explore linear relationships between two variables and then determine the strength and direction of that relationship. It’s mainly used to spot patterns within datasets. 

It’s worth noting that correlation doesn't equate to causation. In essence, one cannot infer a cause-and-effect relationship between the two types of data with correlation analysis. However, you can determine the relationship's size, degree, and direction. 

  • Strength of the correlation

The degree of association in correlation analysis is measured by a correlation coefficient. The Pearson correlation, which is denoted by r , is the most commonly used coefficient. The correlation coefficient quantifies the degree of linear association between two variables and can take values between -1 and +1.

No correlation: This is when the value r is zero.

Low degree: A small correlation is when r lies below ± .29

Moderate degree: If the value of the correlation coefficient is between ± 0.30 and ± 0.49, then there’s a medium correlation.

High degree: When the correlation coefficient takes a value between ±0.50 and ±1, it indicates a strong correlation.

Perfect: A perfect correlation occurs when the value of r is near ±1, indicating that as one variable increases, the other variable either increases (if positive) or decreases (if negative). 

  • Direction of the correlation

You can also identify the direction of the linear relationship between two variables by the correlation coefficient's sign. 

Positive correlation

Scores from +0.5 to +1 indicate a robust positive correlation, meaning they both increase simultaneously.

Negative correlation

Scores from -0.5 to -1 indicate a sturdy negative correlation, meaning that as a single variable increases, the other reduces proportionally. 

No correlation

If the correlation coefficient is 0, it means there’s no correlation or relationship between the two variables being analyzed. It's worth noting that increasing the sample size can lead to more precise and accurate results.

Significance of the correlation 

Once we learn about the strength and direction of the correlation, it’s critical to evaluate whether the observed correlation is likely to have occurred by chance or whether it’s a real relationship between the two variables. Therefore, we need to test the correlation for significance. The most common method for determining the significance of a correlation coefficient is by conducting a hypothesis test. 

The hypothesis test (t-test) helps us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero." We decide this based on the sample correlation coefficient ( r ) and the sample size (n). 

As with other hypothesis tests, the significance level is set first, generally at 5%. If the t-test yields a p-value below 5%, we can conclude that the correlation coefficient is significantly different from zero. Furthermore, we simply say that the correlation coefficient is "significant." Otherwise, we wouldn’t have enough evidence to conclude that there’s a true linear relationship between the two variables.

In general, the larger the correlation coefficient ( r ) and sample size (n), the more likely it is that the correlation is statistically significant. However, it's important to remember that a significant correlation doesn’t necessarily imply causation between the two variables. 

  • What factors affect a correlation analysis?

Below are the factors you must consider when arranging a correlation analysis:

Performing a correlation analysis is only appropriate if there’s evidence of a linear relationship between the quantitative variables. You can use a scatter plot to assess linearity. If you can’t draw a straight line between the points, a correlation analysis isn’t recommended.

Ensure you draw a dispersed plot since it assists in glancing and uncovering exceptions, heteroscedasticity, and non-linear relations.

Avoid analyzing correlations when information is rehashed proportions of a similar variable from a similar individual at the equivalent or changed time focus.

The existing sample size should be determined a priori. 

  • Uses of correlation analysis

Correlation analysis is primarily used to quantify the degree to which two variables relate. By using correlation analysis, researchers evaluate the correlation coefficient that tells them to what degree one variable changes when the other changes too. It provides researchers with a linear relationship between two variables. 

Correlation analysis is used by marketers to evaluate the efficiency of a marketing campaign by monitoring and analyzing customers' reactions to various marketing tactics. As such, they can better understand and serve their customers. 

Another use of correlation analysis is among data scientists and experts tasked with data monitoring. They can use correlation analysis for root cause analysis and minimize Time To Deduction (TTD) and Time To Remediation (TTR). 

Different anomalies or two unusual events happening simultaneously or at the same rate can help identify the exact cause of an issue. As a result, users incur a lower cost of experiencing the issue if they can understand and fix it soon using correlation analysis. 

  • What is the business value of correlation analysis?

Correlation analysis has numerous business values, including identifying potential inputs for more complex analyses and testing for future changes while holding other factors constant. 

Additionally, businesses can use correlation analysis to understand the relationship between two variables. This type of analysis is easy to interpret and comprehend, as it focuses on the variance of one data row in relation to another dataset.

One of the primary business values of correlation analysis is its ability to identify hidden issues within a company. For example, if there’s a positive correlation between customers looking at reviews for a particular product and whether or not they purchase it, this could indicate a place where testing can provide more information. 

By testing whether increasing the number of people who look at positive product reviews leads to an increase in purchases, businesses can develop hypotheses to improve their products and services.

Correlation analysis can also help businesses diagnose problems with multiple regression models. For instance, if a multivariate or multiple regression model isn’t producing the expected results or if independent variables are not truly independent, correlation analysis can help discover these issues.

In digital environments, correlations can be especially helpful in fueling different hypotheses that can then be rapidly tested. This is because the testing can be low risk and not require a significant investment of time or money. 

With the abundance of data available to businesses, they must be careful in selecting the variables they’ll analyze. By doing so, they can uncover previously hidden relationships between variables and gain insights that can help them make data-driven decisions. 

  • Correlation ≠ causation

As previously stated, correlation doesn't strictly imply causation, even when you identify a significant relationship by correlation analysis techniques. You can’t determine the cause by the analysis.

The significant relationship implies that there’s much more to comprehend. Additionally, it implies that there are underlying and extraneous factors that you must further explore to look for a cause. Despite the possibility of a causal relationship existing, it would be irresponsible for researchers to utilize the correlation results as proof of such existence. 

  • Example of correlation analysis

A real-life example of correlation analysis is health improvement vs. medical dose reductions. Medical researchers can use a correlation study in clinical trials to better comprehend how a newly-developed drug impacts patients. 

If a patient's health improves due to taking the drug regularly, there’s a positive correlation. Conversely, if the patient's health deteriorates or doesn't improve, there’s no correlation between the two variables (health and the drug).

What is the difference between correlation and correlation analysis?

Correlation shows us the direction and strength of a relationship between two variables. It’s expressed numerically by the correlation coefficient. Correlation analysis, on the other hand, is a statistical test that reveals the relationship between two variables/datasets.

What are correlation and regression?

Regression and correlation are the most popular methods used to examine the linear relationship between two quantitative variables. Correlation measures how strong the relationship is between a pair of variables, while regression is used to describe the relationship as an equation. 

What is the purpose of correlation?

Correlation analysis can help you to identify possible inputs for a more refined analysis. You can also use it to test for future changes while holding other things constant. The whole purpose of using correlations in research is to determine which variables are connected.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 11 January 2024

Last updated: 15 January 2024

Last updated: 17 January 2024

Last updated: 12 May 2023

Last updated: 30 April 2024

Last updated: 18 May 2023

Last updated: 25 November 2023

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

what is a research correlational analysis

Users report unexpectedly high data usage, especially during streaming sessions.

what is a research correlational analysis

Users find it hard to navigate from the home page to relevant playlists in the app.

what is a research correlational analysis

It would be great to have a sleep timer feature, especially for bedtime listening.

what is a research correlational analysis

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Correlational Research | Guide, Design & Examples

Correlational Research | Guide, Design & Examples

Published on 5 May 2022 by Pritha Bhandari . Revised on 5 December 2022.

A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them.

A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

Table of contents

Correlational vs experimental research, when to use correlational research, how to collect correlational data, how to analyse correlational data, correlation and causation, frequently asked questions about correlational research.

Correlational and experimental research both use quantitative methods to investigate relationships between variables. But there are important differences in how data is collected and the types of conclusions you can draw.

Prevent plagiarism, run a free check.

Correlational research is ideal for gathering data quickly from natural settings. That helps you generalise your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships

You want to find out if there is an association between two variables, but you don’t expect to find a causal relationship between them.

Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions.

To explore causal relationships between variables

You think there is a causal relationship between two variables, but it is impractical, unethical, or too costly to conduct experimental research that manipulates one of the variables.

Correlational research can provide initial indications or additional support for theories about causal relationships.

To test new measurement tools

You have developed a new instrument for measuring your variable, and you need to test its reliability or validity .

Correlational research can be used to assess whether a tool consistently or accurately captures the concept it aims to measure.

There are many different methods you can use in correlational research. In the social and behavioural sciences, the most common data collection methods for this type of research include surveys, observations, and secondary data.

It’s important to carefully choose and plan your methods to ensure the reliability and validity of your results. You should carefully select a representative sample so that your data reflects the population you’re interested in without bias .

In survey research , you can use questionnaires to measure your variables of interest. You can conduct surveys online, by post, by phone, or in person.

Surveys are a quick, flexible way to collect standardised data from many participants, but it’s important to ensure that your questions are worded in an unbiased way and capture relevant insights.

Naturalistic observation

Naturalistic observation is a type of field research where you gather data about a behaviour or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorising actions and events. Naturalistic observation can include both qualitative and quantitative elements, but to assess correlation, you collect data that can be analysed quantitatively (e.g., frequencies, durations, scales, and amounts).

Naturalistic observation lets you easily generalise your results to real-world contexts, and you can study experiences that aren’t replicable in lab settings. But data analysis can be time-consuming and unpredictable, and researcher bias may skew the interpretations.

Secondary data

Instead of collecting original data, you can also use data that has already been collected for a different purpose, such as official records, polls, or previous studies.

Using secondary data is inexpensive and fast, because data collection is complete. However, the data may be unreliable, incomplete, or not entirely relevant, and you have no control over the reliability or validity of the data collection procedures.

After collecting data, you can statistically analyse the relationship between variables using correlation or regression analyses, or both. You can also visualise the relationships between variables with a scatterplot.

Different types of correlation coefficients and regression analyses are appropriate for your data based on their levels of measurement and distributions .

Correlation analysis

Using a correlation analysis, you can summarise the relationship between variables into a correlation coefficient : a single number that describes the strength and direction of the relationship between variables. With this number, you’ll quantify the degree of the relationship between variables.

The Pearson product-moment correlation coefficient, also known as Pearson’s r , is commonly used for assessing a linear relationship between two quantitative variables.

Correlation coefficients are usually found for two variables at a time, but you can use a multiple correlation coefficient for three or more variables.

Regression analysis

With a regression analysis , you can predict how much a change in one variable will be associated with a change in the other variable. The result is a regression equation that describes the line on a graph of your variables.

You can use this equation to predict the value of one variable based on the given value(s) of the other variable(s). It’s best to perform a regression analysis after testing for a correlation between your variables.

It’s important to remember that correlation does not imply causation . Just because you find a correlation between two things doesn’t mean you can conclude one of them causes the other, for a few reasons.

Directionality problem

If two variables are correlated, it could be because one of them is a cause and the other is an effect. But the correlational research design doesn’t allow you to infer which is which. To err on the side of caution, researchers don’t conclude causality from correlational studies.

Third variable problem

A confounding variable is a third variable that influences other variables to make them seem causally related even though they are not. Instead, there are separate causal links between the confounder and each variable.

In correlational research, there’s limited or no researcher control over extraneous variables . Even if you statistically control for some potential confounders, there may still be other hidden variables that disguise the relationship between your study variables.

Although a correlational study can’t demonstrate causation on its own, it can help you develop a causal hypothesis that’s tested in controlled experiments.

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, December 05). Correlational Research | Guide, Design & Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/research-methods/correlational-research-design/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, a quick guide to experimental design | 5 steps & examples, quasi-experimental design | definition, types & examples, qualitative vs quantitative research | examples & methods.

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7.2 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of nonexperimental research.

What Is Correlational Research?

Correlational research is a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are essentially two reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms independent variable and dependent variable do not apply to this kind of research.

The other reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, Allen Kanner and his colleagues thought that the number of “daily hassles” (e.g., rude salespeople, heavy traffic) that people experience affects the number of physical and psychological symptoms they have (Kanner, Coyne, Schaefer, & Lazarus, 1981). But because they could not manipulate the number of daily hassles their participants experienced, they had to settle for measuring the number of daily hassles—along with the number of symptoms—using self-report questionnaires. Although the strong positive relationship they found between these two variables is consistent with their idea that hassles cause symptoms, it is also consistent with the idea that symptoms cause hassles or that some third variable (e.g., neuroticism) causes both.

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 7.2 “Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists” shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. It is how the study is conducted.

Figure 7.2 Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists

Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. However, because some approaches to data collection are strongly associated with correlational research, it makes sense to discuss them here. The two we will focus on are naturalistic observation and archival data. A third, survey research, is discussed in its own chapter.

Naturalistic Observation

Naturalistic observation is an approach to data collection that involves observing people’s behavior in the environment in which it typically occurs. Thus naturalistic observation is a type of field research (as opposed to a type of laboratory research). It could involve observing shoppers in a grocery store, children on a school playground, or psychiatric inpatients in their wards. Researchers engaged in naturalistic observation usually make their observations as unobtrusively as possible so that participants are often not aware that they are being studied. Ethically, this is considered to be acceptable if the participants remain anonymous and the behavior occurs in a public setting where people would not normally have an expectation of privacy. Grocery shoppers putting items into their shopping carts, for example, are engaged in public behavior that is easily observable by store employees and other shoppers. For this reason, most researchers would consider it ethically acceptable to observe them for a study. On the other hand, one of the arguments against the ethicality of the naturalistic observation of “bathroom behavior” discussed earlier in the book is that people have a reasonable expectation of privacy even in a public restroom and that this expectation was violated.

Researchers Robert Levine and Ara Norenzayan used naturalistic observation to study differences in the “pace of life” across countries (Levine & Norenzayan, 1999). One of their measures involved observing pedestrians in a large city to see how long it took them to walk 60 feet. They found that people in some countries walked reliably faster than people in other countries. For example, people in the United States and Japan covered 60 feet in about 12 seconds on average, while people in Brazil and Romania took close to 17 seconds.

Because naturalistic observation takes place in the complex and even chaotic “real world,” there are two closely related issues that researchers must deal with before collecting data. The first is sampling. When, where, and under what conditions will the observations be made, and who exactly will be observed? Levine and Norenzayan described their sampling process as follows:

Male and female walking speed over a distance of 60 feet was measured in at least two locations in main downtown areas in each city. Measurements were taken during main business hours on clear summer days. All locations were flat, unobstructed, had broad sidewalks, and were sufficiently uncrowded to allow pedestrians to move at potentially maximum speeds. To control for the effects of socializing, only pedestrians walking alone were used. Children, individuals with obvious physical handicaps, and window-shoppers were not timed. Thirty-five men and 35 women were timed in most cities. (p. 186)

Precise specification of the sampling process in this way makes data collection manageable for the observers, and it also provides some control over important extraneous variables. For example, by making their observations on clear summer days in all countries, Levine and Norenzayan controlled for effects of the weather on people’s walking speeds.

The second issue is measurement. What specific behaviors will be observed? In Levine and Norenzayan’s study, measurement was relatively straightforward. They simply measured out a 60-foot distance along a city sidewalk and then used a stopwatch to time participants as they walked over that distance. Often, however, the behaviors of interest are not so obvious or objective. For example, researchers Robert Kraut and Robert Johnston wanted to study bowlers’ reactions to their shots, both when they were facing the pins and then when they turned toward their companions (Kraut & Johnston, 1979). But what “reactions” should they observe? Based on previous research and their own pilot testing, Kraut and Johnston created a list of reactions that included “closed smile,” “open smile,” “laugh,” “neutral face,” “look down,” “look away,” and “face cover” (covering one’s face with one’s hands). The observers committed this list to memory and then practiced by coding the reactions of bowlers who had been videotaped. During the actual study, the observers spoke into an audio recorder, describing the reactions they observed. Among the most interesting results of this study was that bowlers rarely smiled while they still faced the pins. They were much more likely to smile after they turned toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.

A woman bowling

Naturalistic observation has revealed that bowlers tend to smile when they turn away from the pins and toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.

sieneke toering – bowling big lebowski style – CC BY-NC-ND 2.0.

When the observations require a judgment on the part of the observers—as in Kraut and Johnston’s study—this process is often described as coding . Coding generally requires clearly defining a set of target behaviors. The observers then categorize participants individually in terms of which behavior they have engaged in and the number of times they engaged in each behavior. The observers might even record the duration of each behavior. The target behaviors must be defined in such a way that different observers code them in the same way. This is the issue of interrater reliability. Researchers are expected to demonstrate the interrater reliability of their coding procedure by having multiple raters code the same behaviors independently and then showing that the different observers are in close agreement. Kraut and Johnston, for example, video recorded a subset of their participants’ reactions and had two observers independently code them. The two observers showed that they agreed on the reactions that were exhibited 97% of the time, indicating good interrater reliability.

Archival Data

Another approach to correlational research is the use of archival data , which are data that have already been collected for some other purpose. An example is a study by Brett Pelham and his colleagues on “implicit egotism”—the tendency for people to prefer people, places, and things that are similar to themselves (Pelham, Carvallo, & Jones, 2005). In one study, they examined Social Security records to show that women with the names Virginia, Georgia, Louise, and Florence were especially likely to have moved to the states of Virginia, Georgia, Louisiana, and Florida, respectively.

As with naturalistic observation, measurement can be more or less straightforward when working with archival data. For example, counting the number of people named Virginia who live in various states based on Social Security records is relatively straightforward. But consider a study by Christopher Peterson and his colleagues on the relationship between optimism and health using data that had been collected many years before for a study on adult development (Peterson, Seligman, & Vaillant, 1988). In the 1940s, healthy male college students had completed an open-ended questionnaire about difficult wartime experiences. In the late 1980s, Peterson and his colleagues reviewed the men’s questionnaire responses to obtain a measure of explanatory style—their habitual ways of explaining bad events that happen to them. More pessimistic people tend to blame themselves and expect long-term negative consequences that affect many aspects of their lives, while more optimistic people tend to blame outside forces and expect limited negative consequences. To obtain a measure of explanatory style for each participant, the researchers used a procedure in which all negative events mentioned in the questionnaire responses, and any causal explanations for them, were identified and written on index cards. These were given to a separate group of raters who rated each explanation in terms of three separate dimensions of optimism-pessimism. These ratings were then averaged to produce an explanatory style score for each participant. The researchers then assessed the statistical relationship between the men’s explanatory style as college students and archival measures of their health at approximately 60 years of age. The primary result was that the more optimistic the men were as college students, the healthier they were as older men. Pearson’s r was +.25.

This is an example of content analysis —a family of systematic approaches to measurement using complex archival data. Just as naturalistic observation requires specifying the behaviors of interest and then noting them as they occur, content analysis requires specifying keywords, phrases, or ideas and then finding all occurrences of them in the data. These occurrences can then be counted, timed (e.g., the amount of time devoted to entertainment topics on the nightly news show), or analyzed in a variety of other ways.

Key Takeaways

  • Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
  • Correlational research is not defined by where or how the data are collected. However, some approaches to data collection are strongly associated with correlational research. These include naturalistic observation (in which researchers observe people’s behavior in the context in which it normally occurs) and the use of archival data that were already collected for some other purpose.

Discussion: For each of the following, decide whether it is most likely that the study described is experimental or correlational and explain why.

  • An educational researcher compares the academic performance of students from the “rich” side of town with that of students from the “poor” side of town.
  • A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
  • A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
  • An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
  • A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
  • A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.

Kanner, A. D., Coyne, J. C., Schaefer, C., & Lazarus, R. S. (1981). Comparison of two modes of stress measurement: Daily hassles and uplifts versus major life events. Journal of Behavioral Medicine, 4 , 1–39.

Kraut, R. E., & Johnston, R. E. (1979). Social and emotional messages of smiling: An ethological approach. Journal of Personality and Social Psychology, 37 , 1539–1553.

Levine, R. V., & Norenzayan, A. (1999). The pace of life in 31 countries. Journal of Cross-Cultural Psychology, 30 , 178–205.

Pelham, B. W., Carvallo, M., & Jones, J. T. (2005). Implicit egotism. Current Directions in Psychological Science, 14 , 106–110.

Peterson, C., Seligman, M. E. P., & Vaillant, G. E. (1988). Pessimistic explanatory style is a risk factor for physical illness: A thirty-five year longitudinal study. Journal of Personality and Social Psychology, 55 , 23–27.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

6.2 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot  manipulate the independent variable because it is impossible, impractical, or unethical. For example, while I might be interested in the relationship between the frequency people use cannabis and their memory abilities I cannot ethically manipulate the frequency that people use cannabis. As such, I must rely on the correlational research strategy; I must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis use is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity. In contrast, correlational studies typically have low internal validity because nothing is manipulated or control but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .  These converging results provide strong evidence that there is a real relationship (indeed a causal relationship) between watching violent television and aggressive behavior.

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. A  negative relationship  is one in which higher scores on one variable tend to be associated with lower scores on the other. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 2.2 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms. The circled point represents a person whose stress score was 10 and who had three physical symptoms. Pearson’s r for these data is +.51.

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson’s  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 2.3 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 2.4 Hypothetical Nonlinear Relationship Between Sleep and Depression

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 12.10 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range.The overall correlation here is −.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0.

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations.

Some excellent and funny examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

Figure 2.5 Example of a Spurious Correlation Source: http://tylervigen.com/spurious-correlations (CC-BY 4.0)

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who determined how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in (because, again, it was the researcher who determined how much they exercised). Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Key Takeaways

  • Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
  • Correlation does not imply causation. A statistical relationship between two variables,  X  and  Y , does not necessarily mean that  X  causes  Y . It is also possible that  Y  causes  X , or that a third variable,  Z , causes both  X  and  Y .
  • While correlational research cannot be used to establish causal relationships between variables, correlational research does allow researchers to achieve many other important objectives (establishing reliability and validity, providing converging evidence, describing relationships and making predictions)
  • Correlation coefficients can range from -1 to +1. The sign indicates the direction of the relationship between the variables and the numerical value indicates the strength of the relationship.
  • A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
  • A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
  • An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
  • A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
  • A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.

2. Practice: For each of the following statistical relationships, decide whether the directionality problem is present and think of at least one plausible third variable.

  • People who eat more lobster tend to live longer.
  • People who exercise more tend to weigh less.
  • College students who drink more alcohol tend to have poorer grades.
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

Creative Commons License

Share This Book

  • Increase Font Size
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Correlation Studies in Psychology Research

Determining the relationship between two or more variables.

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what is a research correlational analysis

Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.

what is a research correlational analysis

Verywell / Brianna Gilmartin

  • Characteristics

Potential Pitfalls

Frequently asked questions.

A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables.

A correlation refers to a relationship between two variables. Correlations can be strong or weak and positive or negative. Sometimes, there is no correlation.

There are three possible outcomes of a correlation study: a positive correlation, a negative correlation, or no correlation. Researchers can present the results using a numerical value called the correlation coefficient, a measure of the correlation strength. It can range from –1.00 (negative) to +1.00 (positive). A correlation coefficient of 0 indicates no correlation.

  • Positive correlations : Both variables increase or decrease at the same time. A correlation coefficient close to +1.00 indicates a strong positive correlation.
  • Negative correlations : As the amount of one variable increases, the other decreases (and vice versa). A correlation coefficient close to -1.00 indicates a strong negative correlation.
  • No correlation : There is no relationship between the two variables. A correlation coefficient of 0 indicates no correlation.

Characteristics of a Correlational Study

Correlational studies are often used in psychology, as well as other fields like medicine. Correlational research is a preliminary way to gather information about a topic. The method is also useful if researchers are unable to perform an experiment.

Researchers use correlations to see if a relationship between two or more variables exists, but the variables themselves are not under the control of the researchers.

While correlational research can demonstrate a relationship between variables, it cannot prove that changing one variable will change another. In other words, correlational studies cannot prove cause-and-effect relationships.

When you encounter research that refers to a "link" or an "association" between two things, they are most likely talking about a correlational study.

Types of Correlational Research

There are three types of correlational research: naturalistic observation, the survey method, and archival research. Each type has its own purpose, as well as its pros and cons.

Naturalistic Observation

The naturalistic observation method involves observing and recording variables of interest in a natural setting without interference or manipulation.  

Can inspire ideas for further research

Option if lab experiment not available

Variables are viewed in natural setting

Can be time-consuming and expensive

Extraneous variables can't be controlled

No scientific control of variables

Subjects might behave differently if aware of being observed

This method is well-suited to studies where researchers want to see how variables behave in their natural setting or state.   Inspiration can then be drawn from the observations to inform future avenues of research.

In some cases, it might be the only method available to researchers; for example, if lab experimentation would be precluded by access, resources, or ethics. It might be preferable to not being able to conduct research at all, but the method can be costly and usually takes a lot of time.  

Naturalistic observation presents several challenges for researchers. For one, it does not allow them to control or influence the variables in any way nor can they change any possible external variables.

However, this does not mean that researchers will get reliable data from watching the variables, or that the information they gather will be free from bias.

For example, study subjects might act differently if they know that they are being watched. The researchers might not be aware that the behavior that they are observing is not necessarily the subject's natural state (i.e., how they would act if they did not know they were being watched).

Researchers also need to be aware of their biases, which can affect the observation and interpretation of a subject's behavior.  

Surveys and questionnaires are some of the most common methods used for psychological research. The survey method involves having a  random sample  of participants complete a survey, test, or questionnaire related to the variables of interest.   Random sampling is vital to the generalizability of a survey's results.

Cheap, easy, and fast

Can collect large amounts of data in a short amount of time

Results can be affected by poor survey questions

Results can be affected by unrepresentative sample

Outcomes can be affected by participants

If researchers need to gather a large amount of data in a short period of time, a survey is likely to be the fastest, easiest, and cheapest option.  

It's also a flexible method because it lets researchers create data-gathering tools that will help ensure they get the information they need (survey responses) from all the sources they want to use (a random sample of participants taking the survey).

Survey data might be cost-efficient and easy to get, but it has its downsides. For one, the data is not always reliable—particularly if the survey questions are poorly written or the overall design or delivery is weak.   Data is also affected by specific faults, such as unrepresented or underrepresented samples .

The use of surveys relies on participants to provide useful data. Researchers need to be aware of the specific factors related to the people taking the survey that will affect its outcome.

For example, some people might struggle to understand the questions. A person might answer a particular way to try to please the researchers or to try to control how the researchers perceive them (such as trying to make themselves "look better").

Sometimes, respondents might not even realize that their answers are incorrect or misleading because of mistaken memories .

Archival Research

Many areas of psychological research benefit from analyzing studies that were conducted long ago by other researchers, as well as reviewing historical records and case studies.

For example, in an experiment known as  "The Irritable Heart ," researchers used digitalized records containing information on American Civil War veterans to learn more about post-traumatic stress disorder (PTSD).

Large amount of data

Can be less expensive

Researchers cannot change participant behavior

Can be unreliable

Information might be missing

No control over data collection methods

Using records, databases, and libraries that are publicly accessible or accessible through their institution can help researchers who might not have a lot of money to support their research efforts.

Free and low-cost resources are available to researchers at all levels through academic institutions, museums, and data repositories around the world.

Another potential benefit is that these sources often provide an enormous amount of data that was collected over a very long period of time, which can give researchers a way to view trends, relationships, and outcomes related to their research.

While the inability to change variables can be a disadvantage of some methods, it can be a benefit of archival research. That said, using historical records or information that was collected a long time ago also presents challenges. For one, important information might be missing or incomplete and some aspects of older studies might not be useful to researchers in a modern context.

A primary issue with archival research is reliability. When reviewing old research, little information might be available about who conducted the research, how a study was designed, who participated in the research, as well as how data was collected and interpreted.

Researchers can also be presented with ethical quandaries—for example, should modern researchers use data from studies that were conducted unethically or with questionable ethics?

You've probably heard the phrase, "correlation does not equal causation." This means that while correlational research can suggest that there is a relationship between two variables, it cannot prove that one variable will change another.

For example, researchers might perform a correlational study that suggests there is a relationship between academic success and a person's self-esteem. However, the study cannot show that academic success changes a person's self-esteem.

To determine why the relationship exists, researchers would need to consider and experiment with other variables, such as the subject's social relationships, cognitive abilities, personality, and socioeconomic status.

The difference between a correlational study and an experimental study involves the manipulation of variables. Researchers do not manipulate variables in a correlational study, but they do control and systematically vary the independent variables in an experimental study. Correlational studies allow researchers to detect the presence and strength of a relationship between variables, while experimental studies allow researchers to look for cause and effect relationships.

If the study involves the systematic manipulation of the levels of a variable, it is an experimental study. If researchers are measuring what is already present without actually changing the variables, then is a correlational study.

The variables in a correlational study are what the researcher measures. Once measured, researchers can then use statistical analysis to determine the existence, strength, and direction of the relationship. However, while correlational studies can say that variable X and variable Y have a relationship, it does not mean that X causes Y.

The goal of correlational research is often to look for relationships, describe these relationships, and then make predictions. Such research can also often serve as a jumping off point for future experimental research. 

Heath W. Psychology Research Methods . Cambridge University Press; 2018:134-156.

Schneider FW. Applied Social Psychology . 2nd ed. SAGE; 2012:50-53.

Curtis EA, Comiskey C, Dempsey O. Importance and use of correlational research .  Nurse Researcher . 2016;23(6):20-25. doi:10.7748/nr.2016.e1382

Carpenter S. Visualizing Psychology . 3rd ed. John Wiley & Sons; 2012:14-30.

Pizarro J, Silver RC, Prause J. Physical and mental health costs of traumatic war experiences among civil war veterans .  Arch Gen Psychiatry . 2006;63(2):193. doi:10.1001/archpsyc.63.2.193

Post SG. The echo of Nuremberg: Nazi data and ethics .  J Med Ethics . 1991;17(1):42-44. doi:10.1136/jme.17.1.42

Lau F. Chapter 12 Methods for Correlational Studies . In: Lau F, Kuziemsky C, eds. Handbook of eHealth Evaluation: An Evidence-based Approach . University of Victoria.

Akoglu H. User's guide to correlation coefficients .  Turk J Emerg Med . 2018;18(3):91-93. doi:10.1016/j.tjem.2018.08.001

Price PC. Research Methods in Psychology . California State University.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Non-Experimental Research

29 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression, which is discussed further in the section on Complex Correlation in this chapter).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, while a researcher might be interested in the relationship between the frequency people use cannabis and their memory abilities they cannot ethically manipulate the frequency that people use cannabis. As such, they must rely on the correlational research strategy; they must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity as artificial conditions are introduced that do not exist in reality. In contrast, correlational studies typically have low internal validity because nothing is manipulated or controlled but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .

Does Correlational Research Always Involve Quantitative Variables?

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of daily hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 6.2 shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. What defines a study is how the study is conducted.

what is a research correlational analysis

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. In other words, they move in the same direction, either both up or both down. A negative relationship is one in which higher scores on one variable tend to be associated with lower scores on the other. In other words, they move in opposite directions. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson's  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations .

Some excellent and amusing examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

what is a research correlational analysis

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who used random assignment to determine how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in. Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Media Attributions

  • Nicholas Cage and Pool Drownings  © Tyler Viegen is licensed under a  CC BY (Attribution)  license
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

A graph that presents correlations between two quantitative variables, one on the x-axis and one on the y-axis. Scores are plotted at the intersection of the values on each axis.

A relationship in which higher scores on one variable tend to be associated with higher scores on the other.

A relationship in which higher scores on one variable tend to be associated with lower scores on the other.

A statistic that measures the strength of a correlation between quantitative variables.

When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.

The problem where two variables, X  and  Y , are statistically related either because X  causes  Y, or because  Y  causes  X , and thus the causal direction of the effect cannot be known.

Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y.

Correlations that are a result not of the two variables being measured, but rather because of a third, unmeasured, variable that affects both of the measured variables.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Introduction to correlation research.

what is a research correlational analysis

The PowerPoint presentation contains important information for this unit on correlations. Contact the instructor, [email protected] …if you have trouble viewing it.

Some content on this website may require the use of a plug-in, such as Microsoft PowerPoint .

When are correlation methods used?

  • They are used to determine the extent to which two or more variables are related among a single group of people (although sometimes each pair of score does not come from one person…the correlation between father’s and son’s height would not).
  • There is no attempt to manipulate the variables (random variables)

How is correlational research different from experimental research? In correlational research we do not (or at least try not to) influence any variables but only measure them and look for relations (correlations) between some set of variables, such as blood pressure and cholesterol level. In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level. Data analysis in experimental research also comes down to calculating “correlations” between variables, specifically, those manipulated and those affected by the manipulation. However, experimental data may potentially provide qualitatively better information: Only experimental data can conclusively demonstrate causal relations between variables. For example, if we found that whenever we change variable A then variable B changes, then we can conclude that “A influences B.” Data from correlational research can only be “interpreted” in causal terms based on some theories that we have, but correlational data cannot conclusively prove causality. Source: http://www.statsoft.com/textbook/stathome.html

Although a relationship between two variables does not prove that one caused the other, if there is no relationship between two variables then one cannot have caused the other.

Correlation research asks the question: What relationship exists?

  • A correlation has direction and can be either positive or negative (note exceptions listed later). With a positive correlation, individuals who score above (or below) the average (mean) on one measure tend to score similarly above (or below) the average on the other measure.  The scatterplot of a positive correlation rises (from left to right). With negative relationships, an individual who scores above average on one measure tends to score below average on the other (or vise verse). The scatterplot of a negative correlation falls (from left to right).
  • A correlation can differ in the degree or strength of the relationship (with the Pearson product-moment correlation coefficient that relationship is linear). Zero indicates no relationship between the two measures and r = 1.00 or r = -1.00 indicates a perfect relationship. The strength can be anywhere between 0 and + 1.00.  Note:  The symbol r is used to represent the Pearson product-moment correlation coefficient for a sample.  The Greek letter rho ( r ) is used for a population. The stronger the correlation–the closer the value of r (correlation coefficient) comes to + 1.00–the more the scatterplot will plot along a line.

When there is no relationship between the measures (variables), we say they are unrelated, uncorrelated, orthogonal, or independent .

Some Math for Bivariate Product Moment Correlation (not required for EPSY 5601): Multiple the z scores of each pair and add all of those products. Divide that by one less than the number of pairs of scores. (pretty easy)

Screenshot 2015-09-03 10.54.34

Rather than calculating the correlation coefficient with either of the formulas shown above, you can simply follow these linked directions for using the function built into Microsoft’s Excel .

Some correlation questions elementary students can investigate are What is the relationship between…

  • school attendance and grades in school?
  • hours spend each week doing homework and school grades?
  • length of arm span and height?
  • number of children in a family and the number of bedrooms in the house?

Correlations only describe the relationship, they do not prove cause and effect. Correlation is a necessary, but not a sufficient condition for determining causality.

There are Three Requirements to Infer a Causal Relationship

  • A statistically significant relationship between the variables
  • The causal variable occurred prior to the other variable
  • There are no other factors that could account for the cause

(Correlation studies do not meet the last requirement and may not meet the second requirement. However, not having a relationship does mean that one variable did not cause the other.)

There is a strong relationship between the number of ice cream cones sold and the number of people who drown each month.  Just because there is a relationship (strong correlation) does not mean that one caused the other.

If there is a relationship between A (ice cream cone sales) and B (drowning) it could be because

  • A->B (Eating ice cream causes drowning)
  • A<-B (Drowning cause people to eat ice cream– perhaps the mourners are so upset that they buy ice cream cones to cheer themselves)
  • A<-C->B (Something else is related to both ice cream sales and the number of drowning– warm weather would be a good guess)

The points is…just because there is a correlation, you CANNOT say that the one variable causes the other.  On the other hand, if there is NO correlations, you can say that one DID NOT cause the other (assuming the measures are valid and reliable).

Format for correlations research questions and hypotheses:

Question: Is there a (statistically significant) relationship between height and arm span? H O : There is no (statistically significant) relationship between height and arm span (H 0 : r =0). H A : There is a (statistically significant) relationship between height and arm span (H A : r <>0).

Coefficient of Determination (Shared Variation)

One way researchers often express the strength of the relationship between two variables is by squaring their correlation coefficient. This squared correlation coefficient is called a COEFFICIENT OF DETERMINATION. The coefficient of determination is useful because it gives the proportion of the variance of one variable that is predictable from the other variable.

Factors which could limit a product-moment correlation coefficient ( PowerPoint demonstrating these factors )

  • Homogenous group (the subjects are very similar on the variables)
  • Unreliable measurement instrument (your measurements can’t be trusted and bounce all over the place)
  • Nonlinear relationship (Pearson’s r is based on linear relationships…other formulas can be used in this case)
  • Ceiling or Floor with measurement (lots of scores clumped at the top or bottom…therefore no spread which creates a problem similar to the homogeneous group)

Assumptions one must meet in order to use the Pearson product-moment correlation

  • The measures are approximately normally distributed
  • The variance of the two measures is similar ( homoscedasticity ) — check with scatterplot
  • The relationship is linear — check with scatterplot
  • The sample represents the population
  • The variables are measured on a interval or ratio scale

There are different types of relationships: Linear – Nonlinear or Curvilinear – Non-monotonic (concave or cyclical). Different procedures are used to measure different types of relationships using different types of scales . The issue of measurement  scales   is very important for this class.  Be sure that you understand them.

Predictor and Criterion Variables (NOT NEEDED FOR EPSY 5601)

  • Multiple Correlation- lots of predictors and one criterion ( R )
  • Partial Correlation- correlation of two variables after their correlation with other variables is removed
  • Serial or Autocorrelation- correlation of a set of number with itself (only staggered one)
  • Canonical Correlation- lots of predictors and lots of criterion R c

When using a critical value table for Pearson’s product-moment correlation , the value found through the intersection of degree of freedom ( n – 2) and the alpha level you are testing ( p = .05) is the minimum r value needed in order for the relationship to be above chance alone.

The statistics package SPSS as well as Microsoft’s Excel can be used to calculate the correlation.

We will use Microsoft’s Excel .

Reading a Correlations Table in a Journal Article

Most research studies report the correlations among a set of variables. The results are presented in a table such as the one shown below.

Correlation table

The intersection of a row and column shows the correlation between the variable listed for the row and the variable listed for the column. For example, the intersection of the row mathematics and the column science shows that the correlation between mathematics and science was .874. The footnote states that the three *** after .874 indicate the relationship was statistically significant at p <.001.

Most tables do not report the perfect correlation along the diagonal that occurs when a variable is correlated with itself. In the example above, the diagonal was used to report the correlation of the four factors with a different variable. Because the correlation between reading and mathematics can be determined in the top section of the table, the correlations between those two variables is not repeated in the bottom half of the table. This is true for all of the relationships reported in the table.  .

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.com

Last updated 10/11/2015

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Media
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business Ethics
  • Business History
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic History
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Design and Analysis for Quantitative Research in Music Education

  • < Previous chapter
  • Next chapter >

6 Correlational Design and Analysis

  • Published: March 2018
  • Cite Icon Cite
  • Permissions Icon Permissions

Interests in how variables may relate to each other and how systems of relationships among variables may be at play often underlie the questions music education researchers pose. This chapter describes basic design and analysis considerations in research that involves the systematic investigation of whether and how variables are related; in other words, correlational research. The chapter poses correlational research as an extension of the book’s previous discussion of descriptive research. The chapter briefly describes the role of correlational studies in advancing theory, presents several issues to consider when designing studies, and provides an introduction to correlation as a statistical concept.

Signed in as

Institutional accounts.

  • Google Scholar Indexing
  • GoogleCrawler [DO NOT DELETE]

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

what is a research correlational analysis

Home Market Research

Correlational Research: What it is with Examples

Use correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Learn more.

Our minds can do some brilliant things. For example, it can memorize the jingle of a pizza truck. The louder the jingle, the closer the pizza truck is to us. Who taught us that? Nobody! We relied on our understanding and came to a conclusion. We don’t stop there, do we? If there are multiple pizza trucks in the area and each one has a different jingle, we would memorize it all and relate the jingle to its pizza truck.

This is what correlational research precisely is, establishing a relationship between two variables, “jingle” and “distance of the truck” in this particular example. The correlational study looks for variables that seem to interact with each other. When you see one variable changing, you have a fair idea of how the other variable will change.

What is Correlational research?

Correlational research is a type of non-experimental research method in which a researcher measures two variables and understands and assesses the statistical relationship between them with no influence from any extraneous variable. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

Correlational Research Example

The correlation coefficient shows the correlation between two variables (A correlation coefficient is a statistical measure that calculates the strength of the relationship between two variables), a value measured between -1 and +1. When the correlation coefficient is close to +1, there is a positive correlation between the two variables. If the value is relative to -1, there is a negative correlation between the two variables. When the value is close to zero, then there is no relationship between the two variables.

Let us take an example to understand correlational research.

Consider hypothetically, a researcher is studying a correlation between cancer and marriage. In this study, there are two variables: disease and marriage. Let us say marriage has a negative association with cancer. This means that married people are less likely to develop cancer.

However, this doesn’t necessarily mean that marriage directly avoids cancer. In correlational research, it is not possible to establish the fact, what causes what. It is a misconception that a correlational study involves two quantitative variables. However, the reality is two variables are measured, but neither is changed. This is true independent of whether the variables are quantitative or categorical.

Types of correlational research

Mainly three types of correlational research have been identified:

1. Positive correlation: A positive relationship between two variables is when an increase in one variable leads to a rise in the other variable. A decrease in one variable will see a reduction in the other variable. For example, the amount of money a person has might positively correlate with the number of cars the person owns.

2. Negative correlation: A negative correlation is quite literally the opposite of a positive relationship. If there is an increase in one variable, the second variable will show a decrease, and vice versa.

For example, being educated might negatively correlate with the crime rate when an increase in one variable leads to a decrease in another and vice versa. If a country’s education level is improved, it can lower crime rates. Please note that this doesn’t mean that lack of education leads to crimes. It only means that a lack of education and crime is believed to have a common reason – poverty.

3. No correlation: There is no correlation between the two variables in this third type . A change in one variable may not necessarily see a difference in the other variable. For example, being a millionaire and happiness are not correlated. An increase in money doesn’t lead to happiness.

Characteristics of correlational research

Correlational research has three main characteristics. They are: 

  • Non-experimental : The correlational study is non-experimental. It means that researchers need not manipulate variables with a scientific methodology to either agree or disagree with a hypothesis. The researcher only measures and observes the relationship between the variables without altering them or subjecting them to external conditioning.
  • Backward-looking : Correlational research only looks back at historical data and observes events in the past. Researchers use it to measure and spot historical patterns between two variables. A correlational study may show a positive relationship between two variables, but this can change in the future.
  • Dynamic : The patterns between two variables from correlational research are never constant and are always changing. Two variables having negative correlation research in the past can have a positive correlation relationship in the future due to various factors.

Data collection

The distinctive feature of correlational research is that the researcher can’t manipulate either of the variables involved. It doesn’t matter how or where the variables are measured. A researcher could observe participants in a closed environment or a public setting.

Correlational Research

Researchers use two data collection methods to collect information in correlational research.

01. Naturalistic observation

Naturalistic observation is a way of data collection in which people’s behavioral targeting is observed in their natural environment, in which they typically exist. This method is a type of field research. It could mean a researcher might be observing people in a grocery store, at the cinema, playground, or in similar places.

Researchers who are usually involved in this type of data collection make observations as unobtrusively as possible so that the participants involved in the study are not aware that they are being observed else they might deviate from being their natural self.

Ethically this method is acceptable if the participants remain anonymous, and if the study is conducted in a public setting, a place where people would not normally expect complete privacy. As mentioned previously, taking an example of the grocery store where people can be observed while collecting an item from the aisle and putting in the shopping bags. This is ethically acceptable, which is why most researchers choose public settings for recording their observations. This data collection method could be both qualitative and quantitative . If you need to know more about qualitative data, you can explore our newly published blog, “ Examples of Qualitative Data in Education .”

02. Archival data

Another approach to correlational data is the use of archival data. Archival information is the data that has been previously collected by doing similar kinds of research . Archival data is usually made available through primary research .

In contrast to naturalistic observation, the information collected through archived data can be pretty straightforward. For example, counting the number of people named Richard in the various states of America based on social security records is relatively short.

Use the correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Uncover the insights that matter the most. Use QuestionPro’s research platform to uncover complex insights that can propel your business to the forefront of your industry.

Research to make better decisions. Start a free trial today. No credit card required.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

pricing analytics software

Pricing Analytics Software: Optimize Your Pricing Strategy

May 13, 2024

relationship marketing

Relationship Marketing: What It Is, Examples & Top 7 Benefits

May 8, 2024

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Correlation Research

Try Qualtrics for free

Correlation research: what is it and how can you use it.

11 min read If you want to find out if a new marketing campaign or product feature is connected to an increase in sales, correlation can help you determine if a relationship exists between those variables and whether there is a positive, negative or neutral impact.

What is correlation in research?

Correlation (often referred to as correlational study, correlation research, bivariate correlation or correlation analysis) is a core step in understanding your data (such as from survey research) or the relationship between variables in your dataset, typically expressed as x1 and x2.

If a correlation exists, one variable is correlated to another in a pairwise fashion.

Streamline your research processes with Qualtrics

Measuring correlation

To measure the degree to which any two variables are correlated, we use a correlation coefficient (of which there are many).

A correlation coefficient is a statistical value, also known as Pearson’s Correlation Coefficient (or Pearson’s r), and is always between -1 and 1. Note: outliers can make coefficients look statistically significant but not meaningful or insightful.

Data points are plotted on a scatterplot and the shape of the data informs the researcher of the relationship between variables.

The flow of correlation

  • -1 indicates a perfectly linear negative correlation
  • 0 indicates no linear correlation
  • 1 indicates a perfectly positive linear correlation

Negative correlation (or negative relationship)

A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. For example, as you spend more money (increase) you save less (decrease).

Positive correlation (or positive relationship)

For positive correlation, both variables either increase or decrease at the same time. Let’s take hours worked versus money earned (assuming no set limit on working hours). As hours worked increases, so too does money earned.

What is a correlation matrix?

Once you’ve plotted your correlation coefficients for different variables, you can build a correlation matrix to display them (or use Stats iQ which can produce one for you). A correlation matrix essentially depicts the correlations between all possible pairs of values in a table. It’s an easy way to summarize large datasets and identify visual patterns across the relationships you are testing.

Relate capability in Stats iQ  

Relate explores the relationships between variables. When you select two variables and then select Relate, Stats iQ will choose the appropriate statistical test based on the structure of the data, run that test, then translate the results into a simple and clear explanation.

When you select three or more variables, Stats iQ will relate each variable to the one variable that has the key by it, then bring the strongest relationships to the top. You can select dozens of variables at a time, so you can sift through many relationships quickly.

Again, “Descriptive Frequencies” and “Bivariate Correlation” are basic steps that every data analyst should take before they move onto regression.

Relating numbers and number variables

Note, a correlational analysis only provides information about variables at one specific point in time. The results could change if you repeat the study.

Furthermore, whilst a relationship may exist between variables, any change in one isn’t necessarily the cause of the change in the other. This brings us onto a basic rule and famous maxim: “Correlation does not imply causation.”

Correlation and causation

It’s a well-known saying that correlation doesn’t imply causation, but why?

Well, with correlation, nothing is constant — and this lack of control makes it impossible to determine cause and effect from a simple correlation study.

Correlation and causation exist at the same time, but “ causation ” is a much higher standard. For example, you find that your child is standing by a table and there’s milk all over the place. So they spilled it. No — the cat did it before you walked in the room.

Causation explicitly applies to time and prior relationships where an action causes an outcome. Put simply: it indicates that one event is the result of another.

Correlation, on the other hand, is simply a reflection of a relationship between two variables — when one changes, so does the other, but it’s not necessarily the cause. The only way to prove or demonstrate a causal relationship is through an appropriately designed and controlled experiment.

As such, there are two basic reasons why correlation doesn’t imply causation:

1. Directionality problem

The directionality problem refers to a possible relationship between two variables — that a change in one will result in a change in the other. This also implies that there’s a correlation between them. However, as correlation doesn’t imply causation, we cannot say with certainty that the change in one of the variables is the cause of the change in the other.

2. Latent variables

A latent variable is a variable that you can’t observe or measure — but you can detect them based on their effects on other observable variables. Consider the psychological construct of happiness or the idea of customer satisfaction: you can’t directly see these variables, but you can measure them indirectly using observed variables.

For example, cities with more grocery stores also tend to have higher crime rates. However, these two variables are only correlated because they have a high correlation with a third variable: population size.

Measuring latent variables

To measure latent variables, we use observed variables and then mathematically estimate the unseen variables. This involves using advanced statistical techniques like factor analysis, latent class analysis (LCA), structural equation modeling (SEM), and Rasch analysis. These techniques rely on the inter-correlations of variables.

The next step is multiple regression/correlation, then casual or predictive modeling. But more on these methods in another topic. So, why use correlation?

Why use correlation?

Correlation is an essential part of any research study as it helps you to understand the relationships between variables, and therefore form hypotheses as the next step of the process.

The advantages of using correlation in research are:

Results are likely to be more truthful to natural occurrences.

If no variables are influenced, then the variables are existing and interacting together as they would in ‘real life’, so the findings will be a true and accurate reflection of the variables.

It does identify variables with strong relationships

During statistical analysis of the data, correlational research will be able to indicate whether there is a positive or negative relationship, or no correlation at all, between the variables. This can be invaluable for research teams trying to identify the right variables to be concentrating future research on. Saves time and money

It can be time-consuming and costly to set up experiment conditions to test whether two variables interact with each other in a cause-and-effect way. correlational research provides a stepping-stone to show researchers the potential of variables in their natural setting, and perhaps bringing patterns to light that might not have been identified in the first place.

You should always use correlation in research, but you cannot always make inferences, because:

There is less external validity

If research findings cannot be repeated and are unable to provide conclusive results, because the observations were done in a natural setting where the variables were not isolated and may have been influenced by other factors.

Having a strong correlation does not infer causation

While two variables may be strongly connected, there cannot be a clear assessment of the cause-and-effect to provide a conclusion.

There is little control over the variables

It’s not possible to isolate the variables to confirm that only the two variables are being explored. There is always the possibility of the third variable.

No guarantee of the results not changing

If results are gathered that a researcher wants to replicate, the method of correlational research is backwards-looking, so there is no guarantee that the variable results won’t change in the future.

Use an intelligent statistical tool to streamline the entire process

By using a survey software technology platform to do your correlation analysis and research, you can save time analyzing your data yourself, and instead use the tool to conduct start-to-finish correlation analysis across the creation, data collection, analysis and reporting stages.

Qualtrics’ survey software streamlines your data collection methods and correlations, making it easy to access results, measure data trends, and uncover insights without the complexity or need to jump between systems.

What makes Qualtrics so different from other survey providers is that you can consult with trained research professionals, and it includes high-tech statistical software like Qualtrics Stats iQ ™. This can handle complicated analyses using these methods:

  • Regression analysis – This is vital in correlational research as it measures the degree of influence of independent variables on a dependent variable (the relationship between two variables).
  • Analysis of Variance (ANOVA) test – Commonly used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them.
  • Conjoint analysis – Asks people to make trade-offs when making decisions, then analyses the results to give the most popular outcome. Helps you understand why people make the complex choices they do.
  • T-Test – Helps you compare whether two data groups have different mean values and allows the user to interpret whether differences are meaningful or merely coincidental.
  • Crosstab analysis – Used in quantitative market research to analyze categorical data – that is, variables that are different and mutually exclusive, and allows you to compare the relationship between two variables in contingency tables.

If you want to learn how the system is set up for conducting and analyzing correlational research, try out a Qualtrics survey software demo to see how it works.

Streamline your processes with Qualtrics

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1. Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1 . Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

Related Articles

What Is a Focus Group?

Research Methodology

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Lau F, Kuziemsky C, editors. Handbook of eHealth Evaluation: An Evidence-based Approach [Internet]. Victoria (BC): University of Victoria; 2017 Feb 27.

Cover of Handbook of eHealth Evaluation: An Evidence-based Approach

Handbook of eHealth Evaluation: An Evidence-based Approach [Internet].

Chapter 12 methods for correlational studies.

Francis Lau .

12.1. Introduction

Correlational studies aim to find out if there are differences in the characteristics of a population depending on whether or not its subjects have been exposed to an event of interest in the naturalistic setting. In eHealth, correlational studies are often used to determine whether the use of an eHealth system is associated with a particular set of user characteristics and/or quality of care patterns ( Friedman & Wyatt, 2006 ). An example is a computerized provider order entry ( cpoe ) study to differentiate the background, usage and performance between clinical users and non-users of the cpoe system after its implementation in a hospital.

Correlational studies are different from comparative studies in that the evaluator does not control the allocation of subjects into comparison groups or assignment of the intervention to specific groups. Instead, the evaluator defines a set of variables including an outcome of interest then tests for hypothesized relations among these variables. The outcome is known as the dependent variable and the variables being tested for association are the independent variables. Correlational studies are similar to comparative studies in that they take on an objectivist view where the variables can be defined, measured and analyzed for the presence of hypothesized relations. As such, correlational studies face the same challenges as comparative studies in terms of their internal and external validity. Of particular importance are the issues of design choices, selection bias, confounders, and reporting consistency.

In this chapter we describe the basic types of correlational studies seen in the eHealth literature and their methodological considerations. Also included are three case examples to show how these studies are done.

12.2. Types of Correlational Studies

Correlational studies, better known as observational studies in epidemiology, are used to examine event exposure, disease prevalence and risk factors in a population ( Elwood, 2007 ). In eHealth, the exposure typically refers to the use of an eHealth system by a population of subjects in a given setting. These subjects may be patients, providers or organizations identified through a set of variables that are thought to differ in their measured values depending on whether or not the subjects were “exposed” to the eHealth system.

There are three basic types of correlational studies that are used in eHealth evaluation: cohort, cross-sectional, and case-control studies ( Vandenbroucke et al., 2014 ). These are described below.

  • Cohort studies – A sample of subjects is observed over time where those exposed and not exposed to the eHealth system are compared for differences in one or more predefined outcomes, such as adverse event rates. Cohort studies may be prospective in nature where subjects are followed for a time period into the future or retrospective for a period into the past. The comparisons are typically made at the beginning of the study as baseline measures, then repeated over time at predefined intervals for differences and trends. Some cohort studies involve only a single group of subjects. Their focus is to describe the characteristics of subjects based on a set of variables, such as the pattern of ehr use by providers and their quality of care in an organization over a given time period.
  • Cross-sectional studies – These are considered a type of cohort study where only one comparison is made between exposed and unexposed subjects. They provide a snapshot of the outcome and the associated characteristics of the cohort at a specific point in time.
  • Case-control studies – Subjects in a sample that are exposed to the eHealth system are matched with those not exposed but otherwise similar in composition, then compared for differences in some predefined outcomes. Case-control studies are retrospective in nature where subjects already exposed to the event are selected then matched with unexposed subjects, using historical cases to ensure they have similar characteristics.

A cross-sectional survey is a type of cross-sectional study where the data source is drawn from postal questionnaires and interviews. This topic will be covered in the chapter on methods for survey studies.

12.3. Methodological Considerations

While correlational studies are considered less rigorous than rct s, they are the preferred designs when it is neither feasible nor ethical to conduct experimental trials. Key methodological issues arise in terms of: (a) design options, (b) biases and confounders, (c) controlling for confounding effects, (d) adherence to good practices, and (e) reporting consistency. These issues are discussed below.

12.3.1. Design Options

There are growing populations with multiple chronic conditions and healthcare interventions. They have made it difficult to design rct s with sufficient sample size and long-term follow-up to account for all the variability this phenomenon entails. Also rct s are intended to test the efficacy of an intervention in a restricted sample of subjects under ideal settings. They have limited generalizability to the population at large in routine settings ( Fleurence, Naci, & Jansen, 2010 ). As such, correlational studies, especially those involving the use of routinely collected ehr data from the general population, have become viable alternatives to rct s. There are advantages and disadvantages to each of the three design options presented above. They are listed below.

  • Cohort studies – These studies typically follow the cohorts over time, which allow one to examine causal relationships between exposure and one or more outcomes. They also allow one to measure change in exposure and outcomes over time. However, these studies can be costly and time-consuming to conduct if the outcomes are rare or occur in the future. With prospective cohorts they can be prone to dropout. With retrospective cohorts accurate historical records are required which may not be available or complete ( Levin, 2003a ).
  • Case-control studies – These studies are suited to examine infrequent or rare outcomes since they are selected at the outset to ensure sufficient cases. Yet the selection of exposed and matching cases can be problematic, as not all relevant characteristics are known. Moreover, the cases may not be representative of the population of interest. The focus on exposed cases that occur infrequently may overestimate their risks ( Levin, 2003b ).
  • Cross-sectional studies – These studies are easier and quicker to conduct than others as they involve a one-time effort over a short period using a sample from the population of interest. They can be used to generate hypotheses and examine multiple outcomes and characteristics at the same time with no loss to follow-up. On the other hand, these studies only give a snapshot of the situation at one time point, making it difficult for causal inference of the exposure and outcomes. The results might be different had another time period been chosen ( Levin, 2006 ).

12.3.2. Biases and Confounders

Shamliyan, Kane, and Dickinson (2010) conducted a systematic review on tools used to assess the quality of observational studies. Despite the large number of quality scales and checklists found in the literature, they concluded that the universal concerns are in the areas of selection bias, confounding, and misclassification. These concerns, also mentioned by Vandenbroucke and colleagues (2014) in their reporting guidelines for observational studies, are summarized below.

  • Selection bias – When subjects are selected through their exposure to the event rather than by random or concealed allocation, there is a risk that the subjects are not comparable due to the presence of systematic differences in their baseline characteristics. For example, a correlational study that examines the association between ehr use and quality of care may have younger providers with more computer savvy in the exposed group because they use ehr more and with more facility than those in the unexposed group. It is also possible to have sicker patients in the exposed group since they require more frequent ehr use than unexposed patients who may be healthier and have less need for the ehr . This is sometimes referred to as response bias, where the characteristics of subjects agreed to be in the study are different from those who declined to take part.
  • Confounding – Extraneous factors that influence the outcome but are also associated with the exposure are said to have a confounding effect. One such type is confounding by indication where sicker patients are both more likely to receive treatments and also more likely to have adverse outcomes. For example, a study of cds alerts and adverse drug events may find a positive but spurious association due to the inclusion of sicker patients with multiple conditions and medications, which increases their chance of adverse events regardless of cds alerts.
  • Misclassification – When there are systematic differences in the completeness or accuracy of the data recorded on the subjects, there is a risk of misclassification in their exposures or outcomes. This is also known as information or detection bias. An example is where sicker patients may have more complete ehr data because they received more tests, treatments and outcome tracking than those who are healthier and require less attention. As such, the exposure and outcomes of sicker patients may be overestimated.

It is important to note that bias and confounding are not synonymous. Bias is caused by finding the wrong association from flawed information or subject selection. Confounding is factually correct with respect to the relationship found, but is incorrect in its interpretation due to an extraneous factor that is associated with both the exposure and outcome.

12.3.3. Controlling for Confounding Effects

There are three common methods to control for confounding effects. These are by matching, stratification, and modelling. They are described below ( Higgins & Green, 2011 ).

  • Matching – The selection of subjects with similar characteristics so that they are comparable; the matching can be done at the individual subject level where each exposed subject is matched with one or more unexposed subjects as controls. It can also be done at the group level with equal numbers of exposed and unexposed subjects. Another way to match subjects is by propensity score, that is, a measure derived from a set of characteristics in the subjects. An example is the retrospective cohort study by Zhou, Leith, Li, and Tom (2015) to examine the association between caregiver phr use and healthcare utilization by pediatric patients. In that study, a propensity score-matching algorithm was used to match phr -registered children to non-registered children. The matching model used registration as the outcome variable and all child and caregiver characteristics as the independent variables.
  • Stratification – Subjects are categorized into subgroups based on a set of characteristics such as age and sex then analyzed for the effect within each subgroup. An example is the retrospective cohort study by Staes et al. (2008) , examining the impact of computerized alerts on the quality of outpatient lab monitoring for transplant patients. In that study, the before/after comparison of the timeliness of reporting and clinician responses was stratified by the type of test (creatinine, cyclosporine A, and tacrolimus) and report source (hospital laboratory or other labs).
  • Modelling – The use of statistical models to compute adjusted effects while accounting for relevant characteristics such as age and sex differences among subjects. An example is the retrospective cohort study by Beck and colleagues (2012) to compare documentation consistency and care plan improvement before and after the implementation of an electronic asthma-specific history and physical template. In that study, before/after group characteristics were compared for differences using t -tests for continuous variables and χ 2 statistics for categorical variables. Logistic regression was used to adjust for group differences in age, gender, insurance, albuterol use at admission, and previous hospitalization.

12.3.4. Adherence to Good Practices in Prospective Observational Studies

The ispor Good Research Practices Task Force published a set of recommendations in designing, conducting and reporting prospective observational studies for comparative effectiveness research ( Berger et al., 2012 ) that are relevant to eHealth evaluation. Their key recommendations are listed below.

  • Key policy questions should be defined to allow inferences to be drawn.
  • Hypothesis testing protocol design to include the hypothesis/questions, treatment groups and outcomes, measured and unmeasured confounders, primary analyses, and required sample size.
  • Rationale for prospective observational study design over others (e.g., rct ) is based on question, feasibility, intervention characteristics and ability to answer the question versus cost and timeliness.
  • Study design choice is able to address potential biases and confounders through the use of inception cohorts, multiple comparator groups, matching designs and unaffected outcomes.
  • Explanation of study design and analytic choices is transparent.
  • Study execution is carried out in ways that ensure relevance and reasonable follow-up is not different from the usual practice.
  • Study registration takes place on publicly available sites prior to its initiation.

12.3.5. The Need for Reporting Consistency

Vandenbroucke et al. (2014) published an expanded version of the Strengthening the Reporting of Observational Studies in Epidemiology ( strobe ) statement to improve the reporting of observational studies that can be applied in eHealth evaluation. It is made up of 22 items, of which 18 are common to cohort, case-control and cross-sectional studies, with four being specific to each of the three designs. The 22 reporting items are listed below (for details refer to the cited reference).

  • Title and abstract – one item that covers the type of design used, and a summary of what was done and found.
  • Introduction – two items on study background/rationale, objectives and/or hypotheses.
  • Methods – nine items on design, setting, participants, variables, data sources/measurement, bias, study size, quantitative variables and statistical methods used.
  • Results – five items on participants, descriptive, outcome data, main results and other analyses.
  • Discussion – four items on key results, limitations, interpretation and generalizability.
  • Other information – one item on funding source.

The four items specific to study design relate to the reporting of participants, statistical methods, descriptive results and outcome data. They are briefly described below for the three types of designs.

  • Cohort studies – Participant eligibility criteria and sources, methods of selection, follow-up and handling dropouts, description of follow-up time and duration, and number of outcome events or summary measures over time. For matched studies include matching criteria and number of exposed and unexposed subjects.
  • Cross-sectional studies – Participant eligibility criteria, sources and methods of selection, analytical methods accounting for sampling strategy as needed, and number of outcome events or summary measures.
  • Case-control studies – Participant eligibility criteria, sources and methods of case/control selection with rationale for choices, methods of matching cases/controls, and number of exposures by category or summary measures of exposures. For matched studies include matching criteria and number of controls per case.

12.4. Case Examples

12.4.1. cohort study of automated immunosuppressive care.

Park and colleagues (2010) conducted a retrospective cohort study to examine the association between the use of a cds (clinical decision support) system in post-liver transplant immunosuppressive care and the rates of rejection episode and drug toxicity. The study is summarized below.

  • Setting – A liver transplant program in the United States that had implemented an automated cds system to manage immunosuppressive therapy for its post-liver transplant recipients after discharge. The system consolidated all clinical information to expedite immunosuppressive review, ordering, and follow-up with recipients. Prior to automation, a paper charting system was used that involved manually tracking lab tests, transcribing results into a paper spreadsheet, finding physicians to review results and orders, and contacting recipients to notify them of changes.
  • Subjects – The study population included recipients of liver transplants between 2004 and 2008 who received outpatient immunosuppressive therapy that included tacrolimus medications.
  • Design – A retrospective cohort study with a before/after design to compare recipients managed by the paper charting system against those managed by the cds system for up to one year after discharge.
  • Measures – The outcome variables were the percentages of recipients with at least one rejection and/or tacrolimus toxicity episode during the one-year follow-up period. The independent variables included recipient, intraoperative, donor and postoperative characteristics, and use of paper charting or cds . Examples of recipient variables were age, gender, body mass index, presence of diabetes and hypertension, and pre-transplant lab results. Examples of intraoperative data were blood type match, type of transplant and volume of blood transfused. Examples of donor data included percentage of fat in the liver. Examples of post-transplantation data included the type of immunosuppressive induction therapy and the management method.
  • Analysis – Mean, standard deviation and t -tests were computed for continuous variables after checking for normal distribution. Percentages and Fisher’s exact test were computed for categorical variables. Autoregressive integrated moving average analysis was done to determine change in outcomes over time. Logistic regression with variables thought to be clinically relevant was used to identify significant univariable and multivariable factors associated with the outcomes. P values of less than 0.05 were considered significant.
  • Findings – Overall, the cds system was associated with significantly fewer episodes of rejection and tacrolimus toxicity. The integrated moving average analysis showed a significant decrease in outcome rates after the cds system was implemented compared with paper charting. Multivariable analysis showed the cds system had lower odds of a rejection episode than paper charting ( or 0.20; p < 0.01) and lower odds of tacrolimus toxicity ( or 0.5; p < 0.01). Other significant non-system related factors included the use of specific drugs, the percentage of fat in the donor liver and the volume of packed red cells transfused.

12.4.2. Cross-sectional Analysis of EHR Documentation and Care Quality

Linder, Schnipper, and Middleton (2012) conducted a cross-sectional study to examine the association between the type of ehr documentation used by physicians and the quality of care provided. The study is summarized below.

  • Setting – An integrated primary care practice-based research network affiliated with an academic centre in the United States. The network uses an in-house ehr system with decision support for preventive services, chronic care management, and medication monitoring and alerts. The ehr data include problem and medication lists, coded allergies and lab tests.
  • Subjects – Physicians and patients from 10 primary care practices that were part of an rct to examine the use of a decision support tool to manage patients with coronary artery disease and diabetes ( cad/DM ). Eligible patients were those with cad/DM in their ehr problem list prior to the rct start date.
  • Design – A nine-month retrospective cross-sectional analysis of ehr data collected from the rct . Three physician documentation styles were defined based on 188,554 visit notes in the ehr : (a) dictation, (b) structured documentation, and (c) free text note. Physicians were divided into three groups based on their predominant style defined as more than 25% of their notes composed by a given method.
  • Measures – The outcome variables were 15 ehr -based cad/DM quality measures assessed 30 days after primary care visits. They covered quality of documentation, medication use, lab testing, physiologic measures, and vaccinations. Measures collected prior to the day of visit were eligible and considered fulfilled with the presence of coded ehr data on vital signs, medications, allergies, problem lists, lab tests, and vaccinations. Independent variables on physicians and patients were included as covariates. For physicians, they included age, gender, training level, proportion of cad/DM patients in their panel, total patient visits, and self-reported experience with the ehr . For patients, they included socio-demographic factors, the number of clinic visits and hospitalizations, the number of problems and medications in the ehr , and whether their physician was in the intervention group.
  • Analysis – Baseline characteristics of physicians and patients were compared using descriptive statistics. Continuous variables were compared using anova . For categorical variables, Fisher’s exact test was used for physician variables and χ 2 test for patient variables. Multivariate logistic regression models were used for each quality measure to adjust for patient and physician clustering and potential confounders. Bonferroni procedure was used to account for multiple comparisons for the 15 quality measures.
  • Findings – During the study period, 234 physicians documented 18,569 visits from 7,000 cad/DM patients. Of these physicians, 146 (62%) typed free-text notes, 68 (25%) used structured documentation, and 20 (9%) dictated notes. After adjusting for cluster effect, physicians who dictated their notes had the worst quality of care in all 15 measures. In particular, physicians who dictated notes were significantly worse in three of 15 measures (antiplatelet medication, tobacco use, diabetic eye exam); physicians who used structured documentation were better in three measures (blood pressure, body mass, diabetic foot exam); and those who used free-text were better in one measure (influenza vaccination). In summary, physicians who dictated notes had worse quality of care than those with structured documentation.

12.4.3. Case-control Comparison of Internet Portal Use

Nielsen, Halamka, and Kinkel (2012) conducted a case-control study to evaluate whether there was an association between active Internet patient portal use by Multiple Sclerosis ( ms ) patients and medical resource utilization. Patient predictors and barriers to portal use were also identified. The study is summarized below.

  • Setting – An academic ms centre in the United States with an in-house Internet patient portal site that was accessed by ms patients to schedule clinic appointments, request prescription refills and referrals, view test results, upload personal health information, and communicate with providers via secure e-mails.
  • Subjects – 240 adult ms patients actively followed during 2008 and 2009 were randomly selected from the ehr ; 120 of these patients had submitted at least one message during that period and were defined as portal users. Another 120 patients who did not enrol in the portal or send any message were selected as non-users for comparison.
  • Design – A retrospective case-control study facilitated through a chart review comparing portal users against non-users from the same period. Patient demographic and clinical information was extracted from the ehr , while portal usage, including feature access type and frequency and e-mail message content, were provided by it staff.
  • Measures – Patient variables included age, gender, race, insurance type, employment status, number of medical problems, disease duration, psychiatric history, number of medications, and physical disability scores. Provider variables included prescription type and frequency. Portal usage variables included feature access type and frequency for test results, appointments, prescription requests and logins, and categorized messaging contents.
  • Analysis – Comparison of patient demographic, clinical and medical resource utilization data from users and non-users were made using descriptive statistics, Wilcoxon rank sum test, Fisher’s exact test and χ 2 test. Multivariate logistic regression was used to identify patient predictors and barriers to portal use. Provider prescribing habits against patient’s psychiatric history and portal use were examined by two-way analysis of variance. All statistical tests used p value of 0.05 with no adjustment made for multiple comparisons. A logistic multivariate regression model was created to predict portal use based on patient demographics, clinical condition, socio-economic status, and physical disability metrics.
  • Findings – Portal users were mostly young professionals with little physical disability. The most frequently used feature was secure patient-provider messaging, often for medication requests or refills, and self-reported side effects. Predictors and barriers of portal use were the number of medications prescribed ( or 1.69, p < 0.0001), Caucasian ethnicity ( or 5.04, p = 0.007), arm and hand disability ( or 0.23, p = 0.01), and impaired vision ( or 0.31, p = 0.01). For medical resource utilization, portal users had more frequent clinic visits, medication use and prescriptions from centre staff providers. Patients with a history of psychiatric disease were prescribed more ms medications than those without any history ( p < 0.0001). In summary, ms patients used the Internet more than the general population, but physical disability limited their access and need to be addressed.

12.4.4. Limitations

A general limitation of a correlational study is that it can determine association between exposure and outcomes but cannot predict causation. The more specific limitations of the three case examples cited by the authors are listed below.

  • Automated immunosuppressive care – Baseline differences existed between groups with unknown effects; possible other unmeasured confounders; possible Hawthorne effects from focus on immunosuppressive care.
  • ehr documentation and care quality – Small sample size; only three documentation styles were considered (e.g., scribe and voice recognition software were excluded) and unsure if they were stable during study period; quality measures specific to cad/DM conditions only; complex methods of adjusting for clustering and confounding that did not account for unmeasured confounders; the level of physician training (e.g., attending versus residents) not adjusted.
  • Internet portal use – Small sample size not representative of the study population; referral centre site could over-represent complex patients requiring advanced care; all patients had health insurance.

12.5. Summary

In this chapter we described cohort, case-control and cross-sectional studies as three types of correlational studies used in eHealth evaluation. The methodological issues addressed include bias and confounding, controlling for confounders, adherence to good practices and consistency in reporting. Three case examples were included to show how eHealth correlational studies are done.

1 ISPOR – International Society for Pharmacoeconomics and Outcomes Research

  • Beck A. F., Sauers H. S., Kahn R. S., Yau C., Weiser J., Simmons J.M. Improved documentation and care planning with an asthma-specific history and physical. Hospital Pediatrics. 2012; 2 (4):194–201. [ PubMed : 24313025 ]
  • Berger M. L., Dreyer N., Anderson F., Towse A., Sedrakyan A., Normand S.L. Prospective observational studies to address comparative effectiveness: The ispor good research practices task force report. Value in Health. 2012; 15 (2):217–230. Retrieved from http://www ​.sciencedirect ​.com/science/article ​/pii/S1098301512000071 . [ PubMed : 22433752 ]
  • Elwood M. Critical appraisal of epidemiological studies and clinical studies. 3rd ed. Oxford: Oxford University Press; 2007.
  • Fleurence R. L., Naci H., Jansen J.P. The critical role of observational evidence in comparative effectiveness research. Health Affairs. 2010; 29 (10):1826–1833. [ PubMed : 20921482 ]
  • Friedman C. P., Wyatt J.C. Evaluation methods in biomedical informatics. 2nd ed. New York: Springer Science + Business Media, Inc; 2006.
  • Higgins J. P. T., Green S., editors. Cochrane handbook for systematic reviews of interventions. London: The Cochrane Collaboration; 2011. (Version 5.1.0, updated March 2011) Retrieved from http://handbook ​.cochrane.org/
  • Levin K. A. Study design iv : Cohort studies. Evidence-based Dentistry. 2003a; 7 :51–52. [ PubMed : 16858385 ]
  • Levin K. A. Study design v : Case-control studies. Evidence-based Dentistry. 2003b; 7 :83–84. [ PubMed : 17003803 ]
  • Levin K. A. Study design iii : Cross-sectional studies. Evidence-based Dentistry. 2006; 7 :24–25. [ PubMed : 16557257 ]
  • Linder J. A., Schnipper J. L., Middleton B. Method of electronic health record documentation and quality of primary care. Journal of the American Medical Informatics Association. 2012; 19 (6):1019–1024. [ PMC free article : PMC3534457 ] [ PubMed : 22610494 ]
  • Nielsen A. S., Halamka J. D., Kinkel R.P. Internet portal use in an academic multiple sclerosis center. Journal of the American Medical Informatics Association. 2012; 19 (1):128–133. [ PMC free article : PMC3240754 ] [ PubMed : 21571744 ]
  • Park E. S., Peccoud M. R., Wicks K. A., Halldorson J. B., Carithers R. L. Jr., Reyes J. D., Perkins J.D. Use of an automated clinical management system improves outpatient immunosuppressive care following liver transplantation. Journal of the American Medical Informatics Association. 2010; 17 (4):396–402. [ PMC free article : PMC2995663 ] [ PubMed : 20595306 ]
  • Shamliyan T., Kane R. L., Dickinson S. A systematic review of tools used to assess the quality of observational studies that examine incidence or prevalence and risk factors for diseases. Journal of Clinical Epidemiology. 2010; 63 (10):1061–1070. [ PubMed : 20728045 ]
  • Staes C. J., Evans R. S., Rocha B. H. S. C., Sorensen J. B., Huff S. M., Arata J., Narus S.P. Computerized alerts improve outpatient laboratory monitoring of transplant patients. Journal of the American Medical Informatics Association. 2008; 15 (3):324–332. [ PMC free article : PMC2410008 ] [ PubMed : 18308982 ]
  • Vandenbroucke J. P., von Elm E., Altman D. G., Gotzsche P. C., Mulrow C. D., Pocock S. J., Egger M. for the strobe Initiative. Strengthening the reporting of observational studies in epidemiology ( strobe ): explanation and elaboration. International Journal of Surgery. 2014; 12 (12):1500–1524. Retrieved from http://www.sciencedirect.com/science/article/pii/ s174391911400212x . [ PubMed : 25046751 ]
  • Zhou Y. Y., Leith W. M., Li H., Tom J.O. Personal health record use for children and health care utilization: propensity score-matched cohort analysis. Journal of the American Medical Informatics Association. 2015; 22 (4):748–754. [ PubMed : 25656517 ]

This publication is licensed under a Creative Commons License, Attribution-Noncommercial 4.0 International License (CC BY-NC 4.0): see https://creativecommons.org/licenses/by-nc/4.0/

  • Cite this Page Lau F. Chapter 12 Methods for Correlational Studies. In: Lau F, Kuziemsky C, editors. Handbook of eHealth Evaluation: An Evidence-based Approach [Internet]. Victoria (BC): University of Victoria; 2017 Feb 27.
  • PDF version of this title (4.5M)
  • Disable Glossary Links

In this Page

  • Introduction
  • Types of Correlational Studies
  • Methodological Considerations
  • Case Examples

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Recent Activity

  • Chapter 12 Methods for Correlational Studies - Handbook of eHealth Evaluation: A... Chapter 12 Methods for Correlational Studies - Handbook of eHealth Evaluation: An Evidence-based Approach

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

logo image missing

  • > Statistics

What is Correlational Research? Types and Characteristics

  • Hrithik Saini
  • Jun 13, 2022

What is Correlational Research? Types and Characteristics title banner

The human psyche is a remarkable instrument for sifting through unconnected elements and establishing a link with a certain matter at hand. If we discuss correlational research, this competence emerges.

We do correlational study on a daily basis; consider how you develop a link between phone ringing at a precise moment and the appearance of the delivery driver. As it's crucial to grasp the many forms of correlation which are accessible, but also how to do them.

What is Correlational Research ? 

Correlational analysis is a way of study that includes studying 2 factors in order to obtain a statistically relevant link amongst them. The goal of correlational research is to find factors that are related to each other to the point that a change in one causes a difference in the other.

In its most basic form, correlational research aims to determine if two factors are connected and, if so, how. Of course, knowing what a factor is would be beneficial, right? Variables may be thought of as areas of focus which can take on various forms. A natural source variable itself has not been made by the researchers in any way.

It's crucial to keep in mind that correlation does not indicate causality. Only because two factors have a correlation will not really indicate one of them will be the cause of the other for a myriad of purposes.

The Issue of Directionality

It's possible that two variables are connected because one is a causation and the other is a consequence. However, the correlational study design prevents you from determining which is which. To be safe, academics don't draw conclusions about causality from correlational studies.

Problem with the Third Variable

A mitigating factor is a third variable that has an effect on other variables, making them appear causally connected when they aren't. Instead, each variable and the confounder have their own causal linkages.

Extraneous factors are controlled to a limited extent or not at all in correlational research. Even if certain possible confounding variables are statistically controlled for, there may still be additional hidden factors that obscure the link between your research variables.  

Types of Correlational Research

High co - relational research, low correlational research, and no correlational research are the three forms of correlational study. All of these categories have their own combination of traits.

Types of Correlational Research:1. Positive Correlational Analysis (PCA) 2. Negative Correlational Analysis (NCRA) 3. Zero Correlational Analysis (ZCA)

Positive Correlational Analysis (PCA)

Positive correlational research is an important strategy that uses two significantly correlated variables to see if an adjustment in one causes a similar transformation in the other. For instance, a rise in employee wages can lead to a rise in the cost of the product, and likewise.

Negative Correlational Analysis (NCRA)

Negative correlational research is a study strategy that involves two numerically opposing characteristics, at which an increase in one variable has an opposite reaction or a drop from the other. If the price of products or services rises, prices plummet, and inversely, this is an example of a negative correlation.

Zero Correlational Analysis (ZCA)

Zero Correlational Analysis is a method of analysis in which there is no connection between. A form of similar experiment known as zero correlational research combines multiple parameters which were not mathematically related. 

A movement from one of the factors might not even cause an equal or opposite modification in the other variable in this scenario. Reasons for the difference in ambiguous causal links are accommodated by zero correlational research. Even though money and endurance are linearly separable, these can be factors in zero correlational study.

Also Read | Hypothesis Testing

When Must Correlational Research be Used ?

Correlational research is a great way to quickly collect data from natural situations. This allows you to apply your results to real-life problems in a way that is externally legitimate.

There are a few instances where correlational research is the best option.

To look into non-causal connections.

You want to see if there's a link between two parameters, but you don't expect to uncover a cause-and-effect relationship. Correlational research can help academics construct hypotheses and make predictions by providing insights into complicated real-world interactions.

To look into the causal links between variables.

You believe there is still a causative link between two factors, but conducting experimental study that tries to influence one of several variables is impracticable, immoral, or too expensive. Correlational research can give preliminary evidence or more support for causal connection ideas.

To put new measuring instruments to the test

You've created a new tool for assessing your variable and want to see if it's reliable or valid. Correlational research can be done to see if an instrument consistently and properly measures the notion it's supposed to.

Best Ways to Examine Correlational Data

After gathering data, you can use correlation or statistical modeling, or both, to statistically assess the relationship among variables. A scatter plot could also be used to depict the relation between variables.

Depending on the degrees of quantification and patterns of your data, several forms of statistical parameters and multiple regression are applicable.

Analyzing Correlations

You may summarize the link between variables using a correlation analysis by calculating a regression equation, which is a specific number that indicates the degree and strength of the association between factors. You'll become capable of determining the strength of the association between variables using this quantity.

For analyzing relationships between the latent quantitative variables, the Pearson ’s product moment coefficient of correlation, generally known as Pearson's r, is widely employed.

Correlation coefficients are typically calculated for two variables . in addition, but a multivariate relationship between two variables can be calculated for three or more factors.

Analysis of Regression

You can anticipate how often a single independent variable will be connected with a movement in another factor using regression analysis. As a consequence, you'll get a linear relationship that explains the curve on your graphing of variables.

This equation can be used to estimate the value of the dependent variable given the value(s) of all the other parameter (s). After you've checked for a correlation amongst your factors, you should do a regression analysis .

Characteristics of Correlational Research

There are three important tenets of correlational research. They are as follows:

Non-Experimental

Correlational research is a non-experimental method. It indicates that investigators do not have to use formal technique to modify factors in agreeing or dispute with such a concept. The investigator just analyzes and examines the relationship among variables, not changing or modifying them in any way.

Backward-Looking

Correlational study is solely willing to look backwards at historical information and observe the past. It is used by scientists to assess and identify long term trends among 2 factors. A correlational analysis may reveal an advantageous association between variables, but that link might shift in the upcoming years.

Correlational study results involving 2 factors are never static and are continually evolving. Based on a variety of causes, two parameters with a negative correlation in the prior may well have a positive correlation connection in the future.

Also Read | Types of Sampling Methods

Examples of Correlational Research

Correlational research examples abound, highlighting a variety of scenarios in which a correlational study may be used to discover a statistical behavioral trend for the variables examined. Here are three correlational research examples :

You want to know if those who are rich are much less tolerant. You feel that affluent individuals are impatient based on your personal experience.

However, you want to find a statistical tendency that supports or refutes your hypothesis. In this scenario, correlational research can be used to find a trend that connects both parameters.

You want to know if there's a link between how much money individuals make and how many children they have. You don't think that people who have more money have more offspring than individuals who have less money.

Domestic abuse, you suppose, produces a brain hemorrhage. You can't do an experiment since it's unacceptable to subject individuals to domestic abuse on purpose.

You believe that a person's income has little bearing on the number of children they have. However, doing correlational study on both variables might disclose whether or not there is a correlational link between them. You can, nevertheless, do correlational study to see if victims of crime experience greater brain bleeding than non-victims.

What is the Correlation Coefficient ?

In correlational research, a coefficient value reveals if there is a favorable, unfavorable, or non-existent network of connected variables. It is commonly denoted by the letter [r] and falls within a spectrum of -1.0 to +1.0 factor loadings.

Pearson's Link Factor (or Pearson's r ) is a metric that is used to test the stability of a relationship amongst variables. A result of 1.0 indicates a positive correlation, a value of -1.0 indicates a negative correlation, and a result of 0.0 indicates zero similarity.

It's necessary to keep in mind that a coefficient of correlation simply represents the linear relationship between the dependent variables; it can't distinguish between dependent and independent variables .

Advantages and Disadvantages of Correlational Research

Advantages of correlational research :.

Correlational research can be conducted to identify the link between two variables when conducting exploratory study is inappropriate. When researching humans, for example, doing an experiment might be considered as risky or immoral; so, correlational research is the ideal alternative.

You can quickly identify the statistical link between two variables using research methodology .

Correlational research takes shorter time and costs less money to conduct than experimental investigation. When dealing with a small number of researchers and limited funds, or when the amount of variables used in this study is kept to a minimum, this becomes a significant benefit.

Relationship between two variables allows researchers to collect data quickly utilizing a variety of approaches, such as a brief survey. Because a brief survey does not need the researcher to conduct it directly, it allows the researcher to deal with a small group of people.

Disadvantages of Correlational Research :

Because correlational research could only be used to discover the statistical link between two parameters, it is limited. It can't be used to find a connection among more than dependent parameters.

It doesn't accommodate for action and reaction between two variables because it doesn't specify which of the two factors is to blame for the observer and record pattern. Finding a favorable correlation between education and vegetarians, for example, does not explain why being informed contributes to becoming a vegetarian or meat consumption leads to greater education.

Although there are plausible explanations for both, causality cannot be established until additional study is conducted. A third, unidentified variable might also be to blame for both. Living in Detroit, for example, can lead to both knowledge and vegetarians.

To discover the connection between variables, correlational research relies on prior statistical trends. As a result, the data cannot be completely trusted for future study.

The researcher has no influence over the variables in correlational study. Correlational study, unlike experimental research, merely enables the researchers to monitor the factors for the purpose of correlating patterns in data without the use of a catalyst.

Correlational study yields a limited amount of data. Correlational study just demonstrates the association between variables; it does not imply causality.

Also Read | What is Statistics?

Correlational research allows researchers to find a quantitative pattern connecting two apparently unrelated variables, and it serves as the foundation for all types of research. It helps you to connect two variables by monitoring their natural behavior.

Correlational research, with exception of experimental investigation, does not focus on the causal factor impacting two variables, making the data generated by correlational research prone to continual change. Experimental research, on the other hand, is faster, simpler, less costly, and more convenient.

Share Blog :

what is a research correlational analysis

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

What is PESTLE Analysis? Everything you need to know about it

An Overview of Descriptive Analysis

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

what is a research correlational analysis

umeshchandradhasmana01

Hi Dear Correlational research is a method used to examine the relationship between variables without manipulating them. It aims to identify if changes in one variable are associated with changes in another variable. There are three types of correlations: positive (both variables increase or decrease together), negative (one variable increases while the other decreases), and zero (no relationship). Characteristics of correlational research include the use of statistical analysis, data collection from existing sources, and the ability to identify patterns and trends in data. Best regards, Mobiloitte

what is a research correlational analysis

brenwright30

THIS IS HOW YOU CAN RECOVER YOUR LOST CRYPTO? Are you a victim of Investment, BTC, Forex, NFT, Credit card, etc Scam? Do you want to investigate a cheating spouse? Do you desire credit repair (all bureaus)? Contact Hacker Steve (Funds Recovery agent) asap to get started. He specializes in all cases of ethical hacking, cryptocurrency, fake investment schemes, recovery scam, credit repair, stolen account, etc. Stay safe out there! [email protected] https://hackersteve.great-site.net/

what is a research correlational analysis

  • Python For Data Analysis
  • Data Science
  • Data Analysis with R
  • Data Analysis with Python
  • Data Visualization with Python
  • Data Analysis Examples
  • Math for Data Analysis
  • Data Analysis Interview questions
  • Artificial Intelligence
  • Data Analysis Projects
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • What is Correlation Analysis?
  • Canonical Correlation Analysis (CCA) using Sklearn
  • What is Regression Analysis?
  • Correlation vs Causation
  • What is Quantitative Analysis?
  • Principal Component Analysis with Python
  • What is Statistical Analysis in Data Science?
  • Geospatial Data Analysis with R
  • Transactional Analysis (TA) Theory
  • Real-Life Applications of Correlation and Regression
  • Network Analysis in Electric Circuits
  • What is Exploratory Data Analysis ?
  • Principal Component Analysis with R Programming
  • Correlation Vs Causation
  • Methods of Economic Analysis
  • Convolution and Cross-Correlation in CNN
  • Introduction to Factor Analysis
  • Python - Variations of Principal Component Analysis
  • Principal Component Analysis(PCA)

What is Canonical Correlation Analysis?

Canonical Correlation Analysis (CCA) is an advanced statistical technique used to probe the relationships between two sets of multivariate variables on the same subjects. It is particularly applicable in circumstances where multiple regression would be appropriate, but there are multiple intercorrelated outcome variables. CCA identifies and quantifies the associations among these two variable groups. It computes a set of canonical variates, which are orthogonal linear combinations of the variables within each group, that optimally explain the variability both within and between the groups.

Understanding Canonical Correlation Analysis

Canonical Correlation Analysis is a statistical technique used to analyze the relationship between two sets of variables. It seeks to find linear combinations of the variables in each set that are maximally correlated with each other. The goal of CCA is to identify patterns of association between the two sets of variables.

In CCA, the two sets of variables are often referred to as X and Y. The technique calculates canonical variables (also known as canonical variates) for each set, which are linear combinations of the original variables. These canonical variables are chosen to maximize the correlation between the two sets.

CCA is commonly used in fields such as psychology, sociology, biology, and economics to explore relationships between different sets of variables and to uncover underlying patterns in the data.

Mathematical Concept of Canonical Correlation

The goal of CCA is to find linear combinations of the variables in each set, called canonical variables, such that the correlation between the two sets of canonical variables is maximized.

Let’s consider two sets of variables, X and Y , with p and q variables respectively. The canonical variables for X and Y are denoted as U and V respectively. The canonical correlation between U and V is denoted as 𝜌, and the objective of CCA is to find U and V such that 𝜌 is maximized.

Mathematically, the canonical variables U and V are defined as linear combinations of the original variables:

[Tex]U = a_1 X_1 + a_2 X_2 + \ldots + a_p X_p [/Tex]

[Tex]V = b_1 Y_1 + b_2 Y_2 + \ldots + b_q Y_q[/Tex]

where [Tex]𝑎_1,𝑎_2,…,𝑎_𝑝 [/Tex] and [Tex]𝑏_1,𝑏_2,…,𝑏_𝑞[/Tex] are the coefficients that maximize the canonical correlation 𝜌. These coefficients are chosen such that the canonical correlation matrix between U and V is maximized, subject to the constraints that 𝑉𝑎𝑟(𝑈)=𝑉𝑎𝑟(𝑉)=1.

The canonical correlation 𝜌is given by:

[Tex]\rho = \sqrt{\lambda_1} [/Tex]

In summary, CCA aims to find linear combinations of variables in two sets such that the correlation between these combinations is maximized. It is a useful technique for identifying relationships between sets of variables and is widely used in various fields such as psychology, economics, and biology.

Example of Canonical Correlation Analysis

[Tex]X = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]] [/Tex]

[Tex]Y = [[-1, -2], [-3, -4], [-5, -6], [-7, -8]][/Tex]

Step 1: Mean Centering Calculate the mean of each variable in X and Y, and subtract the means from the respective variables to center the data:

[Tex]X’ = X – mean(X) Y’ = Y – mean(Y)[/Tex]

[Tex]X’ = [[-4.5, -4.5, -4.5], [-1.5, -1.5, -1.5], [1.5, 1.5, 1.5], [4.5, 4.5, 4.5]][/Tex]

[Tex]Y’ = [[3.5, 3.5], [1.5, 1.5], [-0.5, -0.5], [-2.5, -2.5]] [/Tex]

Step 2: Covariance Matrix Calculate the covariance matrix between X’ and Y’:

[Tex]Cov(X’, Y’) = (X’Y’) / (n – 1) [/Tex]

[Tex]Cov(X’, Y’) = [[ 12.66666667, 12.66666667], [ 5.66666667, 5.66666667], [ -0.66666667, -0.66666667], [-6.66666667, -6.66666667]] [/Tex]

Step 3: Singular Value Decomposition (SVD) Perform SVD on the covariance matrix to obtain the matrices U, S, and V:

[Tex]U, S, V = svd(Cov(X’, Y’))[/Tex]

Step 4: Canonical Correlation Coefficients The canonical correlation coefficients (ρ) are the square roots of the eigenvalues of the product of the covariance matrix and its transpose:

[Tex]ρ = sqrt(eigenvalues(Cov(X’, Y’) * Cov(X’, Y’)’))[/Tex]

Python Implementation Of Canonical Correlation

  • first import NumPy as np. We then define two arrays, X and Y, representing two sets of variables.
  • Next, we center the data by subtracting the mean of each variable from the respective variables in X and Y.
  • We calculate the covariance matrix between the centered X and Y using np.cov(X_centered.T, Y_centered.T).
  • Then, we perform singular value decomposition (SVD) on the covariance matrix to obtain matrices
  • Finally, we calculate the canonical correlation coefficients as the square root of the singular values (s) obtained from SVD.

import numpy as np X = np . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ], [ 10 , 11 , 12 ]]) Y = np . array ([[ - 1 , - 2 ], [ - 3 , - 4 ], [ - 5 , - 6 ], [ - 7 , - 8 ]]) # Mean centering X_centered = X - X . mean ( axis = 0 ) Y_centered = Y - Y . mean ( axis = 0 ) # Calculate covariance matrix covariance_matrix = np . cov ( X_centered . T , Y_centered . T ) # Singular value decomposition U , s , Vt = np . linalg . svd ( covariance_matrix ) # Calculate canonical correlation coefficient canonical_corr = np . sqrt ( s ) print ( "Canonical Correlation Coefficients:" , canonical_corr )

Canonical Correlation Coefficients: [7.63762616e+00 5.16704216e-08 3.46215750e-08 0.00000000e+00 0.00000000e+00]

Thus, CCA is a powerful multivariate statistical technique that can help you explore the relationships between two sets of variables. While it has its limitations, it can provide valuable insights into the structure of your data. By understanding the principles and procedures of CCA, you can effectively use this technique in your research.

Interpreting CCA Results

  • Interpreting the results of CCA involves examining the canonical correlations, the canonical variates, and the loadings of the variables on the canonical variates.
  • The canonical correlations indicate the strength of the relationship between the two sets of variables. A high canonical correlation suggests a strong relationship between the two sets of variables.
  • The canonical variates are the vectors that best represent the relationship between the two sets of variables. They are interpreted in a similar way to factors in factor analysis.
  • The loadings of the variables on the canonical variates indicate the contribution of each variable to the canonical variate. They are interpreted in a similar way to factor loadings in factor analysis.

Application of Canonical Correlation

Some applications of Canonical Correlation are:

  • Psychology: CCA can be used to explore the relationship between personality traits and job performance, or to understand the relationship between mental health factors and academic achievement.
  • Economics: CCA can help analyze the relationship between various economic indicators (like GDP, inflation, etc.) and social indicators (like education levels, healthcare access, etc.) to understand their interdependencies.
  • Medicine: In medical research, CCA can be applied to study the relationship between genetic factors and disease outcomes, or to explore the relationship between different treatment methods and patient outcomes.
  • Ecology: CCA is useful for studying the relationship between environmental variables (like temperature, humidity, etc.) and biological variables (like species diversity, population sizes, etc.) to understand ecological processes.
  • Neuroscience: CCA can be used to analyze brain imaging data (like fMRI or EEG) to understand the relationship between brain activity patterns and cognitive processes.
  • Marketing and Customer Relationship Management: CCA can help identify the underlying factors that drive customer behavior and preferences, which can be useful for targeted marketing strategies.
  • Social Sciences: CCA can be used to explore the relationship between different social factors (like income, education, etc.) and outcomes (like happiness, well-being, etc.) to understand societal trends.
  • Climate Science: CCA can be applied to study the relationship between climate variables (like temperature, precipitation, etc.) and their impacts on ecosystems and human populations.

Advantages of Canonical Correlation

  • Identifying Relationships: CCA can reveal underlying relationships between two sets of variables, even when the variables within each set are highly correlated.
  • Dimensionality Reduction: CCA can reduce the dimensionality of the data by identifying the most important linear combinations of variables in each set.
  • Interpretability: The results of CCA are often easy to interpret, as the canonical variables represent the most correlated pairs of variables between the two sets.
  • Multivariate Analysis: CCA allows for the analysis of multiple variables simultaneously, making it suitable for studying complex relationships.
  • Robustness: CCA is robust to violations of normality assumptions and can handle small sample sizes.

Limitations of Canonical Correlation

  • Linear Relationships: CCA assumes that the relationships between variables are linear, which may not always be the case in real-world data.
  • Sensitivity to Outliers: CCA can be sensitive to outliers, which can affect the estimation of the canonical correlations and vectors.
  • Interpretation of Canonical Variables: While the canonical variables are easy to interpret, interpreting the original variables in terms of these canonical variables can be challenging.
  • Assumption of Equal Covariances: CCA assumes that the two sets of variables have equal population covariance matrices, which may not hold true in practice.
  • Large Sample Size Requirement: CCA may require a relatively large sample size which is not possible every time.

Please Login to comment...

Similar reads.

  • Data Analysis

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Open access
  • Published: 07 May 2024

Causal association between low vitamin D and polycystic ovary syndrome: a bidirectional mendelian randomization study

  • Bingrui Gao 1 ,
  • Chenxi Zhang 1 ,
  • Deping Wang 1 , 2 ,
  • Bojuan Li 1 ,
  • Zhongyan Shan 1 ,
  • Weiping Teng 1 &
  • Jing Li   ORCID: orcid.org/0000-0002-3681-4095 1  

Journal of Ovarian Research volume  17 , Article number:  95 ( 2024 ) Cite this article

280 Accesses

Metrics details

Recent studies have revealed the correlation between serum vitamin D (VD) level and polycystic ovary syndrome (PCOS), but the causality and specific mechanisms remain uncertain.

We aimed to investigate the cause-effect relationship between serum VD and PCOS, and the role of testosterone in the related pathological mechanisms.

We assessed the causality between serum VD and PCOS by using genome-wide association studies (GWAS) data in a bidirectional two-sample Mendelian randomization (TS-MR) analysis. Subsequently, a MR mediation analysis was conducted to examine the mediating action of testosterone in the causality between serum VD and PCOS. Ultimately, we integrated GWAS data with cis-expression quantitative loci (cis-eQTLs) data for gene annotation, and used the potentially related genes for functional enrichment analysis to assess the involvement of testosterone and the potential mechanisms.

TS-MR analysis showed that individuals with lower level of serum VD were more likely to develop PCOS (OR = 0.750, 95% CI: 0.587–0.959, P  = 0.022). MR mediation analysis uncovered indirect causal effect of serum VD level on the risk of PCOS via testosterone (OR = 0.983, 95% CI: 0.968–0.998, P  = 0.025). Functional enrichment analysis showed that several pathways may be involved in the VD-testosterone-PCOS axis, such as steroid hormone biosynthesis and autophagy process.

Our findings suggest that genetically predicted lower serum VD level may cause a higher risk of developing PCOS, which may be mediated by increased testosterone production.

Introduction

Vitamin D (VD) is an essential fat-soluble steroid hormone that is necessary for calcium-phosphate metabolism, bone homeostasis, cell differentiation, and immune system function. The prevalence of VD deficiency (VDD) in the population has gradually increased over the past few decades. VDD is associated with various diseases, including cardiovascular disease, inflammation, dyslipidemia, weight gain, and infectious diseases [ 1 , 2 ]. Furthermore, mounting studies have indicated the potential link between the serum VD status and women's reproductive health. Firstly, the biological function of VD is mediated via intracellular VD receptors (VDRs), which are distributed among various tissues, encompassing hypothalamic, pituitary tissue, endometrium, and ovary [ 3 , 4 ]. Secondly, VD participates in regulating genes associated with ovarian and placental functions [ 5 , 6 ]. All evidences suggest that the serum VD plays a potentially significant role in female reproductive health.

Polycystic ovary syndrome (PCOS) is the most common endocrine disorder that effects women of reproductive age, with a global incidence ranging 20–25% [ 7 , 8 ]. PCOS will affect woman's endometrial function and oocyte competence [ 9 , 10 ], which leads to reproductive dysfunction in PCOS patients, including infertility, miscarriage, and pregnancy complications [ 11 , 12 , 13 ]. However, the exact pathogenesis of PCOS remains unclear. Prior observational studies have elucidated the correlation between the serum VD and the risk of PCOS. A recent study revealed that serum VD concentration were lower in women diagnosed with PCOS compared to body mass index (BMI)-matched control, suggesting that regardless of BMI, PCOS is correlated with reduced VD level [ 14 ]. However, these studies can only prove that there is a correlation between them, they cannot clarify the causality between them. In addition, hyperandrogenemia stands as one of the diagnostic criteria for PCOS and impacts 60–80% of patients [ 15 ]. Female are actually more sensitive to testosterone even though it is known as a male hormone [ 16 ]. Growing evidences showed that testosterone may play an important role between the serum VD level and the risk of PCOS. Hahn et al. illustrated an association between the serum VD level and the severity of hirsutism in individuals with PCOS [ 17 ]. The research conducted by Latic et al. indicates a negative correlation between serum VD level and testosterone production in patients with PCOS [ 18 ]. However, a study by Mesinovic et al. suggested no discernible correlation between the serum VD level and androgen production in individuals with PCOS [ 19 ]. Moreover, a large observational study by Gallea et al. also showcased the association between serum VD levels, insulin, and body weight among PCOS patients but not specifically with hyperandrogenemia [ 20 ]. The reason for these different results may be due to the fact that observational studies are susceptible to confounding factors as well as various biases [ 21 ]. Therefore, it is not clear whether testosterone production mediate the relationship between serum VD level and the risk of PCOS, due to the limitations of the study methodology.

In recent years, mendelian randomization (MR) analysis is widely used as an epidemiological method in medical research. Firstly, MR analysis can minimize the impact of confounding factors and various biases on the results by simulating randomized controlled trials (RCTs) at the genetic level, and secondly, MR analysis can also determine causality and reduce the impact of reverse causality on the results of the study [ 22 ].

Thus, in this study, we use the bidirectional two-sample MR (TS-MR) analysis to investigate the cause-effect relationship between the serum VD level and the risk of PCOS. Secondly, we perform the mediation MR analysis to test the mediating role of testosterone production between serum VD level and the risk of PCOS. Finally, we used the bioinformatics analysis to assess the possible biological functions and molecular mechanisms between them.

Materials and methods

Study design of mendelian randomization study.

Our study explored the cause-effect of serum VD level as an exposure on the risk of developing PCOS as an outcome trait and the effect of testosterone as a mediator between VD and PCOS through bidirectional TS-MR analysis, multivariable MR (MVMR) and mediator MR analysis (Fig.  1 ). In order to ensure the study's validity, the study needed to meet the three following crucial assumptions [ 23 ] (Fig.  1 C):1) the correlation assumption: instrumental variables (IVs) must be robustly correlated with the exposure factors; 2) the exclusion restriction assumption: IVs are not associated with potential confounders of the exposure or the outcome; and 3) the independence assumption: IVs do not influence the outcome variables through other pathways besides the exposure factors. This study followed guidelines of STROBE-MR [ 24 ] checklist (Table S 1 ).

figure 1

Flowchart of the study. A Flowchart of the MR study; ( B ) Flowchart of the Bioinformatics study; ( C ) Diagram of the MR assumptions of the association between VD and PCOS; ( D ) Illustrative diagram for the mediation MR analysis framework Abbreviations: MR, mendelian randomization; TS-MR, two-sample MR; VD, vitamin D; PCOS, polycystic ovary syndrome; IVW, inverse variance weighted; BMI, body mass index; FBG, fasting glucose; FI, fasting insulin; MVMR, multivariable MR; BT, bioavailable testosterone; SNPs, single-nucleotide polymorphisms

Data source and IVs selection of mendelian randomization study

We obtained data associated with VD from a large genome-wide association study (GWAS) that identified 143 loci among 417,580 participants which was conducted by Revez et al. in 2020 [ 25 ]. We accessed the summary data related to PCOS from a meta-analysis in the FinnGen and Estonian Biobank (EstBB), which included 3609 cases and 229,788 controls [ 7 ]. Summary data related to bioavailable testosterone (BT) were obtained from the UK Biobank (UKB). Data on serum fasting glucose (FBG) levels were obtained from a UKB GWAS we conducted in 340,002 British participants [ 26 ]. Summary data on circulating concentrations of fasting insulin (FI) were obtained from the MAGIC GWAS included 151,013 participants [ 27 ]. Pooled data related to BMI were acquired from a GWAS meta-analysis within the (GIANT) consortium, encompassing 681,275 participants [ 28 ]. Details of the GWAS database are summarized in Table S 2 .

In the bidirectional TS-MR analysis, Single-nucleotide polymorphisms (SNPs) with genome-wide significance ( P  < 5 × 10 –8 ) were first selected. These SNPs were matched against the SNP-outcome GWAS database to exclude SNPs that could not be matched. To minimize the effects of linkage disequilibrium, we conducted a clumping process with an r 2 threshold of 0.001 and a clumping window of 10,000 kb and excluded these SNPs if present. Subsequently, we performed MR-PRESSO analysis immediately to demonstrate whether there was significant horizontal pleiotropy to exclude outlier SNPs [ 29 ]. To ensure that the IVs were not affected by confounding variables, we searched the PhenoScanner V2 [ 30 ] and deleted obesity-related SNPs associated with BMI and waist circumference (WC). Finally, 88 SNPs (VD on PCOS) and 2 SNPs (PCOS on VD) were used as IVs in the primary bidirectional TS-MR study, respectively. All SNPs exhibited an F statistic greater than 10. The variance explained for each SNP (R 2 ) was calculated using the widely-accepted formula [ 31 , 32 ]. We used the same method as above to screen the SNPs required in the MR mediation analysis. All the IVs SNPs are summarized in Table S 3 - 7 .

Statistic analysis of mendelian randomization study

Initially, the primary analysis aimed to explore the causal relationship between VD and PCOS. We used bidirectional TS-MR analysis to assess the causal relationship between VD and PCOS. In this, we used Cochran's Q test to assess the heterogeneity [ 33 ]; if there was no heterogeneity, we would use the fixed-effects inverse variance weighted (IVW) method, otherwise, we would use the random-effects IVW method [ 34 ]. Furthermore, considering that obesity, abnormal insulin levels, and abnormal glucose values are common in patients with PCOS, we adjusted genetically predicted BMI, FBG, and FI by MVMR to explore the direct causal effect between VD and PCOS. To make the results more robust.

Secondly, a stepwise MR analysis approach was used to examine whether there exist mediation effects of BT between VD and PCOS. To assess the direct causal effect between VD, BT, and PCOS, we performed an MVMR analysis using the MVMR R package [ 35 ]. Conditional F statistics were calculated for assessing the strength of the genetic instruments in MVMR analysis [ 36 ]. The product of the coefficients method [ 37 ] and the multivariate delta method [ 38 ] were used to calculate the indirect effects of VD on PCOS via mediator.

Sensitivity analysis of mendelian randomization study

The following tests were used as sensitivity analyses to assess the robustness of MR effect estimates to invalid genetic variants. Firstly, we conducted MR-Egger regression [ 39 , 40 ], weighted median [ 41 ], and weighted mode [ 42 ] methods. MR-Egger regression can detect and explain horizontal pleiotropy mainly through intercept tests [ 39 , 40 ]. Weighted median can yield impartial estimations even when over half of the information arise from flawed IVs [ 43 ]. We used weighted mode to divide SNPs into multiple subsets based on similar causal effects, and the estimates of causal effects were computed for the subset with the highest number of SNPs [ 42 ]. Secondly, the leave-one-out (LOO) analysis can test whether the results are affected by a single SNP [ 44 ]. Thirdly, as described above we performed MR-PRESSO analysis [ 29 ] to identify the presence of potential horizontal pleiotropic outliers in IVs that could lead to biased results, as well as searching for and removing obesity-related SNPs associated with BMI and WC from the PhenoScanner database [ 45 ].

All analyses were conducted using R version 4.2.0 (R Foundation for Statistical Computing, Vienna, Austria). P values were considered significant at 0.05.

Bioinformatical analysis

We used the largest whole blood expression quantitative trait loci (eQTL) dataset from the eQTLGen consortium, which includes data on cis-eQTLs for 19,250 whole blood expressed genes from 31,684 individuals [ 46 ]. We combined SNPs data of VD-PCOS ( n -SNP = 90) and VD-BT ( n -SNP = 88) with cis-eQTLs data for gene annotation, respectively. Genes with P  < 5*10 –8 and FDR < 0.05 were screened as potentially relevant genes for VD-PCOS and VD-BT.

Subsequently, we used these potentially relevant genes for bioinformatics analyses, including Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. GO analyses [ 47 ], including biological process (BP), molecular function (MF), and cellular composition (CC), are commonly used for large-scale functional enrichment studies. KEGG is a database that stores information about genomes, biological pathways, diseases, and drugs. We used the clusterProfiler package, org.Hs.eg.db package, and enrichplot package in the software R to perform GO and KEGG enrichment analyses of the potentially relevant genes. P  < 0.05 for GO entries and KEGG pathways were considered significant.

Causal effect between serum vitamin D and polycystic ovary syndrome

In our bidirectional TS-MR analysis, the number of IVs of VD on PCOS and PCOS on VD were 90 and 2, respectively. The F-statistic values for each SNP were greater than 10 (Table S 3 ), indicating that the results were almost unaffected by weak instrumental bias. The result of fixed-effects IVW method (Cochran's Q statistic = 81.42, P  = 0.704) indicated that genetically predicted higher level of VD led to a lower risk of developing PCOS after excluding obesity-associated SNPs ( n  = 90 SNPs, OR = 0.750, 95% CI: 0.587–0.959, P  = 0.022) (Table  1 ). MR-Egger, weighted median, and weighted mode methods all obtained similar magnitude and direction to IVW method (Table  1 ). The scatter plot demonstrates the inhibitory effect of individual SNP on PCOS (Fig. S 1 ). Since the MR-Egger P -intercept was greater than 0.05 (Table S 8 ) and the funnel plot (Fig. S 2 ) was roughly symmetrical, there was no indication of horizontal pleiotropy detected in the study. The results of the LOO analyses indicated that there were no potentially affecting SNPs in the main MR analyses (Fig. S 3 ). The result of the result of the MR-PRESSO test did not show any outlier SNPs. Nevertheless, the results of reverse TS-MR showed that genetically predicted risk of developing PCOS did not affect the VD level (fixed-IVW: n  = 2 SNPs, OR = 1.004, 95% CI: 0.987–1.022, P  = 0.640) (Table  1 ).

We subsequently explored the direct effect of the serum VD level on PCOS by MVMR methods, and the results of both Model 1 (adjusted BMI) and Model 2 (adjusted BMI, FBG, and FI) showed that the negative correlation between serum VD level and the risk of PCOS remained similar (Table  2 ). This confirms the robustness of the TS-MR results.

Mendelian randomization mediation analysis

After excluding the outlier SNPs and obesity-related SNPs, MVMR analysis (adjusted BT) revealed direct causal effects of serum VD level (OR: 0.735, 95% CI: 0.552–0.978; P  = 0.035) on the risk of developing PCOS (Table  3 , Fig.  1 D). In the following steps of the MR mediation analysis, we found strong evidence for a causal effect of serum VD level (β: − 0.053, P  = 0.026) on BT (Table  3 ). In addition to this, we also found a causal relationship between BT and PCOS (OR: 1.378, 95% CI: 1.123–1.691; P  = 0.002) (Table  3 ).

Taken together, we found the potential mediation pathways between VD and PCOS: an indirect causal effect of VD on PCOS risk via BT (θ 3  × θ 4 ) (OR: 0.983, 95% CI: 0.968–0.998; P  = 0.025) (Table  3 ). The pathway mediated 5.96% of the total causal effect of VD on PCOS risk. Detailed estimates of direct and indirect causal effects can be found in Table  3 .

Bioinformatics study

The results of the MR study suggested that reduced VD level may lead to the development of PCOS, and BT is a mediator between VD and PCOS, meaning that VD can ultimately influence the development of PCOS by affecting the production of testosterone. On the basis of the above studies, we collected IVs of VD-PCOS ( n -SNPs = 90) and VD-BT ( n -SNPs = 88) respectively, and combined these IVs with cis-eQTLs data for gene annotation respectively. Ultimately, 147 (VD-PCOS) and 164 (VD-BT) potentially relevant genes were annotated (Table S 9 - 10 ), respectively. We then used these genes to perform GO and KEGG analyses.

Firstly, the potentially relevant genes of VD-PCOS were analyzed for enrichment. The results of GO analysis suggested that these genes were mainly related to androgen metabolic process, superoxide metabolic process, cell body membrane, and steroid dehydrogenase activity (Fig.  2 A). The KEGG analysis was mainly enriched in the process of autophagy, steroid biosynthesis, cytochrome P450 metabolic process, and vitamin digestion and absorption process (Fig.  2 C). Subsequently, potentially relevant genes associated with VD-BT were analyzed for enrichment. The results of GO analysis suggested that these genes were mainly associated with steroid metabolism, superoxide metabolism, autophagosome membrane, nuclear androgen receptor binding, and vitamin transmembrane transporter activity (Fig.  2 B), and the KEGG analysis was mainly enriched for autophagy, steroid biosynthesis, vitamin digestion and absorption, and cholesterol metabolism process (Fig.  2 C). All information of the enrichment analysis is shown in the additional file (Table S 11 -S 12 ).

figure 2

Gene Ontology and Kyoto Encyclopedia of the Genome pathway enrichment analysis of potentially relevant genes. A The GO enrichment analysis for potentially relevant genes related to VD and PCOS; ( B ) The GO enrichment analysis for potentially relevant genes related to VD and BT; ( C ). The KEGG pathway analysis for potentially relevant genes related to VD and PCOS; ( D ). The KEGG pathway analysis for potentially relevant genes related to VD and BT. Abbreviations: VD, vitamin D; PCOS, polycystic ovary syndrome; BT, bioavailable testosterone; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of the Genome

In our bidirectional TS-MR analysis, we found that higher serum VD level was causally associated with a lower risk of developing PCOS (OR = 0.750, 95% CI: 0.587–0.959, P  = 0.022), whereas there was little evidence for a causal effect of the risk of PCOS on the effect of serum VD level. Furthermore, our MR mediation analysis confirmed that testosterone can act as one of the mediating factors between the causality of VD and PCOS (OR = 0.983, 95% CI: 0.968–0.998, P  = 0.025). The mediating effect of testosterone was 5.96%. Ultimately, we utilized potentially relevant genes for GO and KEGG enrichment analysis to assess the involvement of testosterone and the potential biological and molecular mechanisms between them.

VD, a lipid-soluble vitamin, plays a pivotal role in numerous biological processes. Primarily synthesized endogenously through exposure to sunlight, it is also acquired, albeit to a lesser extent, from dietary sources [ 48 ]. VDD is considered a globally prevalent nutritional deficiency, with various studies reporting prevalence rates of 58–91% among infertile women [ 49 ]. A cross-sectional study encompassing 625 women diagnosed with PCOS and 217 control subjects revealed that Chinese women diagnosed with PCOS exhibited notably lower level of VD compared to their healthy [ 50 ]. The result from a large observational study conducted by Krul-Poel et al. similarly demonstrated significantly diminished level of VD among women within the PCOS group [ 51 ]. Recent research has demonstrated that women with PCOS exhibit lower serum concentrations of VD compared to BMI-matched controls. This implies that the level of VD is linked to PCOS irrespective of BMI [ 14 ]. Aligned with the outcomes of these observational studies, our research indicated that higher serum VD level serves as a protective factor for the risk of PCOS. To eliminate the influence of obesity as a potential confounder on the results, we excluded obesity-related SNPs in our TS-MR analysis. Subsequently, in our MVMR analyses, we adjusted for genetically predicted BMI, FBG, and FI to explore the direct causal relationship between VD and PCOS. These stringent measures significantly enhance the credibility and robustness of our findings.

The precise mechanism through which serum VD operates on PCOS remains elusive. Hyperandrogenemia stands as a pivotal diagnostic criterion for PCOS. Numerous past studies have concentrated on exploring the correlation between serum VD and hyperandrogenemia in PCOS, yet the conclusions drawn from these studies have not reached a consensus. A study conducted by Latic N et al. revealed a negative correlation between serum VD level and testosterone in PCOS patients. Additionally, Menichini et al. demonstrated a positive impact of VD supplementation (4000 IU) on total testosterone [ 52 ]. However, a study by Mesinovic et al. suggested no discernible correlation between serum VD and androgens in individuals with PCOS [ 19 ]. Moreover, a large observational study by Gallea et al. also showcased associations between serum VD level, insulin, and body weight among PCOS patients but not specifically with hyperandrogenemia [ 20 ]. The inconsistencies observed in these findings might stem from variations in race, sample sizes, seasonal disparities, and the lifestyles of the included subjects. Our study, employing Mendelian randomization, effectively mitigated the impact of sample size, seasonal fluctuations, and diverse lifestyles on the outcomes. Furthermore, our research focused solely on individuals of European ethnicity, and we excluded BMI-related SNPs when incorporating instrumental variables, thereby significantly reducing BMI's potential confounding effect on the results. These measures ensured the robustness and reliability of our findings. Our results suggest that testosterone acts as a mediator between serum VD and PCOS, implying that serum VD may potentially contribute to the development of PCOS by influencing testosterone production.

The mechanism by which serum VD ultimately contributes to the development of PCOS by affecting testosterone remains unclear, but possible explanation has been proposed. Serum VD heightens the activity of aromatase within the ovary, thereby fostering the conversion of androgens to estrogens, ultimately culminating in diminished androgens production [ 53 ]. Kinuta et al. demonstrated a marked reduction in aromatase activity within the ovaries of VDR knockout mice in contrast to the control group [ 54 ]. In addition, we performed bioinformatics analysis to explore more possible biological mechanisms. Firstly, the results of GO and KEGG analyses of potentially related genes of VD-PCOS showed that steroid biosynthetic process, androgen metabolic process, and nuclear androgen receptor binding process were the possible biological mechanisms between the causality of the serum VD level and PCOS. These results are consistent with the results of our bidirectional TS-MR analysis, demonstrating again that the serum VD can ultimately influence the development of PCOS by modulating testosterone production. Subsequently, we subjected potentially relevant genes associated with VD-BT to bioinformatics analysis. The results suggested that autophagy process and superoxide metabolism process might be the biological mechanism between serum VD and testosterone.

There are very few studies linking autophagy to PCOS, and the results of these studies suggest that the development of PCOS is closely related to the process of autophagy [ 55 ]. Texada et al. showed that autophagy can regulate steroid production by modulating cholesterol transport in endocrine cells [ 56 ]. In addition to this, the role of VD-mediated autophagy in disease has been extensively studied, and basic study by Hu et al. showed that VD can mediate the regulation of autophagy function through gastric epithelial cell VD receptors, which ultimately affects the pathogenic effects of H. pylori [ 57 ]. However, whether VD can mediate autophagy ultimately leading to PCOS remains unknown. The results of the bioinformatics study in this study suggest that autophagy is most likely one of the important mechanisms underlying the relationship between VD and PCOS.

Our study has proved that lower serum VD level causes higher prevalence of PCOS. The latter could have oocyte competence and endometrial function impaired [ 9 , 10 ], but also cause a few adverse outcomes related to reproduction, such as infertility, miscarriage, and premature delivery [ 12 , 13 ]. It has been found that VDD could decrease the rates of ovulation and success pregnancy in the PCOS patients, leading to less live birth [ 58 ]. In addition, It has been reported that serum VD level was independent predicting factor for live birth in the PCOS patients received ovulati0on induction [ 59 ]. Yasmine et al. have reported that endometrial thickness of PCOS patients maybe improved after VD administration [ 60 ]. A recent meta-analysis has shown that VD supplementation to PCOS women could decrease the occurrence rates of early miscarriage and premature delivery [ 53 ]. The nuclear receptor of VD (VDR) and 1,25(OH)2D3 membrane binding protein are expressed in both ovarian granulosa and theca cells [ 61 , 62 ]. It has been found that VD can regulate the expression of enzymes in the VDR and ovary, ultimately regulating ovarian function [ 63 ]. One study showed that VDR mRNA was significantly less expressed in granulosa cells of the women with PCOS [ 64 ]. It may cause PCOS patients to be more sensitive to VDD. Based on the above studies and ours, serum VD level need be monitored in the female population, especially in the women of reproductive age, and timely VD administration in PCOS patients would help to improve their reproductive function and pregnancy outcomes.

Our research has several advantages. Primarily, this study confirms the direct causal relationship of the serum VD level on the risk of PCOS through the utilization of the TS-MR analysis method. This method avoids the limitation commonly found in most observational studies, thereby fortifying the reliability and validity of our finding. Secondly, we ascertain the mediating function of testosterone in the relationship between serum VD and PCOS via MR mediation analysis, thus laying the groundwork for subsequent mechanistic studies. Finally, this is the first study to combine MR studies and bioinformatics analyses together to explore causal relationship and potential functional mechanisms between serum VD level, testosterone, and the risk of PCOS, which is quite different from other studies. Nonetheless, this study also has limitations. Firstly, our study failed to capture dietary and sun exposure information that may affect serum VD level. Secondly, the use of exclusively European data in a MR analysis may not be generalizable to other ethnic populations, albeit reducing the impact of ethnicity bias on the study outcomes. Finally, the absence of relevant data prevented us from independently exploring the relationship of serum VD 2 /D 3 with the risk of PCOS, warranting further investigation.

Conclusions

In conclusion, our studies confirm the causality between lower serum VD level and higher risk of PCOS. Furthermore, testosterone may act as a mediator between serum VD and PCOS. These findings emphasize the clinical importance of testing serum VD level and timely VD supplementation as possible primary prevention and treatment of PCOS.

Availability of data and materials

No datasets were generated or analysed during the current study.

Abbreviations

  • Polycystic ovary syndrome

Genome-wide association studies

Two-sample Mendelian randomization

Cis-expression quantitative loci

VD deficiency

VD receptors

Body mass index

  • Mendelian randomization

Multivariable MR

Instrumental variables

Bioavailable testosterone

Fasting glucose

Fasting insulin

Single-nucleotide polymorphisms

Waist circumference

Inverse variance weighted

Leave one out

Gene ontology

Kyoto Encyclopedia of Genes and Genomes

Biological process

Molecular function

Cellular composition

Holick MF. The vitamin D deficiency pandemic: Approaches for diagnosis, treatment and prevention. Rev Endocr Metab Disord. 2017;18:153–65.

Article   CAS   PubMed   Google Scholar  

Autier P, Boniol M, Pizot C, Mullie P. Vitamin D status and ill health: a systematic review. Lancet Diabetes Endocrinol. 2014;2:76–89.

Lerchbaum E, Obermayer-Pietsch B. Vitamin D and fertility: a systematic review. Eur J Endocrinol. 2012;166:765–78.

Irani M, Merhi Z. Role of vitamin D in ovarian physiology and its implication in reproduction: a systematic review. Fertil Steril. 2014;102:460-468.e3.

Parikh G, Varadinova M, Suwandhi P, Araki T, Rosenwaks Z, Poretsky L, et al. Vitamin D regulates steroidogenesis and insulin-like growth factor binding protein-1 (IGFBP-1) production in human ovarian cells. Horm Metab Res. 2010;42:754–7.

Du H, Daftary GS, Lalwani SI, Taylor HS. Direct regulation of HOXA10 by 1,25-(OH)2D3 in human myelomonocytic cells and human endometrial stromal cells. Mol Endocrinol. 2005;19:2222–33.

Tyrmi JS, Arffman RK, Pujol-Gualdo N, Kurra V, Morin-Papunen L, Sliz E, et al. Leveraging Northern European population history: novel low-frequency variants for polycystic ovary syndrome. Hum Reprod. 2022;37:352–65.

Article   PubMed   Google Scholar  

Bruni V, Capozzi A, Lello S. The Role of Genetics, Epigenetics and Lifestyle in Polycystic Ovary Syndrome Development: the State of the Art. Reprod Sci. 2022;29:668–79.

Palomba S, Daolio J, La Sala GB. Oocyte Competence in Women with Polycystic Ovary Syndrome. Trends Endocrinol Metab. 2017;28:186–98.

Palomba S, Piltonen TT, Giudice LC. Endometrial function in women with polycystic ovary syndrome: a comprehensive review. Hum Reprod Update. 2021;27:584–618.

Norman RJ, Dewailly D, Legro RS, Hickey TE. Polycystic ovary syndrome. Lancet. 2007;370:685–97.

Palomba S. Is fertility reduced in ovulatory women with polycystic ovary syndrome? An opinion paper Hum Reprod. 2021;36:2421–8.

CAS   PubMed   Google Scholar  

Palomba S, De Wilde MA, Falbo A, Koster MPH, La Sala GB, Fauser BCJM. Pregnancy complications in women with polycystic ovary syndrome. Hum Reprod Update. 2015;21:575–92.

Bacopoulou F, Kolias E, Efthymiou V, Antonopoulos CN, Charmandari E. Vitamin D predictors in polycystic ovary syndrome: a meta-analysis. Eur J Clin Invest. 2017;47:746–55.

Lejman-Larysz K, Golara A, Baranowska M, Kozłowski M, Guzik P, Szydłowska I, et al. Influence of Vitamin D on the Incidence of Metabolic Syndrome and Hormonal Balance in Patients with Polycystic Ovary Syndrome. Nutrients. 2023;15:2952.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Durdiakova J, Ostatnikova D, Celec P. Testosterone and its metabolites–modulators of brain functions. Acta Neurobiol Exp (Warsz). 2011;71:434–54.

Hahn S, Haselhorst U, Tan S, Quadbeck B, Schmidt M, Roesler S, et al. Low serum 25-hydroxyvitamin D concentrations are associated with insulin resistance and obesity in women with polycystic ovary syndrome. Exp Clin Endocrinol Diabetes. 2006;114:577–83.

Latic N, Erben RG. Vitamin D and Cardiovascular Disease, with Emphasis on Hypertension, Atherosclerosis, and Heart Failure. Int J Mol Sci. 2020;21:6483.

Mesinovic J, Teede HJ, Shorakae S, Lambert GW, Lambert EA, Naderpoor N, et al. The Relationship between Vitamin D Metabolites and Androgens in Women with Polycystic Ovary Syndrome. Nutrients. 2020;12:1219.

Gallea M, Granzotto M, Azzolini S, Faggian D, Mozzanega B, Vettor R, et al. Insulin and body weight but not hyperandrogenism seem involved in seasonal serum 25-OH-vitamin D3 levels in subjects affected by PCOS. Gynecol Endocrinol. 2014;30:739–45.

Smith GD, Lawlor DA, Harbord R, Timpson N, Day I, Ebrahim S. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. Cardon L, editor. PLoS Med. 2007;4:e352.

Article   PubMed   PubMed Central   Google Scholar  

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23:R89-98.

EPIC- InterAct Consortium, Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015;30:543–52.

Article   PubMed Central   Google Scholar  

Skrivankova VW, Richmond RC, Woolf BAR, Yarmolinsky J, Davies NM, Swanson SA, et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA. 2021;326:1614.

Revez JA, Lin T, Qiao Z, Xue A, Holtz Y, Zhu Z, et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat Commun. 2020;11:1647.

Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet. 2021;53:1097–103.

Chen J, Spracklen CN, Marenne G, Varshney A, Corbin LJ, Luan J, et al. The trans-ancestral genomic architecture of glycemic traits. Nat Genet. 2021;53:840–60.

Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼ 700000 individuals of European ancestry. Hum Mol Genet. 2018;27:3641–9.

Verbanck M, Chen C-Y, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50:693–8.

Mortada I. Hyperuricemia, Type 2 Diabetes Mellitus, and Hypertension: an Emerging Association. Curr Hypertens Rep. 2017;19:69.

Choi HK, McCormick N, Lu N, Rai SK, Yokose C, Zhang Y. Population Impact Attributable to Modifiable Risk Factors for Hyperuricemia. Arthritis Rheumatol. 2020;72:157–65.

Nakamura K, Sakurai M, Miura K, Morikawa Y, Yoshita K, Ishizaki M, et al. Alcohol intake and the risk of hyperuricaemia: a 6-year prospective study in Japanese men. Nutr Metab Cardiovasc Dis. 2012;22:989–96.

Greco MFD, Minelli C, Sheehan NA, Thompson JR. Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med. 2015;34:2926–40.

Article   Google Scholar  

Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408.

Sanderson E, Spiller W, Bowden J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Stat Med. 2021;40:5434–52.

Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol. 2019;48:713–27.

VanderWeele TJ. Mediation Analysis: A Practitioner’s Guide. Annu Rev Public Health. 2016;37:17–32.

MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614.

Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–25.

Burgess S, Thompson SG. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur J Epidemiol. 2017;32:377–89.

Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol. 2016;40:304–14.

Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46:1985–98.

Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator - PubMed. Available from: https://pubmed.ncbi.nlm.nih.gov/27061298/ . [cited 2023 Feb 15].

Burgess S, Bowden J, Fall T, Ingelsson E, Thompson SG. Sensitivity Analyses for Robust Causal Inference from Mendelian Randomization Analyses with Multiple Genetic Variants. Epidemiology. 2017;28:30–42.

Kamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics. 2019;35:4851–3.

Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53:1300–10.

The Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43:D1049–56.

Yuan C, Qian ZR, Babic A, Morales-Oyarvide V, Rubinson DA, Kraft P, et al. Prediagnostic Plasma 25-Hydroxyvitamin D and Pancreatic Cancer Survival. J Clin Oncol. 2016;34:2899–905.

Cunningham TK, Allgar V, Dargham SR, Kilpatrick E, Sathyapalan T, Maguiness S, et al. Association of Vitamin D Metabolites With Embryo Development and Fertilization in Women With and Without PCOS Undergoing Subfertility Treatment. Front Endocrinol. 2019;10:13.

Shan C, Zhu Y, Yu J, Zhang Y, Wang Y, Lu N, et al. Low Serum 25-Hydroxyvitamin D Levels Are Associated With Hyperandrogenemia in Polycystic Ovary Syndrome: A Cross-Sectional Study. Front Endocrinol. 2022;13:894935.

Krul-Poel YHM, Koenders PP, Steegers-Theunissen RP, Ten Boekel E, Wee MMT, Louwers Y, et al. Vitamin D and metabolic disturbances in polycystic ovary syndrome (PCOS): a cross-sectional study. Narayanan R, editor. PLOS One. 2018;13:e0204748.

Menichini D, Facchinetti F. Effects of vitamin D supplementation in women with polycystic ovary syndrome: a review. Gynecol Endocrinol. 2020;36:1–5.

Yang M, Shen X, Lu D, Peng J, Zhou S, Xu L, et al. Effects of vitamin D supplementation on ovulation and pregnancy in women with polycystic ovary syndrome: a systematic review and meta-analysis. Front Endocrinol. 2023;14:1148556.

Kinuta K, Tanaka H, Moriwake T, Aya K, Kato S, Seino Y. Vitamin D is an important factor in estrogen biosynthesis of both female and male gonads. Endocrinology. 2000;141:1317–24.

Kumariya S, Ubba V, Jha RK, Gayen JR. Autophagy in ovary and polycystic ovary syndrome: role, dispute and future perspective. Autophagy. 2021;17:2706–33.

Texada MJ, Malita A, Rewitz K. Autophagy regulates steroid production by mediating cholesterol trafficking in endocrine cells. Autophagy. 2019;15:1478–80.

Hu W, Zhang L, Li MX, Shen J, Liu XD, Xiao ZG, et al. Vitamin D3 activates the autolysosomal degradation function against Helicobacter pylori through the PDIA3 receptor in gastric epithelial cells. Autophagy. 2019;15:707–25.

Butts SF, Seifer DB, Koelper N, Senapati S, Sammel MD, Hoofnagle AN, et al. Vitamin D Deficiency Is Associated With Poor Ovarian Stimulation Outcome in PCOS but Not Unexplained Infertility. J Clin Endocrinol Metab. 2019;104:369–78.

Pal L, Zhang H, Williams J, Santoro NF, Diamond MP, Schlaff WD, et al. Vitamin D Status Relates to Reproductive Outcome in Women With Polycystic Ovary Syndrome: Secondary Analysis of a Multicenter Randomized Controlled Trial. J Clin Endocrinol Metab. 2016;101:3027–35.

Abuzeid Y. Impact of Vitamin D Deficiency on Reproductive Outcome in infertile anovulatory women with polycystic ovary syndrome: a systematic literature review. Curr Dev Nutr. 2020;4:nzaa067_001.

Li S, Qi J, Sun Y, Gao X, Ma J, Zhao S. An integrated RNA-Seq and network study reveals that valproate inhibited progesterone production in human granulosa cells. J Steroid Biochem Mol Biol. 2021;214:105991.

Hrabia A, Kamińska K, Socha M, Grzesiak M. Vitamin D3 Receptors and Metabolic Enzymes in Hen Reproductive Tissues. Int J Mol Sci. 2023;24:17074.

Xu J, Lawson MS, Xu F, Du Y, Tkachenko OY, Bishop CV, et al. Vitamin D3 Regulates Follicular Development and Intrafollicular Vitamin D Biosynthesis and Signaling in the Primate Ovary. Front Physiol. 2018;9:1600.

Aghadavod E, Mollaei H, Nouri M, Hamishehkar H. Evaluation of Relationship between Body Mass Index with Vitamin D Receptor Gene Expression and Vitamin D Levels of Follicular Fluid in Overweight Patients with Polycystic Ovary Syndrome. Int J Fertil Steril. 2017;11:105–11.

CAS   PubMed   PubMed Central   Google Scholar  

Download references

Acknowledgements

We would like to express our sincere gratitude to the compilers of the GWAS summary dataset for their management of the data collection and data resources.

This work was supported by the General Program of National Natural Science Foundation of China (grant number No.81771741), Distinguished Professor at Educational Department of Liaoning Province (grant number No. [2014]187) to JL.

Author information

Authors and affiliations.

Department of Endocrinology and Metabolism, The Institute of Endocrinology, NHC Key Laboratory of Diagnosis and Treatment of Thyroid Diseases, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning, 110000, P.R. China

Bingrui Gao, Chenxi Zhang, Deping Wang, Bojuan Li, Zhongyan Shan, Weiping Teng & Jing Li

Department of Endocrinology and Metabolism, Hongqi Hospital Affiliated to Mudanjiang Medical College, Mudanjiang, Heilongjiang, 157011, P.R. China

Deping Wang

You can also search for this author in PubMed   Google Scholar

Contributions

Designed the study: Jing Li, Bingrui Gao; Collected data: Bingrui Gao, Chenxi Zhang; Performed statistical analyses: Bingrui Gao, Deping Wang, Bojuan Li; Drafted the manuscript: Bingrui Gao; Supervised the study and reviewed the manuscript: Jing Li, Zhongyan Shan, Weiping Teng.

Corresponding author

Correspondence to Jing Li .

Ethics declarations

Ethics approval and consent to participate.

Our analysis used publicly available genome-wide association study (GWAS) summary statistics. No new data were collected, and no new ethical approval was required.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: table s1..

STROBE-MR Checklist; Table S2. Key characteristics of participating studies; Table S3. GWAS significant SNPs used as genetic instruments for VD level on PCOS; Table S4. GWAS significant SNPs used as genetic instruments for PCOS on VD level; Table S5. GWAS significant SNPs used as genetic instruments for VD level on BT; Table S6. GWAS significant SNPs used as genetic instruments for BT on PCOS; Table S7. GWAS significant SNPs used as genetic instruments for BT and VD level on PCOS; Table S8. Heterogeneity and directional pleiotropy test using MR-Egger intercepts; Table S9. Potentially relevant genes corresponding to IVs associated with VD and PCOS; Table S10. Potentially relevant genes corresponding to IVs associated with VD and PCOS; Table S11. GO and KEGG enrichment analysis for potentially relevant genes related to VD and PCOS; Table S12. GO and KEGG enrichment analysis for potentially relevant genes related to VD and BT; Figure S1. Scatter plot of the MR estimates for the association of VD level with PCOS; Figure S2. Funnel plot reveals overall heterogeneity of the impact of VD on PCOS; Figure S3. Leave-one-out analysis of the impact of the VD on PCOS.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gao, B., Zhang, C., Wang, D. et al. Causal association between low vitamin D and polycystic ovary syndrome: a bidirectional mendelian randomization study. J Ovarian Res 17 , 95 (2024). https://doi.org/10.1186/s13048-024-01420-5

Download citation

Received : 23 February 2024

Accepted : 20 April 2024

Published : 07 May 2024

DOI : https://doi.org/10.1186/s13048-024-01420-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Testosterone

Journal of Ovarian Research

ISSN: 1757-2215

what is a research correlational analysis

Digital Inclusive Finance, Spatial Spillover Effects and Relative Rural Poverty Alleviation: Evidence from China

  • Published: 16 May 2024

Cite this article

what is a research correlational analysis

  • Panpan Pei 1 ,
  • Shunyi Zhang 1 &
  • Guangxia Zhou 1  

How to govern relative rural poverty is the key and difficulty in eliminating poverty and achieving common prosperity in China. With the rapid development of digital economy, digital inclusive finance is playing an increasingly fundamental role in poverty alleviation. As an important new financial form, whether and how digital inclusive finance affects relative rural poverty is not yet known. Based on new economic geography, this paper empirically tests the direct and spatial impacts of digital financial inclusion on relative rural poverty alleviation by constructing spatial econometric models and using panel data from 31 provinces (autonomous regions and municipalities directly under the central government) in China from 2012 to 2019. The study found that there is a significant positive spatial correlation in relative rural poverty; the development of digital inclusive finance has a significant inhibitory effect on relative rural poverty. Meanwhile, the development of digital inclusive finance in the local province also has a negative spatial spillover effect on rural relative poverty in surrounding areas. Therefore, it is necessary to boost the development of digital inclusive finance, improve the coordination of inclusive finance between regions, and promote inter-regional economic cooperation in the future. Poverty alleviation remains a challenge in the world, especially in developing countries. Digital inclusive finance, which is a new form of inclusive finance and digital economy widely applied in China, could play an increasingly fundamental role in poverty alleviation in rural areas. In this study, a spatial econometric model (SAR) is constructed based on the new economic geography, and the digital financial inclusion index is integrated with macro-economic data at a provincial level in China from 2012 to 2019. The direct and spatial impacts of digital inclusive finance on poverty reduction in rural areas were accessed using the developed model. Results show that digital inclusive finance can significantly reduce relative poverty in rural areas in China. More importantly, it is indicated that there is a significant positive spatial correlation in relative rural poverty, and digital inclusive finance has a negative spatial spillover effect on relative rural poverty, which is supported by a series of endogeneity and robustness tests, such as substitution of relative poverty, replacing models, and using alternative specifications. Recommendations on implementations in poverty alleviation are proposed based on the results of this study. This paper further complements the hot research field on finance development and income inequality. Our findings offer insights into the development of inclusive financial policies for relative rural poverty alleviation in other countries, especially in developing countries with similar backgrounds to China.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

what is a research correlational analysis

Data Availability

The data are available upon request.

References  

Arestis, P., & Caner, A. (2009). Financial liberalization and the geography of poverty. Cambridge Journal of Regions, Economy and Society, 2 (2), 229–244. https://doi.org/10.1093/cjres/rsp012

Article   Google Scholar  

Banerjee, A. V., & Newman, A. F. (1993). Occupational choice and the process of development. Journal of Political Economy, 101 , 274–298.

Beck, T., Asli, D. K., & Peria, M. M. (2007). Reaching out: Access to and use of banking services across countries. Journal of Financial Economics, 85 (1), 234–266. https://doi.org/10.1016/j.jfineco.2006.07.002

Beck, T., Levine, R., & Levkov, A. (2010). Big bad banks? The winners and losers from bank deregulation in the United States. The Journal of Finance, 65 (5), 1637–1667. https://doi.org/10.1111/j.1540-6261.2010.01589.x

Cai, S. P., & Li, L. (2018). The research on the spatial difference and agglomeration effects of the rural inclusive finance in China. The Theory and Practice of Finance and Economics, 39 (3), 24–30.

Google Scholar  

Cai, H. Y., & Yang, C. (2021). Digital financial inclusion, credit availability and China’s relative poverty alleviation. The Theory and Practice of Finance and Economics, 42 (4), 24–30.

Chen, Y. T., & Ali, M. (2023). A bibliometric review of financial inclusion in Asia. In C. H. Leong, M. Ali, S. A. Raza, C. H. Puah, & I. H. Eksi (Eds.), Financial inclusion across Asia: Bringing opportunities for businesses (pp. 117–134). Emerald Publishing Limited. https://doi.org/10.1108/978-1-83753-304-620231010

Chen, X., & Chen, X. (2018). The special spillover effects of inclusive finance digitization on narrowing urban-rural income gap. Commercial Research, 8 , 167–176.

Chen, H. L., & Chen, X. K. (2021). “Trickle-down” or “polarization”: The improvement effect of digital inclusive finance on rural relative poverty. Journal of Yunnan University of Finance and Economics, 37 (7), 15–26.

Chen, P., & Wang, S. H. (2022). Digital inclusive finance, digital divide and multidimensional relative poverty—From the perspective of aging. Exploration of Economic Issues, 10 , 173–190.

Chen, B. O., & Zhao, C. K. (2021). Poverty reduction in rural China: Does the digital finance matter? PLoS ONE, 16 (12), e0261214. https://doi.org/10.1371/journal.pone.0261214

Cheng, Y. (2023). Research on the impact of the development of digital financial inclusion on multidimensional poverty. Frontiers in Business, Economics and Management, 7 (3), 42–45. https://doi.org/10.54097/fbem.v7i3.5275

Cheng, M. W., Li, L. L., & Zeng, Y. M. (2022). The characteristics and driving factors of spatial poverty in the old revolutionary base area from the perspective of spatial heterogeneity. Journal of Agrotechnical Economics, 4 , 4–17.

Chi, Z. H., & Yang, Y. Y. (2012). The poverty line: A survey. Economic Theory and Business Management, 7 , 56–64.

Claessens, S., & Perotti, E. (2007). Finance and inequality: Channels and evidence. Journal of Comparative Economics, 35 (4), 748–773. https://doi.org/10.1016/j.jce.2007.07.002

Collins, D., Morduch, J., Rutherford, S., & Ruthven, O. (2009). Portfolios of the poor: How the world’s poor live on $2 a day . Princeton University Press.

Corrado, G., & Corrado, L. (2017). Inclusive finance for inclusive growth and development. Current Opinion in Environmental Sustainability, 24 , 19–23. https://doi.org/10.1016/j.cosust.2017.01.013

Feng, S. L., & Zhang, Z. (2024). Research on the impact of digital financial inclusion on multidimensional poverty vulnerability. Economic and Management Review, 40 (1), 44–57.

Foster, J. E. (1998). Absolute versus relative poverty. The American Economic Review, 88 (2), 335–341.

Fuchs, V. R. (1967). Redefining poverty and redistributing income. The Public Interest, 14 (8), 88–95.

Fujita, M., Krugman, P. R., & Venables, A. J. (1999). The spatial economy: Cities, regions and international trade . The MIT Press.

Book   Google Scholar  

Gabor, D., & Brooks, S. (2017). The digital revolution in financial inclusion: International development in the fintech era. New Political Economy, 22 (4), 423–436. https://doi.org/10.1080/13563467.2017.1259298

Galbraith, J. K. (1958). The affluent society . Penguin Books.

Galor, O., & Zeira, J. (1993). Income distribution and macroeconomics. The Review of Economic Studies, 60 (1), 35–52. https://doi.org/10.2307/2297811

Gao, Y. D., Wen, T., & Wang, X. H. (2013). A spatial econometric study on the poverty alleviation effect of China’s fiscal and financial agricultural support policies. Economic Science, 1 , 36–46.

Greenwood, J., & Jovanovic, B. (1990). Financial development, growth, and the distribution of income. Journal of Political Economy, 98 (5), 1076–1107. https://doi.org/10.1086/261720

Guo, F., Wang, J. Y., Wang, F., Kong, T., Zhang, X., & Cheng, Z. Y. (2020). Measuring China’s digital financial inclusion: Index complication and spatial characteristics. Economics (quarterly), 19 (4), 1401–1418.

He, X. S., & Kong, R. (2017). Mechanism analysis and empirical test of inclusive financial system alleviating rural poverty. Journal of Northwest A & F University (social Science Edition), 17 (3), 76–83.

He, Z. Y., Zhang, X., & Wan, G. H. (2020). Digital finance, digital divide, and multidimensional poverty. Statistical Research, 37 (10), 79–89.

Hu, L., Yao, S. Q., Yang, C. Y., & Ji, L. H. (2021). Is digital inclusive finance conducive to alleviating relative poverty? Journal of Finance and Economics, 47 (12), 93–107.

Jalan, J., & Ravallion, M. (2002). Geographic poverty traps? A micro model of consumption growth in rural China. Journal of Applied Econometrics, 17 (4), 329–346. https://doi.org/10.1002/jae.645

Jalilian, H., & Kirkpatrick, C. (2005). Does financial development contribute to poverty reduction. JOurnal of Development Studies, 41 (4), 636–656. https://doi.org/10.1080/00220380500092754

Khan, S. U., & Sloboda, B. W. (2023). Spatial analysis of multidimensional poverty in Pakistan: Do income and poverty score of neighboring regions matter? GeoJournal, 88 (3), 2823–2849. https://doi.org/10.1007/s10708-022-10781-7

Lal, T. (2018). Impact of financial inclusion on poverty alleviation through cooperative banks. International Journal of Social Economics, 45 (5), 808–828. https://doi.org/10.1108/IJSE-05-2017-0194

LeSage, J. P., & Pace, R. K. (2009). Introduction to spatial econometrics . CRC Press.

Lewis, W. A. (1966). Development planning: The essentials of economic policy . Harper & Row Publishers.

Li, J. W. (2017). The development of inclusive finance and the adjustment of imbalance in urban and rural income distribution: An empirical study based on spatial econometric models. Studies of International Finance, 10 , 14–23.

Li, J. J., & Han, X. (2019). The effect of financial inclusion on income distribution and poverty alleviation: Policy framework selection for efficiency and equity. Financial Research, 3 , 129–148.

Li, Y., Yu, X. T., & Li, F. (2021). The standard definition and scale measurement of China’s relative poverty. Chinese Rural Economy, 1 , 31–48.

Li, Y. Q., Xin, L. Q., & Zhao, M. X. (2021). Spatial spillover and threshold characteristics of fiscal support for agriculture and financial assistance for agriculture in promoting farmers’ income growth. Inquiry into Economic Issues, 10 , 65–71.

Liu, L. J., & Guo, L. (2023). Digital financial inclusion, income inequality, and vulnerability to relative poverty. Social Indicators Research, 170 , 1155–1181. https://doi.org/10.1007/s11205-023-03245-z

Liu, Y., & Yan, H. (2021). County financial agglomeration, agricultural mechanization and the growth of farmers’ income-empirical evidence based on the 105 counties in Henan Province. Journal of Agrotechnical Economics, 12 , 60–75.

Liu, Z. Q., & Zhang, T. (2021). The impact of digital inclusive finance on farmers’ income and its spatial spillover effects. Contemporary Economic Research, 12 , 93–102.

Liu, D., Fang, R., & Tang, Y. M. (2019). Spatial spillover effect of digital inclusive finance on farmers’ off-farm income. Financial Economics Research, 34 (3), 57–66.

Liu, M., Ge, Y., Hu, S., & Hao, H. (2023). The spatial effects of regional poverty: Spatial dependence, spatial heterogeneity and scale effects. ISPRS International Journal of Geo-Information, 12 (12), 501. https://doi.org/10.3390/ijgi12120501

Lyigun, M. F., & Oween, A. L. (2004). Income inequality, financial development and macroeconomic fluctuation. The Economic Journal, 114 (4), 352–376. https://doi.org/10.1111/j.1468-0297.2004.00212.x

Maure, N., & Haber, S. (2003). Bank concentration, related lending and economic performance: Evidence from Mexico . Stanford University Mimeo.

Maurer, N., & Haber, S. (2007). Related lending and economic performance: Evidence from Mexico. The Journal of Economic History, 67 (3), 551–581. https://doi.org/10.1017/S002205070700023X

McKinnon, R. I. (1973). Money and capital in economic development . Brookings Institution Press.

Mohan, R. (2006). Economic growth, financial deepening and financial inclusion  (pp. 1305–1320). Reserve Bank of India Bulletin.

Munyegera, G. K., & Matsumoto, T. (2016). Mobile money, remittances, and household welfare: Panel evidence from rural Uganda. World Development, 79 , 127–137. https://doi.org/10.1016/j.worlddev.2015.11.006

Mushtaq, R., & Bruneau, C. (2019). Microfinance, financial inclusion and ICT: Implications for poverty and inequality. Technology in Society, 59 , 101154. https://doi.org/10.1016/j.techsoc.2019.101154

Neaime, S., & Gaysset, I. (2018). Financial inclusion and stability in MENA: Evidence from poverty and inequality. Finance Research Letters, 24 (C), 230–237. https://doi.org/10.1016/j.frl.2017.09.007

Omar, M. A., & Inaba, K. (2020). Does financial inclusion reduce poverty and income inequality in developing countries? A panel data analysis. Journal of Economic Structures, 9 (1), 37. https://doi.org/10.1186/s40008-020-00214-4

Ozili, P. K. (2018). Impact of digital finance on financial inclusion and stability. Borsa Istanbul Review, 18 (4), 329–340. https://doi.org/10.1016/j.bir.2017.12.003

Qi, H. Q., & Zhang, J. X. (2023). Rural inclusive finance development and the sustainability of poverty reduction: Dual perspectives based on multidimensional relative poverty and poverty vulnerability. Exploration of Economic Issues, 7 , 158–175.

Ravallion, M., & Chen, S. (2011). Weakly relative poverty. Review of Economics and Statistics, 93 (4), 1251–1261. https://doi.org/10.1162/REST_a_00127

Ravallion, M., Datt, G., & Walle, D. (1991). Quantifying absolute poverty in the developing world. Review of Income and Wealth, 37 (4), 345–361. https://doi.org/10.1111/j.1475-4991.1991.tb00378.x

Rowntree, B. S. (1901). Poverty: A study of town life. Macmillan.

Ruan, J., Wang, J. T., & Yang, X. (2024). Absolute to relative — The research of relative poverty measurement from the perspective of sharing. Journal of Applied Statistics and Management, 43 (02), 331–342.

Samat, N., Rashid, S. M. R., & Elhadary, Y. A. (2018). Analyzing spatial distribution of poverty incidence in northern region of Peninsular Malaysia. Asian Social Science, 14 (12), 86–96. https://doi.org/10.5539/ass.v14n12p86

Sarma, M., & Pais, J. (2011). Financial inclusion and development. Journal of International Development, 23 (5), 613–628. https://doi.org/10.1002/jid.1698

Schultz, T. W. (1964). Transforming traditional agriculture . Yale University Press.

Sen, A. (1976). Poverty: An ordinal approach to measurement. Econometrica, 44 (2), 219–231. https://doi.org/10.2307/1912718

Sen, A. (1983). Poverty and famines: An essay on entitlement and deprivation . Oxford University Press.

Sen, A. (1985). Rights and capabilities, morality and objectivity . Routledge.

Sen, A. (1993). Capability and well-being . Oxford University Press.

Shaw, E. S. (1973). Financial deepening in economics development . Oxford University Press.

Shen, Y. Y., & Li, S. (2020). How to determine the standards of relative poverty after 2020? —With discussion on the feasibility of “urban–rural Coordination” in relative poverty. Journal of South China Normal University (social Science Edition), 2 , 91–101.

Stiglitz, J. E., & Weiss, A. (1981). Credit rationing in markets with imperfect information. The American Economic Review, 71 (3), 393–410.

Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., & Williams, R. M. (1949). The American soldier: Adjustment during army life . Princeton University Press.

Sun, J. W., & Xia, T. (2019). China’s poverty alleviation strategy and the delineation of the relative poverty line after 2020: An analysis based on theory, policy and empirical data. Chinese Rural Economy, 10 , 98–113.

Tan, Y. Z., & Peng, Q. R. (2018). Inclusive financial development and poverty alleviation: Direct impact and spatial spillover effect. Contemporary Finance & Economics, 3 , 56–67.

Tang, K., Li, Z., & He, C. (2023). Spatial distribution pattern and influencing factors of relative poverty in rural China. Innovation and Green Development, 2 (1), 100030. https://doi.org/10.1016/j.igd.2022.100030

Tay, L. Y., Tai, H. T., & Tan, G. S. (2022). Digital financial inclusion: A gateway to sustainable development. Heliyon, 8 (6), e09766. https://doi.org/10.1016/j.heliyon.2022.e09766

Todaro, M. P. (1969). A model of labor migration and urban unemployment in less developed countries. The American Economic Review, 59 (1), 138–148. https://doi.org/10.2307/1802787

Townsend, P. (1970). Measures and explanations of poverty in high income and low income countries: The problems of operationalizing the concepts of development, class and poverty. In P. Townsend (Eds.), The concept of poverty—Working papers on methods of investigation and lifestyles on the poor in different countries (pp. 1–45). Heinemann Educational Books.

Townsend, P. (1979). Poverty in the UK: A survey of household resources and standards of living . University of California Press.

Vaziri, M., Acheampong, M., Downs, J., & Majid, M. R. (2019). Poverty as a function of space: Understanding the spatial configuration of poverty in Malaysia for sustainable development goal number one. GeoJournal, 84 (5), 1317–1336. https://doi.org/10.1007/s10708-018-9926-8

Wang, X., & Fu, Y. (2022). Digital financial inclusion and vulnerability to poverty: Evidence from Chinese rural households. China Agricultural Economic Review, 14 (1), 64–83. https://doi.org/10.1108/CAER-08-2020-0189

Wang, S. G., & Yin, H. D. (2013). Assets and long-term poverty: Empirical study on 2SLS based on panel data. Guizhou Social Sciences, 9 , 50–58.

Wang, W., & Zhu, Y. M. (2018). Inclusive finance and county capital outflow: Poverty alleviation or worsen—Evidence from 592 poverty counties in China. Economic Theory and Business Management, 10 , 98–108.

Wang, H., Zhao, Q., Bai, Y., Zhang, L., & Yu, X. (2020). Poverty and subjective poverty in rural China. Social Indicators Research, 150 (1), 219–242.

Wang, X. G., Wu, F., & Liu, T. (2021). Spatial econometric research on the impact of rural financial development on high-quality agricultural development. Shandong Social Sciences, 10 , 84–91.

Wang, X., Huang, Y. P., Gou, Q., & Qiu, H. (2022). How digital technologies change financial institutions: China’s practice and international implications. International Economic Review, 1 , 70–85.

Xie, S. F., Jin, C. M., Song, T., & Feng, C. X. (2023). Research on the long tail mechanism of digital finance alleviating the relative poverty of rural households. PLoS ONE, 18 (4), e0284988. https://doi.org/10.1371/journal.pone.0284988

Xing, Y. (2021). “Dividend” and “gap” of rural digital inclusive finance. Economist, 2 , 102–111.

Yang, C. H., & Guo, J. T. (2024). A study on the impact mechanism and spatial effect of digital inclusive finance on rural relative poverty in China. On Economic Problems, 3 , 61–68.

Yang, Y., & Ma, X. (2012). An empirical study on the relationship between floating population and urban relative poverty. Guizhou Social Sciences, 10 , 125–128.

Yang, X. Y., Ye, J. S., Zhuo, Z. Y., & Li, J. (2023). Digital inclusive finance, multidimensional relative poverty and spatial effects. Finance & Economics, 5 , 48–61.

Yin, Z. C., Geng, Z. Y., & Pan, B. X. (2019). Financial exclusion and Chinese family poverty: An empirical study based on CHFS data. Research on Financial Issues, 10 , 60–68.

Zhang, X., Wan, G. H., Zhang, J. J., & He, Z. Y. (2019). Digital economy, financial inclusion, and inclusive growth. Economic Research Journal, 54 (8), 71–86.

Zheng, Z. Q. (2020). Digital inclusive finance, spatial spillover and rural poverty alleviation. Journal of Southwest Jiaotong University (social Sciences), 21 (2), 108–118.

Zhou, G. X. (2022). City scale and migrant workers’ wage premium: Effect and mechanism. Journal of Agrotechnical Economics, 12 , 71–87.

Zhou, L., & Wang, H. L. (2021). An approach to study the poverty reduction effect of digital inclusive finance from a multidimensional perspective based on clustering algorithms. Scientific Programming, 2021 , 4645596. https://doi.org/10.1155/2021/4645596

Zhou, Y. S., & Wang, J. Y. (2024). How digital finance improves the urban-rural financial dual structure in China: From the perspective of formal credit rationing for households. Economic Review, 1 , 72–89.

Zhou, L., Liao, J. L., & Zhang, H. (2021). Digital financial inclusion, credit availability and household poverty: Evidence from a micro-level survey. Economic Science, 17 (1), 145–157.

Download references

Acknowledgements

This study was supported by the Humanities and Social Sciences Research Planning Fund of the Ministry of Education (No. 20YJA790096), the Jinan Philosophy and Social Science Project (No. JNSK22B18), and the First Batch of Talent Research Project "Research on the Impact of Digital Inclusive Finance on Relative Poverty of Rural Floating Population" by Qilu University of Technology (Shandong Academy of Sciences) in 2023 (No. 2023RCKY259).

Author information

Authors and affiliations.

School of Finance, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China

Panpan Pei, Shunyi Zhang & Guangxia Zhou

You can also search for this author in PubMed   Google Scholar

Contributions

All authors have read and agreed to the published version of the manuscript.

Panpan Pei: Conceptualization, Methodology, Software, Validation, Formal analysis, Data curation, Writing-original draft, Writing-review & editing.

Shunyi Zhang: Validation, Formal analysis, Data curation, Writing-original draft, Writing-review & editing, Visualization.

Guangxia Zhou: Conceptualization, Methodology, Validation, Resources, Writing-review & editing, Supervision, Project administration, Funding acquisition.

Corresponding author

Correspondence to Guangxia Zhou .

Ethics declarations

Conflict of interest.

All authors declare no conflicts of interest to this research work.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Pei, P., Zhang, S. & Zhou, G. Digital Inclusive Finance, Spatial Spillover Effects and Relative Rural Poverty Alleviation: Evidence from China. Appl. Spatial Analysis (2024). https://doi.org/10.1007/s12061-024-09580-z

Download citation

Received : 01 April 2023

Accepted : 12 April 2024

Published : 16 May 2024

DOI : https://doi.org/10.1007/s12061-024-09580-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Digital inclusive finance
  • Relative rural poverty
  • Spatial spillover effect
  • Spatial heterogeneity
  • Find a journal
  • Publish with us
  • Track your research

The independent source for health policy research, polling, and news.

Do States with Easier Access to Guns have More Suicide Deaths by Firearm?

Heather Saunders Published: Jul 18, 2022

Nearly half a million lives (480,622) were lost to suicide from 2010 to 2020. During the same period, the suicide death rate increased by 12%, and as of 2009, the number of suicides outnumbered those caused by motor vehicle accidents. Suicides are most prevalent among people who live in rural areas, males, American Indian or Alaska Natives, and White people, but they are rising fastest in some people of color, younger individuals, and people who live in rural areas. On July 16, 2022, the federally mandated crisis number, 988, will be available to all landline and cell phone users, providing a single three-digit number to access a network of over 200 local and state-funded crisis centers. While the overall number of suicide deaths decreased slightly from 47,511 to 45,979 between 2019 to 2020, the suicides involving firearms increased over the same period (from 23,941 to 24,292). The recent mass shootings in Uvalde and Buffalo have catalyzed discussion around mental health and gun policy. In the same week that the federal Bipartisan Safer Communities Act was signed strengthening background checks for young adults, adding incentives for red flag laws, and reducing access to guns for individuals with a domestic violence history, the Supreme Court struck down New York’s “proper cause” requirement for concealed carry allowances. In this issue brief, we use the Center for Disease Control and Prevention (CDC) Wonder database and the State Firearm Law Database to examine the association between suicide deaths by firearm and the number of state-level firearm law provisions.

Suicides account for over half of all firearm deaths (54%), and over half of all suicides involve a firearm (53%). Though mass shootings are more widely covered, data reveal that suicides are a more common cause of firearm-related deaths than homicide. In 2020, a little more than half (54%) of all firearm-related deaths were suicides, 43% were homicides, and 2% were accidental discharges or undetermined causes. This represents a slight decrease from 2018 and 2019, where suicides by firearms accounted for over 60% of all firearm deaths in that period. Looking at suicides, we find that guns were involved in 53% of suicides in 2020, representing the majority of all suicides.

Variation in state-level suicide rates is largely driven by rates of suicide by firearm. Suicides involving firearms vary from the lowest rate of 1.8 per 100,000 in New Jersey and Massachusetts to a high of 20.9 per 100,000 in Wyoming, representing an absolute difference of 19.1. In contrast, the rate of suicide by other means is more stable across states, ranging from a low of 4.6 in Mississippi to a high of 11.4 in South Dakota, representing an absolute difference of 6.8.

There is a wide range of firearm law provisions across states, with Idaho having the fewest at just one and California having the most at 111. Because there is no comprehensive national firearm registry and very few state registries, it is difficult to track gun ownership in the US, so estimates of gun ownership rely on survey data or measures closely related to gun ownership–such as the number of firearm laws. The State Firearm Law Database is a catalog of the presence or absence of 134 firearm law provisions across all 50 states; this analysis uses firearm laws present in 2019. Even though state laws vary widely in detail and number, there are some common themes across states. Many states restrict firearm access to those considered high-risk, including people with felony convictions (37 states), domestic violence misdemeanors (31 states), or those deemed by the court to be a danger (28 states). A number of states regulate concealed carry permits–for example, 37 require background checks for applicants and 28 require authorities to revoke concealed carry permits under certain conditions, though some concealed carry laws may be subject to change given the recent Supreme Court decision.  Other major categories of gun laws include dealer regulations, ammunition regulations and child access prevention, among others. In 2019, the average number of firearm law provisions per state was 29 and ranged from one provision in Idaho to 111 in California ( Appendix Table 1).

More than twice as many suicides by firearm occur in states with the fewest gun laws, relative to states with the most laws. We grouped states into three categories according to the number of firearm law provisions. States with the lowest number of gun law provisions (17 states) had an average of six provisions and were placed in the “least” category; states with a moderate number of laws (16 states) had an average of 19 provisions and were placed in the “moderate” category; and states with the most firearm laws (17 states) had an average of 61 provisions and were placed in the “most” firearm provisions category. Using CDC WONDER underlying cause of death data, we calculated the age-adjusted rate of suicide by firearm for each category of states. We find that suicide by firearm is highest in states with the fewest gun laws (10.8 per 100,000), lower in states with moderate gun laws (8.4 per 100,000), and the lowest in states with the most gun laws (4.9 per 100,000) (Figure 3). The analysis is not designed to necessarily demonstrate a causal relationship between gun laws and suicides by firearm, and it is possible that there are other factors that explain the relationship.

Firearms are the most lethal method of suicide attempts, and about half of suicide attempts take place within 10 minutes of the current suicide thought, so having access to firearms is a suicide risk factor. The availability of firearms has been linked to suicides in a number of peer-reviewed studies . In one such study , researchers examined the association between firearm availability and suicide while also accounting for the potential confounding influence of state-level suicidal behaviors (as measured by suicide attempts). Researchers found that higher rates of gun ownership were associated with increased suicide by firearm deaths, but not with other types of suicide. Taking a look at suicide deaths starting from the date of a handgun purchase and comparing them to people who did not purchase handguns, another study found that people who purchased handguns were more likely to die from suicide by firearm than those who did not–with men 8 times more likely and women 35 times more likely compared to non-owners.

Non-firearm suicides rates are relatively stable across states suggesting that other types of suicides are not more likely in areas where guns are harder to access. To examine whether non-firearm suicides are higher in states where guns are more difficult to access, we used the state-level firearm law provision groups described above and calculated the age-adjusted rate for each group (states with the least, moderate, and the most firearm law provisions). The results of this analysis provide insight into whether there are other factors that may be contributing to the relationship between gun laws and firearm suicides, such as whether people in states that lack easy access to firearms have higher suicide rates by other means. The rate of non-firearm suicides is relatively stable across all groups, ranging from a low rate of 6.5 in states with the most firearm laws to a high of 6.9 in states with the lowest number of firearm laws. The absolute difference of 0.4 is statistically significant, but small. Non-firearm suicides remain relatively stable across groups, suggesting that other types of suicides are not more likely in areas where guns are harder to get (Figure 3). Though we do not observe an increase of suicide death by other means in states with less access to guns, there may still be differences across states that could explain these findings.

If the suicide rate by firearm in all states was similar to the rate in the states with the most gun laws, approximately 6,800 lives may have been saved in 2020, a reduction of about 15% of all suicide-related deaths. Applying the crude rate of 5.3 per 100,000 to the total population in 2020, we estimate that nearly 6,800 suicide deaths may have been averted if rates of suicide by firearm were similar to states with the most gun control laws.

Recent federal legislation strengthens some gun control measures, but it may take several years to impact firearm mortality. In the recently passed federal legislation, the Bipartisan Safer Communities Act , there is an emphasis on strengthening some measures of gun control including background checks for young adults and reducing gun access for those who have a history of domestic violence, among other provisions. Also included in the legislation are additional funds for mental health services in schools and for child and family mental health services. Despite federal movement toward strengthening gun control, a recent Supreme Court decision struck down state legislation that placed additional restrictions on concealed carry permits. It is not known how the Supreme Court’s decision will impact the frequency of concealed carry firearms and the rate of firearm mortality. More firearm regulations are associated with fewer homicides and suicides , but the newly passed federal gun laws may take several years to reduce firearm mortality .

If you or someone you know is considering suicide, contact the National Suicide Prevention Lifeline at the new three-digit dialing code 988 or 1-800-273-8255 (En Español: 1-888-628-9454; Deaf and Hard of Hearing: 1-800-799-4889).

This work was supported in part by Well Being Trust. KFF maintains full editorial control over all of its policy analysis, polling, and journalism activities.

  • Mental Health
  • Gun Violence
  • State Level

Also of Interest

  • The Impact of Gun Violence on Children and Adolescents
  • Child and Teen Firearm Mortality in the U.S. and Peer Countries
  • A Look at the Latest Suicide Data and Change Over the Last Decade

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 13 May 2024

Long-term weight loss effects of semaglutide in obesity without diabetes in the SELECT trial

  • Donna H. Ryan 1 ,
  • Ildiko Lingvay   ORCID: orcid.org/0000-0001-7006-7401 2 ,
  • John Deanfield 3 ,
  • Steven E. Kahn 4 ,
  • Eric Barros   ORCID: orcid.org/0000-0001-6613-4181 5 ,
  • Bartolome Burguera 6 ,
  • Helen M. Colhoun   ORCID: orcid.org/0000-0002-8345-3288 7 ,
  • Cintia Cercato   ORCID: orcid.org/0000-0002-6181-4951 8 ,
  • Dror Dicker 9 ,
  • Deborah B. Horn 10 ,
  • G. Kees Hovingh 5 ,
  • Ole Kleist Jeppesen 5 ,
  • Alexander Kokkinos 11 ,
  • A. Michael Lincoff   ORCID: orcid.org/0000-0001-8175-2121 12 ,
  • Sebastian M. Meyhöfer 13 ,
  • Tugce Kalayci Oral 5 ,
  • Jorge Plutzky   ORCID: orcid.org/0000-0002-7194-9876 14 ,
  • André P. van Beek   ORCID: orcid.org/0000-0002-0335-8177 15 ,
  • John P. H. Wilding   ORCID: orcid.org/0000-0003-2839-8404 16 &
  • Robert F. Kushner 17  

Nature Medicine ( 2024 ) Cite this article

13k Accesses

1979 Altmetric

Metrics details

  • Health care
  • Medical research

In the SELECT cardiovascular outcomes trial, semaglutide showed a 20% reduction in major adverse cardiovascular events in 17,604 adults with preexisting cardiovascular disease, overweight or obesity, without diabetes. Here in this prespecified analysis, we examined effects of semaglutide on weight and anthropometric outcomes, safety and tolerability by baseline body mass index (BMI). In patients treated with semaglutide, weight loss continued over 65 weeks and was sustained for up to 4 years. At 208 weeks, semaglutide was associated with mean reduction in weight (−10.2%), waist circumference (−7.7 cm) and waist-to-height ratio (−6.9%) versus placebo (−1.5%, −1.3 cm and −1.0%, respectively; P  < 0.0001 for all comparisons versus placebo). Clinically meaningful weight loss occurred in both sexes and all races, body sizes and regions. Semaglutide was associated with fewer serious adverse events. For each BMI category (<30, 30 to <35, 35 to <40 and ≥40 kg m − 2 ) there were lower rates (events per 100 years of observation) of serious adverse events with semaglutide (43.23, 43.54, 51.07 and 47.06 for semaglutide and 50.48, 49.66, 52.73 and 60.85 for placebo). Semaglutide was associated with increased rates of trial product discontinuation. Discontinuations increased as BMI class decreased. In SELECT, at 208 weeks, semaglutide produced clinically significant weight loss and improvements in anthropometric measurements versus placebo. Weight loss was sustained over 4 years. ClinicalTrials.gov identifier: NCT03574597 .

Similar content being viewed by others

what is a research correlational analysis

Effects of a personalized nutrition program on cardiometabolic health: a randomized controlled trial

what is a research correlational analysis

Two-year effects of semaglutide in adults with overweight or obesity: the STEP 5 trial

what is a research correlational analysis

What is the pipeline for future medications for obesity?

The worldwide obesity prevalence, defined by body mass index (BMI) ≥30 kg m − 2 , has nearly tripled since 1975 (ref. 1 ). BMI is a good surveillance measure for population changes over time, given its strong correlation with body fat amount on a population level, but it may not accurately indicate the amount or location of body fat at the individual level 2 . In fact, the World Health Organization defines clinical obesity as ‘abnormal or excessive fat accumulation that may impair health’ 1 . Excess abnormal body fat, especially visceral adiposity and ectopic fat, is a driver of cardiovascular (CV) disease (CVD) 3 , 4 , 5 , and contributes to the global chronic disease burden of diabetes, chronic kidney disease, cancer and other chronic conditions 6 , 7 .

Remediating the adverse health effects of excess abnormal body fat through weight loss is a priority in addressing the global chronic disease burden. Improvements in CV risk factors, glycemia and quality-of-life measures including personal well-being and physical functioning generally begin with modest weight loss of 5%, whereas greater weight loss is associated with more improvement in these measures 8 , 9 , 10 . Producing and sustaining durable and clinically significant weight loss with lifestyle intervention alone has been challenging 11 . However, weight-management medications that modify appetite can make attaining and sustaining clinically meaningful weight loss of ≥10% more likely 12 . Recently, weight-management medications, particularly those comprising glucagon-like peptide-1 receptor agonists, that help people achieve greater and more sustainable weight loss have been developed 13 . Once-weekly subcutaneous semaglutide 2.4 mg, a glucagon-like peptide-1 receptor agonist, is approved for chronic weight management 14 , 15 , 16 and at doses of up to 2.0 mg is approved for type 2 diabetes treatment 17 , 18 , 19 . In patients with type 2 diabetes and high CV risk, semaglutide at doses of 0.5 mg and 1.0 mg has been shown to significantly lower the risk of CV events 20 . The SELECT trial (Semaglutide Effects on Heart Disease and Stroke in Patients with Overweight or Obesity) studied patients with established CVD and overweight or obesity but without diabetes. In SELECT, semaglutide was associated with a 20% reduction in major adverse CV events (hazard ratio 0.80, 95% confidence interval (CI) 0.72 to 0.90; P  < 0.001) 21 . Data derived from the SELECT trial offer the opportunity to evaluate the weight loss efficacy, in a geographically and racially diverse population, of semaglutide compared with placebo over 208 weeks when both are given in addition to standard-of-care recommendations for secondary CVD prevention (but without a focus on targeting weight loss). Furthermore, the data allow examination of changes in anthropometric measures such as BMI, waist circumference (WC) and waist-to-height ratio (WHtR) as surrogates for body fat amount and location 22 , 23 . The diverse population can also be evaluated for changes in sex- and race-specific ‘cutoff points’ for BMI and WC, which have been identified as anthropometric measures that predict cardiometabolic risk 8 , 22 , 23 .

This prespecified analysis of the SELECT trial investigated weight loss and changes in anthropometric indices in patients with established CVD and overweight or obesity without diabetes, who met inclusion and exclusion criteria, within a range of baseline categories for glycemia, renal function and body anthropometric measures.

Study population

The SELECT study enrolled 17,604 patients (72.3% male) from 41 countries between October 2018 and March 2021, with a mean (s.d.) age of 61.6 (8.9) years and BMI of 33.3 (5.0) kg m − 2 (ref. 21 ). The baseline characteristics of the population have been reported 24 . Supplementary Table 1 outlines SELECT patients according to baseline BMI categories. Of note, in the lower BMI categories (<30 kg m − 2 (overweight) and 30 to <35 kg m − 2 (class I obesity)), the proportion of Asian individuals was higher (14.5% and 7.4%, respectively) compared with the proportion of Asian individuals in the higher BMI categories (BMI 35 to <40 kg m − 2 (class II obesity; 3.8%) and ≥40 kg m − 2 (class III obesity; 2.2%), respectively). As the BMI categories increased, the proportion of women was higher: in the class III BMI category, 45.5% were female, compared with 20.8%, 25.7% and 33.0% in the overweight, class I and class II categories, respectively. Lower BMI categories were associated with a higher proportion of patients with normoglycemia and glycated hemoglobin <5.7%. Although the proportions of patients with high cholesterol and history of smoking were similar across BMI categories, the proportion of patients with high-sensitivity C-reactive protein ≥2.0 mg dl −1 increased as the BMI category increased. A high-sensitivity C-reactive protein >2.0 mg dl −1 was present in 36.4% of patients in the overweight BMI category, with a progressive increase to 43.3%, 57.3% and 72.0% for patients in the class I, II and III obesity categories, respectively.

Weight and anthropometric outcomes

Percentage weight loss.

The average percentage weight-loss trajectories with semaglutide and placebo over 4 years of observation are shown in Fig. 1a (ref. 21 ). For those in the semaglutide group, the weight-loss trajectory continued to week 65 and then was sustained for the study period through week 208 (−10.2% for the semaglutide group, −1.5% for the placebo group; treatment difference −8.7%; 95% CI −9.42 to −7.88; P  < 0.0001). To estimate the treatment effect while on medication, we performed a first on-treatment analysis (observation period until the first time being off treatment for >35 days). At week 208, mean weight loss in the semaglutide group analyzed as first on-treatment was −11.7% compared with −1.5% for the placebo group (Fig. 1b ; treatment difference −10.2%; 95% CI −11.0 to −9.42; P  < 0.0001).

figure 1

a , b , Observed data from the in-trial period ( a ) and first on-treatment ( b ). The symbols are the observed means, and error bars are ±s.e.m. Numbers shown below each panel represent the number of patients contributing to the means. Analysis of covariance with treatment and baseline values was used to estimate the treatment difference. Exact P values are 1.323762 × 10 −94 and 9.80035 × 10 −100 for a and b , respectively. P values are two-sided and are not adjusted for multiplicity. ETD, estimated treatment difference; sema, semaglutide.

Categorical weight loss and individual body weight change

Among in-trial (intention-to-treat principle) patients at week 104, weight loss of ≥5%, ≥10%, ≥15%, ≥20% and ≥25% was achieved by 67.8%, 44.2%, 22.9%, 11.0% and 4.9%, respectively, of those treated with semaglutide compared with 21.3%, 6.9%, 1.7%, 0.6% and 0.1% of those receiving placebo (Fig. 2a ). Individual weight changes at 104 weeks for the in-trial populations for semaglutide and placebo are depicted in Fig. 2b and Fig. 2c , respectively. These waterfall plots show the variation in weight-loss response that occurs with semaglutide and placebo and show that weight loss is more prominent with semaglutide than placebo.

figure 2

a , Categorical weight loss from baseline at week 104 for semaglutide and placebo. Data from the in-trial period. Bars depict the proportion (%) of patients receiving semaglutide or placebo who achieved ≥5%, ≥10%, ≥15%, ≥20% and ≥25% weight loss. b , c , Percentage change in body weight for individual patients from baseline to week 104 for semaglutide ( b ) and placebo ( c ). Each patient’s percentage change in body weight is plotted as a single bar.

Change in WC

WC change from baseline to 104 weeks has been reported previously in the primary outcome paper 21 . The trajectory of WC change mirrored that of the change in body weight. At week 208, average reduction in WC was −7.7 cm with semaglutide versus −1.3 cm with placebo, with a treatment difference of −6.4 cm (95% CI −7.18 to −5.61; P  < 0.0001) 21 .

WC cutoff points

We analyzed achievement of sex- and race-specific cutoff points for WC by BMI <35 kg m − 2 or ≥35 kg m − 2 , because for BMI >35 kg m − 2 , WC is more difficult technically and, thus, less accurate as a risk predictor 4 , 25 , 26 . Within the SELECT population with BMI <35 kg m − 2 at baseline, 15.0% and 14.3% of the semaglutide and placebo groups, respectively, were below the sex- and race-specific WC cutoff points. At week 104, 41.2% fell below the sex- and race-specific cutoff points for the semaglutide group, compared with only 18.0% for the placebo group (Fig. 3 ).

figure 3

WC cutoff points; Asian women <80 cm, non-Asian women <88 cm, Asian men <88 cm, non-Asian men <102 cm.

Waist-to-height ratio

At baseline, mean WHtR was 0.66 for the study population. The lowest tertile of the SELECT population at baseline had a mean WHtR <0.62, which is higher than the cutoff point of 0.5 used to indicate increased cardiometabolic risk 27 , suggesting that the trial population had high WCs. At week 208, in the group randomized to semaglutide, there was a relative reduction of 6.9% in WHtR compared with 1.0% in placebo (treatment difference −5.87% points; 95% CI −6.56 to −5.17; P  < 0.0001).

BMI category change

At week 104, 52.4% of patients treated with semaglutide achieved improvement in BMI category compared with 15.7% of those receiving placebo. Proportions of patients in the BMI categories at baseline and week 104 are shown in Fig. 4 , which depicts in-trial patients receiving semaglutide and placebo. The BMI category change reflects the superior weight loss with semaglutide, which resulted in fewer patients being in the higher BMI categories after 104 weeks. In the semaglutide group, 12.0% of patients achieved a BMI <25 kg m − 2 , which is considered the healthy BMI category, compared with 1.2% for placebo; per study inclusion criteria, no patients were in this category at baseline. The proportion of patients with obesity (BMI ≥30 kg m − 2 ) fell from 71.0% to 43.3% in the semaglutide group versus 71.9% to 67.9% in the placebo group.

figure 4

In the semaglutide group, 12.0% of patients achieved normal weight status at week 104 (from 0% at baseline), compared with 1.2% (from 0% at baseline) for placebo. BMI classes: healthy (BMI <25 kg m − 2 ), overweight (25 to <30 kg m − 2 ), class I obesity (30 to <35 kg m − 2 ), class II obesity (35 to <40 kg m − 2 ) and class III obesity (BMI ≥40 kg m − 2 ).

Weight and anthropometric outcomes by subgroups

The forest plot illustrated in Fig. 5 displays mean body weight percentage change from baseline to week 104 for semaglutide relative to placebo in prespecified subgroups. Similar relationships are depicted for WC changes in prespecified subgroups shown in Extended Data Fig. 1 . The effect of semaglutide (versus placebo) on mean percentage body weight loss as well as reduction in WC was found to be heterogeneous across several population subgroups. Women had a greater difference in mean weight loss with semaglutide versus placebo (−11.1% (95% CI −11.56 to −10.66) versus −7.5% in men (95% CI −7.78 to −7.23); P  < 0.0001). There was a linear relationship between age category and degree of mean weight loss, with younger age being associated with progressively greater mean weight loss, but the actual mean difference by age group is small. Similarly, BMI category had small, although statistically significant, associations. Those with WHtR less than the median experienced slightly lower mean body weight change than those above the median, with estimated treatment differences −8.04% (95% CI −8.37 to −7.70) and −8.99% (95% CI −9.33 to −8.65), respectively ( P  < 0.0001). Patients from Asia and of Asian race experienced slightly lower mean weight loss (estimated treatment difference with semaglutide for Asian race −7.27% (95% CI −8.09 to −6.46; P  = 0.0147) and for Asia −7.30 (95% CI −7.97 to −6.62; P  = 0.0016)). There was no difference in weight loss with semaglutide associated with ethnicity (estimated treatment difference for Hispanic −8.53% (95% CI −9.28 to −7.76) or non-Hispanic −8.52% (95% CI −8.77 to 8.26); P  = 0.9769), glycemic status (estimated treatment difference for prediabetes −8.53% (95% CI −8.83 to −8.24) or normoglycemia −8.48% (95% CI −8.88 to −8.07; P  = 0.8188) or renal function (estimated treatment difference for estimated glomerular filtration rate (eGFR) <60 or ≥60 ml min −1  1.73 m − 2 being −8.50% (95% CI −9.23 to −7.76) and −8.52% (95% CI −8.77 to −8.26), respectively ( P  = 0.9519)).

figure 5

Data from the in-trial period. N  = 17,604. P values represent test of no interaction effect. P values are two-sided and are not adjusted for multiplicity. The dots show estimated treatment differences, and the error bars show 95% CIs. Details of the statistical models are available in Methods . ETD, estimated treatment difference; HbA1c, glycated hemoglobin; MI, myocardial infarction; PAD, peripheral artery disease; sema, semaglutide.

Safety and tolerability according to baseline BMI category

We reported in the primary outcome of the SELECT trial that adverse events (AEs) leading to permanent discontinuation of the trial product occurred in 1,461 patients (16.6%) in the semaglutide group and 718 patients (8.2%) in the placebo group ( P  < 0.001) 21 . For this analysis, we evaluated the cumulative incidence of AEs leading to trial product discontinuation by treatment assignment and by BMI category (Fig. 6 ). For this analysis, with death modeled as a competing risk, we tracked the proportion of in-trial patients for whom drug was withdrawn or interrupted for the first time (Fig. 6 , left) or cumulative discontinuations (Fig. 6 , right). Both panels of Fig. 6 depict a graded increase in the proportion discontinuing semaglutide, but not placebo. For lower BMI classes, discontinuation rates are higher in the semaglutide group but not the placebo group.

figure 6

Data are in-trial from the full analysis set. sema, semaglutide.

We reported in the primary SELECT analysis that serious adverse events (SAEs) were reported by 2,941 patients (33.4%) in the semaglutide arm and by 3,204 patients (36.4%) in the placebo arm ( P  < 0.001) 21 . For this study, we analyzed SAE rates by person-years of treatment exposure for BMI classes (<30 kg m − 2 , 30 to <35 kg m − 2 , 35 to <40 kg m − 2 , and ≥40 kg m − 2 ) and provide these data in Supplementary Table 2 . We also provide an analysis of the most common categories of SAEs. Semaglutide was associated with lower SAEs, primarily driven by CV event and infections. Within each obesity class (<30 kg m − 2 , 30 to <35 kg m − 2 , 35 to <40 kg m − 2 , and ≥40 kg m − 2 ), there were fewer SAEs in the group receiving semaglutide compared with placebo. Rates (events per 100 years of observation) of SAEs were 43.23, 43.54, 51.07 and 47.06 for semaglutide and 50.48, 49.66, 52.73 and 60.85 for placebo, with no evidence of heterogeneity. There was no detectable difference in hepatobiliary or gastrointestinal SAEs comparing semaglutide with placebo in any of the four BMI classes we evaluated.

The analyses of weight effects of the SELECT study presented here reveal that patients assigned to once-weekly subcutaneous semaglutide 2.4 mg lost significantly more weight than those receiving placebo. The weight-loss trajectory with semaglutide occurred over 65 weeks and was sustained up to 4 years. Likewise, there were similar improvements in the semaglutide group for anthropometrics (WC and WHtR). The weight loss was associated with a greater proportion of patients receiving semaglutide achieving improvement in BMI category, healthy BMI (<25 kg m − 2 ) and falling below the WC cutoff point above which increased cardiometabolic risk for the sex and race is greater 22 , 23 . Furthermore, both sexes, all races, all body sizes and those from all geographic regions were able to achieve clinically meaningful weight loss. There was no evidence of increased SAEs based on BMI categories, although lower BMI category was associated with increased rates of trial product discontinuation, probably reflecting exposure to a higher level of drug in lower BMI categories. These data, representing the longest clinical trial of the effects of semaglutide versus placebo on weight, establish the safety and durability of semaglutide effects on weight loss and maintenance in a geographically and racially diverse population of adult men and women with overweight and obesity but not diabetes. The implications of weight loss of this degree in such a diverse population suggests that it may be possible to impact the public health burden of the multiple morbidities associated with obesity. Although our trial focused on CV events, many chronic diseases would benefit from effective weight management 28 .

There were variations in the weight-loss response. Individual changes in body weight with semaglutide and placebo were striking; still, 67.8% achieved 5% or more weight loss and 44.2% achieved 10% weight loss with semaglutide at 2 years, compared with 21.3% and 6.9%, respectively, for those receiving placebo. Our first on-treatment analysis demonstrated that those on-drug lost more weight than those in-trial, confirming the effect of drug exposure. With semaglutide, lower BMI was associated with less percentage weight loss, and women lost more weight on average than men (−11.1% versus −7.5% treatment difference from placebo); however, in all cases, clinically meaningful mean weight loss was achieved. Although Asian patients lost less weight on average than patients of other races (−7.3% more than placebo), Asian patients were more likely to be in the lowest BMI category (<30 kg m − 2 ), which is known to be associated with less weight loss, as discussed below. Clinically meaningful weight loss was evident in the semaglutide group within a broad range of baseline categories for glycemia and body anthropometrics. Interestingly, at 2 years, a significant proportion of the semaglutide-treated group fell below the sex- and race-specific WC cutoff points, especially in those with BMI <35 kg m − 2 , and a notable proportion (12.0%) fell below the BMI cutoff point of 25 kg m − 2 , which is deemed a healthy BMI in those without unintentional weight loss. As more robust weight loss is possible with newer medications, achieving and maintaining these cutoff point targets may become important benchmarks for tracking responses.

The overall safety profile did not reveal any new signals from prior studies, and there were no BMI category-related associations with AE reporting. The analysis did reveal that tolerability may differ among specific BMI classes, since more discontinuations occurred with semaglutide among lower BMI classes. Potential contributors may include a possibility of higher drug exposure in lower BMI classes, although other explanations, including differences in motivation and cultural mores regarding body size, cannot be excluded.

Is the weight loss in SELECT less than expected based on prior studies with the drug? In STEP 1, a large phase 3 study of once-weekly subcutaneous semaglutide 2.4 mg in individuals without diabetes but with BMI >30 kg m − 2 or 27 kg m − 2 with at least one obesity-related comorbidity, the mean weight loss was −14.9% at week 68, compared with −2.4% with placebo 14 . Several reasons may explain the observation that the mean treatment difference was −12.5% in STEP 1 and −8.7% in SELECT. First, SELECT was designed as a CV outcomes trial and not a weight-loss trial, and weight loss was only a supportive secondary endpoint in the trial design. Patients in STEP 1 were desirous of weight loss as a reason for study participation and received structured lifestyle intervention (which included a −500 kcal per day diet with 150 min per week of physical activity). In the SELECT trial, patients did not enroll for the specific purpose of weight loss and received standard of care covering management of CV risk factors, including medical treatment and healthy lifestyle counseling, but without a specific focus on weight loss. Second, the respective study populations were quite different, with STEP 1 including a younger, healthier population with more women (73.1% of the semaglutide arm in STEP 1 versus 27.7% in SELECT) and higher mean BMI (37.8 kg m − 2 versus 33.3 kg m − 2 , respectively) 14 , 21 . Third, major differences existed between the respective trial protocols. Patients in the semaglutide treatment arm of STEP 1 were more likely to be exposed to the medication at the full dose of 2.4 mg than those in SELECT. In SELECT, investigators were allowed to slow, decrease or pause treatment. By 104 weeks, approximately 77% of SELECT patients on dose were receiving the target semaglutide 2.4 mg weekly dose, which is lower than the corresponding proportion of patients in STEP 1 (89.6% were receiving the target dose at week 68) 14 , 21 . Indeed, in our first on-treatment analysis at week 208, weight loss was greater (−11.7% for semaglutide) compared with the in-trial analysis (−10.2% for semaglutide). Taken together, all these issues make less weight loss an expected finding in SELECT, compared with STEP 1.

The SELECT study has some limitations. First, SELECT was not a primary prevention trial, and the data should not be extrapolated to all individuals with overweight and obesity to prevent major adverse CV events. Although the data set is rich in numbers and diversity, it does not have the numbers of individuals in racial subgroups that may have revealed potential differential effects. SELECT also did not include individuals who have excess abnormal body fat but a BMI <27 kg m − 2 . Not all individuals with increased CV risk have BMI ≥27 kg m − 2 . Thus, the study did not include Asian patients who qualify for treatment with obesity medications at lower BMI and WC cutoff points according to guidelines in their countries 29 . We observed that Asian patients were less likely to be in the higher BMI categories of SELECT and that the population of those with BMI <30 kg m − 2 had a higher percentage of Asian race. Asian individuals would probably benefit from weight loss and medication approaches undertaken at lower BMI levels in the secondary prevention of CVD. Future studies should evaluate CV risk reduction in Asian individuals with high CV risk and BMI <27 kg m − 2 . Another limitation is the lack of information on body composition, beyond the anthropometric measures we used. It would be meaningful to have quantitation of fat mass, lean mass and muscle mass, especially given the wide range of body size in the SELECT population.

An interesting observation from this SELECT weight loss data is that when BMI is ≤30 kg m − 2 , weight loss on a percentage basis is less than that observed across higher classes of BMI severity. Furthermore, as BMI exceeds 30 kg m − 2 , weight loss amounts are more similar for class I, II and III obesity. This was also observed in Look AHEAD, a lifestyle intervention study for weight loss 30 . The proportion (percentage) of weight loss seems to be less, on average, in the BMI <30 kg m − 2 category relative to higher BMI categories, despite their receiving of the same treatment and even potentially higher exposure to the drug for weight loss 30 . Weight loss cannot continue indefinitely. There is a plateau of weight that occurs after weight loss with all treatments for weight management. This plateau has been termed the ‘set point’ or ‘settling point’, a body weight that is in harmony with the genetic and environmental determinants of body weight and adiposity 31 . Perhaps persons with BMI <30 kg m − 2 are closer to their settling point and have less weight to lose to reach it. Furthermore, the cardiometabolic benefits of weight loss are driven by reduction in the abnormal ectopic and visceral depots of fat, not by reduction of subcutaneous fat stores in the hips and thighs. The phenotype of cardiometabolic disease but lower BMI (<30 kg m − 2 ) may be one where reduction of excess abnormal and dysfunctional body fat does not require as much body mass reduction to achieve health improvement. We suspect this may be the case and suggest further studies to explore this aspect of weight-loss physiology.

In conclusion, this analysis of the SELECT study supports the broad use of once-weekly subcutaneous semaglutide 2.4 mg as an aid to CV event reduction in individuals with overweight or obesity without diabetes but with preexisting CVD. Semaglutide 2.4 mg safely and effectively produced clinically significant weight loss in all subgroups based on age, sex, race, glycemia, renal function and anthropometric categories. Furthermore, the weight loss was sustained over 4 years during the trial.

Trial design and participants

The current work complies with all relevant ethical regulations and reports a prespecified analysis of the randomized, double-blind, placebo-controlled SELECT trial ( NCT03574597 ), details of which have been reported in papers describing study design and rationale 32 , baseline characteristics 24 and the primary outcome 21 . SELECT evaluated once-weekly subcutaneous semaglutide 2.4 mg versus placebo to reduce the risk of major adverse cardiac events (a composite endpoint comprising CV death, nonfatal myocardial infarction or nonfatal stroke) in individuals with established CVD and overweight or obesity, without diabetes. The protocol for SELECT was approved by national and institutional regulatory and ethical authorities in each participating country. All patients provided written informed consent before beginning any trial-specific activity. Eligible patients were aged ≥45 years, with a BMI of ≥27 kg m − 2 and established CVD defined as at least one of the following: prior myocardial infarction, prior ischemic or hemorrhagic stroke, or symptomatic peripheral artery disease. Additional inclusion and exclusion criteria can be found elsewhere 32 .

Human participants research

The trial protocol was designed by the trial sponsor, Novo Nordisk, and the academic Steering Committee. A global expert panel of physician leaders in participating countries advised on regional operational issues. National and institutional regulatory and ethical authorities approved the protocol, and all patients provided written informed consent.

Study intervention and patient management

Patients were randomly assigned in a double-blind manner and 1:1 ratio to receive once-weekly subcutaneous semaglutide 2.4 mg or placebo. The starting dose was 0.24 mg once weekly, with dose increases every 4 weeks (to doses of 0.5, 1.0, 1.7 and 2.4 mg per week) until the target dose of 2.4 mg was reached after 16 weeks. Patients who were unable to tolerate dose escalation due to AEs could be managed by extension of dose-escalation intervals, treatment pauses or maintenance at doses below the 2.4 mg per week target dose. Investigators were allowed to reduce the dose of study product if tolerability issues arose. Investigators were provided with guidelines for, and encouraged to follow, evidence-based recommendations for medical treatment and lifestyle counseling to optimize management of underlying CVD as part of the standard of care. The lifestyle counseling was not targeted at weight loss. Additional intervention descriptions are available 32 .

Sex, race, body weight, height and WC measurements

Sex and race were self-reported. Body weight was measured without shoes and only wearing light clothing; it was measured on a digital scale and recorded in kilograms or pounds (one decimal with a precision of 0.1 kg or lb), with preference for using the same scale throughout the trial. The scale was calibrated yearly as a minimum unless the manufacturer certified that calibration of the weight scales was valid for the lifetime of the scale. Height was measured without shoes in centimeters or inches (one decimal with a precision of 0.1 cm or inches). At screening, BMI was calculated by the electronic case report form. WC was defined as the abdominal circumference located midway between the lower rib margin and the iliac crest. Measures were obtained in a standing position with a nonstretchable measuring tape and to the nearest centimeter or inch. The patient was asked to breathe normally. The tape touched the skin but did not compress soft tissue, and twists in the tape were avoided.

The following endpoints relevant to this paper were assessed at randomization (week 0) to years 2, 3 and 4: change in body weight (%); proportion achieving weight loss ≥5%, ≥10%, ≥15% and ≥20%; change in WC (cm); and percentage change in WHtR (cm cm −1 ). Improvement in BMI category (defined as being in a lower BMI class) was assessed at week 104 compared with baseline according to BMI classes: healthy (BMI <25 kg m − 2 ), overweight (25 to <30 kg m − 2 ), class I obesity (30 to <35 kg m − 2 ), class II obesity (35 to <40 kg m − 2 ) and class III obesity (≥40 kg m − 2 ). The proportions of individuals with BMI <35 or ≥35 kg m − 2 who achieved sex- and race-specific cutoff points for WC (indicating increased metabolic risk) were evaluated at week 104. The WC cutoff points were as follows: Asian women <80 cm, non-Asian women <88 cm, Asian men <88 cm and non-Asian men <102 cm.

Overall, 97.1% of the semaglutide group and 96.8% of the placebo group completed the trial. During the study, 30.6% of those assigned to semaglutide did not complete drug treatment, compared with 27.0% for placebo.

Statistical analysis

The statistical analyses for the in-trial period were based on the intention-to-treat principle and included all randomized patients irrespective of adherence to semaglutide or placebo or changes to background medications. Continuous endpoints were analyzed using an analysis of covariance model with treatment as a fixed factor and baseline value of the endpoint as a covariate. Missing data at the landmark visit, for example, week 104, were imputed using a multiple imputation model and done separately for each treatment arm and included baseline value as a covariate and fit to patients having an observed data point (irrespective of adherence to randomized treatment) at week 104. The fit model is used to impute values for all patients with missing data at week 104 to create 500 complete data sets. Rubin’s rules were used to combine the results. Estimated means are provided with s.e.m., and estimated treatment differences are provided with 95% CI. Binary endpoints were analyzed using logistic regression with treatment and baseline value as a covariate, where missing data were imputed by first using multiple imputation as described above and then categorizing the imputed data according to the endpoint, for example, body weight percentage change at week 104 of <0%. Subgroup analyses for continuous and binary endpoints also included the subgroup and interaction between treatment and subgroup as fixed factors. Because some patients in both arms continued to be followed but were off treatment, we also analyzed weight loss by first on-treatment group (observation period until first time being off treatment for >35 days) to assess a more realistic picture of weight loss in those adhering to treatment. CIs were not adjusted for multiplicity and should therefore not be used to infer definitive treatment effects. All statistical analyses were performed with SAS software, version 9.4 TS1M5 (SAS Institute).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Data will be shared with bona fide researchers who submit a research proposal approved by the independent review board. Individual patient data will be shared in data sets in a deidentified and anonymized format. Information about data access request proposals can be found at https://www.novonordisk-trials.com/ .

Obesity and overweight. World Health Organization https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight (2021).

Cornier, M. A. et al. Assessing adiposity: a scientific statement from the American Heart Association. Circulation 124 , 1996–2019 (2011).

Article   PubMed   Google Scholar  

Afshin, A. et al. Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med. 377 , 13–27 (2017).

Jensen, M. D. et al. 2013 AHA/ACC/TOS guideline for the management of overweight and obesity in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and The Obesity Society. J. Am. Coll. Cardiol. 63 , 2985–3023 (2014).

Poirier, P. et al. Obesity and cardiovascular disease: pathophysiology, evaluation, and effect of weight loss: an update of the 1997 American Heart Association Scientific Statement on Obesity and Heart Disease from the Obesity Committee of the Council on Nutrition, Physical Activity, and Metabolism. Circulation 113 , 898–918 (2006).

Dai, H. et al. The global burden of disease attributable to high body mass index in 195 countries and territories, 1990–2017: an analysis of the Global Burden of Disease Study. PLoS Med. 17 , e1003198 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Ndumele, C. E. et al. Cardiovascular–kidney–metabolic health: a presidential advisory from the American Heart Association. Circulation 148 , 1606–1635 (2023).

Garvey, W. T. et al. American Association of Clinical Endocrinologists and American College of Endocrinology comprehensive clinical practice guidelines for medical care of patients with obesity. Endocr. Pr. 22 , 1–203 (2016).

Article   Google Scholar  

Ryan, D. H. & Yockey, S. R. Weight loss and improvement in comorbidity: differences at 5%, 10%, 15%, and over. Curr. Obes. Rep. 6 , 187–194 (2017).

Wing, R. R. et al. Benefits of modest weight loss in improving cardiovascular risk factors in overweight and obese individuals with type 2 diabetes. Diabetes Care 34 , 1481–1486 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wadden, T. A., Tronieri, J. S. & Butryn, M. L. Lifestyle modification approaches for the treatment of obesity in adults. Am. Psychol. 75 , 235–251 (2020).

Tchang, B. G. et al. Pharmacologic treatment of overweight and obesity in adults. in (eds. Feingold, K. R. et al.) Endotext https://www.ncbi.nlm.nih.gov/books/NBK279038/ (MDText.com, 2000).

Müller, T. D., Blüher, M., Tschöp, M. H. & DiMarchi, R. D. Anti-obesity drug discovery: advances and challenges. Nat. Rev. Drug Discov. 21 , 201–223 (2022).

Wilding, J. P. H. et al. Once-weekly semaglutide in adults with overweight or obesity. N. Engl. J. Med. 384 , 989–1002 (2021).

Article   CAS   PubMed   Google Scholar  

Wegovy (semaglutide) summary of product characteristics. European Medicines Agency https://www.ema.europa.eu/en/documents/product-information/wegovy-epar-product-information_en.pdf (2023).

WEGOVY (semaglutide) prescribing information. Food and Drug Administration https://www.accessdata.fda.gov/drugsatfda_docs/label/2023/215256s007lbl.pdf (2023).

Sorli, C. et al. Efficacy and safety of once-weekly semaglutide monotherapy versus placebo in patients with type 2 diabetes (SUSTAIN 1): a double-blind, randomised, placebo-controlled, parallel-group, multinational, multicentre phase 3a trial. Lancet Diabetes Endocrinol. 5 , 251–260 (2017).

Ozempic (semaglutide) summary of product characteristics. European Medicines Agency https://www.ema.europa.eu/en/documents/product-information/ozempic-epar-product-information_en.pdf (2023).

OZEMPIC (semaglutide) prescribing information. Food and Drug Administration https://www.accessdata.fda.gov/drugsatfda_docs/label/2017/209637lbl.pdf (2017).

Marso, S. P. et al. Semaglutide and cardiovascular outcomes in patients with type 2 diabetes. N. Engl. J. Med. 375 , 1834–1844 (2016).

Lincoff, A. M. et al. Semaglutide and cardiovascular outcomes in obesity without diabetes. N. Engl. J. Med. 389 , 2221–2232 (2023).

Ross, R. et al. Waist circumference as a vital sign in clinical practice: a consensus statement from the IAS and ICCR Working Group on Visceral Obesity. Nat. Rev. Endocrinol. 16 , 177–189 (2020).

Snijder, M. B., van Dam, R. M., Visser, M. & Seidell, J. C. What aspects of body fat are particularly hazardous and how do we measure them? Int. J. Epidemiol. 35 , 83–92 (2006).

Lingvay, I. et al. Semaglutide for cardiovascular event reduction in people with overweight or obesity: SELECT study baseline characteristics. Obesity 31 , 111–122 (2023).

Basset, J. The Asia-Pacific perspective: redefining obesity and its treatment. International Diabetes Institute, World Health Organization Regional Office for the Western Pacific, International Association for the Study of Obesity & International Obesity Task Force https://www.vepachedu.org/TSJ/BMI-Guidelines.pdf (2000).

Hu, F. in Obesity Epidemiology (ed. Hu, F.) 53–83 (Oxford University Press, 2008).

Browning, L. M., Hsieh, S. D. & Ashwell, M. A systematic review of waist-to-height ratio as a screening tool for the prediction of cardiovascular disease and diabetes: 0·5 could be a suitable global boundary value. Nutr. Res. Rev. 23 , 247–269 (2010).

Sattar, N. et al. Treating chronic diseases without tackling excess adiposity promotes multimorbidity. Lancet Diabetes Endocrinol. 11 , 58–62 (2023).

Obesity classification. World Obesity https://www.worldobesity.org/about/about-obesity/obesity-classification (2022).

Unick, J. L. et al. Effectiveness of lifestyle interventions for individuals with severe obesity and type 2 diabetes: results from the Look AHEAD trial. Diabetes Care 34 , 2152–2157 (2011).

Speakman, J. R. et al. Set points, settling points and some alternative models: theoretical options to understand how genes and environments combine to regulate body adiposity. Dis. Model. Mech. 4 , 733–745 (2011).

Ryan, D. H. et al. Semaglutide effects on cardiovascular outcomes in people with overweight or obesity (SELECT) rationale and design. Am. Heart J. 229 , 61–69 (2020).

Download references

Acknowledgements

Editorial support was provided by Richard Ogilvy-Stewart of Apollo, OPEN Health Communications, and funded by Novo Nordisk A/S, in accordance with Good Publication Practice guidelines ( www.ismpp.org/gpp-2022 ).

Author information

Authors and affiliations.

Pennington Biomedical Research Center, Baton Rouge, LA, USA

Donna H. Ryan

Department of Internal Medicine/Endocrinology and Peter O’ Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA

Ildiko Lingvay

Institute of Cardiovascular Science, University College London, London, UK

John Deanfield

VA Puget Sound Health Care System and University of Washington, Seattle, WA, USA

Steven E. Kahn

Novo Nordisk A/S, Søborg, Denmark

Eric Barros, G. Kees Hovingh, Ole Kleist Jeppesen & Tugce Kalayci Oral

Endocrinology and Metabolism Institute, Cleveland Clinic, Cleveland, OH, USA

Bartolome Burguera

Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK

Helen M. Colhoun

Obesity Unit, Department of Endocrinology, Hospital das Clínicas, University of São Paulo, São Paulo, Brazil

Cintia Cercato

Internal Medicine Department D, Hasharon Hospital-Rabin Medical Center, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

Dror Dicker

Center for Obesity Medicine and Metabolic Performance, Department of Surgery, University of Texas McGovern Medical School, Houston, TX, USA

Deborah B. Horn

First Department of Propaedeutic Internal Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece

Alexander Kokkinos

Department of Cardiovascular Medicine, Cleveland Clinic, and Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, USA

A. Michael Lincoff

Institute of Endocrinology & Diabetes, University of Lübeck, Lübeck, Germany

Sebastian M. Meyhöfer

Cardiovascular Division, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA

Jorge Plutzky

University of Groningen, University Medical Center Groningen, Department of Endocrinology, Groningen, the Netherlands

André P. van Beek

Department of Cardiovascular and Metabolic Medicine, University of Liverpool, Liverpool, UK

John P. H. Wilding

Northwestern University Feinberg School of Medicine, Chicago, IL, USA

Robert F. Kushner

You can also search for this author in PubMed   Google Scholar

Contributions

D.H.R., I.L. and S.E.K. contributed to the study design. D.B.H., I.L., D.D., A.K., S.M.M., A.P.v.B., C.C. and J.P.H.W. were study investigators. D.B.H., I.L., D.D., A.K., S.M.M., A.P.v.B., C.C. and J.P.H.W. enrolled patients. D.H.R. was responsible for data analysis and manuscript preparation. All authors contributed to data interpretation, review, revisions and final approval of the manuscript.

Corresponding author

Correspondence to Donna H. Ryan .

Ethics declarations

Competing interests.

D.H.R. declares having received consulting honoraria from Altimmune, Amgen, Biohaven, Boehringer Ingelheim, Calibrate, Carmot Therapeutics, CinRx, Eli Lilly, Epitomee, Gila Therapeutics, IFA Celtics, Novo Nordisk, Pfizer, Rhythm, Scientific Intake, Wondr Health and Zealand Pharma; she declares she received stock options from Calibrate, Epitomee, Scientific Intake and Xeno Bioscience. I.L. declares having received research funding (paid to institution) from Novo Nordisk, Sanofi, Mylan and Boehringer Ingelheim. I.L. received advisory/consulting fees and/or other support from Altimmune, AstraZeneca, Bayer, Biomea, Boehringer Ingelheim, Carmot Therapeutics, Cytoki Pharma, Eli Lilly, Intercept, Janssen/Johnson & Johnson, Mannkind, Mediflix, Merck, Metsera, Novo Nordisk, Pharmaventures, Pfizer, Regeneron, Sanofi, Shionogi, Structure Therapeutics, Target RWE, Terns Pharmaceuticals, The Comm Group, Valeritas, WebMD and Zealand Pharma. J.D. declares having received consulting honoraria from Amgen, Boehringer Ingelheim, Merck, Pfizer, Aegerion, Novartis, Sanofi, Takeda, Novo Nordisk and Bayer, and research grants from British Heart Foundation, MRC (UK), NIHR, PHE, MSD, Pfizer, Aegerion, Colgate and Roche. S.E.K. declares having received consulting honoraria from ANI Pharmaceuticals, Boehringer Ingelheim, Eli Lilly, Merck, Novo Nordisk and Oramed, and stock options from AltPep. B.B. declares having received honoraria related to participation on this trial and has no financial conflicts related to this publication. H.M.C. declares being a stockholder and serving on an advisory panel for Bayer; receiving research grants from Chief Scientist Office, Diabetes UK, European Commission, IQVIA, Juvenile Diabetes Research Foundation and Medical Research Council; serving on an advisory board and speaker’s bureau for Novo Nordisk; and holding stock in Roche Pharmaceuticals. C.C. declares having received consulting honoraria from Novo Nordisk, Eli Lilly, Merck, Brace Pharma and Eurofarma. D.D. declares having received consulting honoraria from Novo Nordisk, Eli Lilly, Boehringer Ingelheim and AstraZeneca, and received research grants through his affiliation from Novo Nordisk, Eli Lilly, Boehringer Ingelheim and Rhythm. D.B.H. declares having received research grants through her academic affiliation from Novo Nordisk and Eli Lilly, and advisory/consulting honoraria from Novo Nordisk, Eli Lilly and Gelesis. A.K. declares having received research grants through his affiliation from Novo Nordisk and Pharmaserve Lilly, and consulting honoraria from Pharmaserve Lilly, Sanofi-Aventis, Novo Nordisk, MSD, AstraZeneca, ELPEN Pharma, Boehringer Ingelheim, Galenica Pharma, Epsilon Health and WinMedica. A.M.L. declares having received honoraria from Novo Nordisk, Eli Lilly, Akebia Therapeutics, Ardelyx, Becton Dickinson, Endologix, FibroGen, GSK, Medtronic, Neovasc, Provention Bio, ReCor, BrainStorm Cell Therapeutics, Alnylam and Intarcia for consulting activities, and research funding to his institution from AbbVie, Esperion, AstraZeneca, CSL Behring, Novartis and Eli Lilly. S.M.M. declares having received consulting honoraria from Amgen, AstraZeneca, Bayer, Boehringer Ingelheim, Daichii-Sankyo, esanum, Gilead, Ipsen, Eli Lilly, Novartis, Novo Nordisk, Sandoz and Sanofi; he declares he received research grants from AstraZeneca, Eli Lilly and Novo Nordisk. J.P. declares having received consulting honoraria from Altimmune, Amgen, Esperion, Merck, MJH Life Sciences, Novartis and Novo Nordisk; he has received a grant, paid to his institution, from Boehringer Ingelheim and holds the position of Director, Preventive Cardiology, at Brigham and Women’s Hospital. A.P.v.B. is contracted via the University of Groningen (no personal payment) to undertake consultancy for Novo Nordisk, Eli Lilly and Boehringer Ingelheim. J.P.H.W. is contracted via the University of Liverpool (no personal payment) to undertake consultancy for Altimmune, AstraZeneca, Boehringer Ingelheim, Cytoki, Eli Lilly, Napp, Novo Nordisk, Menarini, Pfizer, Rhythm Pharmaceuticals, Sanofi, Saniona, Tern Pharmaceuticals, Shionogi and Ysopia. J.P.H.W. also declares personal honoraria/lecture fees from AstraZeneca, Boehringer Ingelheim, Medscape, Napp, Menarini, Novo Nordisk and Rhythm. R.F.K. declares having received consulting honoraria from Novo Nordisk, Weight Watchers, Eli Lilly, Boehringer Ingelheim, Pfizer, Structure and Altimmune. E.B., G.K.H., O.K.J. and T.K.O. are employees of Novo Nordisk A/S.

Peer review

Peer review information.

Nature Medicine thanks Christiana Kartsonaki, Peter Rossing, Naveed Sattar and Vikas Sridhar for their contribution to the peer review of this work. Primary Handling Editor: Sonia Muliyil, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 effect of semaglutide treatment or placebo on waist circumference from baseline to week 104 by subgroups..

Data from the in-trial period. N  = 17,604. P values represent test of no interaction effect. P values are two-sided and not adjusted for multiplicity. The dots show estimated treatment differences and the error bars show 95% confidence intervals. Details of the statistical models are available in Methods . BMI, body mass index; CI, confidence interval; CV, cardiovascular; CVD, cardiovascular disease; eGFR, estimated glomerular filtration rate; ETD, estimated treatment difference; HbA1c, glycated hemoglobin; MI, myocardial infarction; PAD, peripheral artery disease; sema, semaglutide.

Supplementary information

Reporting summary, supplementary tables 1 and 2.

Supplementary Table 1. Baseline characteristics by BMI class. Data are represented as number and percentage of patients. Renal function categories were based on the eGFR as per Chronic Kidney Disease Epidemiology Collaboration. Albuminuria categories were based on UACR. Smoking was defined as smoking at least one cigarette or equivalent daily. The category ‘Other’ for CV inclusion criteria includes patients where it is unknown if the patient fulfilled only one or several criteria and patients who were randomized in error and did not fulfill any criteria. Supplementary Table 2. SAEs according to baseline BMI category. P value: two-sided P value from Fisher’s exact test for test of no difference.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ryan, D.H., Lingvay, I., Deanfield, J. et al. Long-term weight loss effects of semaglutide in obesity without diabetes in the SELECT trial. Nat Med (2024). https://doi.org/10.1038/s41591-024-02996-7

Download citation

Received : 01 March 2024

Accepted : 12 April 2024

Published : 13 May 2024

DOI : https://doi.org/10.1038/s41591-024-02996-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

what is a research correlational analysis

medRxiv

A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Louise M Allan
  • ORCID record for Frank Dudbridge
  • For correspondence: [email protected] [email protected]
  • ORCID record for Luke C Pilling
  • ORCID record for João Delgado
  • Info/History
  • Supplementary material
  • Preview PDF

Background Multimorbidity, the presence of two or more conditions in one person, is increasingly prevalent. Yet shared biological mechanisms of specific pairs of conditions often remain poorly understood. We address this gap by integrating large-scale primary care and genetic data to elucidate potential causes of multimorbidity.

Methods We defined chronic, common, and heritable conditions in individuals aged ≥65 years, using two large representative healthcare databases [CPRD (UK) N=2,425,014 and SIDIAP (Spain) N=1,053,640], and estimated heritability using the same definitions in UK Biobank (N=451,197). We used logistic regression models to estimate the co-occurrence of pairs of conditions in the primary care data.

Linkage disequilibrium score regression was used to estimate genetic similarity between pairs of conditions. Meta-analyses were conducted across healthcare databases, and up to three sources of genetic data, for each condition pair. We classified pairs of conditions as across or within-domain based on the international classification of disease.

Findings We identified N=72 chronic conditions, with 43·6% of 2546 pairs showing higher co-occurrence than expected and evidence of shared genetics. Notably, across-domain pairs like iron deficiency anaemia and peripheral arterial disease exhibited substantial shared genetics (genetic correlation R g =0·45[95% Confidence Intervals 0·27:0·64]). N=33 pairs displayed negative genetic correlations, such as skin cancer and rheumatoid arthritis ( R g =-0·14[-0·21:-0·06]), indicating potential protective mechanisms. Discordance between genetic and primary care data was also observed, e.g., abdominal aortic aneurysm and bladder cancer co-occurred but were not genetically correlated (Odds-Ratio=2·23[2·09:2·37], R g =0·04[-0·20:0·28]) and schizophrenia and fibromyalgia were less likely to co-occur but were positively genetically correlated (OR=0·84[0·75:0·94], R g =0·20[0·11:0·29]).

Interpretation Most pairs of chronic conditions show evidence of shared genetics and co-occurrence in primary care, suggesting shared mechanisms. The identified shared mechanisms, negative correlations and discordance between genetic and observational data provide a foundation for future research on prevention and treatment of multimorbidity.

Funding UK Medical Research Council [MR/W014548/1].

Competing Interest Statement

ARL is now an employee of AstraZeneca and has interests in the company. The work undertaken here was prior to his appointment. SK's group has received funding support from Amgen BioPharma outside of this work. JB is a part time employee of Novo Nordisk Research Centre Oxford, limited, unrelated to this work. TF has consulted for several pharmaceutical companies. All other authors have no disclosures to declare.

Funding Statement

This work was supported by the UK Medical Research Council [grant number MR/W014548/1]. This study was supported by the National Institute for Health and Care Research (NIHR) Exeter Biomedical Research Centre (BRC), the NIHR Leicester BRC, the NIHR Oxford BRC, the NIHR Peninsula Applied Research Collaboration, and the NIHR HealthTech Research Centre. KB is partly funded by the NIHR Applied Research Collaboration South-West Peninsula. JM is funded by an NIHR Advanced Fellowship (NIHR302270). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. CV acknowledges research funding by a "Contratos para la intensificacion de la actividad investigadora en el Sistema Nacional de Salud" contract (INT23/00040) from the Spanish Ministry of Science and Innovation.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was approved by the relevant ethics committees: SIDIAP Scientific and Ethical Committees (19/518-P) on 18/12/2019. The SIDIAP database is based on opt-out presumed consent. If a patient decides to opt out, their routine data would be excluded of the database. CPRD ISAC committee protocol number 23_003109. The Northwest Multi-Centre Research Ethics Committee approved the collection and use of UK Biobank data for health-related research (Research Ethics Committee reference 11/NW/0382). UKB was granted under Application Number 9072.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

↵ * = joint first authors

↵ # = joint senior authors

Data Availability

We cannot make individual-level data available. Researchers can apply to UK Biobank ( https://www.ukbiobank.ac.uk/enable-your-research/ ), CPRD ( https://www.cprd.com/research-applications ), and SIDIAP ( https://www.sidiap.org/index.php/en/solicituds-en ). We have made our diagnostic code lists, code and results available on our GitHub ( https://github.com/GEMINI-multimorbidity/ ) site and Shiny website ( https://gemini-multimorbidity.shinyapps.io/atlas/ ). GWAS summary statistics will be available following acceptance at the GWAS Catalog ( https://www.ebi.ac.uk/gwas/home ).

View the discussion thread.

Supplementary Material

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
  • Addiction Medicine (324)
  • Allergy and Immunology (627)
  • Anesthesia (163)
  • Cardiovascular Medicine (2371)
  • Dentistry and Oral Medicine (289)
  • Dermatology (206)
  • Emergency Medicine (379)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (836)
  • Epidemiology (11768)
  • Forensic Medicine (10)
  • Gastroenterology (702)
  • Genetic and Genomic Medicine (3736)
  • Geriatric Medicine (350)
  • Health Economics (633)
  • Health Informatics (2395)
  • Health Policy (932)
  • Health Systems and Quality Improvement (896)
  • Hematology (341)
  • HIV/AIDS (782)
  • Infectious Diseases (except HIV/AIDS) (13308)
  • Intensive Care and Critical Care Medicine (767)
  • Medical Education (365)
  • Medical Ethics (104)
  • Nephrology (398)
  • Neurology (3501)
  • Nursing (198)
  • Nutrition (524)
  • Obstetrics and Gynecology (674)
  • Occupational and Environmental Health (663)
  • Oncology (1823)
  • Ophthalmology (537)
  • Orthopedics (218)
  • Otolaryngology (287)
  • Pain Medicine (232)
  • Palliative Medicine (66)
  • Pathology (446)
  • Pediatrics (1033)
  • Pharmacology and Therapeutics (426)
  • Primary Care Research (420)
  • Psychiatry and Clinical Psychology (3175)
  • Public and Global Health (6138)
  • Radiology and Imaging (1280)
  • Rehabilitation Medicine and Physical Therapy (747)
  • Respiratory Medicine (826)
  • Rheumatology (379)
  • Sexual and Reproductive Health (372)
  • Sports Medicine (323)
  • Surgery (402)
  • Toxicology (50)
  • Transplantation (172)
  • Urology (145)

COMMENTS

  1. Correlational Research

    A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative. Positive correlation.

  2. Correlation Analysis

    Correlation analysis is a statistical method used to evaluate the strength and direction of the relationship between two or more variables. The correlation coefficient ranges from -1 to 1. A correlation coefficient of 1 indicates a perfect positive correlation. This means that as one variable increases, the other variable also increases.

  3. Correlational Research

    Correlational research design is a type of nonexperimental research that is used to examine the relationship between two or more variables. About us; Disclaimer; ... Uses statistical analysis: Correlational research relies on statistical analysis to determine the strength and direction of the relationship between variables. This may include ...

  4. What Is Correlation Analysis: Comprehensive Guide

    Correlation analysis is a staple of data analytics. It's a commonly used method to measure the relationship between two variables. It helps researchers understand the extent to which changes to the value in one variable are associated with changes to the value in the other. This analysis often applies to quantitative data collected through ...

  5. Correlational Study Overview & Examples

    A correlational study is an experimental design that evaluates only the correlation between variables. The researchers record measurements but do not control or manipulate the variables. Correlational research is a form of observational study. A correlation indicates that as the value of one variable increases, the other tends to change in a ...

  6. Correlational Research

    Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions. ... Correlation analysis. Using a correlation analysis, you can summarise the relationship between variables into a correlation coefficient: a single number that describes the strength and direction of ...

  7. 7.2 Correlational Research

    Correlational research is a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. ... This is an example of content analysis —a family of systematic approaches to measurement using ...

  8. 6.2 Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables ...

  9. Correlational Research

    Correlational research cannot directly establish causal relationships between research variables. The results of correlational research can lead to various possibilities: (1) Variable X is the cause, and variable Y is the effect; (2) variable Y is the cause, and variable X is the effect; (3) there is no causal relationship between variable X and variable Y; both are simultaneously influenced ...

  10. Correlation Studies in Psychology Research

    A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables. A correlation refers to a relationship between two variables. Correlations can be strong or weak and ...

  11. Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical ...

  12. Introduction to Correlation Research

    A correlation has direction and can be either positive or negative (note exceptions listed later). With a positive correlation, individuals who score above (or below) the average (mean) on one measure tend to score similarly above (or below) the average on the other measure. The scatterplot of a positive correlation rises (from left to right).

  13. 6 Correlational Design and Analysis

    Correlational research is a natural extension of descriptive inquiry. The obvious difference between the two approaches being that we are now interested in going beyond describing the tendencies and variation within any single variable (or set of variables) to describing how two or more variables may be related.

  14. Correlational Research: What it is with Examples

    Correlational research is a type of non-experimental research method in which a researcher measures two variables and understands and assesses the statistical relationship between them with no influence from any extraneous variable. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical ...

  15. Correlation in Statistics: Correlation Analysis Explained

    Step 1: Type your data into a worksheet in Excel. The best format is two columns. Place your x-values in column A and your y-values in column B. Step 2: Click the "Data" tab and then click "Data Analysis.". Step 3: Click "Correlation" and then click "OK.". Step 4: Type the location for your x-y variables in the Input. Range box.

  16. Correlation Research: What It Is & How to Use It

    Correlation (often referred to as correlational study, correlation research, bivariate correlation or correlation analysis) is a core step in understanding your data (such as from survey research) or the relationship between variables in your dataset, typically expressed as x1 and x2. If a correlation exists, one variable is correlated to ...

  17. Correlation: Meaning, Types, Examples & Coefficient

    Types. A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

  18. Understanding the Correlation Coefficient: A Complete Guide

    Correlation (Pearson, Kendall, Spearman) Correlation is a bivariate analysis that measures the strength of association between two variables and the direction of the relationship. In terms of the strength of relationship, the value of the correlation coefficient varies between +1 and -1. A value of ± 1 indicates a perfect degree of association ...

  19. Conducting correlation analysis: important limitations and pitfalls

    The correlation coefficient is easy to calculate and provides a measure of the strength of linear association in the data. However, it also has important limitations and pitfalls, both when studying the association between two variables and when studying agreement between methods. These limitations and pitfalls should be taken into account when ...

  20. Chapter 12 Methods for Correlational Studies

    Correlational studies aim to find out if there are differences in the characteristics of a population depending on whether or not its subjects have been exposed to an event of interest in the naturalistic setting. In eHealth, correlational studies are often used to determine whether the use of an eHealth system is associated with a particular set of user characteristics and/or quality of care ...

  21. What is Correlational Research? Types and Characteristics

    What is Correlational Research? Correlational analysis is a way of study that includes studying 2 factors in order to obtain a statistically relevant link amongst them. The goal of correlational research is to find factors that are related to each other to the point that a change in one causes a difference in the other.

  22. What is Canonical Correlation Analysis?

    Canonical Correlation Analysis (CCA) is an advanced statistical technique used to probe the relationships between two sets of multivariate variables on the same subjects. It is particularly applicable in circumstances where multiple regression would be appropriate, but there are multiple intercorrelated outcome variables.

  23. Correlation vs Causation: Key Differences in Analysis

    Correlation refers to a relationship or connection between two variables where if one changes, the other tends to change as well. However, this doesn't mean one causes the other to change.

  24. Causal association between low vitamin D and polycystic ovary syndrome

    Recent studies have revealed the correlation between serum vitamin D (VD) level and polycystic ovary syndrome (PCOS), but the causality and specific mechanisms remain uncertain. We aimed to investigate the cause-effect relationship between serum VD and PCOS, and the role of testosterone in the related pathological mechanisms. We assessed the causality between serum VD and PCOS by using genome ...

  25. Digital Inclusive Finance, Spatial Spillover Effects and ...

    The study found that there is a significant positive spatial correlation in relative rural poverty; the development of digital inclusive finance has a significant inhibitory effect on relative rural poverty. ... Based on the above analysis, this paper proposes the research Hypotheses 2: H2: Digital inclusive finance has a positive spatial ...

  26. Do States with Easier Access to Guns have More Suicide Deaths by ...

    The analysis is not designed to necessarily demonstrate a causal relationship between gun laws and suicides by firearm, and it is possible that there are other factors that explain the ...

  27. Long-term weight loss effects of semaglutide in obesity without

    A prespecified analysis of the SELECT trial revealed that patients assigned to once-weekly subcutaneous semaglutide 2.4 mg lost significantly more weight than those receiving placebo and showed ...

  28. A systematic analysis of the contribution of genetics to multimorbidity

    Background Multimorbidity, the presence of two or more conditions in one person, is increasingly prevalent. Yet shared biological mechanisms of specific pairs of conditions often remain poorly understood. We address this gap by integrating large-scale primary care and genetic data to elucidate potential causes of multimorbidity. Methods We defined chronic, common, and heritable conditions in ...