• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

analyzing data in research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Employee Engagement Survey Tools

Top 10 Employee Engagement Survey Tools

employee engagement software

Top 20 Employee Engagement Software Solutions

May 3, 2024

customer experience software

15 Best Customer Experience Software of 2024

May 2, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Data Analysis

  • Introduction to Data Analysis
  • Quantitative Analysis Tools
  • Qualitative Analysis Tools
  • Mixed Methods Analysis
  • Geospatial Analysis
  • Further Reading

Profile Photo

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible  data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ). 

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats. 

Some examples of types of data are as follows:

  • Photographs 
  • Hand-written notes from field observation
  • Machine learning training data sets
  • Ethnographic interview transcripts
  • Sheet music
  • Scripts for plays and musicals 
  • Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis. 

Data analysis falls within the larger research data lifecycle, as seen below. 

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data. 

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods. 

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning. 

Qualitative  research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning. 

Mixed methods  research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4). 

  • Next: Planning >>
  • Last Updated: May 3, 2024 9:38 AM
  • URL: https://guides.library.georgetown.edu/data-analysis

Creative Commons

  • Privacy Policy

Research Method

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

  • Locations and Hours

Data Literacy for Researchers

What is data literacy, how to use this guide, guide credits.

  • Research & Instruction Librarian
  • Finding Data
  • Analyzing and Visualizing Data
  • Communicating Data
  • UCLA Data Research Support

For the purposes of this guide, data literacy is defined as the ability to ethically find, analyze, and communicate data in the research process.

This guide is organized by three stages of the research data life cycle:

1. Finding Data

2. Analyzing and Visualizing Data

3. Communicating Data

These stages are common to data-driven research across disciplines, though the definition of data itself can vary by discipline. Check out WI+RE's Intro to Data Literacy tutorial to explore different examples of disciplinary data.

For more information about the stages in the research data life cycle, see the  UCLA Library Data Literacy Core Competencies .

This guide was created by Data Literacy Specialist Jennifer Cao, in consultation with UCLA Library staff Ashley Peterson and Ibraheem Ali.

  • Next: Finding Data >>
  • Last Updated: Apr 18, 2024 9:35 AM
  • URL: https://guides.library.ucla.edu/data-research
  • University Libraries
  • Research Guides
  • Topic Guides
  • Research Methods Guide
  • Data Analysis

Research Methods Guide: Data Analysis

  • Introduction
  • Research Design & Method
  • Survey Research
  • Interview Research
  • Resources & Consultation

Tools for Analyzing Survey Data

  • R (open source)
  • Stata 
  • DataCracker (free up to 100 responses per survey)
  • SurveyMonkey (free up to 100 responses per survey)

Tools for Analyzing Interview Data

  • AQUAD (open source)
  • NVivo 

Data Analysis and Presentation Techniques that Apply to both Survey and Interview Research

  • Create a documentation of the data and the process of data collection.
  • Analyze the data rather than just describing it - use it to tell a story that focuses on answering the research question.
  • Use charts or tables to help the reader understand the data and then highlight the most interesting findings.
  • Don’t get bogged down in the detail - tell the reader about the main themes as they relate to the research question, rather than reporting everything that survey respondents or interviewees said.
  • State that ‘most people said …’ or ‘few people felt …’ rather than giving the number of people who said a particular thing.
  • Use brief quotes where these illustrate a particular point really well.
  • Respect confidentiality - you could attribute a quote to 'a faculty member', ‘a student’, or 'a customer' rather than ‘Dr. Nicholls.'

Survey Data Analysis

  • If you used an online survey, the software will automatically collate the data – you will just need to download the data, for example as a spreadsheet.
  • If you used a paper questionnaire, you will need to manually transfer the responses from the questionnaires into a spreadsheet.  Put each question number as a column heading, and use one row for each person’s answers.  Then assign each possible answer a number or ‘code’.
  • When all the data is present and correct, calculate how many people selected each response.
  • Once you have calculated how many people selected each response, you can set up tables and/or graph to display the data.  This could take the form of a table or chart.
  • In addition to descriptive statistics that characterize findings from your survey, you can use statistical and analytical reporting techniques if needed.

Interview Data Analysis

  • Data Reduction and Organization: Try not to feel overwhelmed by quantity of information that has been collected from interviews- a one-hour interview can generate 20 to 25 pages of single-spaced text.   Once you start organizing your fieldwork notes around themes, you can easily identify which part of your data to be used for further analysis.
  • What were the main issues or themes that struck you in this contact / interviewee?"
  • Was there anything else that struck you as salient, interesting, illuminating or important in this contact / interviewee? 
  • What information did you get (or failed to get) on each of the target questions you had for this contact / interviewee?
  • Connection of the data: You can connect data around themes and concepts - then you can show how one concept may influence another.
  • Examination of Relationships: Examining relationships is the centerpiece of the analytic process, because it allows you to move from simple description of the people and settings to explanations of why things happened as they did with those people in that setting.
  • << Previous: Interview Research
  • Next: Resources & Consultation >>
  • Last Updated: Aug 21, 2023 10:42 AM

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis. 

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include: 

  • Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate. 
  • Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes. 
  • Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic. 
  • Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

  • Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
  • Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply. 
  • Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

Data analysis process graphic

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis. 

  • Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step. 
  • Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others.  An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario. 
  • Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data. 
  • Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others. 
  • Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them. 

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of ​​application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

Top 17 data analysis methods

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches. 

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world: 

A. Quantitative Methods 

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods. 

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.  

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist. 

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.  When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result. 

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events. 

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.  

8. Decision Trees 

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision. 

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely.  Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision.  In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis 

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more. 

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments. 

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic. 

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example. 

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of. 

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all”  and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses.  When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all. 

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading. 

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best. 

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data. 

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next. 

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context. 

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question. 

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service. 

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore,  to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize. 

15. Narrative Analysis 

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others. 

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.  

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study. 

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on. 

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice. 

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data. 

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes. 

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.  

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

data connectors from datapine

4. Think of governance 

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical. 

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole. 

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors. 

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

Transportation costs logistics KPIs

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations. 

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

  • Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation. 
  • Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
  • Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

  • Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
  • Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
  • SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
  • Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly 

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving. 

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in. 

  • Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low. 
  • External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high. 
  • Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now. 
  • Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps. 

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource . 

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail. 

  • Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions. 
  • Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective. 
  • Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them. 
  • Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them. 
  • Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.   
  • Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy. 
  • Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way. 
  • Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data. 

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

  • Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers. 
  • Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate. 
  • Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient. 
  • SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis. 
  • Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context. 

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

  • By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
  • 94% of enterprises say that analyzing data is important for their growth and digital transformation. 
  • Companies that exploit the full potential of their data can increase their operating margins by 60% .
  • We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis 

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

  • Cluster analysis
  • Cohort analysis
  • Regression analysis
  • Factor analysis
  • Neural Networks
  • Data Mining
  • Text analysis
  • Time series analysis
  • Decision trees
  • Conjoint analysis 
  • Correspondence Analysis
  • Multidimensional Scaling 
  • Content analysis 
  • Thematic analysis
  • Narrative analysis 
  • Grounded theory analysis
  • Discourse analysis 

Top 17 Data Analysis Techniques:

  • Collaborate your needs
  • Establish your questions
  • Data democratization
  • Think of data governance 
  • Clean your data
  • Set your KPIs
  • Omit useless data
  • Build a data management roadmap
  • Integrate technology
  • Answer your questions
  • Visualize your data
  • Interpretation of data
  • Consider autonomous technology
  • Build a narrative
  • Share the load
  • Data Analysis tools
  • Refine your process constantly 

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

Library Homepage

Research Process Guide

  • Step 1 - Identifying and Developing a Topic
  • Step 2 - Narrowing Your Topic
  • Step 3 - Developing Research Questions
  • Step 4 - Conducting a Literature Review
  • Step 5 - Choosing a Conceptual or Theoretical Framework
  • Step 6 - Determining Research Methodology
  • Step 6a - Determining Research Methodology - Quantitative Research Methods
  • Step 6b - Determining Research Methodology - Qualitative Design
  • Step 7 - Considering Ethical Issues in Research with Human Subjects - Institutional Review Board (IRB)
  • Step 8 - Collecting Data
  • Step 9 - Analyzing Data
  • Step 10 - Interpreting Results
  • Step 11 - Writing Up Results

Step 9: Analyzing Data

analyzing data in research

Once you collect the data, you need to analyze the data. Depending on your methodology and your research questions, you will determine how you will analyze the data.

What is Data Analysis?

For most researchers, data analysis involves a continuous review of the data. Analysis for both quantitative and qualitative (numerical and non-numerical) data requires the researcher to repeatedly revisit the data while examining (Kumar, 2015):

  • The relationship between data and abstract concepts.
  • The relationship between description and interpretation.
  • The data through inductive and deductive reasoning.

Regardless of your methodology, these are the 4 steps in the data analysis process:

  • Describe the data clearly.
  • Identify what is typical and atypical among the data.
  • Uncover relationships and other patterns within the data.
  • Answer research questions or test hypotheses.

Quantitative data analysis

The first thing that you want to do is discuss a step-by-step procedure for the analysis process. For example, Gall et al. (2006) outlined steps used to review the pretest and posttest design with matching participants in the experimental and control groups:

  • Administer measures of dependent variables to research participants.
  • Assign participants to matched pairs on the basis of their scores based on step 1.
  • Randomly assign one member of each group to the experimental groups and the other to the control group.
  • Administer the experimental “treatment” to group.
  • Administer the measures of dependent variable to the experimental and control groups.
  • Compare the performance of experimental and control groups on posttest using tests of statistical significance.

Then you want to tell the reader about the kinds of statistical tests that will be implemented on the dataset (Creswell & Creswell, 2018):

  • Report descriptive statistics including frequencies (i.e., how many male, female, non-binary participants?), means (i.e., what is the mean age?), and standard deviation values for the primary outcome measures. Standard deviation is formally designed as the average distance to scores away from the mean.
  • Indicate inferential statistics test used to examine the hypothesis of your study. For experimental design with categorical variables you might use t-tests or univariate analysis of variance (ANOVA), analysis of covariance (ANCOVA), or multivariate analysis of variance (MANOVA). There are several tests mentioned below categorized based on your measurement.

Kinds of statistical analysis:

Using software like SPSS, you can conduct statistical tests to examine your hypothesis and research questions (Bryman & Cramer, 2009; Ong & Puteh, 2017; Kumar, 2015):

  • Frequency distribution
  • Proportions/percentage values
  • Percentile rank
  • Spearman rank order
  • Correlation
  • Mann-Whitney test
  • Standard deviation
  • Pearson's Product-moment
  • Inferential procedures (T-tests, ANOVA)
  • Geometric mean
  • Percentage variance
  • Inferential procedures (T-test, ANOVA)

Qualitative Data Analysis

Qualitative data analysis, unlike quantitative, does not require hypothesis testing but rather deals with non-numerical data, in the form of words. Qualitative data is inductive, therefore at its root, it is about creating a theory or understanding through analysis and interpretation (Miles et al., 2018; Bogdan & Biklen, 2007). Traditional qualitative researchers have historically completed their data analysis by hand. There is something about getting your hands dirty with the data. That said, there are other options for qualitative data transcription and analysis. Also, there are many methods of coding. Below is a review of the general process of coding techniques.

All qualitative data analysis involves the same four essential steps:

  • Raw data management - "data cleaning"
  • Data coding cycle I - "chunking," "coding"
  • Data coding cycle II - "coding," "clustering"
  • Data interpretation - "telling the story," "making sense of the data for others"

Transcription - Raw data management

Qualitative data usually involves transcribing interviews or focus group data that has been previously recorded with participant consent.

Data Coding - Cycle I

There are several steps in the first cycle of coding. The first thing you need to do is to immerse yourself in the transcript data. Read the data. Read it again. Then, read the data again. Do this several times, and as you do so, you will start to get a sense of the data as a whole. Start annotating in the margins, “chunking” data into categories that make sense to you. This step is your very first preliminary pass at coding. As you do the “chunking,” read over the chunks and see if you start to identify patterns or contradictions. Some really excellent guides and resources for you as you begin your coding process are:

The Coding Manual for Qualitative Researchers, 4th edition - Johnny Saldana (2021)

            Qualitative Data Analysis: A Methods Sourcebook, 4th edition - Matthew Miles, Michael Huberman, & Johnny Saldana (2020)  Both of these books are held by Kean University's Library.  

Data coding - Cycle II

During your second cycle (and third, if need be) of coding, you start clustering chunks of data that have similarities. As you are doing this, you are reading over the chunks of data, refining your code book, and narrowing down the scope of each code.  You will go through 2 or 3 cycles of narrowing down codes, grouping them together, and winnowing down the data. You will most likely move from 25-30 codes to grouping them together in clusters to develop themes. These themes are the core of your data analysis. You will end up with 5-7 central themes that tell the story across the data (Saldaña, 2021). Kinds of Coding

As you work your way through the data analysis, you will be going through three different kinds of coding as you progress (Miles et al., 2018; Bogdan & Biklen, 2007; Creswell & Creswell, 2018):

  • Open Coding - assigning a word or phrase that accurately describes the data chunk. You do open coding line by line in the transcriptions of all interview data. This is the first coding step.
  • Axial Coding - is the process of looking for categories across data sets. Takes place after open coding. More in the second or third cycle of coding. Remember: you cannot categorize something as a theme unless it cuts across data sets.
  • Cluster Coding - taking chunks of data that share similarities and review and code in several cycles. Reduce codes by removing redundancies. Here you are refining your code book to develop themes across data.

Bogdan, R., & Biklen, S. K. (2007). Qualitative research for education. (5th ed.). Allyn & Bacon.

Bryman, A., & Cramer, D. (2009). Quantitative data analysis with SPSS 14, 15 & 16: A guide for social scientists. Routledge/Taylor & Francis Group.

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches. Sage.

Gall, M. D., Borg, W. R., & Gall, J. P. (2006). The methods of quantitative and qualitative research in education sciences and psychology.  (A. R. Nasr, M. Abolghasemi, K. H. Bagheri, M. J. Pakseresht, Z. Khosravi, M. Shahani Yeilagh, Trans.). (2nd ed.). Samt Publications.

Kumar, S. (2015). IRS introduction to research in special and inclusive education. [PowerPoint slides 4, 5, 37, 38, 39,43]. Informační systém Masarykovy univerzity. https://is.muni.cz/el/1441/podzim2015/SP_IRS/

Miles, M. B., Huberman, A. M., & Saldaña, J. (2018). Qualitative data analysis: A methods sourcebook. Sage.

Ong, M. H. A., & Puteh, F. (2017). Quantitative data analysis: Choosing between SPSS, PLS, and AMOS in social science research. International Interdisciplinary Journal of Scientific Research, 3 (1), 14-25.

Saldaña, J. (2021). The coding manual for qualitative researchers (4th ed.). Sage.

  • Last Updated: Jun 29, 2023 1:35 PM
  • URL: https://libguides.kean.edu/ResearchProcessGuide

Banner

Research Data Management Self-Assessment

  • Self-Assessment Rubric for researchers
  • Planning for Data
  • Organizing Data
  • Saving Data
  • Preparing Data

Analyzing Data

  • Sharing Data

There is more to analyzing your data than running statistical tests, summarizing comparisons, and creating visualizations. Analyzing your data also involves ensuring that a future researcher (who may or may not be you) can understand and potentially replicate your analyses.

What does it mean to analyze data?

The methods you use to draw conclusions from your data will, of course, depend on your research questions, your field of research, and the tools you have available to you. However, here are two factors to consider when analyzing your data.

Requirements and How to Meet Them

Your research group, field of research, or institution may have a set of standards or best practices related to how data should be analyzed and how analytical outputs should be managed. These m ay be a simple set of guidelines about how procedures, parameters, or protocols should be documented in a lab notebook.

If you are unsure about the specific requirements that apply to you, try to think about documenting your analyses and managing your outputs as “showing your work”. For additional assistance, contact the LRDS team at  [email protected] .

Things to Think About

  • Properly documenting and managing your analyses is important for reasons related to research transparency and reproducibility. However, they will also help prevent you from wasting time and losing data.
  • While many best practice recommendations apply mostly to analyses underlying a scholarly publication, you should apply the same procedures to all of the analyses you conduct - no matter the outcome.
  • << Previous: Preparing Data
  • Next: Sharing Data >>
  • Last Updated: Mar 1, 2024 4:25 PM
  • URL: https://libguides.chapman.edu/support_your_data

Support Your Data - Logo

Put your best data forward

An a lyz i n g data.

There is more to analyzing your data than running statistical tests, summarizing comparisons, and creating visualizations. Analyzing your data also involves ensuring that a future researcher (who may or may not be you) can understand and potentially replicate your analyses.

ANALYZING: HOW DO YOU DOCUMENT YOUR DATA ANALYSIS?

I often have to redo my analyses or examine their products to determine what procedures or parameters were applied.

After I finish my analysis, I document the specific parameters, procedures, and protocols applied.

I create some formal plans about how I will manage my data, but I generally don’t refer back to them.

I have ensured that the specifics of my analysis workflow and decison making process can be put into action by others.

What does it mean to analyze data?

The methods you use to draw conclusions from your data will, of course, depend on your research questions, your field of research, and the tools you have available to you. However, here are two factors to consider when analyzing your data.

DOCUMENTING ANALYSIS DECISIONS

You should be as transparent as possible about how and why you conducted your specific analyses.

MANAGING ANALYTICAL OUTPUTS

If your analyses generate additional outputs (documents, images, etc.), you should organize and save them as if they were any other research product.

Requirements and how to meet them

Your research group, field of research, or institution may have a set of standards or best practices related to how data should be analyzed and how analytical outputs should be managed. These may be as simple as a set of guidelines about how procedures, parameters, or protocols should be documented.

If you are unsure about the specific requirements that apply to you, try to think about documenting your analyses and managing your outputs as “showing your work.”

Things to think about

  • Properly documenting and managing your analyses is important for reasons related to research transparency and reproducibility. However, they will also help prevent you from wasting time and losing data.
  • While many best practice recommendations apply mostly to analyses underlying a scholarly publication, you should apply the same procedures to all of the analyses you conduct, no matter the outcome.

DOWNLOAD PDF VERSIONS

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

6 How to Analyze Data in a Primary Research Study

Melody Denny and Lindsay Clark

This chapter introduces students to the idea of working with primary research data grounded in qualitative inquiry, closed-and open-ended methods, and research ethics (Driscoll; Mackey and Gass; Morse; Scott and Garner). [1] We know this can seem intimidating to students, so we will walk them through the process of analyzing primary research, using information from public datasets including the Pew Research Center. Using sample data on teen social media use, we share our processes for analyzing sample data to demonstrate different approaches for analyzing primary research data (Charmaz; Creswell; Merriam and Tisdale; Saldaña). We also include links to additional public data sets, chapter discussion prompts, and sample activities for students to apply these strategies.

At this point in your education, you are familiar with what is known as secondary research or what many students think of as library research. Secondary research makes use of sources most often found in the library or, these days, online (books, journal articles, magazines, and many others). There’s another kind of research that you may or may not be familiar with: primary research. The Purdue OWL defines primary research as “any type of research you collect yourself” and lists examples as interviews, observations, and surveys (“What is Primary Research”).

Primary research is typically divided into two main types—quantitative and qualitative research. These two methods (or a mix of these) are used by many fields of study, so providing a singular definition for these is a bit tricky. Sheard explains that “quantitative research…deals with data that are numerical or that can be converted into numbers. The basic methods used to investigate numerical data are called ‘statistics’” (429). Guest, et al. explain that qualitative research is “information that is difficult to obtain through more quantitatively-oriented methods of data collection” and is used more “to answer the whys and hows of human behavior, opinion, and experience” (1).

This chapter focuses on qualitative methods that explore peoples’ behaviors, interpretations, and opinions. Rather than being only a reader and reporter of research, primary research allows you to be creators of research. Primary research provides opportunities to collect information based on your specific research questions and generate new knowledge from those questions to share with others. Generally, primary research tends to follow these steps:

  • Develop a research question. Secondary research often uses this as a starting point as well. With primary research, however, rather than using library research to answer your research question, you’ll also collect data yourself to answer the question you developed. Data, in this case, is the information you collect yourself through methods such as interviews, surveys, and observations.
  • Decide on a research method. According to Scott and Garner, “A research method is a recognized way of collecting or producing [primary data], such as a survey, interview, or content analysis of documents” (8). In other words, the method is how you obtain the data.
  • Collect data. Merriam and Tisdale clarify what it means to collect data: “data collection is about asking, watching, and reviewing” (105-106). Primary research might include asking questions via surveys or interviews, watching or observing interactions or events, and examining documents or other texts.
  • Analyze data. Once data is collected, it must then be analyzed. “Data analysis is the process of making sense out of the data… Basically, data analysis is the process used to answer your research question(s)” (Merriam and Tisdale 202). It’s worth noting that many researchers collect data and analyze at the same time, so while these may seem like different steps in the process, they actually overlap.
  • Report findings. Once the researcher has spent time understanding and interpreting the data, they are then ready to write about their research, often called “findings.” You may also see this referred to as “results.”

While the entire research process is discussed, this chapter focuses on the analysis stage of the process (step 4). Depending on where you are in the research process, you may need to spend more time on step 1, 2, or 3 and review Driscoll’s “Introduction to Primary Research” (Volume 2 of Writing Spaces ).

Primary research can seem daunting, and some students might think that they can’t do primary research, that this type of research is for professionals and scholars, but that’s simply not true. It’s true that primary research data can be difficult to collect and even more difficult to analyze, but the findings are typically very revealing. This chapter and the examples included break down this research process and demonstrate how general curiosity can lead to exciting chances to learn and share information that is relevant and interesting. The goal of this chapter is to provide you with some information about data analysis and walk you through some activities to prepare you for your own data analysis. The next section discusses analyzing data from closed-ended methods and open-ended methods.

Data from Primary Research

As stated above, this chapter doesn’t focus on methods, but before moving on to analysis, it’s important to clarify a few things related to methods as they are directly connected to analyzing data. As a quick reminder, a research method is how researchers collect their data such as surveys, interviews, or textual analysis. No matter which method used, researchers need to think about the types of questions to ask for answering their overall research question. Generally, there are two types of questions to consider: closed-ended and open-ended. The next section provides examples of the data you might receive from asking closed-ended and open-ended questions and options for analyzing and presenting that data.

Data from Closed-Ended Methods

The data that is generated by closed-ended questions on methods such as surveys and polls is often easier to organize. Because the way respondents could answer those questions is limited to specific answers (Yes/No, numbered scales, multiple choice), the data can be analyzed by each question or by looking at the responses individually or as a whole. Though there are several approaches to analyzing the data that comes from closed-ended questions, this section will introduce you to a few different ways to make sense of this kind of data.

Closed-ended questions are those that have limited answers, like multiple choice or check-all-that-apply questions. These questions mean that respondents can provide only the answers given or they may select an “other” option. An example of a closed-ended question could be “Do you use YouTube? Yes, No, Sometimes.” Closed-ended questions have their perks because they (mostly) keep participants from misinterpreting the question or providing unhelpful responses. They also make data analysis a bit easier.

If you were to ask the “Yes, No, Sometimes” question about YouTube to 20 of your closest friends, you may get responses like Yes = 18, No = 1, and Sometimes = 1. But, if you were to ask a more detailed question like “Which of the following social media platforms do you use?” and provide respondents with a check-all-that-apply option, like “Facebook, YouTube, Twitter, Instagram, Snapchat, Reddit, and Tumblr,” you would get a very different set of data. This data might look like Facebook = 17, YouTube = 18, Twitter = 12, Instagram = 20, Snapchat = 15, Reddit = 8, and Tumblr = 3. The big takeaway here is that how you ask the question determines the type of data you collect.

Analyzing Closed-Ended Data

Now that you have data, it’s time to think about analyzing and presenting that data. Luckily, the Pew Research Center conducted a similar study that can be used as an example. The Pew Research Center is a “nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research” (“About Pew Research Center”). The information provided below comes from their public dataset “Teens, Social Media, and Technology, 2018” (Anderson and Jiang). This example is used to show how you might analyze this type of data once collected and what that data might look like. “Teens, Social Media, and Technology 2018” reported responses to questions related to which online platforms teens use and which they use most often. In figure 1 below, Pew researchers show the final product of their analysis of the data:

Social Media Usage Statistics

Pew analyzed their data and organized the findings by percentages to show what they discovered. They had 743 teens who responded to these questions, so presenting their findings in percentages helps readers better “see” the data overall (rather than saying YouTube = 631 and Instagram = 535). However, results can be represented in different ways. When the Pew researchers were deciding how to present their data, they could have reported the frequency, or the number of people who said they used YouTube, Instagram, and Snapchat.

In the scenario of polling 20 of your closest friends, you, too, would need to decide how to present your data: Facebook = 17, YouTube = 18, Twitter = 12, Instagram = 20, Snapchat = 15, Reddit = 8, and Tumblr = 3. In your case, you might want to present the frequency (number) of responses rather than the percentages of responses like Pew did. You could choose a bar graph like Pew or maybe a simple table to show your data.

Looking again at the Pew data, researchers could use this data to generate further insights or questions about user preferences. For example, one could highlight the fact that 85% of respondents reported using YouTube the most, while only 7% reported using Reddit. Why is that? What conclusions might you be able to make based on these data? Does the data make you wonder if any additional questions might be explored? If you want to learn more about your respondents’ opinions or preference, you might need to ask open-ended questions.

Data from Open-Ended Methods

Whereas closed-ended questions limit how respondents might answer, open-ended questions do not limit respondents’ answers and allow them to answer more freely. An example of an open-ended question, to build off the question above, could be “Why do you use social media? Explain.” This type of question gives respondents more space to fully explain their responses. Open-ended questions can make the data varied because each respondent may answer differently. These questions, which can provide fruitful responses, can also mean unexpected responses or responses that don’t help to answer the overall research question, which can sometimes make data analysis challenging.

In that same Pew Research Center data, respondents were likely limited in how they were able to answer by selecting social media platforms from a list. Pew also shares selected data (Appendix A), and based on these data, it can be assumed they also asked open-ended questions, something about the positive or negative effects of social media platforms. Because their research method included both closed-ended questions about which platforms teens use as well as open-ended questions that invited their thoughts about social media, Pew researchers were able to learn more about these participants’ thoughts and perceptions. To give us, the readers, a clearer idea of how they justified their presentation of the data, Pew offers 15 sample excerpts from those open-ended questions. They explain that these excerpts are what the researchers believe are representative of the larger data set. We explain below how we might analyze those excerpts.

Analyzing Open-Ended Data

As Driscoll reminds us, ethical considerations impact all stages of the research process, and researchers should act ethically throughout the entire research process. You already know a little something about research ethics. For example, you know that ethical writers cite sources used in research papers by giving credit to the person who created that information. When creating primary sources, you have a few different ethical considerations for analyzing data, which will be discussed below.

To demonstrate how to analyze data from open-ended methods, we explain how we (Melody and Lindsay) analyzed the 15 excerpts from the Pew data using open coding. Open coding means analyzing the data without any predetermined categories or themes; researchers are just seeing what emerges or seems significant (Charmaz). Creswell suggests four specific steps when coding qualitative data, though he also stresses that these steps are iterative, meaning that researchers may need to revisit a step anywhere throughout the process. We use these four steps to explain our analysis process, including how we ethically coded the data, interpreted what the coding process revealed, and worked together to identify and explain categories we saw in the data.

Step 1: Organizing and Preparing the Data

The first part of the analysis stage is organizing the data before examining it. When organizing data, researchers must be careful to work with primary data ethically because that data often represents actual peoples’ information and opinions. Therefore, researchers need to carefully organize the data in such a way as to not identify their participants or reveal who they are. This is a key component to The Belmont Report , guidelines published in 1979 meant to guide researchers and help protect participants. Using pseudonyms or assigning numbers or codes (in place of names) to the data is a recommended ethical step to maintain participants’ confidentiality in a study. Anonymizing data, or removing names, has the additional effect of eliminating researcher bias, which can occur when researchers are so familiar with their own data and participants that the researchers may begin to think they already know the answers or see connections prior to analysis (Driscoll). By assigning pseudonyms, researchers can also ensure that they take an objective look at each participant’s answers without being persuaded by participant identity.

The first part of coding is to make notations while reading through the data (Merriam and Tisdale). At this point, researchers are open to many possibilities regarding their data. This is also where researchers begin to construct categories. Offering a simple example to illustrate this decision-making process, Merriam and Tisdale ask us to imagine sorting and categorizing two hundred grocery store items (204). Some items could be sorted into more than one category; for example, ice cream could be categorized as “frozen” or as “dessert.” How you decide to sort that item depends on your research question and what you want to learn.

For this step, we, Melody and Lindsay, each created a separate document that included the 15 excerpts. Melody created a table for the quotes, leaving a column for her coding notes, and Lindsay added spaces between the excerpts for her notes. For our practice analysis, we analyzed the data independently, and then shared what we did to compare, verify, and refine our analysis. This brings a second, objective view to the analysis, reduces the effect of researcher bias, and ensures that your analysis can be verified and supported by the data. To support your analysis, you need to demonstrate how you developed the opinions and conclusions you have about your data. After all, when researchers share their analyses, readers often won’t see all of the raw data, so they need to be able to trust the analysis process.

Step 2: Reading through All the Data

Creswell suggests getting a general sense of the data to understand its overall meaning. As you start reading through your data, you might begin to recognize trends, patterns, or recurring features that give you ideas about how to both analyze and later present the data. When we read through the interview excerpts of these 15 participants’ opinions of social media, we both realized that there were two major types of comments: positive and negative. This might be similar to categorizing the items in the grocery store (mentioned above) into fresh/frozen foods and non-perishable items.

To better organize the data for further analysis, Melody marked each positive comment with a plus sign and each negative comment with a minus sign. Lindsay color-coded the comments (red for negative, indicated by boldface type below; green for positive, indicated by grey type below) and then organized them on the page by type. This approach is in line with Merriam and Tisdale’s explanation of coding: “assigning some sort of shorthand designation to various aspects of your data so that you can easily retrieve specific pieces of the data. The designations can be single words, letters, numbers, phrases, colors, or combinations of these” (199). While we took different approaches, as shown the two sections below, both allowed us to visually recognize the major sections of the data:

Lindsay’s Coding Round 1, which shows her color coding indicated by boldface type

“[Social media] allows us to communicate freely and see what everyone else is doing. [It] gives us a voice that can reach many people.” (Boy, age 15) “It makes it harder for people to socialize in real life, because they become accustomed to not interacting with people in person.” (Girl, age 15) “[Teens] would rather go scrolling on their phones instead of doing their homework, and it’s so easy to do so. It’s just a huge distraction.” (Boy, age 17) “It enables people to connect with friends easily and be able to make new friends as well.” (Boy, age 15) “I think social media have a positive effect because it lets you talk to family members far away.” (Girl, age 14) “Because teens are killing people all because of the things they see on social media or because of the things that happened on social media.” (Girl, age 14) “We can connect easier with people from different places and we are more likely to ask for help through social media which can save people.” (Girl, age 15)

Melody’s Coding Round 1, showing her use of plus and minus signs to classify the comments as positive or negative, respectively

+ “[Social media] allows us to communicate freely and see what everyone else is doing. [It] gives us a voice that can reach many people.” (Boy, age 15) – “It makes it harder for people to socialize in real life, because they become accustomed to not interacting with people in person.” (Girl, age 15) – “[Teens] would rather go scrolling on their phones instead of doing their homework, and it’s so easy to do so. It’s just a huge distraction.” (Boy, age 17) + “It enables people to connect with friends easily and be able to make new friends as well.” (Boy, age 15) + “I think social media have a positive effect because it lets you talk to family members far away.” (Girl, age 14) – “Because teens are killing people all because of the things they see on social media or because of the things that happened on social media.” (Girl, age 14) + “We can connect easier with people from different places and we are more likely to ask for help through social media which can save people.” (Girl, age 15)

Step 3: Doing Detailed Coding Analysis of the Data

It’s important to mention that Creswell dedicates pages of description on coding data because there are various ways of approaching detailed analysis. To code our data, we added a descriptive word or phrase that “symbolically assigns a summative, salient, essence-capturing, and/or evocative attribute” to a portion of data (Saldaña 3). From the grocery store example above, that could mean looking at the category of frozen foods and dividing them into entrees, side dishes, desserts, appetizers, etc. We both coded for topics or what the teens were generally talking about in their responses. For example, one excerpt reads “Social media allows us to communicate freely and see what everyone else is doing. It gives us a voice that can reach many people.” To code that piece of data, researchers might assign words like communication, voice, or connection to explain what the data is describing.

In this way, we created the codes from what the data said, describing what we read in those excerpts. Notice in the section below that, even though we coded independently, we described these pieces of data in similar ways using bolded keywords:

Melody’s Coding Round 2, with key words added to summarize the meanings of the different quotes

– “Gives people a bigger audience to speak and teach hate and belittle each other.” (Boy, age 13) bullying – “It provides a fake image of someone’s life. It sometimes makes me feel that their life is perfect when it is not.” (Girl, age 15) fake + “Because a lot of things created or made can spread joy.” (Boy, age 17) reaching people + “I feel that social media can make people my age feel less lonely or alone. It creates a space where you can interact with people.” (Girl, age 15) connection + “[Social media] allows us to communicate freely and see what everyone else is doing. [It] gives us a voice that can reach many people.” (Boy, age 15) reaching people

Lindsay’s Coding Round 2, with key words added in capital letters to summarize the meanings of the quotations

“Gives people a bigger audience to speak and teach hate and belittle each other.” (Boy, age 13) OPPORTUNITIES TO COMMUNICATE NEGATIVELY/MORE EASILY “It provides a fake image of someone’s life. It sometimes makes me feel that their life is perfect when it is not.” (Girl, age 15) FAKE, NOT REALITY “Because a lot of things created or made can spread joy.” (Boy, age 17) SPREAD JOY “I feel that social media can make people my age feel less lonely or alone. It creates a space where you can interact with people.” (Girl, age 15) INTERACTION, LESS LONELY “[Social media] allows us to communicate freely and see what everyone else is doing. [It] gives us a voice that can reach many people.” (Boy, age 15) COMMUNICATE, VOICE

Though there are methods that allow for researchers to use predetermined codes (like from previous studies), “the traditional approach…is to allow the codes to emerge during the data analysis” (Creswell 187).

Step 4: Using the Codes to Create a Description Using Categories, Themes, Settings, or People

Our individual coding happened in phases, as we developed keywords and descriptions that could then be defined and relabeled into concise coding categories (Saldaña 11). We shared our work from Steps 1-3 to further define categories and determine which themes were most prominent in the data. A few times, we interpreted something differently and had to discuss and come to an agreement about which category was best.

In our process, one excerpt comment was interpreted as negative by one of us and positive by the other. Together we discussed and confirmed which comments were positive or negative and identified themes that seemed to appear more than once, such as positive feelings towards the interactional element of social media use and the negative impact of social media use on social skills. When two coders compare their results, this allows for qualitative validity, which means “the researcher checks for the accuracy of the findings” (Creswell 190). This could also be referred to as intercoder reliability (Lavrakas). For intercoder reliability, researchers sometimes calculate how often they agree in a percentage. Like many other aspects of primary research, there is no consensus on how best to establish or calculate intercoder reliability, but generally speaking, it’s a good idea to have someone else check your work and ensure you are ethically analyzing and reporting your data.

Interpreting Coded Data

Once we agreed on the common categories and themes in this dataset, we worked together on the final analysis phase of interpreting the data, asking “what does it mean?” Data interpretation includes “trying to give sense to the data by creatively producing insights about it” (Gibson and Brown 6). Though we acknowledge that this sample of only 15 excerpts is small, and it might be difficult to make claims about teens and social media from just this data, we can share a few insights we had as part of this practice activity.

Overall, we could report the frequency counts and percentages that came from our analysis. For example, we counted 8 positive comments and 7 negative comments about social media. Presented differently, those 8 positive comments represent 53% of the responses, so slightly over half. If we focus on just the positive comments, we are able to identify two common themes among those 8 responses: Interaction and Expression. People who felt positively about social media use identified the ability to connect with people and voice their feelings and opinions as the main reasons. When analyzing only the 7 negative responses, we identified themes of Bullying and Social Skills as recurring reasons people are critical of social media use among teens. Identifying these topics and themes in the data allows us to begin thinking about what we can learn and share with others about this data.

How we represent what we have learned from our data can demonstrate our ethical approach to data analysis. In short, we only want to make claims we can support, and we want to make those claims ethically, being careful to not exaggerate or be misleading.

To better understand a few common ethical dilemmas regarding the presentation of data, think about this example: A few years ago, Lindsay taught a class that had only four students. On her course evaluations, those four students rated the class experience as “Excellent.” If she reports that 100% of her students answered “Excellent,” is she being truthful? Yes. Do you see any potential ethical considerations here? If she said that 4/4 gave that rating, does that change how her data might be perceived by others? While Lindsay could show the raw data to support her claims, important contextual information could be missing if she just says 100%. Perhaps others would assume this was a regular class of 20-30 students, which would make that claim seem more meaningful and impressive than it might be.

Another word for this is cherry picking. Cherry picking refers to making conclusions based on thin (or not enough) data or focusing on data that’s not necessarily representative of the larger dataset (Morse). For example, if Lindsay reported the comment that one of her students made about this being the “best class ever,” she would be telling the truth but really only focusing on the reported opinion of 25% of the class (1 out of 4). Ideally, researchers want to make claims about the data based on ideas that are prominent, trending, or repeated. Less prominent pieces of data, like the opinion of that one student, are known as outliers, or data that seem to “be atypical of the rest of the dataset” (Mackey and Gass 257). Focusing on those less-representative portions might misrepresent or overshadow the aspects of the data that are prominent or meaningful, which could create ethical problems for your study. With these ethical considerations in mind, the last step of conducting primary research would be to write about the analysis and interpretation to share your process with others.

This chapter has introduced you to ethically analyzing data within the primary research tradition by focusing on close-ended and open-ended data. We’ve provided you with examples of how data might be analyzed, interpreted, and presented to help you understand the process of making sense of your data. This is just one way to approach data analysis, but no matter your research method, having a systematic approach is recommended. Data analysis is a key component in the overall primary research process, and we hope that you are now excited and curious to participate in a primary research project.

Works Cited

“About Pew Research Center.” Pew Research Center, 2020. www.pewresearch.org/about/ . Accessed 28 Dec 2020. Anderson, Monica, and Jingjing Jiang.

“Teens, Social Media & Technology 2018.” Pew Research Center, May 2018, www.pewresearch.org/internet/2018/05/31/teens-social-media-technology-2018/ .

The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research, Office for Human Research Protections, www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html . 18 Apr. 1979.

Charmaz, Kathy. “Grounded Theory.” Approaches to Qualitative Research: A Reader on Theory and Practice , edited by Sharlene Nagy Hesse-Biber and Patricia Leavy, Oxford UP, 2004, pp. 496-521.

Corpus of Contemporary American English (COCA) . (n.d.). Retrieved April 11, 2021, from https://www.english-corpora.org/coca/

Creswell, John W. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches , 3rd edition, Sage, 2009.

Data.gov . (2020). Retrieved April 11, 2021, from https://www.data.gov/

Driscoll, Dana Lynn. “Introduction to Primary Research: Observations, Surveys, and Interviews.” Writing Spaces: Readings on Writing , Volume 2, Parlor Press, 2011, pp. 153-174.

Explore Census Data . (n.d.). United States Census Bureau. Retrieved April 11, 2021, from https://data.census.gov/cedsci/

Gibson, William J., and Andrew Brown. Working with Qualitative Data . London, Sage, 2009.

Google Trends. (n.d.). Retrieved April 11, 2021, from https://trends.google.com/trends/explore

Guest, Greg, et al. Collecting Qualitative Data: A Field Manual for Applied Research . Sage, 2013.

HealthData.gov . (n.d.). Retrieved April 11, 2021, from https://healthdata.gov/

Lavrakas, Paul J. Encyclopedia of Survey Research Methods . Sage, 2008.

Mackey, Allison, and Sue M. Gass. Second Language Research: Methodology and Design . Lawrence Erlbaum Associates, 2005.

Merriam, Sharan B., and Elizabeth J. Tisdell. Qualitative Research: A Guide to Design and Implementation , John Wiley & Sons, Incorporated, 2015. ProQuest Ebook Central, https://ebookcentral.proquest.com/lib/unco/detail.action?docID=2089475 .

Michigan Corpus of Academic Spoken English. (n.d.). Retrieved April 11, 2021, from https://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase;page=simple

Morse, Janice. M. “‘Cherry Picking’: Writing from Thin Data.” Qualitative Health Research , vol. 20, no. 1, 2009, p. 3.

Pew Research Center . (2021). Retrieved April 11, 2021, from https://www.pewresearch.org/

Saldaña, Johnny. The Coding Manual for Qualitative Researchers , 2nd edition, Sage, 2013.

Scott, Greg, and Roberta Garner. Doing Qualitative Research: Designs, Methods, and Techniques , 1st edition, Pearson, 2012.

Sheard, Judithe. “Quantitative Data Analysis.” Research Methods Information, Systems, and Contexts , edited by Kirsty Williamson and Graeme Johanson, Elsevier, 2018, pp. 429-452.

Teens and Social Media , Google Trends, trends.google.com/trends/explore?-date=all&q=teens%20and%20social%20media . Accessed 15 Jul. 2020.

“What is Primary Research and How Do I Get Started?” The Writing Lab and OWL at Purdue and Purdue U , 2020. owl.purdue.edu/owl . Accessed 21 Dec. 2020.

Zhao, Alice. “How Text Messages Change from Dating to Marriage.” Huffington Post , 21 Oct. 2014, www.huffpost.com .

“My mom had to get a ride to the library to get what I have in my hand all the time. She reminds me of that a lot.” (Girl, age 14)

“Gives people a bigger audience to speak and teach hate and belittle each other.” (Boy, age 13)

“It provides a fake image of someone’s life. It sometimes makes me feel that their life is perfect when it is not.” (Girl, age 15)

“Because a lot of things created or made can spread joy.” (Boy, age 17)

“I feel that social media can make people my age feel less lonely or alone. It creates a space where you can interact with people.” (Girl, age 15)

“[Social media] allows us to communicate freely and see what everyone else is doing. [It] gives us a voice that can reach many people.” (Boy, age 15)

“It makes it harder for people to socialize in real life, because they become accustomed to not interacting with people in person.” (Girl, age 15)

“[Teens] would rather go scrolling on their phones instead of doing their homework, and it’s so easy to do so. It’s just a huge distraction.” (Boy, age 17)

“It enables people to connect with friends easily and be able to make new friends as well.” (Boy, age 15)

“I think social media have a positive effect because it lets you talk to family members far away.” (Girl, age 14)

“Because teens are killing people all because of the things they see on social media or because of the things that happened on social media.” (Girl, age 14)

“We can connect easier with people from different places and we are more likely to ask for help through social media which can save people.” (Girl, age 15)

“It has given many kids my age an outlet to express their opinions and emotions, and connect with people who feel the same way.” (Girl, age 15)

“People can say whatever they want with anonymity and I think that has a negative impact.” (Boy, age 15)

“It has a negative impact on social (in-person) interactions.” (Boy, age 17)

Teacher Resources for How to Analyze Data in a Primary Research Study

Overview and teaching strategies.

This chapter is intended as an overview of analyzing qualitative research data and was written as a follow-up piece to Dana Lynn Driscoll’s “Introduction to Primary Research: Observations, Surveys, and Interviews” in Volume 2 of this collection. This chapter could work well for leading students through their own data analysis of a primary research project or for introducing students to the idea of primary research by using outside data sources, those in the chapter and provided in the activities below, or data you have access to.

From our experiences, students usually have limited experience with primary research methods outside of conducting a small survey for other courses, like sociology. We have found that few of our students have been formally introduced to primary research and analysis. Therefore, this chapter strives to briefly introduce students to primary research while focusing on analysis. We’ve presented analysis by categorizing data as open-ended and closed-ended without getting into too many details about qualitative versus quantitative. Our students tend to produce data collection tools with a mix of these types of questions, so we feel it’s important to cover the analysis of both.

In this chapter, we bring students real examples of primary data and lead them through analysis by showing examples. Any of these exercises and the activities below may be easily supplemented with additional outside data. One way that teachers can bring in outside data is through the use of public datasets.

Public Data Sets

There are many public data sets that teachers can use to acquaint their students with analyzing data. Be aware that some of these datasets are for experienced researchers and provide the data in CSV files or include metadata, all of which is probably too advanced for most of our students. But if you are comfortable converting this data, it could be valuable for a data analysis activity.

  • In the chapter, we pulled from Pew Research, and their website contains many free and downloadable data sets (Pew Research Center).
  • The site Data.gov provides searchable datasets, but you can also explore their data by clicking on “data” and seeing what kinds of reports they offer.
  • The U.S. Census Bureau offers some datasets as well (Explore Census Data): Much of this data is presented in reports, but teachers could pull information from reports and have students analyze the data and compare their results to those in the report, much like we did with the Pew Research data in the chapter.
  • Similarly, HealthData.gov offers research-based reports packed with data for students to analyze.
  • In one of the activities below, we used Google Trends to look at searches over a period of time. There are some interesting data and visuals provided on the homepage to help students get started.
  • If you’re looking for something a bit more academic, the Michigan Corpus of Academic Spoken English is a great database of transcripts from academic interactions and situations.
  • Similarly, the Corpus of Contemporary American English allows users to search for words or word strings to see their frequency and in which genre and when these occur.

Before moving on to student activities, we’d like to offer one additional suggestion for teachers to consider.

Class Google Form

One thing that Melody does at the beginning of almost all of her research-based writing courses is ask students to complete a Google Form at the beginning of the semester. Sometimes, these forms are about their experiences with research. Other times, they revolve around a class topic (recently, she’s been interested in Generation Z or iGeneration and has asked students questions related to that). Then, when it’s time to start thinking about primary research, she uses that Google Form to help students understand more about the primary research process. Here are some ways that teachers can employ the data gathered from Google Form given to students.

  • Ask students to look at the questions asked on the survey and deduce the overall research question.
  • • Ask students to look at the types of questions asked (open- and closed-ended) and consider why they were constructed that way.
  • Ask students to evaluate the wording of the questions asked.
  • Ask students to examine the results of a few (or more) or the questions on the survey. This can be done in groups with each group looking at 1-3 questions, depending on the size of your Google Form.
  • Ask students to think about how they might present that data in visual form. Yes, Google provides some visuals, but you can give them the raw data and see what they come up with.
  • Ask students to come up with 1-3 major takeaways based on all the data.

This exercise allows students to work with real data and data that’s directly related to them and their classmates. It’s also completely within ethical boundaries because it’s data collected in the classroom, for educational purposes, and it stays within the classroom.

Below we offer some guiding questions to help move students through the chapter and the activities as well as some additional activities.

Discussion Questions

  • In the opening of this chapter, we introduced you to primary research , or “any type of research you collect yourself” (“What is Primary Research”). Have you completed primary research before? How did you decide on your research method, based on your research question? If you have not worked on primary research before, brainstorm a potential research question for a topic you want to know more about. Discuss what research method you might use, including closed- or open-ended methods and why.
  • Looking at the chart from the Pew Research dataset, “Teens, Social Media, and Technology 2018,” would you agree that the distributions among online platforms remain similar, or have trends changed?
  • What do you make of the “none of the above” category on the Pew table? Do you think teens are using online platforms that aren’t listed, or do you think those respondents don’t use any online platforms?

google trends for "social media"

  • When analyzing data from open-ended questions, which step seems most challenging to you? Explain.

Activity #1: TurnItIn and Infographics

Infographics can be a great way to help you see and understand data, while also giving you a way to think about presenting your own data. Multiple infographics are available on TurnItIn, downloadable for free, that provide information about plagiarism.

Figure 3, titled “The Plagiarism Spectrum,” provides you with the “severity” and “frequency” based on survey findings of nearly 900 high school and college instructors from around the world. TurnItIn encourages educators to print this infographic and hang in their classroom:

plagiarism spectrum

This infographic provides some great data analysis examples: specific categories with definitions (and visual representation of their categories), frequency counts with bar graphs, and color gradient bars to show higher vs. lower numbers.

  • Write a summary of how this infographic presents data.
  • How do you think they analyzed the data based on this visual?

Activity #2: How Text Messages Change from Dating to Marriage

In Alice Zhao’s Huffington Post piece, she analyzes text messages that she collected during her relationship with her boyfriend, turned fiancé, turned husband to answer the question of how text messages (or communication) change over the course of a relationship. While Zhao offers some insight into her data, she also provides readers with some really cool graphics that you can use to practice your analysis skills.

These first graphics are word clouds. In figure 4, Zhao put her textual data into a program that creates these images based on the most frequently occurring words. Word clouds are another option for analyzing your data. If you have a lot of textual data and want to know what participants said the most, placing your data into a word cloud program is an easy way to “see” the data in a new way. This is usually one of the first steps of analysis, and additional analysis is almost always needed.

Zhao’s Word Cloud Sampling

  • What do you notice about the texts from 2008 to 2014?
  • What do you notice between her texts (me) and his texts (him)?

Zhao also provided this graphic (figure 5), a comparative look at what she saw as the most frequently occurring words from the word clouds. This could be another step in your data analysis procedure: zooming in on a few key aspects and digging a bit deeper.

Zhao’s Bar Graph

  • What do you make of this data? Why might the word “hey” occur more frequently in the dating time frame and the word “ok” occur more frequently in the married time frame?

As part of her research, Zhao also looked at the time of day text messages were sent, shown below in figure 6:

Zhao’s Plot Graph of Time of Day

Here, Zhao looked at messages sent a month after their first date, a month after their engagement, and a month after their wedding.

  • She offers her own interpretation in her piece in figure 6, but what do you think of this?
  • Also make note of this graphic. It’s a great way to look at the data another way. If your data may be time sensitive, this type of graphic may help you better analyze and understand your data.
  • This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0) and is subject to the Writing Spaces Terms of Use. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ , email [email protected] , or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. To view the Writing Spaces Terms of Use, visit http://writingspaces.org/terms-of-use . ↵

How to Analyze Data in a Primary Research Study Copyright © 2021 by Melody Denny and Lindsay Clark is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License , except where otherwise noted.

Share This Book

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Data Descriptor
  • Open access
  • Published: 03 May 2024

A dataset for measuring the impact of research data and their curation

  • Libby Hemphill   ORCID: orcid.org/0000-0002-3793-7281 1 , 2 ,
  • Andrea Thomer 3 ,
  • Sara Lafia 1 ,
  • Lizhou Fan 2 ,
  • David Bleckley   ORCID: orcid.org/0000-0001-7715-4348 1 &
  • Elizabeth Moss 1  

Scientific Data volume  11 , Article number:  442 ( 2024 ) Cite this article

576 Accesses

8 Altmetric

Metrics details

  • Research data
  • Social sciences

Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.

Similar content being viewed by others

analyzing data in research

SciSciNet: A large-scale open data lake for the science of science research

analyzing data in research

Data, measurement and empirical methods in the science of science

analyzing data in research

Interdisciplinarity revisited: evidence for research impact and dynamism

Background & summary.

Recent policy changes in funding agencies and academic journals have increased data sharing among researchers and between researchers and the public. Data sharing advances science and provides the transparency necessary for evaluating, replicating, and verifying results. However, many data-sharing policies do not explain what constitutes an appropriate dataset for archiving or how to determine the value of datasets to secondary users 1 , 2 , 3 . Questions about how to allocate data-sharing resources efficiently and responsibly have gone unanswered 4 , 5 , 6 . For instance, data-sharing policies recognize that not all data should be curated and preserved, but they do not articulate metrics or guidelines for determining what data are most worthy of investment.

Despite the potential for innovation and advancement that data sharing holds, the best strategies to prioritize datasets for preparation and archiving are often unclear. Some datasets are likely to have more downstream potential than others, and data curation policies and workflows should prioritize high-value data instead of being one-size-fits-all. Though prior research in library and information science has shown that the “analytic potential” of a dataset is key to its reuse value 7 , work is needed to implement conceptual data reuse frameworks 8 , 9 , 10 , 11 , 12 , 13 , 14 . In addition, publishers and data archives need guidance to develop metrics and evaluation strategies to assess the impact of datasets.

Several existing resources have been compiled to study the relationship between the reuse of scholarly products, such as datasets (Table  1 ); however, none of these resources include explicit information on how curation processes are applied to data to increase their value, maximize their accessibility, and ensure their long-term preservation. The CCex (Curation Costs Exchange) provides models of curation services along with cost-related datasets shared by contributors but does not make explicit connections between them or include reuse information 15 . Analyses on platforms such as DataCite 16 have focused on metadata completeness and record usage, but have not included related curation-level information. Analyses of GenBank 17 and FigShare 18 , 19 citation networks do not include curation information. Related studies of Github repository reuse 20 and Softcite software citation 21 reveal significant factors that impact the reuse of secondary research products but do not focus on research data. RD-Switchboard 22 and DSKG 23 are scholarly knowledge graphs linking research data to articles, patents, and grants, but largely omit social science research data and do not include curation-level factors. To our knowledge, other studies of curation work in organizations similar to ICPSR – such as GESIS 24 , Dataverse 25 , and DANS 26 – have not made their underlying data available for analysis.

This paper describes a dataset 27 compiled for the MICA project (Measuring the Impact of Curation Actions) led by investigators at ICPSR, a large social science data archive at the University of Michigan. The dataset was originally developed to study the impacts of data curation and archiving on data reuse. The MICA dataset has supported several previous publications investigating the intensity of data curation actions 28 , the relationship between data curation actions and data reuse 29 , and the structures of research communities in a data citation network 30 . Collectively, these studies help explain the return on various types of curatorial investments. The dataset that we introduce in this paper, which we refer to as the MICA dataset, has the potential to address research questions in the areas of science (e.g., knowledge production), library and information science (e.g., scholarly communication), and data archiving (e.g., reproducible workflows).

We constructed the MICA dataset 27 using records available at ICPSR, a large social science data archive at the University of Michigan. Data set creation involved: collecting and enriching metadata for articles indexed in the ICPSR Bibliography of Data-related Literature against the Dimensions AI bibliometric database; gathering usage statistics for studies from ICPSR’s administrative database; processing data curation work logs from ICPSR’s project tracking platform, Jira; and linking data in social science studies and series to citing analysis papers (Fig.  1 ).

figure 1

Steps to prepare MICA dataset for analysis - external sources are red, primary internal sources are blue, and internal linked sources are green.

Enrich paper metadata

The ICPSR Bibliography of Data-related Literature is a growing database of literature in which data from ICPSR studies have been used. Its creation was funded by the National Science Foundation (Award 9977984), and for the past 20 years it has been supported by ICPSR membership and multiple US federally-funded and foundation-funded topical archives at ICPSR. The Bibliography was originally launched in the year 2000 to aid in data discovery by providing a searchable database linking publications to the study data used in them. The Bibliography collects the universe of output based on the data shared in each study through, which is made available through each ICPSR study’s webpage. The Bibliography contains both peer-reviewed and grey literature, which provides evidence for measuring the impact of research data. For an item to be included in the ICPSR Bibliography, it must contain an analysis of data archived by ICPSR or contain a discussion or critique of the data collection process, study design, or methodology 31 . The Bibliography is manually curated by a team of librarians and information specialists at ICPSR who enter and validate entries. Some publications are supplied to the Bibliography by data depositors, and some citations are submitted to the Bibliography by authors who abide by ICPSR’s terms of use requiring them to submit citations to works in which they analyzed data retrieved from ICPSR. Most of the Bibliography is populated by Bibliography team members, who create custom queries for ICPSR studies performed across numerous sources, including Google Scholar, ProQuest, SSRN, and others. Each record in the Bibliography is one publication that has used one or more ICPSR studies. The version we used was captured on 2021-11-16 and included 94,755 publications.

To expand the coverage of the ICPSR Bibliography, we searched exhaustively for all ICPSR study names, unique numbers assigned to ICPSR studies, and DOIs 32 using a full-text index available through the Dimensions AI database 33 . We accessed Dimensions through a license agreement with the University of Michigan. ICPSR Bibliography librarians and information specialists manually reviewed and validated new entries that matched one or more search criteria. We then used Dimensions to gather enriched metadata and full-text links for items in the Bibliography with DOIs. We matched 43% of the items in the Bibliography to enriched Dimensions metadata including abstracts, field of research codes, concepts, and authors’ institutional information; we also obtained links to full text for 16% of Bibliography items. Based on licensing agreements, we included Dimensions identifiers and links to full text so that users with valid publisher and database access can construct an enriched publication dataset.

Gather study usage data

ICPSR maintains a relational administrative database, DBInfo, that organizes study-level metadata and information on data reuse across separate tables. Studies at ICPSR consist of one or more files collected at a single time or for a single purpose; studies in which the same variables are observed over time are grouped into series. Each study at ICPSR is assigned a DOI, and its metadata are stored in DBInfo. Study metadata follows the Data Documentation Initiative (DDI) Codebook 2.5 standard. DDI elements included in our dataset are title, ICPSR study identification number, DOI, authoring entities, description (abstract), funding agencies, subject terms assigned to the study during curation, and geographic coverage. We also created variables based on DDI elements: total variable count, the presence of survey question text in the metadata, the number of author entities, and whether an author entity was an institution. We gathered metadata for ICPSR’s 10,605 unrestricted public-use studies available as of 2021-11-16 ( https://www.icpsr.umich.edu/web/pages/membership/or/metadata/oai.html ).

To link study usage data with study-level metadata records, we joined study metadata from DBinfo on study usage information, which included total study downloads (data and documentation), individual data file downloads, and cumulative citations from the ICPSR Bibliography. We also gathered descriptive metadata for each study and its variables, which allowed us to summarize and append recoded fields onto the study-level metadata such as curation level, number and type of principle investigators, total variable count, and binary variables indicating whether the study data were made available for online analysis, whether survey question text was made searchable online, and whether the study variables were indexed for search. These characteristics describe aspects of the discoverability of the data to compare with other characteristics of the study. We used the study and series numbers included in the ICPSR Bibliography as unique identifiers to link papers to metadata and analyze the community structure of dataset co-citations in the ICPSR Bibliography 32 .

Process curation work logs

Researchers deposit data at ICPSR for curation and long-term preservation. Between 2016 and 2020, more than 3,000 research studies were deposited with ICPSR. Since 2017, ICPSR has organized curation work into a central unit that provides varied levels of curation that vary in the intensity and complexity of data enhancement that they provide. While the levels of curation are standardized as to effort (level one = less effort, level three = most effort), the specific curatorial actions undertaken for each dataset vary. The specific curation actions are captured in Jira, a work tracking program, which data curators at ICPSR use to collaborate and communicate their progress through tickets. We obtained access to a corpus of 669 completed Jira tickets corresponding to the curation of 566 unique studies between February 2017 and December 2019 28 .

To process the tickets, we focused only on their work log portions, which contained free text descriptions of work that data curators had performed on a deposited study, along with the curators’ identifiers, and timestamps. To protect the confidentiality of the data curators and the processing steps they performed, we collaborated with ICPSR’s curation unit to propose a classification scheme, which we used to train a Naive Bayes classifier and label curation actions in each work log sentence. The eight curation action labels we proposed 28 were: (1) initial review and planning, (2) data transformation, (3) metadata, (4) documentation, (5) quality checks, (6) communication, (7) other, and (8) non-curation work. We note that these categories of curation work are very specific to the curatorial processes and types of data stored at ICPSR, and may not match the curation activities at other repositories. After applying the classifier to the work log sentences, we obtained summary-level curation actions for a subset of all ICPSR studies (5%), along with the total number of hours spent on data curation for each study, and the proportion of time associated with each action during curation.

Data Records

The MICA dataset 27 connects records for each of ICPSR’s archived research studies to the research publications that use them and related curation activities available for a subset of studies (Fig.  2 ). Each of the three tables published in the dataset is available as a study archived at ICPSR. The data tables are distributed as statistical files available for use in SAS, SPSS, Stata, and R as well as delimited and ASCII text files. The dataset is organized around studies and papers as primary entities. The studies table lists ICPSR studies, their metadata attributes, and usage information; the papers table was constructed using the ICPSR Bibliography and Dimensions database; and the curation logs table summarizes the data curation steps performed on a subset of ICPSR studies.

Studies (“ICPSR_STUDIES”): 10,605 social science research datasets available through ICPSR up to 2021-11-16 with variables for ICPSR study number, digital object identifier, study name, series number, series title, authoring entities, full-text description, release date, funding agency, geographic coverage, subject terms, topical archive, curation level, single principal investigator (PI), institutional PI, the total number of PIs, total variables in data files, question text availability, study variable indexing, level of restriction, total unique users downloading study data files and codebooks, total unique users downloading data only, and total unique papers citing data through November 2021. Studies map to the papers and curation logs table through ICPSR study numbers as “STUDY”. However, not every study in this table will have records in the papers and curation logs tables.

Papers (“ICPSR_PAPERS”): 94,755 publications collected from 2000-08-11 to 2021-11-16 in the ICPSR Bibliography and enriched with metadata from the Dimensions database with variables for paper number, identifier, title, authors, publication venue, item type, publication date, input date, ICPSR series numbers used in the paper, ICPSR study numbers used in the paper, the Dimension identifier, and the Dimensions link to the publication’s full text. Papers map to the studies table through ICPSR study numbers in the “STUDY_NUMS” field. Each record represents a single publication, and because a researcher can use multiple datasets when creating a publication, each record may list multiple studies or series.

Curation logs (“ICPSR_CURATION_LOGS”): 649 curation logs for 563 ICPSR studies (although most studies in the subset had one curation log, some studies were associated with multiple logs, with a maximum of 10) curated between February 2017 and December 2019 with variables for study number, action labels assigned to work description sentences using a classifier trained on ICPSR curation logs, hours of work associated with a single log entry, and total hours of work logged for the curation ticket. Curation logs map to the study and paper tables through ICPSR study numbers as “STUDY”. Each record represents a single logged action, and future users may wish to aggregate actions to the study level before joining tables.

figure 2

Entity-relation diagram.

Technical Validation

We report on the reliability of the dataset’s metadata in the following subsections. To support future reuse of the dataset, curation services provided through ICPSR improved data quality by checking for missing values, adding variable labels, and creating a codebook.

All 10,605 studies available through ICPSR have a DOI and a full-text description summarizing what the study is about, the purpose of the study, the main topics covered, and the questions the PIs attempted to answer when they conducted the study. Personal names (i.e., principal investigators) and organizational names (i.e., funding agencies) are standardized against an authority list maintained by ICPSR; geographic names and subject terms are also standardized and hierarchically indexed in the ICPSR Thesaurus 34 . Many of ICPSR’s studies (63%) are in a series and are distributed through the ICPSR General Archive (56%), a non-topical archive that accepts any social or behavioral science data. While study data have been available through ICPSR since 1962, the earliest digital release date recorded for a study was 1984-03-18, when ICPSR’s database was first employed, and the most recent date is 2021-10-28 when the dataset was collected.

Curation level information was recorded starting in 2017 and is available for 1,125 studies (11%); approximately 80% of studies with assigned curation levels received curation services, equally distributed between Levels 1 (least intensive), 2 (moderately intensive), and 3 (most intensive) (Fig.  3 ). Detailed descriptions of ICPSR’s curation levels are available online 35 . Additional metadata are available for a subset of 421 studies (4%), including information about whether the study has a single PI, an institutional PI, the total number of PIs involved, total variables recorded is available for online analysis, has searchable question text, has variables that are indexed for search, contains one or more restricted files, and whether the study is completely restricted. We provided additional metadata for this subset of ICPSR studies because they were released within the past five years and detailed curation and usage information were available for them. Usage statistics including total downloads and data file downloads are available for this subset of studies as well; citation statistics are available for 8,030 studies (76%). Most ICPSR studies have fewer than 500 users, as indicated by total downloads, or citations (Fig.  4 ).

figure 3

ICPSR study curation levels.

figure 4

ICPSR study usage.

A subset of 43,102 publications (45%) available in the ICPSR Bibliography had a DOI. Author metadata were entered as free text, meaning that variations may exist and require additional normalization and pre-processing prior to analysis. While author information is standardized for each publication, individual names may appear in different sort orders (e.g., “Earls, Felton J.” and “Stephen W. Raudenbush”). Most of the items in the ICPSR Bibliography as of 2021-11-16 were journal articles (59%), reports (14%), conference presentations (9%), or theses (8%) (Fig.  5 ). The number of publications collected in the Bibliography has increased each decade since the inception of ICPSR in 1962 (Fig.  6 ). Most ICPSR studies (76%) have one or more citations in a publication.

figure 5

ICPSR Bibliography citation types.

figure 6

ICPSR citations by decade.

Usage Notes

The dataset consists of three tables that can be joined using the “STUDY” key as shown in Fig.  2 . The “ICPSR_PAPERS” table contains one row per paper with one or more cited studies in the “STUDY_NUMS” column. We manipulated and analyzed the tables as CSV files with the Pandas library 36 in Python and the Tidyverse packages 37 in R.

The present MICA dataset can be used independently to study the relationship between curation decisions and data reuse. Evidence of reuse for specific studies is available in several forms: usage information, including downloads and citation counts; and citation contexts within papers that cite data. Analysis may also be performed on the citation network formed between datasets and papers that use them. Finally, curation actions can be associated with properties of studies and usage histories.

This dataset has several limitations of which users should be aware. First, Jira tickets can only be used to represent the intensiveness of curation for activities undertaken since 2017, when ICPSR started using both Curation Levels and Jira. Studies published before 2017 were all curated, but documentation of the extent of that curation was not standardized and therefore could not be included in these analyses. Second, the measure of publications relies upon the authors’ clarity of data citation and the ICPSR Bibliography staff’s ability to discover citations with varying formality and clarity. Thus, there is always a chance that some secondary-data-citing publications have been left out of the bibliography. Finally, there may be some cases in which a paper in the ICSPSR bibliography did not actually obtain data from ICPSR. For example, PIs have often written about or even distributed their data prior to their archival in ICSPR. Therefore, those publications would not have cited ICPSR but they are still collected in the Bibliography as being directly related to the data that were eventually deposited at ICPSR.

In summary, the MICA dataset contains relationships between two main types of entities – papers and studies – which can be mined. The tables in the MICA dataset have supported network analysis (community structure and clique detection) 30 ; natural language processing (NER for dataset reference detection) 32 ; visualizing citation networks (to search for datasets) 38 ; and regression analysis (on curation decisions and data downloads) 29 . The data are currently being used to develop research metrics and recommendation systems for research data. Given that DOIs are provided for ICPSR studies and articles in the ICPSR Bibliography, the MICA dataset can also be used with other bibliometric databases, including DataCite, Crossref, OpenAlex, and related indexes. Subscription-based services, such as Dimensions AI, are also compatible with the MICA dataset. In some cases, these services provide abstracts or full text for papers from which data citation contexts can be extracted for semantic content analysis.

Code availability

The code 27 used to produce the MICA project dataset is available on GitHub at https://github.com/ICPSR/mica-data-descriptor and through Zenodo with the identifier https://doi.org/10.5281/zenodo.8432666 . Data manipulation and pre-processing were performed in Python. Data curation for distribution was performed in SPSS.

He, L. & Han, Z. Do usage counts of scientific data make sense? An investigation of the Dryad repository. Library Hi Tech 35 , 332–342 (2017).

Article   Google Scholar  

Brickley, D., Burgess, M. & Noy, N. Google dataset search: Building a search engine for datasets in an open web ecosystem. In The World Wide Web Conference - WWW ‘19 , 1365–1375 (ACM Press, San Francisco, CA, USA, 2019).

Buneman, P., Dosso, D., Lissandrini, M. & Silvello, G. Data citation and the citation graph. Quantitative Science Studies 2 , 1399–1422 (2022).

Chao, T. C. Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences. Proceedings of the American Society for Information Science and Technology 48 , 1–8 (2011).

Article   ADS   Google Scholar  

Parr, C. et al . A discussion of value metrics for data repositories in earth and environmental sciences. Data Science Journal 18 , 58 (2019).

Eschenfelder, K. R., Shankar, K. & Downey, G. The financial maintenance of social science data archives: Four case studies of long–term infrastructure work. J. Assoc. Inf. Sci. Technol. 73 , 1723–1740 (2022).

Palmer, C. L., Weber, N. M. & Cragin, M. H. The analytic potential of scientific data: Understanding re-use value. Proceedings of the American Society for Information Science and Technology 48 , 1–10 (2011).

Zimmerman, A. S. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Sci. Technol. Human Values 33 , 631–652 (2008).

Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368 , 4023–4038 (2010).

Article   ADS   CAS   Google Scholar  

Fear, K. M. Measuring and Anticipating the Impact of Data Reuse . Ph.D. thesis, University of Michigan (2013).

Borgman, C. L., Van de Sompel, H., Scharnhorst, A., van den Berg, H. & Treloar, A. Who uses the digital data archive? An exploratory study of DANS. Proceedings of the Association for Information Science and Technology 52 , 1–4 (2015).

Pasquetto, I. V., Borgman, C. L. & Wofford, M. F. Uses and reuses of scientific data: The data creators’ advantage. Harvard Data Science Review 1 (2019).

Gregory, K., Groth, P., Scharnhorst, A. & Wyatt, S. Lost or found? Discovering data needed for research. Harvard Data Science Review (2020).

York, J. Seeking equilibrium in data reuse: A study of knowledge satisficing . Ph.D. thesis, University of Michigan (2022).

Kilbride, W. & Norris, S. Collaborating to clarify the cost of curation. New Review of Information Networking 19 , 44–48 (2014).

Robinson-Garcia, N., Mongeon, P., Jeng, W. & Costas, R. DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics 11 , 841–854 (2017).

Qin, J., Hemsley, J. & Bratt, S. E. The structural shift and collaboration capacity in GenBank networks: A longitudinal study. Quantitative Science Studies 3 , 174–193 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Acuna, D. E., Yi, Z., Liang, L. & Zhuang, H. Predicting the usage of scientific datasets based on article, author, institution, and journal bibliometrics. In Smits, M. (ed.) Information for a Better World: Shaping the Global Future. iConference 2022 ., 42–52 (Springer International Publishing, Cham, 2022).

Zeng, T., Wu, L., Bratt, S. & Acuna, D. E. Assigning credit to scientific datasets using article citation networks. Journal of Informetrics 14 , 101013 (2020).

Koesten, L., Vougiouklis, P., Simperl, E. & Groth, P. Dataset reuse: Toward translating principles to practice. Patterns 1 , 100136 (2020).

Du, C., Cohoon, J., Lopez, P. & Howison, J. Softcite dataset: A dataset of software mentions in biomedical and economic research publications. J. Assoc. Inf. Sci. Technol. 72 , 870–884 (2021).

Aryani, A. et al . A research graph dataset for connecting research data repositories using RD-Switchboard. Sci Data 5 , 180099 (2018).

Färber, M. & Lamprecht, D. The data set knowledge graph: Creating a linked open data source for data sets. Quantitative Science Studies 2 , 1324–1355 (2021).

Perry, A. & Netscher, S. Measuring the time spent on data curation. Journal of Documentation 78 , 282–304 (2022).

Trisovic, A. et al . Advancing computational reproducibility in the Dataverse data repository platform. In Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems , P-RECS ‘20, 15–20, https://doi.org/10.1145/3391800.3398173 (Association for Computing Machinery, New York, NY, USA, 2020).

Borgman, C. L., Scharnhorst, A. & Golshan, M. S. Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. Journal of the Association for Information Science and Technology 70 , 888–904, https://doi.org/10.1002/asi.24172 (2019).

Lafia, S. et al . MICA Data Descriptor. Zenodo https://doi.org/10.5281/zenodo.8432666 (2023).

Lafia, S., Thomer, A., Bleckley, D., Akmon, D. & Hemphill, L. Leveraging machine learning to detect data curation activities. In 2021 IEEE 17th International Conference on eScience (eScience) , 149–158, https://doi.org/10.1109/eScience51609.2021.00025 (2021).

Hemphill, L., Pienta, A., Lafia, S., Akmon, D. & Bleckley, D. How do properties of data, their curation, and their funding relate to reuse? J. Assoc. Inf. Sci. Technol. 73 , 1432–44, https://doi.org/10.1002/asi.24646 (2021).

Lafia, S., Fan, L., Thomer, A. & Hemphill, L. Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation network. Quantitative Science Studies 3 , 694–714, https://doi.org/10.1162/qss_a_00209 (2022).

ICPSR. ICPSR Bibliography of Data-related Literature: Collection Criteria. https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html (2023).

Lafia, S., Fan, L. & Hemphill, L. A natural language processing pipeline for detecting informal data references in academic literature. Proc. Assoc. Inf. Sci. Technol. 59 , 169–178, https://doi.org/10.1002/pra2.614 (2022).

Hook, D. W., Porter, S. J. & Herzog, C. Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 , 23, https://doi.org/10.3389/frma.2018.00023 (2018).

https://www.icpsr.umich.edu/web/ICPSR/thesaurus (2002). ICPSR. ICPSR Thesaurus.

https://www.icpsr.umich.edu/files/datamanagement/icpsr-curation-levels.pdf (2020). ICPSR. ICPSR Curation Levels.

McKinney, W. Data Structures for Statistical Computing in Python. In van der Walt, S. & Millman, J. (eds.) Proceedings of the 9th Python in Science Conference , 56–61 (2010).

Wickham, H. et al . Welcome to the Tidyverse. Journal of Open Source Software 4 , 1686 (2019).

Fan, L., Lafia, S., Li, L., Yang, F. & Hemphill, L. DataChat: Prototyping a conversational agent for dataset search and visualization. Proc. Assoc. Inf. Sci. Technol. 60 , 586–591 (2023).

Download references

Acknowledgements

We thank the ICPSR Bibliography staff, the ICPSR Data Curation Unit, and the ICPSR Data Stewardship Committee for their support of this research. This material is based upon work supported by the National Science Foundation under grant 1930645. This project was made possible in part by the Institute of Museum and Library Services LG-37-19-0134-19.

Author information

Authors and affiliations.

Inter-university Consortium for Political and Social Research, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill, Sara Lafia, David Bleckley & Elizabeth Moss

School of Information, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill & Lizhou Fan

School of Information, University of Arizona, Tucson, AZ, 85721, USA

Andrea Thomer

You can also search for this author in PubMed   Google Scholar

Contributions

L.H. and A.T. conceptualized the study design, D.B., E.M., and S.L. prepared the data, S.L., L.F., and L.H. analyzed the data, and D.B. validated the data. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Libby Hemphill .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hemphill, L., Thomer, A., Lafia, S. et al. A dataset for measuring the impact of research data and their curation. Sci Data 11 , 442 (2024). https://doi.org/10.1038/s41597-024-03303-2

Download citation

Received : 16 November 2023

Accepted : 24 April 2024

Published : 03 May 2024

DOI : https://doi.org/10.1038/s41597-024-03303-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

analyzing data in research

analyzing data in research

Understanding data analysis: A beginner's guide

Before data can be used to tell a story, it must go through a process that makes it usable. Explore the role of data analysis in decision-making.

What is data analysis?

Data analysis is the process of gathering, cleaning, and modeling data to reveal meaningful insights. This data is then crafted into reports that support the strategic decision-making process.

Types of data analysis

There are many different types of data analysis. Each type can be used to answer a different question.

analyzing data in research

Descriptive analytics

Descriptive analytics refers to the process of analyzing historical data to understand trends and patterns. For example, success or failure to achieve key performance indicators like return on investment.

An example of descriptive analytics is generating reports to provide an overview of an organization's sales and financial data, offering valuable insights into past activities and outcomes.

analyzing data in research

Predictive analytics

Predictive analytics uses historical data to help predict what might happen in the future, such as identifying past trends in data to determine if they’re likely to recur.

Methods include a range of statistical and machine learning techniques, including neural networks, decision trees, and regression analysis.

analyzing data in research

Diagnostic analytics

Diagnostic analytics helps answer questions about what caused certain events by looking at performance indicators. Diagnostic analytics techniques supplement basic descriptive analysis.

Generally, diagnostic analytics involves spotting anomalies in data (like an unexpected shift in a metric), gathering data related to these anomalies, and using statistical techniques to identify potential explanations.

analyzing data in research

Cognitive analytics

Cognitive analytics is a sophisticated form of data analysis that goes beyond traditional methods. This method uses machine learning and natural language processing to understand, reason, and learn from data in a way that resembles human thought processes.

The goal of cognitive analytics is to simulate human-like thinking to provide deeper insights, recognize patterns, and make predictions.

analyzing data in research

Prescriptive analytics

Prescriptive analytics helps answer questions about what needs to happen next to achieve a certain goal or target. By using insights from prescriptive analytics, organizations can make data-driven decisions in the face of uncertainty.

Data analysts performing prescriptive analysis often rely on machine learning to find patterns in large semantic models and estimate the likelihood of various outcomes.

analyzing data in research

analyticsText analytics

Text analytics is a way to teach computers to understand human language. It involves using algorithms and other techniques to extract information from large amounts of text data, such as social media posts or customer previews.

Text analytics helps data analysts make sense of what people are saying, find patterns, and gain insights that can be used to make better decisions in fields like business, marketing, and research.

The data analysis process

Compiling and interpreting data so it can be used in decision making is a detailed process and requires a systematic approach. Here are the steps that data analysts follow:

1. Define your objectives.

Clearly define the purpose of your analysis. What specific question are you trying to answer? What problem do you want to solve? Identify your core objectives. This will guide the entire process.

2. Collect and consolidate your data.

Gather your data from all relevant sources using  data analysis software . Ensure that the data is representative and actually covers the variables you want to analyze.

3. Select your analytical methods.

Investigate the various data analysis methods and select the technique that best aligns with your objectives. Many free data analysis software solutions offer built-in algorithms and methods to facilitate this selection process.

4. Clean your data.

Scrutinize your data for errors, missing values, or inconsistencies using the cleansing features already built into your data analysis software. Cleaning the data ensures accuracy and reliability in your analysis and is an important part of data analytics.

5. Uncover valuable insights.

Delve into your data to uncover patterns, trends, and relationships. Use statistical methods, machine learning algorithms, or other analytical techniques that are aligned with your goals. This step transforms raw data into valuable insights.

6. Interpret and visualize the results.

Examine the results of your analyses to understand their implications. Connect these findings with your initial objectives. Then, leverage the visualization tools within free data analysis software to present your insights in a more digestible format.

7. Make an informed decision.

Use the insights gained from your analysis to inform your next steps. Think about how these findings can be utilized to enhance processes, optimize strategies, or improve overall performance.

By following these steps, analysts can systematically approach large sets of data, breaking down the complexities and ensuring the results are actionable for decision makers.

The importance of data analysis

Data analysis is critical because it helps business decision makers make sense of the information they collect in our increasingly data-driven world. Imagine you have a massive pile of puzzle pieces (data), and you want to see the bigger picture (insights). Data analysis is like putting those puzzle pieces together—turning that data into knowledge—to reveal what’s important.

Whether you’re a business decision maker trying to make sense of customer preferences or a scientist studying trends, data analysis is an important tool that helps us understand the world and make informed choices.

Primary data analysis methods

A person working on his desktop an open office environment

Quantitative analysis

Quantitative analysis deals with numbers and measurements (for example, looking at survey results captured through ratings). When performing quantitative analysis, you’ll use mathematical and statistical methods exclusively and answer questions like ‘how much’ or ‘how many.’ 

Two people looking at tablet screen showing a word document

Qualitative analysis

Qualitative analysis is about understanding the subjective meaning behind non-numerical data. For example, analyzing interview responses or looking at pictures to understand emotions. Qualitative analysis looks for patterns, themes, or insights, and is mainly concerned with depth and detail.

Data analysis solutions and resources

Turn your data into actionable insights and visualize the results with ease.

Microsoft 365

Process data and turn ideas into reality with innovative apps, including Excel.

Importance of backing up data

Learn how to back up your data and devices for peace of mind—and added security. 

Copilot in Excel

Go deeper with your data using Microsoft Copilot—your AI assistant.

Excel expense template

Organize and track your business expenses using Excel.

Excel templates

Boost your productivity with free, customizable Excel templates for all types of documents.

Chart designs

Enhance presentations, research, and other materials with customizable chart templates.

Follow Microsoft

 LinkedIn.

  • Open access
  • Published: 08 May 2024

Measurement and analysis of change in research scholars’ knowledge and attitudes toward statistics after PhD coursework

  • Mariyamma Philip 1  

BMC Medical Education volume  24 , Article number:  512 ( 2024 ) Cite this article

134 Accesses

Metrics details

Knowledge of statistics is highly important for research scholars, as they are expected to submit a thesis based on original research as part of a PhD program. As statistics play a major role in the analysis and interpretation of scientific data, intensive training at the beginning of a PhD programme is essential. PhD coursework is mandatory in universities and higher education institutes in India. This study aimed to compare the scores of knowledge in statistics and attitudes towards statistics among the research scholars of an institute of medical higher education in South India at different time points of their PhD (i.e., before, soon after and 2–3 years after the coursework) to determine whether intensive training programs such as PhD coursework can change their knowledge or attitudes toward statistics.

One hundred and thirty research scholars who had completed PhD coursework in the last three years were invited by e-mail to be part of the study. Knowledge and attitudes toward statistics before and soon after the coursework were already assessed as part of the coursework module. Knowledge and attitudes towards statistics 2–3 years after the coursework were assessed using Google forms. Participation was voluntary, and informed consent was also sought.

Knowledge and attitude scores improved significantly subsequent to the coursework (i.e., soon after, percentage of change: 77%, 43% respectively). However, there was significant reduction in knowledge and attitude scores 2–3 years after coursework compared to the scores soon after coursework; knowledge and attitude scores have decreased by 10%, 37% respectively.

The study concluded that the coursework program was beneficial for improving research scholars’ knowledge and attitudes toward statistics. A refresher program 2–3 years after the coursework would greatly benefit the research scholars. Statistics educators must be empathetic to understanding scholars’ anxiety and attitudes toward statistics and its influence on learning outcomes.

Peer Review reports

A PhD degree is a research degree, and research scholars submit a thesis based on original research in their chosen field. Doctor of Philosophy (PhD) degrees are awarded in a wide range of academic disciplines, and the PhD students are usually referred as research scholars. A comprehensive understanding of statistics allows research scholars to add rigour to their research. This approach helps them evaluate the current practices and draw informed conclusions from studies that were undertaken to generate their own hypotheses and to design, analyse and interpret complex clinical decisions. Therefore, intensive training at the beginning of the PhD journey is essential, as intensive training in research methodology and statistics in the early stages of research helps scholars design and plan their studies efficiently.

The University Grants Commission of India has taken various initiatives to introduce academic reforms to higher education institutions in India and mandated in 2009 that coursework be treated as a prerequisite for PhD preparation and that a minimum of four credits be assigned to one or more courses on research methodology, which could cover areas such as quantitative methods, computer applications, and research ethics. UGC also clearly states that all candidates admitted to PhD programmes shall be required to complete the prescribed coursework during the initial two semesters [ 1 ]. National Institute of Mental Health and Neurosciences (NIMHANS) at Bangalore, a tertiary care hospital and medical higher education institute in South India, that trains students in higher education in clinical fields, also introduced coursework in the PhD program for research scholars from various backgrounds, such as basic, behavioral and neurosciences, as per the UGC mandate. Research scholars undertake coursework programs soon after admission, which consist of several modules that include research methodology and statistical software training, among others.

Most scholars approach a course in statistics with the prejudice that statistics is uninteresting, demanding, complex or involve much mathematics and, most importantly, it is not relevant to their career goals. They approach statistics with considerable apprehension and negative attitudes, probably because of their inability to grasp the relevance of the application of the methods in their fields of study. This could be resolved by providing sufficient and relevant examples of the application of statistical techniques from various fields of medical research and by providing hands-on experience to learn how these techniques are applied and interpreted on real data. Hence, research methodology and statistical methods and the application of statistical methods using software have been given much importance and are taught as two modules, named Research Methodology and Statistics and Statistical Software Training, at this institute of medical higher education that trains research scholars in fields as diverse as basic, behavioural and neurosciences. Approximately 50% of the coursework curriculum focused on these two modules. Research scholars were thus given an opportunity to understand the theoretical aspects of the research methodology and statistical methods. They were also given hands-on training on statistical software to analyse the data using these methods and to interpret the findings. The coursework program was designed in this specific manner, as this intensive training would enable the research scholars to design their research studies more effectively and analyse their data in a better manner.

It is important to study attitudes toward statistics because attitudes are known to impact the learning process. Also, most importantly, these scholars are expected to utilize the skills in statistics and research methods to design research projects or guide postgraduate students and research scholars in the near future. Several authors have assessed attitudes toward statistics among various students and examined how attitudes affect academic achievement, how attitudes are correlated with knowledge in statistics and how attitudes change after a training program. There are studies on attitudes toward statistics among graduate [ 2 , 3 , 4 ] and postgraduate [ 5 ] medical students, politics, sociology, ( 6 – 7 ) psychology [ 8 , 9 , 10 ], social work [ 11 ], and management students [ 12 ]. However, there is a dearth of related literature on research scholars, and there are only two studies on the attitudes of research scholars. In their study of doctoral students in education-related fields, Cook & Catanzaro (2022) investigated the factors that contribute to statistics anxiety and attitudes toward statistics and how anxiety, attitudes and plans for future research use are connected among doctoral students [ 13 ]. Another study by Sohrabi et al. (2018) on research scholars assessed the change in knowledge and attitude towards teaching and educational design of basic science PhD students at a Medical University after a two-day workshop on empowerment and familiarity with the teaching and learning principles [ 14 ]. There were no studies that assessed changes in the attitudes or knowledge of research scholars across the PhD training period or after intensive training programmes such as PhD coursework. Even though PhD coursework has been established in institutes of higher education in India for more than a decade, there are no published research on the effectiveness of coursework from Indian universities or institutes of higher education.

This study aimed to determine the effectiveness of PhD coursework and whether intensive training programs such as PhD coursework can influence the knowledge and attitudes toward statistics of research scholars. Additionally, it would be interesting to know if the acquired knowledge could be retained longer, especially 2–3 years after the coursework, the crucial time of PhD data analysis. Hence, this study compares the scores of knowledge in statistics and attitude toward statistics of the research scholars at different time points of their PhD training, i.e., before, soon after and 2–3 years after the coursework.

Participants

This is an observational study of single group with repeated assessments. The institute offers a three-month coursework program consisting of seven modules, the first module is ethics; the fifth is research methodology and statistics; and the last is neurosciences. The study was conducted in January 2020. All research scholars of the institute who had completed PhD coursework in the last three years were considered for this study ( n  = 130). Knowledge and attitudes toward statistics before and soon after the coursework module were assessed as part of the coursework program. They were collected on the first and last day of the program respectively. The author who was also the coordinator of the research methodology and statistics module of the coursework have obtained the necessary permission to use the data for this study. The scholars invited to be part of the study by e-mail. Knowledge and attitude towards statistics 2–3 years after the coursework were assessed online using Google forms. They were also administered a semi structured questionnaire to elicit details about the usefulness of coursework. Participation was voluntary, and consent was also sought online. The confidentiality of the data was assured. Data were not collected from research scholars of Biostatistics or from research scholars who had more than a decade of experience or who had been working in the institute as faculty, assuming that their scores could be higher and could bias the findings. This non funded study was reviewed and approved by the Institute Ethics Committee.

Instruments

Knowledge in Statistics was assessed by a questionnaire prepared by the author and was used as part of the coursework evaluation. The survey included 25 questions that assessed the knowledge of statistics on areas such as descriptive statistics, sampling methods, study design, parametric and nonparametric tests and multivariate analyses. Right answers were assigned a score of 1, and wrong answers were assigned a score of 0. Total scores ranged from 0 to 25. Statistics attitudes were assessed by the Survey of Attitudes toward Statistics (SATS) scale. The SATS is a 36-item scale that measures 6 domains of attitudes towards statistics. The possible range of scores for each item is between 1 and 7. The total score was calculated by dividing the summed score by the number of items. Higher scores indicate more positive attitudes. The SAT-36 is a copyrighted scale, and researchers are allowed to use it only with prior permission. ( 15 – 16 ) The author obtained permission for use in the coursework evaluation and this study. A semi structured questionnaire was also used to elicit details about the usefulness of coursework.

Statistical analysis

Descriptive statistics such as mean, standard deviation, number and percentages were used to describe the socio-demographic data. General Linear Model Repeated Measures of Analysis of variance was used to compare knowledge and attitude scores across assessments. Categorical data from the semi structured questionnaire are presented as percentages. All the statistical tests were two-tailed, and a p value < 0.05 was set a priori as the threshold for statistical significance. IBM SPSS (28.0) was used to analyse the data.

One hundred and thirty research scholars who had completed coursework (CW) in the last 2–3 years were considered for the study. These scholars were sent Google forms to assess their knowledge and attitudes 2–3 years after coursework. 81 scholars responded (62%), and 4 scholars did not consent to participate in the study. The data of 77 scholars were merged with the data obtained during the coursework program (before and soon after CW). Socio-demographic characteristics of the scholars are presented in Table  1 .

The age of the respondents ranged from 23 to 36 years, with an average of 28.7 years (3.01), and the majority of the respondents were females (65%). Years of experience (i.e., after masters) before joining a PhD programme ranged from 0.5 to 9 years, and half of them had less than three years of experience before joining the PhD programme (median-3). More than half of those who responded were research scholars from the behavioural sciences (55%), while approximately 30% were from the basic sciences (29%).

General Linear Model Repeated Measures of Analysis of variance was used to compare the knowledge and attitude scores of scholars before, soon after and 2–3 after the coursework (will now be referred as “later the CW”), and the results are presented below (Table  2 ; Fig.  1 ).

figure 1

Comparison of knowledge and attitude scores across the assessments. Later the CW – 2–3 years after the coursework

The scores for knowledge and attitude differed significantly across time. Scores of knowledge and attitude increased soon after the coursework; the percentage of change was 77% and 43% respectively. However, significant reductions in knowledge and attitude scores were observed 2–3 years after the coursework compared to scores soon after the coursework. The reduction was higher for attitude scores; knowledge and attitude scores have decreased by 10% and 37% respectively. The change in scores across assessments is evident from the graph, and clearly the effect size is higher for attitude than knowledge.

The scores of knowledge or attitude before the coursework did not significantly differ with respect to gender or age or were not correlated with years of experience. Hence, they were not considered as covariates in the above analysis.

A semi structured questionnaire with open ended questions was also administered to elicit in-depth information about the usefulness of the coursework programme, in which they were also asked to self- rate their knowledge. The data were mostly categorical or narratives. Research scholars’ self-rated knowledge scores (on a scale of 0–10) also showed similar changes; knowledge improved significantly and was retained even after the training (Fig.  2 ).

figure 2

Self-rated knowledge scores of research scholars over time. Later the CW – 2–3 years after the coursework

The response to the question “ How has coursework changed your attitude toward statistics?”, is presented in Fig.  3 . The responses were Yes, positively, Yes - Negatively, No change – still apprehensive, No change – still appreciate, No change – still hate statistics. The majority of the scholars (70%) reported a positive change in their attitude toward statistics. Moreover, none of the scholars reported negative changes. Approximately 9% of the scholars reported that they were still apprehensive about statistics or hate statistics after the coursework.

figure 3

How has coursework changed your attitude toward statistics?

Those scholars who reported that they were apprehensive about statistics or hate statistics noted the complexity of the subject, lack of clarity, improper instructions and fear of mathematics as major reasons for their attitude. Some responses are listed below.

“The statistical concepts were not taught in an understandable manner from the UG level” , “I am weak in mathematical concepts. The equations and formulae in statistics scare me”. “Lack of knowledge about the importance of statistics and fear of mathematical equations”. “The preconceived notion that Statistics is difficult to learn” . “In most of the places, it is not taught properly and conceptual clarity is not focused on, and because of this an avoidance builds up, which might be a reason for the negative attitude”.

Majority of the scholars (92%) felt that coursework has helped them in their PhD, and they were happy to recommend it for other research scholars (97%). The responses of the scholars to the question “ How was coursework helpful in your PhD journey ?”, are listed below.

“Course work gave a fair idea on various things related to research as well as statistics” . “Creating the best design while planning methodology, which is learnt form course work, will increase efficiency in completing the thesis, thereby making it faster”. “Course work give better idea of how to proceed in many areas like literature search, referencing, choosing statistical methods, and learning about research procedures”. “Course work gave a good idea of research methodology, biostatistics and ethics. This would help in writing a better protocol and a better thesis”. “It helps us to plan our research well and to formulate, collect and plan for analysis”. “It makes people to plan their statistical analysis well in advance” .

This study evaluated the effectiveness of the existing coursework programme in an institution of higher medical education, and investigated whether the coursework programme benefits research scholars by improving their knowledge of statistics and attitudes towards statistics. The study concluded that the coursework program was beneficial for improving scholars’ knowledge about statistics and attitudes toward statistics.

Unlike other studies that have assessed attitudes toward statistics, the study participants in this study were research scholars. Research scholars need extensive training in statistics, as they need to apply statistical tests and use statistical reasoning in their research thesis, and in their profession to design research projects or their future student dissertations. Notably, no studies have assessed the attitudes or knowledge of research scholars in statistics either across the PhD training period or after intensive statistics training programs. However, the findings of this study are consistent with the findings of a study that compared the knowledge and attitudes toward teaching and education design of PhD students after a two-day educational course and instructional design workshop [ 14 ].

Statistics educators need not only impart knowledge but they should also motivate the learners to appreciate the role of statistics and to continue to learn the quantitative skills that is needed in their professional lives. Therefore, the role of learners’ attitudes toward statistics requires special attention. Since PhD coursework is possibly a major contributor to creating a statistically literate research community, scholars’ attitudes toward statistics need to be considered important and given special attention. Passionate and engaging statistics educators who have adequate experience in illustrating relatable examples could help scholars feel less anxious and build competence and better attitudes toward statistics. Statistics educators should be aware of scholars’ anxiety, fears and attitudes toward statistics and about its influence on learning outcomes and further interest in the subject.

Strengths and limitations

Analysis of changes in knowledge and attitudes scores across various time points of PhD training is the major strength of the study. Additionally, this study evaluates the effectiveness of intensive statistical courses for research scholars in terms of changes in knowledge and attitudes. This study has its own limitations: the data were collected through online platforms, and the nonresponse rate was about 38%. Ability in mathematics or prior learning experience in statistics, interest in the subject, statistics anxiety or performance in coursework were not assessed; hence, their influence could not be studied. The reliability and validity of the knowledge questionnaire have not been established at the time of this study. However, author who had prepared the questionnaire had ensured questions from different areas of statistics that were covered during the coursework, it has also been used as part of the coursework evaluation. Despite these limitations, this study highlights the changes in attitudes and knowledge following an intensive training program. Future research could investigate the roles of age, sex, mathematical ability, achievement or performance outcomes and statistics anxiety.

The study concluded that a rigorous and intensive training program such as PhD coursework was beneficial for improving knowledge about statistics and attitudes toward statistics. However, the significant reduction in attitude and knowledge scores after 2–3 years of coursework indicates that a refresher program might be helpful for research scholars as they approach the analysis stage of their thesis. Statistics educators must develop innovative methods to teach research scholars from nonstatistical backgrounds. They also must be empathetic to understanding scholars’ anxiety, fears and attitudes toward statistics and to understand its influence on learning outcomes and further interest in the subject.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

UGC Regulations on Minimum Standards and Procedure for the award of, M.Phil/Ph D, Degree R. 2009. Ugc.ac.in. [cited 2023 Oct 26]. https://www.ugc.ac.in/oldpdf/regulations/mphilphdclarification.pdf .

Althubaiti A. Attitudes of medical students toward statistics in medical research: Evidence from Saudi Arabia. J Stat Data Sci Educ [Internet]. 2021;29(1):115–21. https://doi.org/10.1080/10691898.2020.1850220 .

Hannigan A, Hegarty AC, McGrath D. Attitudes towards statistics of graduate entry medical students: the role of prior learning experiences. BMC Med Educ [Internet]. 2014;14(1):70. https://doi.org/10.1186/1472-6920-14-70 .

Hasabo EA, Ahmed GEM, Alkhalifa RM, Mahmoud MD, Emad S, Albashir RB et al. Statistics for undergraduate medical students in Sudan: associated factors for using statistical analysis software and attitude toward statistics among undergraduate medical students in Sudan. BMC Med Educ [Internet]. 2022;22(1):889. https://doi.org/10.1186/s12909-022-03960-0 .

Zhang Y, Shang L, Wang R, Zhao Q, Li C, Xu Y et al. Attitudes toward statistics in medical postgraduates: measuring, evaluating and monitoring. BMC Med Educ [Internet]. 2012;12(1):117. https://doi.org/10.1186/1472-6920-12-117 .

Bechrakis T, Gialamas V, Barkatsas A. Survey of attitudes towards statistics (SATS): an investigation of its construct validity and its factor structure invariance by gender. Int J Theoretical Educational Pract. 2011;1(1):1–15.

Google Scholar  

Khavenson T, Orel E, Tryakshina M. Adaptation of survey of attitudes towards statistics (SATS 36) for Russian sample. Procedia Soc Behav Sci [Internet]. 2012; 46:2126–9. https://doi.org/10.1016/j.sbspro.2012.05.440 .

Coetzee S, Van Der Merwe P. Industrial psychology students’ attitudes towards statistics. J Industrial Psychol. 2010;36(1):843–51.

Francesca C, Primi C. Assessing statistics attitudes among College Students: Psychometric properties of the Italian version of the Survey of attitudes toward statistics (SATS). Learn Individual Differences. 2009;2:309–13.

Counsell A, Cribbie RA. Students’ attitudes toward learning statistics with R. Psychol Teach Rev [Internet]. 2020;26(2):36–56. https://doi.org/10.53841/bpsptr.2020.26.2.36 .

Yoon E, Lee J. Attitudes toward learning statistics among social work students: Predictors for future professional use of statistics. J Teach Soc Work [Internet]. 2022;42(1):65–81. https://doi.org/10.1080/08841233.2021.2014018 .

Melad AF. Students’ attitude and academic achievement in statistics: a Correlational Study. J Posit School Psychol. 2022;6(2):4640–6.

Cook KD, Catanzaro BA. Constantly Working on My Attitude Towards Statistics! Education Doctoral Students’ Experiences with and Motivations for Learning Statistics. Innov High Educ. 2023; 48:257–84. https://doi.org/10.1007/s10755-022-09621-w .

Sohrabi Z, Koohestani HR, Nahardani SZ, Keshavarzi MH. Data on the knowledge, attitude, and performance of Ph.D. students attending an educational course (Tehran, Iran). Data Brief [Internet]. 2018; 21:1325–8. https://doi.org/10.1016/j.dib.2018.08.081 .

Chau C, Stevens J, Dauphine T. Del V. A: The development and validation of the survey of attitudes toward statistics. Educ Psychol Meas. 1995;(5):868–75.

Student attitude surveys. and online educational consulting [Internet]. Evaluationandstatistics.com. [cited 2023 Oct 26]. https://www.evaluationandstatistics.com/ .

Download references

Acknowledgements

The author would like to thank the participants of the study and peers and experts who examined the content of the questionnaire for their time and effort.

This research did not receive any grants from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Biostatistics, Dr. M.V. Govindaswamy Centre, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, 560 029, India

Mariyamma Philip

You can also search for this author in PubMed   Google Scholar

Contributions

Mariyamma Philip: Conceptualization, Methodology, Validation, Investigation, Writing- Original draft, Reviewing and Editing.

Corresponding author

Correspondence to Mariyamma Philip .

Ethics declarations

Ethics approval and consent to participate.

This study used data already collected data (before and soon after coursework). The data pertaining to knowledge and attitude towards statistics 2–3 years after coursework were collected from research scholars through the online survey platform Google forms. The participants were invited to participate in the survey through e-mail. The study was explained in detail, and participation in the study was completely voluntary. Informed consent was obtained online in the form of a statement of consent. The confidentiality of the data was assured, even though identifiable personal information was not collected. This non-funded study was reviewed and approved by NIMHANS Institute Ethics Committee (No. NIMHANS/21st IEC (BS&NS Div.)

Consent for publication

Not applicable because there is no personal information or images that could lead to the identification of a study participant.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Philip, M. Measurement and analysis of change in research scholars’ knowledge and attitudes toward statistics after PhD coursework. BMC Med Educ 24 , 512 (2024). https://doi.org/10.1186/s12909-024-05487-y

Download citation

Received : 27 October 2023

Accepted : 29 April 2024

Published : 08 May 2024

DOI : https://doi.org/10.1186/s12909-024-05487-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Knowledge of statistics
  • Attitude towards statistics
  • PhD coursework
  • Research scholars

BMC Medical Education

ISSN: 1472-6920

analyzing data in research

A love of marine biology and data analysis

Thursday, May 09, 2024 • Katherine Egan Bennett : contact

Kelsey Beavers Scuba Research

Kelsey Beavers’ love of the ocean started at a young age. Coming from a family of avid scuba divers, she became a certified junior diver at age 11.

“It was a different world,” Beavers said. “I loved everything about the ocean.”

After graduating from high school, the Austin native moved to Fort Worth to study environmental science at Texas Christian University. One of her professors at TCU knew University of Texas at Arlington biology Professor Laura Mydlarz and encouraged Beavers to continue her studies in Arlington.

“Kelsey came to UTA to pursue a Ph.D. and study coral disease, and she quickly got involved in a large project studying stony coral tissue loss disease (SCTLD) , a rapidly spreading disease that has been killing coral all along Florida’s coast and in 22 Caribbean countries,” Mydlarz said. “She has been a real asset to our team, including being the lead author on a paper we published in Nature Communications last year on the disease.”

UT Arlington biology researchers Laura Mydlarz and Kelsey Beavers

As part of her doctoral program, Beavers completed original research studying the gene expression of coral reefs affected by SCTLD. Her research involved scuba diving off the coast of the U.S. Virgin Islands to collect coral tissue samples before returning to the lab for data analysis.

“What we found was that the symbiotic algae living within coral are also affected by SCTLD,” Beavers said. “Our current hypothesis is that when algae move from reef to reef, they may be spreading the disease that has been devastating coral reefs since it first appeared in 2014.”

A large part of Beavers’ dissertation project involved crunching large sets of gene expression data extracted from the coral samples and analyzing it in the context of disease susceptibility and severity.

“The analysis part of the project was so much larger than just using a regular Mac, so I worked with the Texas Advanced Computer Center (TACC) in Austin, which is part of the UT System, using their supercomputers,” Beavers said.

Beavers enjoyed the data analysis part of her project so much that when she saw an opening at TACC for a full-time position, she jumped at the chance. She’s now working there part-time until graduation, when she plans to move to Austin for her new role.

“I’m really looking forward to my new position, as I’ll be able to work on research projects other than my own,” she said. “It will be interesting to be a specialist in data analysis and help other scientists use the TACC supercomputers to solve complex questions.”

As part of the job, she’ll travel to other UT System campuses to educate researchers on how they can use the tools available at TACC.

“I’m really proud of the work Kelsey did in our lab these past few years, and I’m excited to see her thrive after graduation,” Mydlarz said. “Seeing my students succeed is one of the best parts of this job.”

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Teens and Video Games Today

  • Methodology

Table of Contents

  • Who plays video games?
  • How often do teens play video games?
  • What devices do teens play video games on?
  • Social media use among gamers
  • Teen views on how much they play video games and efforts to cut back
  • Are teens social with others through video games?
  • Do teens think video games positively or negatively impact their lives?
  • Why do teens play video games?
  • Bullying and violence in video games
  • Appendix A: Detailed charts
  • Acknowledgments

The analysis in this report is based on a self-administered web survey conducted from Sept. 26 to Oct. 23, 2023, among a sample of 1,453 dyads, with each dyad (or pair) comprised of one U.S. teen ages 13 to 17 and one parent per teen. The margin of sampling error for the full sample of 1,453 teens is plus or minus 3.2 percentage points. The margin of sampling error for the full sample of 1,453 parents is plus or minus 3.2 percentage points. The survey was conducted by Ipsos Public Affairs in English and Spanish using KnowledgePanel, its nationally representative online research panel.

The research plan for this project was submitted to an external institutional review board (IRB), Advarra, which is an independent committee of experts that specializes in helping to protect the rights of research participants. The IRB thoroughly vetted this research before data collection began. Due to the risks associated with surveying minors, this research underwent a full board review and received approval (Approval ID Pro00073203).

KnowledgePanel members are recruited through probability sampling methods and include both those with internet access and those who did not have internet access at the time of their recruitment. KnowledgePanel provides internet access for those who do not have it and, if needed, a device to access the internet when they join the panel. KnowledgePanel’s recruitment process was originally based exclusively on a national random-digit dialing (RDD) sampling methodology. In 2009, Ipsos migrated to an address-based sampling (ABS) recruitment methodology via the U.S. Postal Service’s Delivery Sequence File (DSF). The Delivery Sequence File has been estimated to cover as much as 98% of the population, although some studies suggest that the coverage could be in the low 90% range. 4

Panelists were eligible for participation in this survey if they indicated on an earlier profile survey that they were the parent of a teen ages 13 to 17. A random sample of 3,981 eligible panel members were invited to participate in the study. Responding parents were screened and considered qualified for the study if they reconfirmed that they were the parent of at least one child ages 13 to 17 and granted permission for their teen who was chosen to participate in the study. In households with more than one eligible teen, parents were asked to think about one randomly selected teen, and that teen was instructed to complete the teen portion of the survey. A survey was considered complete if both the parent and selected teen completed their portions of the questionnaire, or if the parent did not qualify during the initial screening.

Of the sampled panelists, 1,763 (excluding break-offs) responded to the invitation and 1,453 qualified, completed the parent portion of the survey, and had their selected teen complete the teen portion of the survey, yielding a final stage completion rate of 44% and a qualification rate of 82%. The cumulative response rate accounting for nonresponse to the recruitment surveys and attrition is 2.2%. The break-off rate among those who logged on to the survey (regardless of whether they completed any items or qualified for the study) is 26.9%.

Upon completion, qualified respondents received a cash-equivalent incentive worth $10 for completing the survey. To encourage response from non-Hispanic Black panelists, the incentive was increased from $10 to $20 on Oct 5, 2023. The incentive was increased again on Oct. 10, from $20 to $40; then to $50 on Oct. 17; and to $75 on Oct. 20. Reminders and notifications of the change in incentive were sent for each increase.

All panelists received email invitations and any nonresponders received reminders, shown in the table. The field period was closed on Oct. 23, 2023.

A table showing Invitation and reminder dates

The analysis in this report was performed using separate weights for parents and teens. The parent weight was created in a multistep process that begins with a base design weight for the parent, which is computed to reflect their probability of selection for recruitment into the KnowledgePanel. These selection probabilities were then adjusted to account for the probability of selection for this survey, which included oversamples of non-Hispanic Black and Hispanic parents. Next, an iterative technique was used to align the parent design weights to population benchmarks for parents of teens ages 13 to 17 on the dimensions identified in the accompanying table, to account for any differential nonresponse that may have occurred.

To create the teen weight, an adjustment factor was applied to the final parent weight to reflect the selection of one teen per household. Finally, the teen weights were further raked to match the demographic distribution for teens ages 13 to 17 who live with parents. The teen weights were adjusted on the same teen dimensions as parent dimensions with the exception of teen education, which was not used in the teen weighting.

Sampling errors and tests of statistical significance take into account the effect of weighting. Interviews were conducted in both English and Spanish.

In addition to sampling error, one should bear in mind that question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of opinion polls.

The following table shows the unweighted sample sizes and the error attributable to sampling that would be expected at the 95% level of confidence for different groups in the survey:

A table showing the unweighted sample sizes and the error attributable to sampling

Sample sizes and sampling errors for subgroups are available upon request.

Dispositions and response rates

The tables below display dispositions used in the calculation of completion, qualification and cumulative response rates. 5

A table showing Dispositions and response rates

© Pew Research Center, 2023

  • AAPOR Task Force on Address-based Sampling. 2016. “AAPOR Report: Address-based Sampling.” ↩
  • For more information on this method of calculating response rates, refer to: Callegaro, Mario, and Charles DiSogra. 2008. “Computing response metrics for online panels.” Public Opinion Quarterly. ↩

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Friendships
  • Online Harassment & Bullying
  • Teens & Tech
  • Teens & Youth

How Teens and Parents Approach Screen Time

Teens and internet, device access fact sheet, teens and social media fact sheet, teens, social media and technology 2023, what the data says about americans’ views of artificial intelligence, most popular, report materials.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

  • Open access
  • Published: 11 May 2024

Serum urate levels and neurodegenerative outcomes: a prospective cohort study and mendelian randomization analysis of the UK Biobank

  • Tingjing Zhang 1 , 2   na1 ,
  • Yu An 3   na1 ,
  • Zhenfei Shen 4 ,
  • Honghao Yang 5 , 6 ,
  • Jinguo Jiang 5 , 6 ,
  • Liangkai Chen 7 ,
  • Yanhui Lu 8 &
  • Yang Xia 5 , 6  

Alzheimer's Research & Therapy volume  16 , Article number:  106 ( 2024 ) Cite this article

59 Accesses

Metrics details

Previous studies on the associations between serum urate levels and neurodegenerative outcomes have yielded inconclusive results, and the causality remains unclear. This study aimed to investigate whether urate levels are associated with the risks of Alzheimer’s disease and related dementias (ADRD), Parkinson’s disease (PD), and neurodegenerative deaths.

This prospective study included 382,182 participants (45.7% men) from the UK Biobank cohort. Cox proportional hazards models were used to assess the associations between urate levels and risk of neurodegenerative outcomes. In the Mendelian randomization (MR) analysis, urate-related single-nucleotide polymorphisms were identified through a genome-wide association study. Both linear and non-linear MR approaches were utilized to investigate the potential causal associations.

During a median follow-up period of 12 years, we documented 5,400 ADRD cases, 2,553 PD cases, and 1,531 neurodegenerative deaths. Observational data revealed that a higher urate level was associated with a decreased risk of ADRD (hazard ratio [HR]: 0.93, 95% confidence interval [CI]: 0.90, 0.96), PD (HR: 0.87, 95% CI: 0.82, 0.91), and neurodegenerative death (HR: 0.88, 95% CI: 0.83, 0.94). Negative linear associations between urate levels and neurodegenerative events were observed (all P -values for overall < 0.001 and all P -values for non-linearity > 0.05). However, MR analyses yielded no evidence of either linear or non-linear associations between genetically predicted urate levels and the risk of the aforementioned neurodegenerative events.

Although the prospective cohort study demonstrated that elevated urate levels were associated with a reduced risk of neurodegenerative outcomes, MR analyses found no evidence of causality.

Neurological disorders rank foremost in causing disability and stand as the second most common cause of death worldwide, accounting for 11.6% of global disability-adjusted life-years and 16.5% of all deaths [ 1 ]. Globally, Alzheimer’s disease and related dementias (ADRD) and Parkinson’s disease (PD) are the most prevalent neurodegenerative diseases [ 1 , 2 ]. Currently, there are more than 55 million individuals with ADRD, as well as more than 8.5 million individuals with PD worldwide [ 3 , 4 ]. The economic burden of ADRD on the global economy amounts to 1.3 trillion US dollars, with nearly 10 million new cases reported each year [ 3 ]. PD has resulted in 5.8 million disability-adjusted life years, reflecting an 81% increase since 2000 [ 4 ]. At present, neither ADRD nor PD has a cure, emphasizing the importance of identifying and focusing on modifiable risk factors associated with these conditions.

Urate, the final product of human purine metabolism, serves as a potent antioxidant [ 5 , 6 ]. It plays a significant role in human physiology by contributing to approximately 60% of the scavenging activity against free radicals [ 7 ]. Urate plays a crucial role in neutralizing and eliminating reactive oxygen species, thereby protecting cells and tissues from oxidative damage [ 8 ]. The antioxidant properties of urate are crucial for maintaining cell function and preventing conditions associated with oxidative stress [ 9 , 10 ]. Additionally, these antioxidant properties have led to suggestions that urate may be a neuroprotective agent [ 7 , 11 ]. However, while the associations of urate levels with neurodegenerative diseases have been explored, the findings are inconsistent and conflicting [ 12 , 13 , 14 , 15 ]. This inconsistency may be attributed to potential confounding factors and possible reverse causation influencing the observed associations. Furthermore, it remains unclear whether the association between urate levels and risk of neurodegenerative outcomes is causal.

Mendelian randomization (MR) is an approach of epidemiological studies that uses genetic variants associated with exposure as instrumental variables to establish causal effects on outcomes [ 16 ]. The MR design eliminates the impact of confounding factors as alleles are randomly allocated during gamete formation and conception [ 17 ]. Consequently, the results of MR avoid the bias of reverse causation and confounding factors [ 18 ].

Therefore, we aimed to determine the associations between urate levels and risk of neurodegenerative diseases, especially ADRD, PD, and neurodegenerative death, based on a large prospective population-based observational analysis and the MR approach, and to provide a stronger scientific basis to enhance the efficacy of health management strategies.

Materials and methods

Study populations.

UK Biobank is a prospective study that enrolled more than 500,000 individuals aged 40 to 79 years from 22 evaluation centers across the United Kingdom between April 2006 to December 2010. During recruitment, all participants were assessed for demographic information, lifestyle factors, bodily measurements, and other health-related parameters by trained health professionals. Additionally, blood specimens were collected for genotyping. The UK Biobank study protocol is publicly available at https://www.ukbiobank.ac.uk/ .

In this large population-based study of 502,461 participants, several exclusion criteria were applied to ensure data quality: (1) individuals with prevalent ADRD or PD at baseline; (2) those with missing data on urate levels, genetic information, and related covariates; (3) individuals with sex discordance; (4) outliers with genotype missingness or heterozygosity; (5) individuals with genetic kinship to other participants; and (6) individuals of non-European ancestry. As a result, a final sample of 382,182 participants was retained for the analysis. The flowchart is shown in Fig. S1 .

The UK Biobank study was approved by the Northwest Multi-Center Research Ethics Committee, and each participant provided written informed consent before participating in the study. The data resource used for this study was obtained under application number 63,454 from the UK Biobank.

Assessment of exposure, outcome, and covariates

Baseline serum urate levels were measured using the uricase pedigree analysis package of the Beckman Coulter AU5800 platform (Randox Biosciences, Crumlin, UK). Participants were categorized into quartiles based on the distribution of urate levels according to sex. “Quartile 1” refers to the lowest 25% of participants with the lowest urate level, while “quartile 4” represents the highest 25% of participants with the highest urate level.

Neurodegenerative outcomes were identified using data on admissions and diagnoses with primary or secondary diagnosis based on the International Classification of Diseases (detailed information provided in Table S1 ) [ 19 , 20 ]. The follow-up period ranged from March 16, 2006 to the end endpoint of follow-up (September 30, 2021 for centers in England; February 28, 2018, for centers in Wales; and July 31, 2021, for centers in Scotland). Person-years were calculated for each participant from the date of baseline assessment to the occurrence of neurodegenerative outcomes, death, or the end of follow-up, whichever occurred first.

Covariates possibly affecting the associations between urate levels and neurodegenerative outcomes, as indicated by previous studies, were taken into account in our analysis. A baseline touch-screen questionnaire was used to assess the potential confounding variables, including sociodemographic and lifestyle factors (e.g., age, sex, educational levels, smoking status, alcohol consumption and dietary habits), as well as personal and family history of diseases. Based on the baseline food frequency questionnaire, a diet score was calculated using the following elements: vegetables, fruits, fish, processed meat, unprocessed red meat, whole grains, and refined grains, as conducted in previous studies [ 21 , 22 ]. Each diet factor received 1 point: consumption of at least 3 servings of vegetables per day, at least 3 servings of fruit per day, at least 2 servings of fish per week, no more than 1 serving of processed meat per week, no more than 1.5 servings of unprocessed red meat per week, at least 3 servings of whole grains per day, and no more than 1.5 servings of refined grains per week. The total diet score ranged from 0 to 7. Details of covariates were provided in Table S3 .

Genetic instrument for urate

The genotyping procedure and DNA array used in the UK Biobank study have been previously described [ 23 ]. In brief, each participant’s blood specimen was genotyped using the custom Affymetrix UK Biobank Axiom array. The genotyping data underwent phasing and imputation; SHAPEIT3 was used for phasing and IMPUTE3 was used for imputation, with a merged reference panel of UK10K and 1000 Genomes Phase 3 [ 24 ].

We used 20 independent single nucleotide polymorphisms (SNPs) ( P  < 5 × 10 − 8 , r 2  < 0.1 within a 1000 kb window) identified in a genome-wide association analysis as genetic instruments in the MR (Table S2 ) [ 25 ]. These SNPs were used to construct the genetic risk score (GRS). The calculation of the GRS for each SNP involved coding them as 0, 1, or 2 based on the number of risk alleles, and each SNP was weighted by its relative effect size (β coefficient). The GRS for each individual was then obtained by summing the weighted scores using the PLINK “–score” command and the z-standardized value. The distribution of urate-related GRS is shown in Fig. S2 . In this study, the genetic instrument showed a strong association with urate levels, with an F statistic of 173 and a P -value < 0.0001.

Statistical analysis

Baseline characteristics of the study population were outlined across quartiles of the urate levels, with continuous variables expressed as mean (standard deviation, SD) and categorical variables as percentages (%). Cox proportional hazard regression models were used to examine the associations of urate levels with neurodegenerative outcomes. Proportional hazards were tested using scaled Schoenfeld’s residuals. Three models were established: (1) model 1 adjusted for age, sex, and body mass index (BMI); (2) model 2 additionally adjusted for education levels, Townsend deprivation index, smoking status, and drinking status based on model 1; and (3) model 3 additionally adjusted for family history of diseases (hypertension, cardiovascular disease, and diabetes), healthy diet score, and personal history of diseases (kidney disease, hypertension, cardiovascular disease, and diabetes) based on model 2. The P -value for trend was calculated using the median value of urate in each quartile as a continuous variable [ 26 ]. Restricted cubic splines based on Cox proportional hazards regression model [ 27 ] were used to evaluate non-linear associations between urate levels and neurodegenerative outcomes in the multivariable model with 3 knots at the 25th, 50th, and 75th percentiles of the urate levels (with the minimum value used as the reference). To strengthen the robustness of the results, we performed several sensitivity analyses as follows: (1) excluded participants who had incident neurodegenerative outcomes at the initial 5 follow-up years to avoid reverse causality; (2) repeated the analysis after stratifying by age, sex, and BMI; (3) conducted Fine–Gray competing risk analysis to assess the competitive risk of non-neurodegenerative death [ 28 ]; and (4) divided the neurodegenerative death into deaths due to ADRD and PD respectively.

We employed both linear and non-linear MR methods to assess potential causal associations between urate levels and neurodegenerative outcomes. For the linear MR analyses, we examined the associations between urate-related GRS and neurodegenerative outcomes using a Cox regression model. The model was adjusted for various covariates, including age, sex, BMI, educational levels, Townsend deprivation index, smoking status, alcohol consumption, family history of diseases (hypertension, cardiovascular disease, and diabetes), healthy diet score, personal history of diseases (kidney disease, hypertension, cardiovascular disease, and diabetes), the first 10 principal components of ancestry, and genotype measurement batch. In the sensitivity analyses, (1) we employed an unweighted GRS model, calculated by summing the number of urate-related increasing alleles; (2) the SNP rs2231142, identified as the strongest in previous GWAS, was used as an instrumental variable to mitigate the potential introduction of horizontal pleiotropy [ 25 ]; and (3) the urate-related GRS was divided into quartiles to assess the linear MR results. In the non-linear MR analyses, we divided the sample into five strata based on the residual urate levels, which represented the differential value between the observed urate level and the genetically predicted urate level. Within each stratum, we evaluated the linear MR estimate, which contributed to the localized average causal effect (LACE) [ 29 ]. A meta-regression of LACE estimates against the mean of the exposure in each stratum was performed using a flexible semiparametric framework that applied the derivative of fractional polynomial models. This assessment aimed to determine whether a non-linear model offered a better fit for the LACE estimates compared to a linear model [ 30 ]. Two tests for non-linearity were conducted as follows: (1) a Cochran’s Q statistic to assess heterogeneity by analyzing differences between the LACE estimates, and (2) a trend test that involved meta-regression of LACE estimates against the mean value of urate in each stratum.

P- values were two-sided with < 0.05 defined as statistically significant. Statistical Analysis System 9.4 software for Windows was used to conduct the cohort analyses (SAS Institute Inc., Gary, NC, USA), and MR analyses were performed using R version 4.2.3 with “ TwoSampleMR ” and “ NLMR ” packages.

Baseline characteristics of the study population

In this study, a total of 382,182 participants (174,990 [45.7%] men and 207,192 [54.2%] women) were included. Over a median follow-up period of 12 years, 5,400 ADRD cases, 2,553 PD cases, and 1,531 neurodegenerative deaths were documented. Table  1 presents the baseline characteristics categorized by urate levels. Participants with elevated urate levels tended to be older and more frequently drinkers. They also possessed higher BMI values and showed a greater propensity for medical histories of hypertension, diabetes, kidney disease, and cardiovascular disease. Conversely, they scored lower in healthy diet, and educational level compared to those with reduced urate levels.

Observational findings

Table  2 shows the associations between urate levels and risk of neurodegenerative outcomes. In the cohort analyses, urate levels exhibited inverse associations with the risk of ADRD, PD, and neurodegenerative death. With each increase of one SD in urate levels, the risk of ADRD, PD, and neurodegenerative death decreased by 7% (HR: 0.93, 95% CI: 0.90, 0.96), 13% (HR: 0.87, 95% CI: 0.82, 0.91), and 12% (HR: 0.88, 95% CI: 0.83, 0.94), respectively. The restricted cubic spline curves demonstrated that there was no non-linear association between urate levels and ADRD ( P -value for overall < 0.0001, P -value for non-linearity = 0.08), PD ( P -value for overall < 0.0001, P -value for non-linearity = 0.31), and neurodegenerative death ( P -value for overall = 0.0009, P -value for non-linearity = 0.44) (Fig.  1 ). In sensitivity analyses, we achieved consistent findings when: (1) excluding participants with incident neurodegenerative outcomes within the initial 5 follow-up years (Table S4 ); (2) conducting subgroup analyses stratified by age, sex, and BMI (Table S5 ); (3) using a competing risk regression model for the analyses (Table S6 ); (4) divided the neurodegenerative death into deaths due to ADRD and PD respectively (Table S7 ).

figure 1

Shape of the association between urate and neurodegenerative outcomes using restricted cubic spline based on observational data. Adjusted for age, sex, BMI, education levels, Townsend deprivation index, smoking status, alcohol consumption, family history of diseases (hypertension, cardiovascular disease, and diabetes), healthy diet score, and history of diseases (kidney disease, hypertension, cardiovascular disease, and diabetes)

Mendelian randomization results

As depicted in Fig.  2 , there was no linear association between genetically predicted urate levels and risk of ADRD (HR: 0.98, 95% CI: 0.96, 1.01), PD (HR: 1.03, 95% CI: 0.99, 1.06), and neurodegenerative death (HR: 1.01, 95% CI: 0.96, 1.05). Additionally, consistent results were observed in the sensitivity analyses when re-evaluating the associations between unweighted urate-related GRS and neurodegenerative outcomes (Fig. S2 ), using rs2231142 as an instrument variable (Fig. S3 ), or dividing the urate-related GRS into quartiles (Table S8 ). Moreover, there was no evidence of non-linear causal effects between genetically predicted urate levels and risk of ADRD ( P quadratic = 0.77, P cochran Q = 0.49), PD ( P quadratic = 0.24, P cochran Q = 0.54), and neurodegenerative death ( P quadratic = 0.19, P cochran Q = 0.18) (Fig.  3 ).

figure 2

The casual associations between urate levels and neurodegenerative outcomes using linear MR analysis. Adjusted for age, sex, BMI, education levels, Townsend deprivation index, smoking status, alcohol consumption, family history of diseases (hypertension, cardiovascular disease, and diabetes), healthy diet score, history of diseases (kidney disease, hypertension, cardiovascular disease, and diabetes), first 10 principal components of ancestry, and genotype measurement batch

figure 3

Shape of casual relationship between urate and neurodegenerative outcomes using non-linear MR method. Adjusted for age, sex, BMI, education levels, Townsend deprivation index, smoking status, alcohol consumption, family history of diseases (hypertension, cardiovascular disease, and diabetes), healthy diet score, history of diseases (kidney disease, hypertension, cardiovascular disease, and diabetes), first 10 principal components of ancestry, and genotype measurement batch

We investigated the associations between urate levels and neurodegenerative outcomes using a comprehensive approach that involved a large population-based cohort and complementary MR analyses. Our findings suggest that, while elevated urate levels are associated with a reduced risk of incident neurodegenerative outcomes, both linear and non-linear MR analyses demonstrated no evidence of causality of these associations. These results have clinical significance because of the limited research available on the intricate associations between urate levels and neurodegenerative outcomes.

Previous observational epidemiological studies have explored the associations between urate levels and risk of neurodegenerative outcomes [ 12 , 31 , 32 , 33 ], which support part of our findings reported herein. Scheepers et al. found that long-term follow-up data from a Sweden perspective study, which spanned 44 years, highlighted the protective role of urate in the development of dementia across subtypes [ 31 ]. A meta-analysis of 21 case-control studies and 3 cohort studies indicated a potential inverse association between serum uric acid levels and Alzheimer’s disease (AD) risk [ 12 ]. Another systematic review involving 23 studies (5,575 participants) reported low serum uric acid levels as a potential risk factor for both AD and PD [ 32 ]. Additionally, a dose-response meta-analysis of 15 studies involving 449,816 participants and 14,687 cases revealed a 6% reduction in PD risk for every 1 mg/dL increase in the urate level [ 33 ]. However, a population-based cohort study with a 12-year follow-up period reported inconsistent findings, suggesting that elevated serum uric acid levels were associated with an increased risk of dementia [ 13 ]. Based on a large prospective cohort study, we observed a negative association between urate levels and neurodegenerative outcomes. The underlying mechanism may lie in urate’s antioxidant properties, which could offer protection against neurodegeneration by reducing oxidative stress and inflammation [ 7 , 8 ]. Additionally, experimental models of neurodegenerative diseases have shown that urate has neuroprotective effects [ 34 ]. The inconsistency between the results of several studies may be attributed to several factors, including differences in study populations, methodologies, outcome definitions, and potential confounding variables.

To enhance the public health implications of our findings, we also employed MR methods. Although our observational analyses revealed significant negative associations between urate levels and risk of neurodegenerative outcomes in the prospective cohort, the MR analyses did not support a causal association. Through the use of SNPs as exposure proxies, which are randomly distributed among individuals, MR analysis offers an analogous approach to a randomized controlled trial [ 35 ]. Consistent with our results, a previous double-blind, placebo-controlled, phase III randomized trial involving 587 individuals did not establish an association between sustained urate-elevating treatment and PD risk [ 36 ]. The results of our MR study also suggest that increasing urate levels are unlikely to offer clinical benefits in reducing the risk of neurodegenerative outcomes, including ADRD, PD, and neurodegenerative death. This provides an important public health implication, indicating that elevated urate levels may not be effective for preventing neurodegenerative events.

This is the first large-scale investigation examining the associations between urate levels and ADRD, PD, and neurodegenerative death using complementary analyses (cohort and MR analyses), which increased the reliability of our conclusions. The utilization of a large population-based dataset enhanced the statistical power and the applicability of our findings. Furthermore, our MR analyses employed robust instrumental variables, thereby minimizing the potential for weak instrument bias. Additionally, we rigorously assessed key assumptions, ensuring that primary instruments were not related to potential confounders.

Our study has several limitations. Firstly, the potential for selection bias and residual confounding exists, despite our adjustments for multiple confounders. The potential for confounding by unaccounted factors also exists. Secondly, the MR analysis was constrained by the limited number of SNPs used. Although we included a substantial number of genetic variants, a score encompassing a greater array of urate-related SNPs would enhance the robustness of causal investigation. Additionally, it should be acknowledged that certain SNPs utilized in our analysis may exhibit potential correlations with unidentified factors associated with neurodegenerative outcomes. Consequently, we cannot entirely dismiss the potential influence of pleiotropic effects on our findings. Thirdly, the diagnosis of neurodegenerative events was derived from registry-based data rather than comprehensive neuropsychological assessments. Although registry-based diagnoses generally exhibit good accuracy, the potential for misclassification among certain study participants cannot be entirely ruled out. Finally, it is important to note that the participants in this study predominantly belong to the White British ethnicity, which might limit the generalizability of our findings to other ethnicities or populations.

Our study revealed significant linear negative associations between urate levels and risk of ADRD, PD, and neurodegenerative death, as evidenced by a comprehensive large-scale prospective cohort study. However, the MR analyses did not sustain the causality aspect, regardless of the application of linear and non-linear MR analyses. This underscores a crucial public health message that elevated urate levels may not be essential for mitigating neurodegenerative outcomes. Nonetheless, additional research is warranted to validate these findings.

Data availability

Data are available in a public, open access repository. This research has been conducted using the UK Biobank Resource under Application Number 63454. The UK Biobank data are available on application to the UK Biobank ( https://www.ukbiobank.ac.uk/ ).

Abbreviations

Alzheimer’s disease and related dementias

  • Parkinson’s disease
  • Mendelian randomization

Hazard ratio

Confidence interval

Collaborators GBDN. Global, regional, and national burden of neurological disorders, 1990–2016: a systematic analysis for the global burden of Disease Study 2016. Lancet Neurol. 2019;18(5):459–80.

Article   Google Scholar  

(WHO). W.H.O. Neurological Disorders Fact Sheet . https://www.who.int/news-room/fact-sheets/detail/neurological-disorders .

(WHO). W.H.O. Dementia Fact Sheet . https://www.who.int/news-room/fact-sheets/detail/dementia .

(WHO). W.H.O. Parkinson Disease Fact Sheet . https://www.who.int/news-room/fact-sheets/detail/parkinson-disease .

Alvarez-Lario B, Macarron-Vicente J. Uric acid and evolution. Rheumatology (Oxford). 2010;49(11):2010–5.

Article   CAS   PubMed   Google Scholar  

Richette P, Bardin T. Gout Lancet. 2010;375(9711):318–28.

Waring WS. Uric acid: an important antioxidant in acute ischaemic stroke. QJM. 2002;95(10):691–3.

Maxwell SR, et al. Antioxidant status in patients with uncomplicated insulin-dependent and non-insulin-dependent diabetes mellitus. Eur J Clin Invest. 1997;27(6):484–90.

Baillie JK, et al. Endogenous urate production augments plasma antioxidant capacity in healthy lowland subjects exposed to high altitude. Chest. 2007;131(5):1473–8.

Waring WS, et al. Uric acid reduces exercise-induced oxidative stress in healthy adults. Clin Sci (Lond). 2003;105(4):425–30.

Chen X, Wu G, Schwarzschild MA. Urate in Parkinson’s disease: more than a biomarker? Curr Neurol Neurosci Rep. 2012;12(4):367–75.

Du N, et al. Inverse Association between serum uric acid levels and Alzheimer’s Disease Risk. Mol Neurobiol. 2016;53(4):2594–9.

Latourte A, et al. Uric acid and incident dementia over 12 years of follow-up: a population-based cohort study. Ann Rheum Dis. 2018;77(3):328–35.

Gao X, et al. Prospective study of plasma urate and risk of Parkinson disease in men and women. Neurology. 2016;86(6):520–6.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Kia DA, et al. Mendelian randomization study shows no causal relationship between circulating urate levels and Parkinson’s disease. Ann Neurol. 2018;84(2):191–9.

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–98.

Smith GD, Ebrahim S. Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22.

Article   PubMed   Google Scholar  

Verduijn M, et al. Mendelian randomization: use of genetics to enable causal inference in observational studies. Nephrol Dial Transpl. 2010;25(5):1394–8.

Shang X, et al. Association of a wide range of chronic diseases and apolipoprotein E4 genotype with subsequent risk of dementia in community-dwelling adults: a retrospective cohort study. EClinicalMedicine. 2022;45:101335.

Article   PubMed   PubMed Central   Google Scholar  

Wilkinson T, et al. Identifying dementia outcomes in UK Biobank: a validation study of primary care, hospital admissions and mortality data. Eur J Epidemiol. 2019;34(6):557–65.

Pazoki R, et al. Genetic predisposition to high blood pressure and lifestyle factors: associations with midlife blood pressure levels and Cardiovascular events. Circulation. 2018;137(7):653–61.

Wang L, et al. Air pollution and risk of chronic obstructed pulmonary disease: the modifying effect of genetic susceptibility and lifestyle. EBioMedicine. 2022;79:103994.

Bycroft C, Petkova FC, Band D, Elliott G, Sharp LT, Motyer K, Vukcevic A, Delaneau D, O’Connell O, Cortes J, Welsh A, McVean S, Les G lie, Donnelly S, Marchini P. J., Genome-wide genetic data on ~ 500,000 UK Biobank participants bioRxiv, July 20, 2017.

J M. UK Biobank. UK Biobank phasing and imputation documentation

Kottgen A, et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat Genet. 2013;45(2):145–54.

Sotos-Prieto M, et al. Association of Changes in Diet Quality with Total and cause-specific mortality. N Engl J Med. 2017;377(2):143–53.

Desquilbet L, Mariotti F. Dose-response analyses using restricted cubic spline functions in public health research. Stat Med. 2010;29(9):1037–57.

Noordzij M, et al. When do we need competing risks methods for survival analysis in nephrology? Nephrol Dial Transpl. 2013;28(11):2670–7.

Staley JR, Burgess S. Semiparametric methods for estimation of a nonlinear exposure-outcome relationship using instrumental variables with application to mendelian randomization. Genet Epidemiol. 2017;41(4):341–52.

Sun YQ, et al. Body mass index and all cause mortality in HUNT and UK Biobank studies: linear and non-linear mendelian randomisation analyses. BMJ. 2019;364:l1042.

Scheepers L, et al. Urate and risk of Alzheimer’s disease and vascular dementia: a population-based study. Alzheimers Dement. 2019;15(6):754–63.

Zhou Z, et al. Serum uric acid and the risk of dementia: a systematic review and Meta-analysis. Front Aging Neurosci. 2021;13:625690.

Chang H, et al. Dose-response meta-analysis on urate, gout, and the risk for Parkinson’s disease. NPJ Parkinsons Dis. 2022;8(1):160.

Squadrito GL, et al. Reaction of uric acid with peroxynitrite and implications for the mechanism of neuroprotection by uric acid. Arch Biochem Biophys. 2000;376(2):333–7.

Zhou H, et al. Mendelian randomization study showed no causality between metformin use and lung cancer risk. Int J Epidemiol. 2020;49(4):1406–7.

Parkinson Study Group. Effect of urate-elevating inosine on early Parkinson Disease Progression: the SURE-PD3 Randomized Clinical Trial. JAMA. 2021;326(10):926–39.

Download references

Acknowledgements

The authors gratefully acknowledge all the people who have made this study.

This work was supported by the Young Elite Scientists Sponsorship Program by China Association for Science and Technology (grant number 2020QNRC001 to Yang Xia and Yu An), the 345 Talent Project of Shengjing Hospital of China Medical University (grant number M0294 to Yang Xia); the LiaoNing Revitalization Talents Program (grant number XLYC2203168 to Yang Xia); the National Natural Science Foundation of China (grant number 92357305 to Yanhui Lu), the Fundamental Research Funds for the Central Universities (to Yanhui Lu); the National Natural Science Foundation of China (grant number 82103811 to Yu An), the Beijing Hospitals Authority Youth Programme (grant number QML20230301 to Yu An); the Natural Science Major Project of the Anhui Provincial Department of Education (grant number 2022AH051233 to Tingjing Zhang), the Youth Key Talents Program of Wannan Medical College (grant number WK202211 to Tingjing Zhang), and Doctoral Research Grant Fund of Wannan Medical College (grant number WYRCQD2022008 to Tingjing Zhang). The funders had no role in the conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Author information

Tingjing Zhang and Yu An contributed equally to this work.

Authors and Affiliations

School of Public Health, Wannan Medical College, Wuhu, China

Tingjing Zhang

Institutes of Brain Science, Wannan Medical College, Wuhu, China

Department of Endocrinology, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China

Department of Clinical Nutrition, Yijishan Hospital of Wannan Medical College, Wuhu, China

Zhenfei Shen

Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, No. 36, San Hao Street, Shenyang, Liaoning, 110004, China

Honghao Yang, Jinguo Jiang & Yang Xia

Liaoning Key Laboratory of Precision Medical Research on Major Chronic Disease, Shenyang, China

Department of Nutrition and Food Hygiene, Hubei Key Laboratory of Food Nutrition and Safety, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

Liangkai Chen

School of Nursing, Peking University, No. 38 Xueyuan Rd, Haidian District, Beijing, 100191, China

You can also search for this author in PubMed   Google Scholar

Contributions

Authors’ contributions: Y. X. and Y. L. had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis; Study concept and design: T. Z., and Y. X.; Acquisition, analysis, or interpretation of data: T. Z., and Y. X.; Drafting of the manuscript: T. Z., and Y. X.; Critical revision of the manuscript for important intellectual content: T. Z., Y. A., Z. S., H. Y., J. J., L. C., Y. L., and Y. X.; Statistical analysis: T. Z., and Y. X.; Obtained funding: Y. X. and Y. L.; Administrative, technical, or material support: T. Z. and Y. A.; Study supervision: Y. X. and Y. L.

Corresponding authors

Correspondence to Yanhui Lu or Yang Xia .

Ethics declarations

Ethics approval and consent to participate.

The UK Biobank study was approved by the Northwest Multi-Center Research Ethics Committee, and each participant provided written informed consent before participating in the study. The data resource used for this study was obtained under application number 63454 from the UK Biobank.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, T., An, Y., Shen, Z. et al. Serum urate levels and neurodegenerative outcomes: a prospective cohort study and mendelian randomization analysis of the UK Biobank. Alz Res Therapy 16 , 106 (2024). https://doi.org/10.1186/s13195-024-01476-x

Download citation

Received : 09 December 2023

Accepted : 06 May 2024

Published : 11 May 2024

DOI : https://doi.org/10.1186/s13195-024-01476-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Alzheimer’s disease
  • Neurodegenerative-related deaths
  • Prospective cohort study

Alzheimer's Research & Therapy

ISSN: 1758-9193

analyzing data in research

IMAGES

  1. 5 Steps of the Data Analysis Process

    analyzing data in research

  2. What is Data Analysis in Research

    analyzing data in research

  3. 8 Types of Analysis in Research

    analyzing data in research

  4. Data analysis

    analyzing data in research

  5. Standard statistical tools in research and data analysis

    analyzing data in research

  6. How-To: Data Analytics for Beginners

    analyzing data in research

VIDEO

  1. Analysis of Data? Some Examples to Explore

  2. AI platform for analyzing unstructured data, creating actionable insights #shorts

  3. 4 Ways ChatGPT Can Help You Research Like A Boss!!!

  4. Data Analysis in Research

  5. 100% Genuine Explanation by an Industry Expert #digitalmarketingcourse #shorts

  6. Analyzing and Interpreting Research Data, Part 1 || Grade 10 Math Quarter 4 Week 5

COMMENTS

  1. What is Data Analysis? An Expert Guide With Examples

    Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It is a multifaceted process involving various techniques and methodologies to interpret data from various sources in different formats, both structured and unstructured.

  2. What Is Data Analysis? (With Examples)

    Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...

  3. The Beginner's Guide to Statistical Analysis

    Statistical analysis means investigating trends, patterns, and relationships using quantitative data. It is an important research tool used by scientists, governments, businesses, and other organizations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify ...

  4. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  5. Introduction to Data Analysis

    Data analysis can be quantitative, qualitative, or mixed methods. Quantitative research typically involves numbers and "close-ended questions and responses" (Creswell & Creswell, 2018, p. 3).Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures (Creswell & Creswell, 2018, p. 4).

  6. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  7. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.

  8. Data Analysis

    Data Analysis. Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.

  9. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  10. Research Guides: Data Literacy for Researchers: Overview

    This guide is organized by three stages of the research data life cycle: 1. Finding Data. 2. Analyzing and Visualizing Data. 3. Communicating Data. These stages are common to data-driven research across disciplines, though the definition of data itself can vary by discipline. Check out WI+RE's Intro to Data Literacy tutorial to explore ...

  11. Research Methods Guide: Data Analysis

    Data Analysis and Presentation Techniques that Apply to both Survey and Interview Research. Create a documentation of the data and the process of data collection. Analyze the data rather than just describing it - use it to tell a story that focuses on answering the research question. Use charts or tables to help the reader understand the data ...

  12. Research Methods

    Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  13. What is data analysis? Methods, techniques, types & how-to

    Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

  14. Research Guides: Research Process Guide: Step 9

    Regardless of your methodology, these are the 4 steps in the data analysis process: Describe the data clearly. Identify what is typical and atypical among the data. Uncover relationships and other patterns within the data. Answer research questions or test hypotheses. Quantitative data analysis.

  15. How to use and assess qualitative research methods

    How to conduct qualitative research? Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [13, 14].As Fossey puts it: "sampling, data collection, analysis and interpretation are related to each other in a cyclical ...

  16. How to Analyze Research Data: A Step-by-Step Guide

    Analyzing research data is a crucial skill for any researcher, whether you are conducting a survey, an experiment, a case study, or any other type of research. Data analysis helps you answer your ...

  17. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  18. Data Analysis

    While data analysis in qualitative research can include statistical procedures, many times analysis becomes an ongoing iterative process where data is continuously collected and analyzed almost simultaneously. Indeed, researchers generally analyze for patterns in observations through the entire data collection phase (Savenye, Robinson, 2004). ...

  19. Analyzing Data

    Analyzing your data also involves ensuring that a future researcher (who may or may not be you) can understand and potentially replicate your analyses. What does it mean to analyze data? The methods you use to draw conclusions from your data will, of course, depend on your research questions, your field of research, and the tools you have ...

  20. What Is Qualitative Research?

    Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research. Qualitative research is the opposite of quantitative research, which involves collecting and ...

  21. Analyzing Data

    a. lyz. i. n. g Data. There is more to analyzing your data than running statistical tests, summarizing comparisons, and creating visualizations. Analyzing your data also involves ensuring that a future researcher (who may or may not be you) can understand and potentially replicate your analyses.

  22. How to Analyze Data in a Primary Research Study

    Teacher Resources for How to Analyze Data in a Primary Research Study Overview and Teaching Strategies. This chapter is intended as an overview of analyzing qualitative research data and was written as a follow-up piece to Dana Lynn Driscoll's "Introduction to Primary Research: Observations, Surveys, and Interviews" in Volume 2 of this ...

  23. A dataset for measuring the impact of research data and their curation

    This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation ...

  24. Understanding Data Analysis: A Beginner's Guide

    Gather your data from all relevant sources using data analysis software. Ensure that the data is representative and actually covers the variables you want to analyze. 3. Select your analytical methods. Investigate the various data analysis methods and select the technique that best aligns with your objectives.

  25. Measurement and analysis of change in research scholars' knowledge and

    Knowledge of statistics is highly important for research scholars, as they are expected to submit a thesis based on original research as part of a PhD program. As statistics play a major role in the analysis and interpretation of scientific data, intensive training at the beginning of a PhD programme is essential. PhD coursework is mandatory in universities and higher education institutes in ...

  26. Data Collection

    Data Collection | Definition, Methods & Examples. Published on June 5, 2020 by Pritha Bhandari.Revised on June 21, 2023. Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

  27. A love of marine biology and data analysis

    Kelsey Beavers conducting research on coral reef disease. Kelsey Beavers' love of the ocean started at a young age. Coming from a family of avid scuba divers, she became a certified junior diver at age 11. ... Beavers enjoyed the data analysis part of her project so much that when she saw an opening at TACC for a full-time position, she ...

  28. Methodology

    The analysis in this report is based on a self-administered web survey conducted from Sept. 26 to Oct. 23, 2023, among a sample of 1,453 dyads, with each. Numbers, Facts and Trends Shaping Your World ... The IRB thoroughly vetted this research before data collection began. Due to the risks associated with surveying minors, this research ...

  29. Serum urate levels and neurodegenerative outcomes: a prospective cohort

    The data resource used for this study was obtained under application number 63,454 from the UK Biobank. Assessment of exposure, outcome, and covariates. Baseline serum urate levels were measured using the uricase pedigree analysis package of the Beckman Coulter AU5800 platform (Randox Biosciences, Crumlin, UK).

  30. Home

    Zillow Research aims to be the most open, authoritative source for timely and accurate housing data and unbiased insight. Affordability There Are a Record-High 550 'Million-Dollar' Cities in the US