• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

definition of analytical research

Home Market Research Research Tools and Apps

Analytical Research: What is it, Importance + Examples

Analytical research is a type of research that requires critical thinking skills and the examination of relevant facts and information.

Finding knowledge is a loose translation of the word “research.” It’s a systematic and scientific way of researching a particular subject. As a result, research is a form of scientific investigation that seeks to learn more. Analytical research is one of them.

Any kind of research is a way to learn new things. In this research, data and other pertinent information about a project are assembled; after the information is gathered and assessed, the sources are used to support a notion or prove a hypothesis.

An individual can successfully draw out minor facts to make more significant conclusions about the subject matter by using critical thinking abilities (a technique of thinking that entails identifying a claim or assumption and determining whether it is accurate or untrue).

What is analytical research?

This particular kind of research calls for using critical thinking abilities and assessing data and information pertinent to the project at hand.

Determines the causal connections between two or more variables. The analytical study aims to identify the causes and mechanisms underlying the trade deficit’s movement throughout a given period.

It is used by various professionals, including psychologists, doctors, and students, to identify the most pertinent material during investigations. One learns crucial information from analytical research that helps them contribute fresh concepts to the work they are producing.

Some researchers perform it to uncover information that supports ongoing research to strengthen the validity of their findings. Other scholars engage in analytical research to generate fresh perspectives on the subject.

Various approaches to performing research include literary analysis, Gap analysis , general public surveys, clinical trials, and meta-analysis.

Importance of analytical research

The goal of analytical research is to develop new ideas that are more believable by combining numerous minute details.

The analytical investigation is what explains why a claim should be trusted. Finding out why something occurs is complex. You need to be able to evaluate information critically and think critically. 

This kind of information aids in proving the validity of a theory or supporting a hypothesis. It assists in recognizing a claim and determining whether it is true.

Analytical kind of research is valuable to many people, including students, psychologists, marketers, and others. It aids in determining which advertising initiatives within a firm perform best. In the meantime, medical research and research design determine how well a particular treatment does.

Thus, analytical research can help people achieve their goals while saving lives and money.

Methods of Conducting Analytical Research

Analytical research is the process of gathering, analyzing, and interpreting information to make inferences and reach conclusions. Depending on the purpose of the research and the data you have access to, you can conduct analytical research using a variety of methods. Here are a few typical approaches:

Quantitative research

Numerical data are gathered and analyzed using this method. Statistical methods are then used to analyze the information, which is often collected using surveys, experiments, or pre-existing datasets. Results from quantitative research can be measured, compared, and generalized numerically.

Qualitative research

In contrast to quantitative research, qualitative research focuses on collecting non-numerical information. It gathers detailed information using techniques like interviews, focus groups, observations, or content research. Understanding social phenomena, exploring experiences, and revealing underlying meanings and motivations are all goals of qualitative research.

Mixed methods research

This strategy combines quantitative and qualitative methodologies to grasp a research problem thoroughly. Mixed methods research often entails gathering and evaluating both numerical and non-numerical data, integrating the results, and offering a more comprehensive viewpoint on the research issue.

Experimental research

Experimental research is frequently employed in scientific trials and investigations to establish causal links between variables. This approach entails modifying variables in a controlled environment to identify cause-and-effect connections. Researchers randomly divide volunteers into several groups, provide various interventions or treatments, and track the results.

Observational research

With this approach, behaviors or occurrences are observed and methodically recorded without any outside interference or variable data manipulation . Both controlled surroundings and naturalistic settings can be used for observational research . It offers useful insights into behaviors that occur in the actual world and enables researchers to explore events as they naturally occur.

Case study research

This approach entails thorough research of a single case or a small group of related cases. Case-control studies frequently include a variety of information sources, including observations, records, and interviews. They offer rich, in-depth insights and are particularly helpful for researching complex phenomena in practical settings.

Secondary data analysis

Examining secondary information is time and money-efficient, enabling researchers to explore new research issues or confirm prior findings. With this approach, researchers examine previously gathered information for a different reason. Information from earlier cohort studies, accessible databases, or corporate documents may be included in this.

Content analysis

Content research is frequently employed in social sciences, media observational studies, and cross-sectional studies. This approach systematically examines the content of texts, including media, speeches, and written documents. Themes, patterns, or keywords are found and categorized by researchers to make inferences about the content.

Depending on your research objectives, the resources at your disposal, and the type of data you wish to analyze, selecting the most appropriate approach or combination of methodologies is crucial to conducting analytical research.

Examples of analytical research

Analytical research takes a unique measurement. Instead, you would consider the causes and changes to the trade imbalance. Detailed statistics and statistical checks help guarantee that the results are significant.

For example, it can look into why the value of the Japanese Yen has decreased. This is so that an analytical study can consider “how” and “why” questions.

Another example is that someone might conduct analytical research to identify a study’s gap. It presents a fresh perspective on your data. Therefore, it aids in supporting or refuting notions.

Descriptive vs analytical research

Here are the key differences between descriptive research and analytical research:

The study of cause and effect makes extensive use of analytical research. It benefits from numerous academic disciplines, including marketing, health, and psychology, because it offers more conclusive information for addressing research issues.

QuestionPro offers solutions for every issue and industry, making it more than just survey software. For handling data, we also have systems like our InsightsHub research library.

You may make crucial decisions quickly while using QuestionPro to understand your clients and other study subjects better. Make use of the possibilities of the enterprise-grade research suite right away!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

A/B testing software

Top 13 A/B Testing Software for Optimizing Your Website

Apr 12, 2024

contact center experience software

21 Best Contact Center Experience Software in 2024

Government Customer Experience

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App

Employee Engagement App: Top 11 For Workforce Improvement 

Apr 10, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

404 Not found

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

Research Methods | Definitions, Types, Examples

Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design . When planning your methods, there are two key decisions you will make.

First, decide how you will collect data . Your methods depend on what type of data you need to answer your research question :

  • Qualitative vs. quantitative : Will your data take the form of words or numbers?
  • Primary vs. secondary : Will you collect original data yourself, or will you use data that has already been collected by someone else?
  • Descriptive vs. experimental : Will you take measurements of something as it is, or will you perform an experiment?

Second, decide how you will analyze the data .

  • For quantitative data, you can use statistical analysis methods to test relationships between variables.
  • For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.

Table of contents

Methods for collecting data, examples of data collection methods, methods for analyzing data, examples of data analysis methods, other interesting articles, frequently asked questions about research methods.

Data is the information that you collect for the purposes of answering your research question . The type of data you need depends on the aims of your research.

Qualitative vs. quantitative data

Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.

For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data .

If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing , collect quantitative data .

You can also take a mixed methods approach , where you use both qualitative and quantitative research methods.

Primary vs. secondary research

Primary research is any original data that you collect yourself for the purposes of answering your research question (e.g. through surveys , observations and experiments ). Secondary research is data that has already been collected by other researchers (e.g. in a government census or previous scientific studies).

If you are exploring a novel research question, you’ll probably need to collect primary data . But if you want to synthesize existing knowledge, analyze historical trends, or identify patterns on a large scale, secondary data might be a better choice.

Descriptive vs. experimental data

In descriptive research , you collect data about your study subject without intervening. The validity of your research will depend on your sampling method .

In experimental research , you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design .

To conduct an experiment, you need to be able to vary your independent variable , precisely measure your dependent variable, and control for confounding variables . If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Your data analysis methods will depend on the type of data you collect and how you prepare it for analysis.

Data can often be analyzed both quantitatively and qualitatively. For example, survey responses could be analyzed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.

Qualitative analysis methods

Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that was collected:

  • From open-ended surveys and interviews , literature reviews , case studies , ethnographies , and other sources that use text rather than numbers.
  • Using non-probability sampling methods .

Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions and be careful to avoid research bias .

Quantitative analysis methods

Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).

You can use quantitative analysis to interpret data that was collected either:

  • During an experiment .
  • Using probability sampling methods .

Because the data is collected and analyzed in a statistically valid way, the results of quantitative analysis can be easily standardized and shared among researchers.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis
  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Is this article helpful?

Other students also liked, writing strong research questions | criteria & examples.

  • What Is a Research Design | Types, Guide & Examples
  • Data Collection | Definition, Methods & Examples

More interesting articles

  • Between-Subjects Design | Examples, Pros, & Cons
  • Cluster Sampling | A Simple Step-by-Step Guide with Examples
  • Confounding Variables | Definition, Examples & Controls
  • Construct Validity | Definition, Types, & Examples
  • Content Analysis | Guide, Methods & Examples
  • Control Groups and Treatment Groups | Uses & Examples
  • Control Variables | What Are They & Why Do They Matter?
  • Correlation vs. Causation | Difference, Designs & Examples
  • Correlational Research | When & How to Use
  • Critical Discourse Analysis | Definition, Guide & Examples
  • Cross-Sectional Study | Definition, Uses & Examples
  • Descriptive Research | Definition, Types, Methods & Examples
  • Ethical Considerations in Research | Types & Examples
  • Explanatory and Response Variables | Definitions & Examples
  • Explanatory Research | Definition, Guide, & Examples
  • Exploratory Research | Definition, Guide, & Examples
  • External Validity | Definition, Types, Threats & Examples
  • Extraneous Variables | Examples, Types & Controls
  • Guide to Experimental Design | Overview, Steps, & Examples
  • How Do You Incorporate an Interview into a Dissertation? | Tips
  • How to Do Thematic Analysis | Step-by-Step Guide & Examples
  • How to Write a Literature Review | Guide, Examples, & Templates
  • How to Write a Strong Hypothesis | Steps & Examples
  • Inclusion and Exclusion Criteria | Examples & Definition
  • Independent vs. Dependent Variables | Definition & Examples
  • Inductive Reasoning | Types, Examples, Explanation
  • Inductive vs. Deductive Research Approach | Steps & Examples
  • Internal Validity in Research | Definition, Threats, & Examples
  • Internal vs. External Validity | Understanding Differences & Threats
  • Longitudinal Study | Definition, Approaches & Examples
  • Mediator vs. Moderator Variables | Differences & Examples
  • Mixed Methods Research | Definition, Guide & Examples
  • Multistage Sampling | Introductory Guide & Examples
  • Naturalistic Observation | Definition, Guide & Examples
  • Operationalization | A Guide with Examples, Pros & Cons
  • Population vs. Sample | Definitions, Differences & Examples
  • Primary Research | Definition, Types, & Examples
  • Qualitative vs. Quantitative Research | Differences, Examples & Methods
  • Quasi-Experimental Design | Definition, Types & Examples
  • Questionnaire Design | Methods, Question Types & Examples
  • Random Assignment in Experiments | Introduction & Examples
  • Random vs. Systematic Error | Definition & Examples
  • Reliability vs. Validity in Research | Difference, Types and Examples
  • Reproducibility vs Replicability | Difference & Examples
  • Reproducibility vs. Replicability | Difference & Examples
  • Sampling Methods | Types, Techniques & Examples
  • Semi-Structured Interview | Definition, Guide & Examples
  • Simple Random Sampling | Definition, Steps & Examples
  • Single, Double, & Triple Blind Study | Definition & Examples
  • Stratified Sampling | Definition, Guide & Examples
  • Structured Interview | Definition, Guide & Examples
  • Survey Research | Definition, Examples & Methods
  • Systematic Review | Definition, Example, & Guide
  • Systematic Sampling | A Step-by-Step Guide with Examples
  • Textual Analysis | Guide, 3 Approaches & Examples
  • The 4 Types of Reliability in Research | Definitions & Examples
  • The 4 Types of Validity in Research | Definitions & Examples
  • Transcribing an Interview | 5 Steps & Transcription Software
  • Triangulation in Research | Guide, Types, Examples
  • Types of Interviews in Research | Guide & Examples
  • Types of Research Designs Compared | Guide & Examples
  • Types of Variables in Research & Statistics | Examples
  • Unstructured Interview | Definition, Guide & Examples
  • What Is a Case Study? | Definition, Examples & Methods
  • What Is a Case-Control Study? | Definition & Examples
  • What Is a Cohort Study? | Definition & Examples
  • What Is a Conceptual Framework? | Tips & Examples
  • What Is a Controlled Experiment? | Definitions & Examples
  • What Is a Double-Barreled Question?
  • What Is a Focus Group? | Step-by-Step Guide & Examples
  • What Is a Likert Scale? | Guide & Examples
  • What Is a Prospective Cohort Study? | Definition & Examples
  • What Is a Retrospective Cohort Study? | Definition & Examples
  • What Is Action Research? | Definition & Examples
  • What Is an Observational Study? | Guide & Examples
  • What Is Concurrent Validity? | Definition & Examples
  • What Is Content Validity? | Definition & Examples
  • What Is Convenience Sampling? | Definition & Examples
  • What Is Convergent Validity? | Definition & Examples
  • What Is Criterion Validity? | Definition & Examples
  • What Is Data Cleansing? | Definition, Guide & Examples
  • What Is Deductive Reasoning? | Explanation & Examples
  • What Is Discriminant Validity? | Definition & Example
  • What Is Ecological Validity? | Definition & Examples
  • What Is Ethnography? | Definition, Guide & Examples
  • What Is Face Validity? | Guide, Definition & Examples
  • What Is Non-Probability Sampling? | Types & Examples
  • What Is Participant Observation? | Definition & Examples
  • What Is Peer Review? | Types & Examples
  • What Is Predictive Validity? | Examples & Definition
  • What Is Probability Sampling? | Types & Examples
  • What Is Purposive Sampling? | Definition & Examples
  • What Is Qualitative Observation? | Definition & Examples
  • What Is Qualitative Research? | Methods & Examples
  • What Is Quantitative Observation? | Definition & Examples
  • What Is Quantitative Research? | Definition, Uses & Methods

What is your plagiarism score?

What are Analytical Study Designs?

  • Research Process
  • Peer Review

Analytical study designs can be experimental or observational and each type has its own features. In this article, you'll learn the main types of designs and how to figure out which one you'll need for your study.

Updated on September 19, 2022

word cloud highlighting research, results, and analysis

A study design is critical to your research study because it determines exactly how you will collect and analyze your data. If your study aims to study the relationship between two variables, then an analytical study design is the right choice.

But how do you know which type of analytical study design is best for your specific research question? It's necessary to have a clear plan before you begin data collection. Lots of researchers, sadly, speed through this or don't do it at all.

When are analytical study designs used?

A study design is a systematic plan, developed so you can carry out your research study effectively and efficiently. Having a design is important because it will determine the right methodologies for your study. Using the right study design makes your results more credible, valid, and coherent.

Descriptive vs. analytical studies

Study designs can be broadly divided into either descriptive or analytical.

Descriptive studies describe characteristics such as patterns or trends. They answer the questions of what, who, where, and when, and they generate hypotheses. They include case reports and qualitative studies.

Analytical study designs quantify a relationship between different variables. They answer the questions of why and how. They're used to test hypotheses and make predictions.

Experimental and observational

Analytical study designs can be either experimental or observational. In experimental studies, researchers manipulate something in a population of interest and examine its effects. These designs are used to establish a causal link between two variables.

In observational studies, in contrast, researchers observe the effects of a treatment or intervention without manipulating anything. Observational studies are most often used to study larger patterns over longer periods.

Experimental study designs

Experimental study designs are when a researcher introduces a change in one group and not in another. Typically, these are used when researchers are interested in the effects of this change on some outcome. It's important to try to ensure that both groups are equivalent at baseline to make sure that any differences that arise are from any introduced change.

In one study, Reiner and colleagues studied the effects of a mindfulness intervention on pain perception . The researchers randomly assigned participants into an experimental group that received a mindfulness training program for two weeks. The rest of the participants were placed in a control group that did not receive the intervention.

Experimental studies help us establish causality. This is critical in science because we want to know whether one variable leads to a change, or causes another. Establishing causality leads to higher internal validity and makes results reproducible.

Experimental designs include randomized control trials (RCTs), nonrandomized control trials (non-RCTs), and crossover designs. Read on to learn the differences.

Randomized control trials

In an RCT, one group of individuals receives an intervention or a treatment, while another does not. It's then possible to investigate what happens to the participants in each group.

Another important feature of RCTs is that participants are randomly assigned to study groups. This helps to limit certain biases and retain better control. Randomization also lets researchers pinpoint any differences in outcomes to the intervention received during the trial. RTCs are considered the gold standard in biomedical research and are considered to provide the best kind of evidence.

For example, one RCT looked at whether an exercise intervention impacts depression . Researchers randomly placed patients with depressive symptoms into intervention groups containing different types of exercise (i.e., light, moderate, or strong). Another group received usual medications or no exercise interventions.

Results showed that after the 12-week trial, patients in all exercise groups had decreased depression levels compared to the control group. This means that by using an RCT design, researchers can now safely assume that the exercise variable has a positive impact on depression.

However, RCTs are not without drawbacks. In the example above, we don't know if exercise still has a positive impact on depression in the long term. This is because it's not feasible to keep people under these controlled settings for a long time.

Advantages of RCTs

  • It is possible to infer causality
  • Everything is properly controlled, so very little is left to chance or bias
  • Can be certain that any difference is coming from the intervention

Disadvantages of RCTs

  • Expensive and can be time-consuming
  • Can take years for results to be available
  • Cannot be done for certain types of questions due to ethical reasons, such as asking participants to undergo harmful treatment
  • Limited in how many participants researchers can adequately manage in one study or trial
  • Not feasible for people to live under controlled conditions for a long time

Nonrandomized controlled trials

Nonrandomized controlled trials are a type of nonrandomized controlled studies (NRS) where the allocation of participants to intervention groups is not done randomly . Here, researchers purposely assign some participants to one group and others to another group based on certain features. Alternatively, participants can sometimes also decide which group they want to be in.

For example, in one study, clinicians were interested in the impact of stroke recovery after being in an enriched versus non-enriched hospital environment . Patients were selected for the trial if they fulfilled certain requirements common to stroke recovery. Then, the intervention group was given access to an enriched environment (i.e. internet access, reading, going outside), and another group was not. Results showed that the enriched group performed better on cognitive tasks.

NRS are useful in medical research because they help study phenomena that would be difficult to measure with an RCT. However, one of their major drawbacks is that we cannot be sure if the intervention leads to the outcome. In the above example, we can't say for certain whether those patients improved after stroke because they were in the enriched environment or whether there were other variables at play.

Advantages of NRS's

  • Good option when randomized control trials are not feasible
  • More flexible than RCTs

Disadvantages of NRS's

  • Can't be sure if the groups have underlying differences
  • Introduces risk of bias and confounds

Crossover study

In a crossover design, each participant receives a sequence of different treatments. Crossover designs can be applied to RCTs, in which each participant is randomly assigned to different study groups.

For example, one study looked at the effects of replacing butter with margarine on lipoproteins levels in individuals with cholesterol . Patients were randomly assigned to a 6-week butter diet, followed by a 6-week margarine diet. In between both diets, participants ate a normal diet for 5 weeks.

These designs are helpful because they reduce bias. In the example above, each participant completed both interventions, making them serve as their own control. However, we don't know if eating butter or margarine first leads to certain results in some subjects.

Advantages of crossover studies

  • Each participant serves as their own control, reducing confounding variables
  • Require fewer participants, so they have better statistical power

Disadvantages of crossover studies

  • Susceptible to order effects, meaning the order in which a treatment was given may have an effect
  • Carry-over effects between treatments

Observational studies

In observational studies, researchers watch (observe) the effects of a treatment or intervention without trying to change anything in the population. Observational studies help us establish broad trends and patterns in large-scale datasets or populations. They are also a great alternative when an experimental study is not an option.

Unlike experimental research, observational studies do not help us establish causality. This is because researchers do not actively control any variables. Rather, they investigate statistical relationships between them. Often this is done using a correlational approach.

For example, researchers would like to examine the effects of daily fiber intake on bone density . They conduct a large-scale survey of thousands of individuals to examine correlations of fiber intake with different health measures.

The main observational studies are case-control, cohort, and cross-sectional. Let's take a closer look at each one below.

Case-control study

A case-control is a type of observational design in which researchers identify individuals with an existing health situation (cases) and a similar group without the health issue (controls). The cases and the controls are then compared based on some measurements.

Frequently, data collection in a case-control study is retroactive (i.e., backwards in time). This is because participants have already been exposed to the event in question. Additionally, researchers must go through records and patient files to obtain the records for this study design.

For example, a group of researchers examined whether using sleeping pills puts people at risk of Alzheimer's disease . They selected 1976 individuals that received a dementia diagnosis (“cases”) with 7184 other individuals (“controls”). Cases and controls were matched on specific measures such as sex and age. Patient data was consulted to find out how much sleeping pills were consumed over the course of a certain time.

Case-control is ideal for situations where cases are easy to pick out and compare. For instance, in studying rare diseases or outbreaks.

Advantages of case-control studies

  • Feasible for rare diseases
  • Cheaper and easier to do than an RCT

Disadvantages of case-control studies

  • Relies on patient records, which could be lost or damaged
  • Potential recall and selection bias

Cohort study (longitudinal)

A cohort is a group of people who are linked in some way. For instance, a birth year cohort is all people born in a specific year. In cohort studies, researchers compare what happens to individuals in the cohort that have been exposed to some variable compared with those that haven't on different variables. They're also called longitudinal studies.

The cohort is then repeatedly assessed on variables of interest over a period of time. There is no set amount of time required for cohort studies. They can range from a few weeks to many years.

Cohort studies can be prospective. In this case, individuals are followed for some time into the future. They can also be retrospective, where data is collected on a cohort from records.

One of the longest cohort studies today is The Harvard Study of Adult Development . This cohort study has been tracking various health outcomes of 268 Harvard graduates and 456 poor individuals in Boston from 1939 to 2014. Physical screenings, blood samples, brain scans and surveys were collected on this cohort for over 70 years. This study has produced a wealth of knowledge on outcomes throughout life.

A cohort study design is a good option when you have a specific group of people you want to study over time. However, a major drawback is that they take a long time and lack control.

Advantages of cohort studies

  • Ethically safe
  • Allows you to study multiple outcome variables
  • Establish trends and patterns

Disadvantages of cohort studies

  • Time consuming and expensive
  • Can take many years for results to be revealed
  • Too many variables to manage
  • Depending on length of study, can have many changes in research personnel

Cross-sectional study

Cross-sectional studies are also known as prevalence studies. They look at the relationship of specific variables in a population in one given time. In cross-sectional studies, the researcher does not try to manipulate any of the variables, just study them using statistical analyses. Cross-sectional studies are also called snapshots of a certain variable or time.

For example, researchers wanted to determine the prevalence of inappropriate antibiotic use to study the growing concern about antibiotic resistance. Participants completed a self-administered questionnaire assessing their knowledge and attitude toward antibiotic use. Then, researchers performed statistical analyses on their responses to determine the relationship between the variables.

Cross-sectional study designs are ideal when gathering initial data on a research question. This data can then be analyzed again later. By knowing the public's general attitudes towards antibiotics, this information can then be relayed to physicians or public health authorities. However, it's often difficult to determine how long these results stay true for.

Advantages of cross-sectional studies

  • Fast and inexpensive
  • Provides a great deal of information for a given time point
  • Leaves room for secondary analysis

Disadvantages of cross-sectional studies

  • Requires a large sample to be accurate
  • Not clear how long results remain true for
  • Do not provide information on causality
  • Cannot be used to establish long-term trends because data is only for a given time

So, how about your next study?

Whether it's an RCT, a case-control, or even a qualitative study, AJE has services to help you at every step of the publication process. Get expert guidance and publish your work for the world to see.

The AJE Team

The AJE Team

See our "Privacy Policy"

  • Open access
  • Published: 13 March 2022

The role of analytic direction in qualitative research

  • Joanna E. M. Sale 1 , 2 , 3  

BMC Medical Research Methodology volume  22 , Article number:  66 ( 2022 ) Cite this article

4841 Accesses

5 Citations

3 Altmetric

Metrics details

The literature on qualitative data analysis mostly concerns analyses pertaining to an individual research question and the organization of data within that research question. Few authors have written about the entire qualitative dataset from which multiple and separate analyses could be conducted and reported. The concept of analytic direction is a strategy that can assist qualitative researchers in deciding which findings to highlight within a dataset. The objectives of this paper were to: 1) describe the importance of analytic direction in qualitative research, and 2) provide a working example of the concept of analytic direction.

A qualitative dataset from one of the author’s research programs was selected for review. Ten potential analytic directions were identified after the initial phenomenological analysis was conducted. Three analytic directions based on the same coding template but different content areas of the data were further developed using phenomenological analysis ( n  = 2) and qualitative description ( n  = 1) and are the focus of this paper. Development and selection of these three analytic directions was determined partially relying on methodological criteria to promote rigour including a comprehensive examination of the data, the use of multiple analysts, direct quotations to support claims, negative case analysis, and reflexivity.

The three analytic directions addressed topics within the scope of the overall research question. Each analytic direction had its own central point or story line and each highlighted a different perspective or voice. The use of an inductive and deductive approach to analysis and how the role of theory was integrated varied in each analytic direction.

Conclusions

The concept of analytic direction enables researchers to organize their qualitative datasets in order to tell different and unique “stories”. The concept relies upon, and promotes, the conduct of rigourous qualitative research.

Peer Review reports

Reports on data analysis in qualitative research are well documented. Procedural steps have been described [ 1 , 2 , 3 , 4 , 5 , 6 , 7 ] and authors have made distinctions between the concepts of coding, analysis, and interpretation [ 1 , 2 , 8 , 9 ]. Authors have written about different researchers accessing different representations of a topic or phenomenon [ 2 , 10 ] or multiple interpretations being applied to the same transcript [ 11 ]. The literature on data analysis mostly concerns analyses pertaining to an individual research question and the organization of data within that research question. Few authors have written about the entire qualitative dataset from which multiple and separate analyses could be conducted and reported.

The data collected by qualitative researchers can be voluminous and often surpass the data pertaining to objectives outlined in grant proposals. These data may be compelling but analyses of some data are often given lower priority if they do not align directly with the stated objectives.

There comes a point during data collection and analysis where qualitative researchers must choose “which story, of the many stories available to them in a data set, to tell” (p. 376) [ 12 ]. According to Arthur Frank, “[a] fter the methods, there has to be a story” (p. 431) [ 13 ]. “Stories” should have a central point or storyline [ 12 ]. The final report can be told from the perspective of different voices [ 12 ] and organized by time such as emphasizing key turning points and milestones in the sequence of events studied [ 12 , 14 ] or by using other forms of representation such as metaphors [ 2 , 12 ]. Theory can be central or more peripheral in the account [ 15 ]. The question remains, what “story”, or “stories”, do we tell?

The concept of analytic direction

The concept of analytic direction is a strategy that can assist qualitative researchers in deciding which “stories” to highlight within a dataset. Sandelowski reports that researchers account for their data and then determine the different “paths” [ 1 ] or “analytic paths” [ 16 ] they can pursue. Others have proposed that decision-making throughout analysis implies analytic ideas at every stage of the coding process [ 8 ] and that researchers define for themselves what analytic issues are to be explored and what ideas are important [ 8 ]. Charmaz [ 17 ] reports that grounded theory researchers pursue more than one analytic direction by focusing on certain ideas first and then returning to the data to address an unfinished analysis in another area later. While the concept of analytic direction has been referenced, or alluded to, by these and other authors [ 1 , 8 , 16 , 18 , 19 ], operationalization of this concept is not well articulated. In this paper, the term analytic direction refers to a message developed by the researchers about the data that may or may not require further substantiation. An analytic direction can be presented as a single message or theme, and can stand alone or be supported by multiple sub-messages or sub-themes. Analytic directions can be developed during the coding process, in later stages of analysis, or possibly during analyses of new datasets. Relying on strategies to promote rigour can assist with the development, substantiation, and selection of analytic directions. If substantiated, each analytic direction could be the focus of an individual publication. The objectives of this paper were to: 1) describe the importance of analytic direction in qualitative research; and 2) provide a working example of the concept of analytic direction.

Why analytic direction is important

The concept of analytic direction is important because it has implications for methodological rigour. We have an obligation to conduct methodological rigourous studies [ 20 ], especially when studies require primary data collection that involves a burden to participants [ 21 ]. The author proposes that methodological rigour is embedded within, and contributes to, the concept of analytic direction. Several strategies to promote rigour that are universal to many qualitative approaches, including phenomenology, are discussed. These strategies include, but are not limited to, a comprehensive examination of the data, the use of multiple analysts, direct quotations to support claims, negative case analysis, and reflexivity. It is important to support the quality of analytic directions so that researchers can then determine which analytic directions may or may not require further substantiation. The quality of the analytic direction will also assist in determining which directions may be selected for reporting.

The relationship between analytic direction and methodological rigour

This paper focuses on the stage where data collection is considered to be complete and does not directly address how data collection, and methodological rigour related to data collection, contributes to the concept of analytic direction. The assumption is that data collection and analysis were conducted iteratively [ 22 , 23 ] and that the team decided when data collection was complete, perhaps relying upon one of the various conceptualizations of saturation discussed by Saunders and colleagues [ 24 ]. A decision about saturation would not necessarily apply to any, or all, analytic directions being developed.

The author proposes that several strategies for promoting rigour assist with the development and selection of analytic directions. One aspect of methodological rigour is that authors carry out a comprehensive examination of their data [ 5 , 25 ]. By thinking about, and engaging in, analytic direction, researchers are encouraged to attend to all of their data rather than attending only to data that interests them initially.

The use of multiple analysts promotes a comprehensive examination of the data [ 2 , 26 ] and thus, contributes to the concept of analytic direction. Different viewpoints lead to an enrichment of the analysis and can lead to a conceptual clarification of the interpretations [ 2 ]. Multiple viewpoints can be used at the level of coding but also at the level of the larger team as data collection and analysis proceeds. Discussions about the novelty, clinical significance, and relevance [ 27 ] of the analytic directions may occur at this time and continue through to the writing of the respective manuscripts. Analytic directions are relevant if they add knowledge, or increase the confidence with which existing knowledge is regarded [ 28 ]. According to Malterud [ 26 ], engaging multiple researchers in a qualitative study strengthens the design of the study, not for the purpose of consensus or identical readings of the data but to supplement and contest each others’ statements.

The use of direct quotations to support the claims made about the analytic directions (and/or themes within) is another strategy to promote rigour [ 29 ]. Not only do quotations illustrate and clarify the results but they also demonstrate whether there is substantive evidence to support the analytic directions being proposed. In contrast, data that do not support the analytic directions (and/or themes within) should be accounted for and their exclusion justified when promoting methodological rigour [ 30 ]. Authors may refer to this as attending to negative cases [ 28 ] or deviant case analysis [ 25 , 31 ]. This strategy promotes that “deviant cases” or “outliers” are not forced into categories or ignored but used instead to aid understanding or theory development [ 25 ]. For example, these cases may explain why the patterns developed from the data or the more normative behaviours are not always found in the researchers’ interpretations [ 25 , 31 ].

Reflexivity is an essential component of methodological rigour [ 26 ]. Reflexivity has been described as “an attitude of attending systematically to the context of knowledge construction, especially to the effect of the researcher, at every step of the research process” [ 26 ] (p. 484). Being reflexive means being aware of your own position in producing partial knowledge [ 32 ]. The qualitative researcher acknowledges his or her personal influence on what that partial knowledge is (for example, the data collected are dependent on the interviewer’s questions and prompts). According to Eakin and Gladstone [ 33 ], knowing one’s standpoint helps one to recognize the forces that might drive certain interpretations and stifle other conceptualizations of the data. Knowledge production is also partial because it is not possible to report all interpretations of the data and therefore, the research team has to decide what to report. Researchers engaging in the concept of analytic direction are more likely to be reflexive about what they are, and are not, reporting from their datasets.

Rationale for the chosen example

The dataset chosen for this example was from a study where the author and her team identified 10 potential analytic directions based on a compilation of the memos and team discussions pertaining to analysis and interpretation of the data. The publications developed from this dataset reflected the selection of three analytic directions that focused on different content areas [ 34 , 35 , 36 ]. The same coding template was the foundation for the three publications and the timing of the reporting was ordered based on the author’s interests. The author chose the dataset as an example primarily because it was not heavily theory-laden and therefore accessible to novice qualitative researchers. The resulting publications have practical implications for clinical and health services research and the process of developing these publications could inform graduate students who are embarking on a qualitative program of research for their thesis work.

Original research funded

The goal of the original research project was to reduce the burden of illness due to fracture through improved bone health investigation and treatment. Specifically, the aim was to examine what researchers could learn from members of a patient group. The study was approved by the Research Ethics Board at Unity Health Toronto (REB# 10–371). The study team consisted of scientists, clinicians, a policy maker, and a patient representative with expertise related to bone health. Informed by the Theory of Planned Behaviour [ 37 , 38 ], the team set out to examine members of a patient group to ask them about their intentions and actions toward bone health diagnosis and treatment and their experiences with diagnostic tests and treatment recommendations. All individuals ( n  = 28) were 50+ years old and had sustained a fragility fracture. The overall project relied on a phenomenological approach conceptualized by Giorgi and Wertz [ 30 , 39 , 40 , 41 ].

We developed a master coding template of 27 broad codes that were designed to organize the data with minimal reliance on theory. The coding template was revised four times as data collection and analysis proceeded. The codes were developed from a combination of inductive and deductive codes. More specifically, inductive codes were developed from topics discussed in the interviews. Other codes were pre-specified from the overall aim of the original funded study and from the domains of the Theory of Planned Behaviour.

Development of analytic directions from the dataset

Qualitative researchers can use several strategies to develop analytic directions. The author started the organization process early in order to think about how best to maximize the data collected. Coding began after the first couple of interviews had been conducted; this is conventional advice for analysis in qualitative research [ 1 , 2 , 23 , 42 ]. As soon as the coding process began, a document specific to analysis was created. Miles and colleagues have referred to this as “analytic memoing” [ 6 ]. This document is different from other documents in which the team discusses design features, decisions, and interview logistics related to the study. Analytic ideas were added to this document after coding and discussing each transcript. The author engaged two individuals in the coding/analysis process, as multiple analysts promote a comprehensive examination of the data [ 2 , 26 ]. The author met regularly with members of the team during the process of data collection and analysis to discuss the data, interpretations of them, and different lines of inquiry. These discussions were recorded in the analysis document. Table  1 outlines the potential analytic directions considered for this paper. The 10 analytic directions were developed prior to publication of analytic direction #1. Some of these directions were posed as questions that required further analysis and substantiation. Tables were then created to help us to visualize patterns during analysis. As an example, for analytic direction #2, a table was created in which each participant was assigned a row and perceived messages from the various health care providers (for example, primary care providers and specialists) were placed in columns. Perceived messages were presented as quotations from participants. We examined the columns to compare perceived messages across provider groups for each participant and then examined the columns to compare the perceived messages within each provider group. For analytic direction #3, a table was created with each participant assigned a row and the domains of the Theory of Planned Behaviour assigned to columns. The table was populated with data in the form of quotations from each participant that we believed corresponded to each of the domains. Strategies such as matrices [ 5 , 6 ] or thematic maps [ 42 ] can also be used to visualize developing patterns when presenting or organizing data.

Selection of the three analytic directions

The number of analytic directions selected likely depends on circumstances including the quality of the data, the quality of the analysis, and available resources. The research team considered the multiple analytic directions, discussing their relevance [ 27 ], novelty, and clinical significance and also the interests of the team in order to incorporate the perspectives of the different stakeholders. It was important to the author that the content of each analytic direction was bounded in that it did not overlap with the content of the other analytic directions. For example, analytic directions #2 and #3 discuss the potential influence of others in participants’ lives. However, analytic direction #2 focused on health care providers while analytic direction #3 focused on family members, friends, and colleagues of participants and specifically excluded health care providers from the analysis based on the Theory of Planned Behaviour domain “subjective norm”. In narrowing down the list of analytic directions, the author ensured there were sufficient data (quotations) to support the claims. Cases that did not fit the general results were acknowledged in order to justify their exclusion or explain why they did not fit. For example, in analytic direction #3, we examined instances where the data did not appear to fit with the Theory of Planned Behaviour and explained what happened in these instances where the model did not appear to be predictive of intentions.

The master coding template was important as it assisted with the organization of evidence for each analytic direction. The master coding template also assisted the team with the creation of tables for each analytic direction discussed. Table  2 demonstrates the relationship between the master coding template and the three selected analytic directions.

The impetus for analytic direction #1 [ 34 ] was based on an assumption held by the author as she was working on the research proposal. Her expectation was that members of a patient group would be patient advocates who were experts in navigating for care. She was interested in what patients could learn from members of this patient group. The analytic direction for the paper came from surprise, and subsequent disappointment, that those assumptions were not supported by the data and that members of the patient group did not all appear to be advocates and experts in navigating for care. One commonality that defined the patient group was that members appeared to be in favour of taking prescribed medication.

Analytic direction #1 included elements of both inductive and deductive analysis in that codes were developed for the master coding template from the data (inductive) but the author’s expectations also influenced how those codes were combined and how the team interpreted the data (deductive). Drawing from the literature, the term “advocacy” was equated with the theoretical concept of “effective” or “activated” consumer [ 43 , 44 ]. The code “effective consumer” did not exist in the original master template, partly because we preferred to not apply theoretical labels prematurely to the data. Based on the coding template, we drew from six codes to create a table about “effective consumer” behaviours (see Table 2 ). Participants were then coded along a continuum between what was referred to as “few effective consumer behaviours” (patients who followed orders with minimal involvement in their care and demonstrating the least amount of advocacy) to “many effective consumer behaviours” (individuals demonstrating significant involvement in their care, those who demanded diagnostic testing and requested specific medications).

Analytic direction #2 [ 35 ] was developed concurrently with analytic direction #1. The role of theory was minimal in analytic direction #2 and perhaps implicit in the methodology of phenomenology which focuses on individuals’ experiences [ 23 , 39 ]. The impetus for analytic direction #2 was our proposal that messages from health care providers might determine individuals’ strategies or behaviours that were the focus of analytic direction #1. The analysis was more inductive than that of analytic direction #1 as the team had no pre-contemplated plan to examine how messages from health care providers might determine individuals’ behaviours. In conducting the analysis, the team wondered whether conflicts about what individuals did with the recommendations they received (their actions) appeared to be due to messages perceived across, and within, health care provider groups. Health care providers discussed in the interviews included clinic staff, primary care providers, specialists, nurses, physiotherapists, and chiropractors.

For analytic direction #2, we used seven of the codes in the master coding template (see Table 2 ). Five of these seven codes were also used in analytic direction #1 but for very different reasons and drawing from different data within these codes. We were interested in individuals’ understanding or interpretation of recommendations by health care providers, not how individuals interacted with health care providers or what they did with information received from health care providers. In other words, we were interested in the meaning of what health care providers reportedly said to participants and not what participants did with that information.

The publication for analytic direction #3 [ 36 ] was written 3 years after that for analytic direction #1. This was the author’s least preferred paper, despite the Theory of Planned Behaviour being the theoretical framework guiding the original funded research. Analytic direction #3 involved a primarily deductive analysis where the Theory of Planned Behaviour guided the coding and analysis. Because of the restrictions of forcing exploratory data from open-ended questions into pre-defined domains, the author selected a qualitative description approach for the research design.

Contrary to memos and reflexive notes documented by the author about the potential value of this analysis and whether the team had learned anything about the application of the Theory of Planned Behaviour in the context of our study, the pursuit of analytic direction #3 became an interesting methodological exercise for a number of reasons. We collected data on several behaviours including receiving diagnostic tests, taking supplements, exercising, attending falls prevention classes, and initiating medication. The author believed that one particular behaviour had to be selected for analysis which entailed examining the data for each of the behaviours in depth. The author chose to focus on medication initiation and/or medication use because of a longstanding interest in medication use. Also, there was sufficient data to substantiate the Theory of Planned Behaviour domains in relation to medication initiation and/or medication use. The Theory of Planned Behaviour did not appear to be particularly relevant to intentions to attend a bone mineral density test and there did not appear to be sufficient data to support any one of the non-pharmacological treatment strategies mentioned. The team also had to make decisions about what counted as “perceived behavioural control”, “subjective norms”, and “attitudes” which were the three domains of the Theory of Planned Behaviour [ 37 , 38 ]. In particular, participants’ discussions about medication side effects were problematic to conceptualize in reference to these domains. The team decided to code “ experiences with side effects” as “perceived behavioural control” but “ anticipated side effects” as an “attitude”.

For analytic direction #3, the team drew from five codes, three of which were pre-specified prior to analyzing the interviews and meant to capture the domains of the Theory of Planned Behaviour. The code “attitude to BMD testing” and “attitude to bone health treatment” were existing codes based on the Theory of Planned Behaviour. The code “subjective norm” was not part of the coding template because the team believed it was too specific. We instead examined the code “social influence” which captured a broader array of information about peers such as family members and friends. Similarly, “perceived behavioural control” was not part of the coding template because we found it too specific. Information for this domain was taken from another code labelled “bone health treatment” which captured data pertaining to participants’ medications, including past behaviour with medication and how difficult it was, or not, to take the medication. The code “intentions” was an existing code.

The three selected analytic directions varied in how the team used an inductive and deductive approach to analysis [ 15 , 45 ] and how the role of theory was integrated (“central” vs. more “peripheral”) [ 15 ]. Each publication was within the scope of the overall research goal or question. As proposed by Agee [ 46 ], this overall question offered the potential for more specific questions during analysis. Finally, each publication had its own central point [ 12 ] and highlighted a different perspective or voice [ 12 ].

The following is a summary of the three analytic directions labelled with the first few words of the titles of each publication (see Table  3 ).

Analytic direction #1 (Strategies used by a patient group; inductive and deductive-driven)

In this publication, we examined the strategies described by three groups of individuals: individuals demonstrating few effective consumer behaviours, individuals demonstrating many effective consumer behaviours, and individuals demonstrating both types of behaviours. We discussed how the continuum was contrary to our expectations of what behaviours members of a patient group would exhibit. Having acknowledged this finding, we reported that more than half of the participants described effective consumer behaviours including making requests of health care providers for referral to specialists, bone mineral density tests, and prescription medications. Our overall message was that members of a patient group described a range of effective consumer behaviours that could be incorporated as skill sets in post-fracture interventions.

Analytic direction #2 (Perceived messages about bone health; inductive-driven)

In this publication, we described the perceived messages across the different provider groups and then the perceived messages within each provider group. We reported that participants perceived that specialists were more interested in their bone health than general practitioners and that very few messages about bone health were perceived from other health care providers. We also reported that perceived messages about one’s bone health and recommendations for management across provider groups were inconsistent (for example, with regard to medication initiation). The message for analytic direction #2 was that patients perceived inconsistent messages within, and across, various healthcare providers, suggesting a need to raise awareness of bone health management guidelines to providers.

Analytic direction #3 (Theory of Planned Behaviour explains intentions to use medication; deductive-driven)

In this publication, we described the data in each domain of the Theory of Planned Behaviour and the apparent relationship between these domains and participants’ intentions with regard to medication use. Our message was that the Theory of Planned Behaviour appeared to be predictive of intentions to take prescribed medication in approximately three-quarters of participants and when it was not predictive, a positive attitude to medication was the most important domain in determining participants’ intentions.

This working example of analytic direction resulted in three publications highlighting distinct "stories”. The publications differed in a number of ways. Each publication had its own central point or story line [ 12 ]. The role of theory [ 15 ] was minimal in analytic direction #2 but was more central in analytic directions #1 and #3 with the concept of “effective” or “activated” consumer and the Theory of Planned Behaviour dominating the analyses, respectively. Acknowledging that the authentic voices of participants may always be manufactured by the authorial account [ 32 , 47 ], all papers were written from the perspective of “I” or “we”. However, we focused on participants at the forefront for analytic direction #1 and we focused on participants’ perceptions of their providers’ voices for analytic direction #2. For analytic direction #3, the voice of the research team dominated as we struggled with methodological decisions. It is proposed that the voice of the model (Theory of Planned Behaviour) also dominated in analytic direction #3.

One implication related to analytic direction is that the research team may need to modify elements of the original research design to better suit the analytic direction selected. If such a modification is made, the team should ensure theoretical consistency in how the methods and methodologies are integrated [ 48 , 49 ]. For example, Crotty [ 49 ] proposes that theoretical consistency is needed between methods, methodology, theoretical perspective, and epistemology because these four elements inform one another. Similarly, Carter and Little [ 48 ] argue that consistency between methods, methodology, and epistemology contribute to the rigour of a qualitative study. Authors should demonstrate that elements of their theoretical perspectives and research design are compatible if they are applying another methodological approach to the data. Carter and Little [ 48 ] suggest that methodologies can be combined or altered if the researcher retains a coherent epistemological position and justifies the choices made. In the funded grant, a phenomenological program of research was proposed and the data were collected through in-depth interviews conducted from a phenomenological perspective. Analytic direction #3 was not purely consistent with a phenomenological approach because of the restriction to force exploratory data into domains of a theoretical framework and so we pursued this analytic direction with a different approach (qualitative description). As pointed out by Sandelowski [ 50 ], using phenomenology and qualitative description in this way is not to be confused with misuses of methods or techniques. Unlike quantitative research, qualitative research is not produced from any “pure” use of a method, but from the use of methods that are variously textured, toned, and hued [ 50 ]. According to Sandelowski [ 50 ], qualitative description can be used in conjunction with phenomenological research in a number of ways. For example, phenomenological analyses can be applied to qualitative descriptive studies [ 50 ]. However, the pursuit of other approaches to analysis, such as grounded theory or a participatory action approach, might lead to epistemological tensions if the original study design and data collection was guided by a phenomenological approach. Future discussion about the concept of analytic direction when considering theoretical and methodological positions that differ epistemologically from the original design and conduct of the study is needed.

There are a number of other implications related to the concept of analytic direction. Practically, it is advised that researchers start to think about analytic directions early so that they are aware of the potential analytic directions being developed as soon as data collection and analysis begin. By thinking about the “larger picture” at this early stage in the research, the team is better equipped to make the most of the data collected. Having said this, one will likely never use the entire dataset. As researchers, we rarely have sufficient funds or personnel to pursue all analytic directions. Data are often set aside because researchers are eager to analyze data collected for new projects or pressured to seek future funding opportunities. Analytic directions that are not pursued can be transferred to student projects. Alternatively, it is possible to draw on a sub-set of the transcripts/observations to carry out a secondary analysis. The author has developed subsequent analytic directions that span across studies and draw from a subset of transcripts for several secondary analyses [ 51 , 52 , 53 ]. Analytic directions can also contribute to ideas for new grant proposals that enable the researcher to generate more data on analytic directions that need further substantiation and further exploration.

This paper demonstrates some guidance about how to bound each analytic direction. Bounding the analytic direction is necessary so one does not re-use the data or produce multiple, yet quite similar, papers on the same topic. Researchers are encouraged to be open and transparent and acknowledge related publications so reviewers and other audiences reading the work are able to determine for themselves that the analyses are different.

There are ethical considerations in developing an analytic direction or framing the analytic direction in a way that might be different or supplementary to the original design. It is not always feasible to obtain subsequent consent from participants for use of the data if this use differs from that of the original goal of the study. As a result, analytic directions pursued should be within the scope of the approved research ethics application. One strategy is to keep the study goal or aim broad in the research ethics submission so that it encompasses many topics that might be discussed during data collection. Another consideration is to not prematurely close a research ethics application because researchers may be able to use the data for a secondary analysis at a later date.

This paper makes novel contributions to qualitative research methodology by demonstrating how the process of analytic direction works, by operationalizing the concept and providing an example, and by describing the connection between analytic direction and rigour. This paper further contributes to the advancement of rigour by demonstrating how the development and selection of analytic directions relies on several strategies to promote rigour, such as a comprehensive examination of the data, the use of multiple analysts, providing quotations to support claims made, checking for negative cases, and reflexivity.

In conclusion, the concept of analytic direction enables researchers to organize their qualitative datasets in order to tell different and unique “stories”. The concept relies upon, and promotes, the conduct of rigourous qualitative research. As with all elements of qualitative analysis, researchers are encouraged to think about the role of analytic direction as soon as data collection commences.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due to participants not consenting to having their data deposited in a public dataset but are available from the corresponding author on reasonable request.

Sandelowski M. Qualitative analysis: what it is and how to begin. ResNurs Health. 1995;18:371–5.

CAS   Google Scholar  

Kvale S. Interviews: an introduction to qualitative research interviewing. Thousand Oaks: Sage Publications; 1996.

Google Scholar  

Saldana J. The coding manual for qualitative researchers. Los Angeles: Sage Publications; 2009.

Miller WL, Crabtree BF. The dance of interpretation. Doing qualitative research. Newbury Park: Sage Publications; 1999.

Spencer L, Ritchie J, O'Connor W. Analysis: practices, principles and processes. Qualitative research practice: a guide for social science students and researchers. Los Angeles: Sage Publications; 2003. p. 199–217.

Miles MB, Huberman AM, Saldana J. Qualitative data analysis. 3rd ed. Los Angeles: Sage Publications; 2014.

Crabtree BF, Miller WL. A template approach to text analysis: developing and using codebooks. Doing qualitative research, vol. 3. Newbury Park: Sage Publications; 1992. p. 93–109.

Coffey A, Atkinson P. Making sense of qualitative data: complementary research strategies. Thousand Oaks: Sage Publications; 1996.

Kelly M. The role of theory in qualitative health research. Fam Pract. 2010;27:285–90.

Article   PubMed   Google Scholar  

Malterud K. Shared understanding of the qualitative research process. Guidelines for the medical researcher. Fam Pract. 1993;10(2):201–6.

Article   CAS   PubMed   Google Scholar  

Slaughter S, Dean Y, Knight H, Krieg B, Mor P, Nour V, et al. The inevitable pull of the river's current: interpretations derived from a single text using multiple research traditions. Qual Health Res. 2007;17(4):548–61.

Sandelowski M. Writing a good read: strategies for re-presenting qualitative data. Res Nurs Health. 1998;21:375–82.

Frank AW. After methods, the story: from incongruity to truth in qualitative research. Qual Health Res. 2004;14(3):430–40.

Sandelowski M. Time and qualitative research. [review] [31 refs]. Res Nurs Health. 1999;22(1):79–87.

Sandelowski M. Theory unmasked: the uses and guises of theory in qualitative research. Res Nurs Health. 1993;16:213–8.

Sandelowski M. “To be of use”: enhancing the utility of qualitative research. Nurs Outlook. 1997;45:125–32.

Charmaz K. Constructing grounded theory: a practical guide through qualitative analysis. London: Sage Publications; 2006.

Thorne S. Metasynthetic madness: what kind of monster have we created? Qual Health Res. 2017;27(1):3–12.

Sharp EA, GD DC. What does rejection have to do with it? Toward an innovative, kinesthetic analysis of qualitative data. Forum Qualitative Sozialforschung/Forum. Qual Soc Res [On-line Journal]. 2013;14(2):1–12.

Streiner DL, Norman GR. PDQ epidemiology. 2nd ed. St. Louis, Missouri: Mosby; 1996.

Ulrich CM, Wallen GR, Feister A, Grady C. Respondent burden in clinical research: when are we asking too much of subjects? IRB. 2005;27(4):17–20.

Polkinghorne DE. Language and meaning: data collection in qualitative research. J Couns Psychol. 2005;52(2):137–45.

Article   Google Scholar  

Schwandt TA. Dictionary of qualitative inquiry. 2nd ed. Thousand Oaks: Sage Publications, Inc.; 2001.

Saunders B, Sim J, Kingstone T, Baker S, Waterfield J, Bartlam B, et al. Saturation in qualitative research: exploring its conceptualization and operationalization. Qual Quant. 2018;52:1893–907.

Lewis J, Ritchie J. Generalising from qualitative research. In: Ritchie J, Lewis J, editors. Qualitative research practice: a guide for social science students and researchers. London: Sage Publications; 2003. p. 263–86.

Malterud K. Qualitative research: standards, challenges, and guidelines. Lancet. 2001;358(9280):483–8.

Giacomini MK, Cook DJ, for the Evidence-Based Medicine Working Group. Users’ guides to the medical literature XXIII. Qualitative research in health care B. what are the results and how do they help me care for my patients? JAMA. 2000;284(4):478–82.

Mays N, Pope C. Quality in qualitative health research. In: Pope C, Mays N, editors. Qualitative research in health care. Malden: Blackwell Publishing; 2006. p. 82–101.

Chapter   Google Scholar  

Dixon-Woods M, Shaw RL, Agarwal S, Smith JA. The problem of appraising qualitative research. Qual Saf Health Care. 2004;13:223–5.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Giorgi A. Concerning a serious misunderstanding of the essence of the phenomenological method in psychology. J Phenomenol Psychol. 2008;39:33–58.

Silverman D. Qualitative research: issues of theory, method and practice. 3rd ed. London: Sage Publications Ltd; 2011.

Finlay L. Negotiating the swamp: the opportunity and challenge of reflexivity in research practice. Qual Res. 2002;2(3):209–30.

Eakin JM, Gladstone B. “Value-adding” analysis: doing more with qualitative data. Int J Qual Methods. 2020;19:1–13.

Sale JEM, Cameron C, Hawker G, Jaglal S, Funnell L, Jain R, et al. Strategies used by an osteoporosis patient group to navigate for bone health care after a fracture. Arch Orthop Trauma Surg. 2014;134:229–35.

Sale JEM, Hawker G, Cameron C, Bogoch E, Jain R, Beaton D, et al. Perceived messages about bone health after a fracture are not consistent across healthcare providers. Rheumatol Int. 2015;35:97–103.

Sale JEM, Cameron C, Thielke S, Meadows L, Senior K. The theory of planned behaviour explains intentions to use antiresorptive medication after a fragility fracture. Rheumatol Int. 2017;37:875–82.

Ajzen I, Fishbein M. Understanding attitudes and predicting social behavior. Englewood Cliffs: Prentice-Hall; 1980.

Fishbein M, Ajzen I. Belief, attitude, intention, and behavior: an introduction to theory and research. Reading: Addison-Wesley; 1975.

Giorgi A. The theory, practice, and evaluation of the phenomenological method as a qualitative research procedure. J Phenomenol Psychol. 1997;28:235–60.

Giorgi A. The descriptive phenomenological method in psychology: a modified Husserlian approach, vol. 2009. Pittsburgh: Duquesne University Press; 2009.

Wertz FJ. Phenomenological research methods for counseling psychology. J Couns Psychol. 2005;52(2):167–77.

Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3:77–101.

Kristjansson E, Tugwell PS, Wilson AJ, Brooks PM, Driedger SM, Gallois C, et al. Development of the effective musculoskeletal consumer scale. J Rheumatol. 2007;34(6):1392–400.

PubMed   Google Scholar  

Hibbard JH, Stockard J, Mahoney ER, Tusler M. Development of the patient activation measure (PAM): conceptualizing and measuring activation in patients and consumers. Health Serv Res. 2004;39(4 Pt 1):1005–26.

Article   PubMed   PubMed Central   Google Scholar  

Sale JEM, Thielke S. Qualitative research is a fundamental scientific process. J Clin Epidemiol. 2018;102:129–33.

Agee J. Developing qualitative research questions: a reflective process. Int J Qual Stud Educ. 2009;22(4):431–47.

Cooper N, Burnett S. Using discursive reflexivity to enhance the qualitative research process. Qual Soc Work. 2006;5(1):111–29.

Carter SM, Little M. Justifying knowledge, justifying method, taking action: epistemologies, methodologies, and methods in qualitative research. Qual Health Res. 2007;17(10):1316–28.

Crotty M. The foundations of social research. Los Angeles: Sage Publications; 1998.

Sandelowski M. Whatever happened to qualitative description? Res Nurs Health. 2000;23:334–40.

Gheorghita A, Webster F, Thielke S, Sale JEM. Long-term experiences of pain after a fragility fracture. Osteoporos Int. 2018;29:1093–104.

Sale JEM, Ashe MC, Beaton D, Bogoch E, Frankel L. Men’s health-seeking behaviours regarding bone health after a fragility fracture: a secondary analysis of qualitative data. Osteoporos Int. 2016;27(10):3113–9.

Sale JEM, Frankel L, Paiva J, Saini J, Hui S, McKinlay J, et al. Having caregiving responsibilities affects management of fragility fractures and bone health. Osteoporos Int. 2020;31:1565–72.

Download references

Acknowledgements

Not applicable.

Funding for the work described in this paper was provided by the Canadian Institutes of Health Research (Funding Reference Number: CBO-109629). The Canadian Institutes of Health Research had no involvement in the design of the study and collection, analysis, and interpretation of the data and in the writing the manuscript.

Author information

Authors and affiliations.

Musculoskeletal Health and Outcomes Research, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 30 Bond Street, Toronto, Ontario, M5B 1W8, Canada

Joanna E. M. Sale

Institute of Health Policy, Management & Evaluation, University of Toronto, Health Sciences Building, 155 College Street, Suite 425, Toronto, Ontario, M5T 3M6, Canada

Department of Surgery, Faculty of Medicine, University of Toronto, 149 College Street, 5th Floor, Toronto, Ontario, M5T 1P5, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

Joanna Sale made substantial contributions to conception and design and analysis and interpretation of the data, drafted and revised the manuscript critically for important intellectual content, approved the final version of the manuscript submitted, and agreed to be accountable for all aspects of the work.

Author’s information

JEMS is a Scientist and Associate Professor who has been teaching qualitative research courses and lectures at the introductory and intermediate level at the University of Toronto since 2007.

Corresponding author

Correspondence to Joanna E. M. Sale .

Ethics declarations

Ethics approval and consent to participate.

The study and protocol upon which this manuscript is based was approved by the Research Ethics Board at Unity Health Toronto (REB# 10–371). All methods were carried out in accordance with the Declaration of Helsinki and the relevant guidelines and regulations set by the Research Ethics Board at Unity Health Toronto. Informed consent was obtained from all participants.

Consent for publication

Competing interests.

The author declares that she has no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sale, J.E.M. The role of analytic direction in qualitative research. BMC Med Res Methodol 22 , 66 (2022). https://doi.org/10.1186/s12874-022-01546-4

Download citation

Received : 28 September 2021

Accepted : 11 February 2022

Published : 13 March 2022

DOI : https://doi.org/10.1186/s12874-022-01546-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Analytic direction
  • Qualitative research
  • Data analysis
  • Methodological rigour
  • Critical appraisal

BMC Medical Research Methodology

ISSN: 1471-2288

definition of analytical research

Written By : Pitch N Hire

Mon Sep 18 2023

Everything You Need To Know About Analytical Research Its Essential Uses

Analytical Research.png

Research is vital in any field. It helps in finding out information about various subjects. It is a systematic process of collecting data, documenting critical information, analyzing data, and interpreting it. It employs different methodologies to perform various tasks. Its main task is to collect, compose and analyze data on the subject matter. It can be defined as the process of creating new knowledge or applying existing knowledge to invent new concepts.

Research methods are classified into different categories based on the methods, nature of the study, purpose, and research design. Based on the nature of the study, research is classified into two parts- descriptive research and analytical research. This article will cover the subject matter of analytical research. Now, you must be thinking about what is analytical research. It is that kind of research in which secondary data are used to critically examine the study. Researchers used already existing information for research analysis. Different types of analytical research designs are used for critically evaluating the information extracted from the data of the existing research.

Read More : Best Applicant Tracking System (ATS) Providers In The USA

Effect of Analytical Studies on Education Trails

Students, research scholars, doctors, psychologists, etc. take the help of analytical research for taking out important information for their research studies. It helps in adding new concepts and ideas to the already produced material. Various kinds of analytical research designs are used to add value to the study material. It is conducted using various methods such as literary research, public opinion, meta-analysis, scientific trials, etc.

When you come across a question of what is analytical research, you can define it as a tool that is used to add reliability to the work. This is generally conducted to provide support to an idea or hypothesis. It employs critical thinking to extract the small details. This helps in building big assumptions about the subject matter or the material of the study. It emphasizes comprehending the cause-effect relationship between variables. 

Analytical Research Designs

Read More : Candidate Tracking System: What Does It Include?

Analytical Research includes critical assessment and critical thinking and hence, it is important. It creates new ideas about the data and proves or disproves the hypothesis. If the question comes of what analytical research is used for, it can be said that it is used to create an association between exposure and the outcome. This association is based on two types of analytical research design. The first is cohort studies and the second is a case-control study. In cohort studies, people of different groups with different levels of exposure are observed over time to analyze the occurrence of an outcome. It is a forward-direction and prospective kind of study. It is easier to determine the outcome risk among unexposed and exposed groups. 

It resembles the experimental design. Whereas in case-control studies, researchers enlist two groups, cases, and controls, and then bring out the history of exposure of each group. It is a backward-direction and retrospective study. It consumes less time and is comparatively cheaper than cohort studies. It is the primary study design that is used to examine the relationship between a particular exposure and an outcome.

Read More : How Do Hiring Software Systems Improve The Recruitment Process?

Methods of Conducting Analytical Research 

Analytical Research saves time, money, and lives and helps in achieving objectives effectively. It can be conducted using the following methods:

Literary Research 

Literary Research is one of the methods of conducting analytical research. It means finding new ideas and concepts from already existing literary work. It requires you to invent something new, a new way of interpreting the already available information to discuss it. It is the backbone of various research studies. Its function is to find out all the literary information, preserve them with different methodologies and analyze them. It provides hypotheses in the already existing research and also helps in analyzing modern-day research. It helps in analyzing unsolved or doubtful theories.

Read More : An Ultimate Guide to Candidate Management System

Meta-Analysis Research

Meta-Analysis is an epidemiological, formal, and quantitative research design that helps in the systematic assessment of previous research results to develop a conclusion about the body of research. It is a subset of systematic reviews. It analyzes the strength of the evidence. It helps in examining the variability or heterogeneity. It includes a quantitative review of the body of literature. It is PRISMA and its aim is to identify the existence of effects, finding the negative and positive kinds of effects. Its results can improve the accuracy of the estimates of effects.

Scientific Trials

Scientific Trials research is conducted on people. It is of two types, observational studies and the second is clinical traits. It finds new possibilities for clinical traits. It aims to find out medical strategies. It also helps in determining whether medical treatment and devices are safe it not. It searches for a better way of treating, diagnosing, screening, and treatment of the disease. It is a scientific study that involves 4 stages. It is conducted to find if a new treatment method is safe, effective, and efficient in people or not.

Read More : Applicant Tracking Software: Do They Really Help You Hire Better?

It aims to examine or analyze surgical, medical, and behavioral interventions. There are different types of scientific trials such as cohort studies, case-control studies, treatment trials, cross-sectional studies, screening trials, pilot trials, prevention trials, etc. 

Analytical Research is that kind of research that utilizes the already available data for extracting information. Its main aim is to divide a topic or a concept into smaller pieces to understand it in a better way and then assemble those parts in a way that is understandable by you. You can conduct analytical research by using the methods discussed in the article. It involves ex-ante research. It means analyzing the phenomenon. 

It is of different types such as historical research, philosophical research, research synthesis, and reviews. Also, it intends to comprehend the causal relation between phenomena. It works within the limited variables and involves in-depth research and analysis of the available data. Therefore, it is crucial for any data because it adds relevance to it and makes it authentic. It supports and validates a hypothesis. It helps companies in making quick and effective decision-making about the product and services provided by them.

Related Posts

  • Siebel interview questions
  • Spec work for graphic designers

Let our experts elevate your hiring journey. Message us and unlock potential. We'll be in touch.

  • Applicant Tracking System (ATS)
  • Interview & Assessment
  • SEO Services
  • Content Marketing Services
  • Social Media Marketing Services
  • Software Testing Services
  • Web Development Services
  • UI / UX Services
  • Mobile Development Services
  • Permanent Staffing Services
  • Contract Staffing Services

Our Popular Articles

  • More from M-W
  • To save this word, you'll need to log in. Log In

Definition of analytic

  • well-founded
  • well-grounded

Examples of analytic in a Sentence

These examples are programmatically compiled from various online sources to illustrate current usage of the word 'analytic.' Any opinions expressed in the examples do not represent those of Merriam-Webster or its editors. Send us feedback about these examples.

Word History

analytic borrowed from Late Latin analyticus, borrowed from Greek analytikós, from analýein "to loosen, dissolve, resolve into constitutent elements" + -t-, verbal adjective formative + -ikos -ic entry 1 ; analytical from Late Latin analyticus + -al entry 1 — more at analysis

1601, in the meaning defined at sense 1

Phrases Containing analytic

  • analytic geometry
  • analytic philosophy
  • analytic psychology
  • self - analytic

Dictionary Entries Near analytic

analytical balance

Cite this Entry

“Analytic.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/analytic. Accessed 14 Apr. 2024.

Kids Definition

Kids definition of analytic, medical definition, medical definition of analytic, more from merriam-webster on analytic.

Nglish: Translation of analytic for Spanish Speakers

Britannica English: Translation of analytic for Arabic Speakers

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

Your vs. you're: how to use them correctly, every letter is silent, sometimes: a-z list of examples, more commonly mispronounced words, how to use em dashes (—), en dashes (–) , and hyphens (-), absent letters that are heard anyway, popular in wordplay, the words of the week - apr. 12, 10 scrabble words without any vowels, 12 more bird names that sound like insults (and sometimes are), 8 uncommon words related to love, 9 superb owl words, games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

Cambridge Dictionary

  • Cambridge Dictionary +Plus

Meaning of analytical in English

Your browser doesn't support HTML5 audio

  • adjudication
  • analytically
  • interpretable
  • interpretive
  • interpretively
  • investigate
  • investigation
  • reinvestigation
  • risk assessment
  • run over/through something
  • run through something

Related word

Analytical | intermediate english, analytical | business english, examples of analytical, translations of analytical.

Get a quick, free translation!

{{randomImageQuizHook.quizId}}

Word of the Day

pitch-perfect

singing each musical note perfectly, at exactly the right pitch (= level)

Alike and analogous (Talking about similarities, Part 1)

Alike and analogous (Talking about similarities, Part 1)

definition of analytical research

Learn more with +Plus

  • Recent and Recommended {{#preferredDictionaries}} {{name}} {{/preferredDictionaries}}
  • Definitions Clear explanations of natural written and spoken English English Learner’s Dictionary Essential British English Essential American English
  • Grammar and thesaurus Usage explanations of natural written and spoken English Grammar Thesaurus
  • Pronunciation British and American pronunciations with audio English Pronunciation
  • English–Chinese (Simplified) Chinese (Simplified)–English
  • English–Chinese (Traditional) Chinese (Traditional)–English
  • English–Dutch Dutch–English
  • English–French French–English
  • English–German German–English
  • English–Indonesian Indonesian–English
  • English–Italian Italian–English
  • English–Japanese Japanese–English
  • English–Norwegian Norwegian–English
  • English–Polish Polish–English
  • English–Portuguese Portuguese–English
  • English–Spanish Spanish–English
  • English–Swedish Swedish–English
  • Dictionary +Plus Word Lists
  • English    Adjective
  • Translations
  • All translations

Add analytical to one of your lists below, or create a new one.

{{message}}

Something went wrong.

There was a problem sending your report.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Iqbal h. sarker.

1 Swinburne University of Technology, Melbourne, VIC 3122 Australia

2 Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, Chittagong, 4349 Bangladesh

The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science, advanced analytics methods including machine learning modeling can provide actionable insights or deeper knowledge about data, which makes the computing process automatic and smart. In this paper, we present a comprehensive view on “Data Science” including various types of advanced analytics methods that can be applied to enhance the intelligence and capabilities of an application through smart decision-making in different scenarios. We also discuss and summarize ten potential real-world application domains including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making. Based on this, we finally highlight the challenges and potential research directions within the scope of our study. Overall, this paper aims to serve as a reference point on data science and advanced analytics to the researchers and decision-makers as well as application developers, particularly from the data-driven solution point of view for real-world problems.

Introduction

We are living in the age of “data science and advanced analytics”, where almost everything in our daily lives is digitally recorded as data [ 17 ]. Thus the current electronic world is a wealth of various kinds of data, such as business data, financial data, healthcare data, multimedia data, internet of things (IoT) data, cybersecurity data, social media data, etc [ 112 ]. The data can be structured, semi-structured, or unstructured, which increases day by day [ 105 ]. Data science is typically a “concept to unify statistics, data analysis, and their related methods” to understand and analyze the actual phenomena with data. According to Cao et al. [ 17 ] “data science is the science of data” or “data science is the study of data”, where a data product is a data deliverable, or data-enabled or guided, which can be a discovery, prediction, service, suggestion, insight into decision-making, thought, model, paradigm, tool, or system. The popularity of “Data science” is increasing day-by-day, which is shown in Fig. ​ Fig.1 1 according to Google Trends data over the last 5 years [ 36 ]. In addition to data science, we have also shown the popularity trends of the relevant areas such as “Data analytics”, “Data mining”, “Big data”, “Machine learning” in the figure. According to Fig. ​ Fig.1, 1 , the popularity indication values for these data-driven domains, particularly “Data science”, and “Machine learning” are increasing day-by-day. This statistical information and the applicability of the data-driven smart decision-making in various real-world application areas, motivate us to study briefly on “Data science” and machine-learning-based “Advanced analytics” in this paper.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig1_HTML.jpg

The worldwide popularity score of data science comparing with relevant  areas in a range of 0 (min) to 100 (max) over time where x -axis represents the timestamp information and y -axis represents the corresponding score

Usually, data science is the field of applying advanced analytics methods and scientific concepts to derive useful business information from data. The emphasis of advanced analytics is more on anticipating the use of data to detect patterns to determine what is likely to occur in the future. Basic analytics offer a description of data in general, while advanced analytics is a step forward in offering a deeper understanding of data and helping to analyze granular data, which we are interested in. In the field of data science, several types of analytics are popular, such as "Descriptive analytics" which answers the question of what happened; "Diagnostic analytics" which answers the question of why did it happen; "Predictive analytics" which predicts what will happen in the future; and "Prescriptive analytics" which prescribes what action should be taken, discussed briefly in “ Advanced analytics methods and smart computing ”. Such advanced analytics and decision-making based on machine learning techniques [ 105 ], a major part of artificial intelligence (AI) [ 102 ] can also play a significant role in the Fourth Industrial Revolution (Industry 4.0) due to its learning capability for smart computing as well as automation [ 121 ].

Although the area of “data science” is huge, we mainly focus on deriving useful insights through advanced analytics, where the results are used to make smart decisions in various real-world application areas. For this, various advanced analytics methods such as machine learning modeling, natural language processing, sentiment analysis, neural network, or deep learning analysis can provide deeper knowledge about data, and thus can be used to develop data-driven intelligent applications. More specifically, regression analysis, classification, clustering analysis, association rules, time-series analysis, sentiment analysis, behavioral patterns, anomaly detection, factor analysis, log analysis, and deep learning which is originated from the artificial neural network, are taken into account in our study. These machine learning-based advanced analytics methods are discussed briefly in “ Advanced analytics methods and smart computing ”. Thus, it’s important to understand the principles of various advanced analytics methods mentioned above and their applicability to apply in various real-world application areas. For instance, in our earlier paper Sarker et al. [ 114 ], we have discussed how data science and machine learning modeling can play a significant role in the domain of cybersecurity for making smart decisions and to provide data-driven intelligent security services. In this paper, we broadly take into account the data science application areas and real-world problems in ten potential domains including the area of business data science, health data science, IoT data science, behavioral data science, urban data science, and so on, discussed briefly in “ Real-world application domains ”.

Based on the importance of machine learning modeling to extract the useful insights from the data mentioned above and data-driven smart decision-making, in this paper, we present a comprehensive view on “Data Science” including various types of advanced analytics methods that can be applied to enhance the intelligence and the capabilities of an application. The key contribution of this study is thus understanding data science modeling, explaining different analytic methods for solution perspective and their applicability in various real-world data-driven applications areas mentioned earlier. Overall, the purpose of this paper is, therefore, to provide a basic guide or reference for those academia and industry people who want to study, research, and develop automated and intelligent applications or systems based on smart computing and decision making within the area of data science.

The main contributions of this paper are summarized as follows:

  • To define the scope of our study towards data-driven smart computing and decision-making in our real-world life. We also make a brief discussion on the concept of data science modeling from business problems to data product and automation, to understand its applicability and provide intelligent services in real-world scenarios.
  • To provide a comprehensive view on data science including advanced analytics methods that can be applied to enhance the intelligence and the capabilities of an application.
  • To discuss the applicability and significance of machine learning-based analytics methods in various real-world application areas. We also summarize ten potential real-world application areas, from business to personalized applications in our daily life, where advanced analytics with machine learning modeling can be used to achieve the expected outcome.
  • To highlight and summarize the challenges and potential research directions within the scope of our study.

The rest of the paper is organized as follows. The next section provides the background and related work and defines the scope of our study. The following section presents the concepts of data science modeling for building a data-driven application. After that, briefly discuss and explain different advanced analytics methods and smart computing. Various real-world application areas are discussed and summarized in the next section. We then highlight and summarize several research issues and potential future directions, and finally, the last section concludes this paper.

Background and Related Work

In this section, we first discuss various data terms and works related to data science and highlight the scope of our study.

Data Terms and Definitions

There is a range of key terms in the field, such as data analysis, data mining, data analytics, big data, data science, advanced analytics, machine learning, and deep learning, which are highly related and easily confusing. In the following, we define these terms and differentiate them with the term “Data Science” according to our goal.

The term “Data analysis” refers to the processing of data by conventional (e.g., classic statistical, empirical, or logical) theories, technologies, and tools for extracting useful information and for practical purposes [ 17 ]. The term “Data analytics”, on the other hand, refers to the theories, technologies, instruments, and processes that allow for an in-depth understanding and exploration of actionable data insight [ 17 ]. Statistical and mathematical analysis of the data is the major concern in this process. “Data mining” is another popular term over the last decade, which has a similar meaning with several other terms such as knowledge mining from data, knowledge extraction, knowledge discovery from data (KDD), data/pattern analysis, data archaeology, and data dredging. According to Han et al. [ 38 ], it should have been more appropriately named “knowledge mining from data”. Overall, data mining is defined as the process of discovering interesting patterns and knowledge from large amounts of data [ 38 ]. Data sources may include databases, data centers, the Internet or Web, other repositories of data, or data dynamically streamed through the system. “Big data” is another popular term nowadays, which may change the statistical and data analysis approaches as it has the unique features of “massive, high dimensional, heterogeneous, complex, unstructured, incomplete, noisy, and erroneous” [ 74 ]. Big data can be generated by mobile devices, social networks, the Internet of Things, multimedia, and many other new applications [ 129 ]. Several unique features including volume, velocity, variety, veracity, value (5Vs), and complexity are used to understand and describe big data [ 69 ].

In terms of analytics, basic analytics provides a summary of data whereas the term “Advanced Analytics” takes a step forward in offering a deeper understanding of data and helps to analyze granular data. Advanced analytics is characterized or defined as autonomous or semi-autonomous data or content analysis using advanced techniques and methods to discover deeper insights, predict or generate recommendations, typically beyond traditional business intelligence or analytics. “Machine learning”, a branch of artificial intelligence (AI), is one of the major techniques used in advanced analytics which can automate analytical model building [ 112 ]. This is focused on the premise that systems can learn from data, recognize trends, and make decisions, with minimal human involvement [ 38 , 115 ]. “Deep Learning” is a subfield of machine learning that discusses algorithms inspired by the human brain’s structure and the function called artificial neural networks [ 38 , 139 ].

Unlike the above data-related terms, “Data science” is an umbrella term that encompasses advanced data analytics, data mining, machine, and deep learning modeling, and several other related disciplines like statistics, to extract insights or useful knowledge from the datasets and transform them into actionable business strategies. In [ 17 ], Cao et al. defined data science from the disciplinary perspective as “data science is a new interdisciplinary field that synthesizes and builds on statistics, informatics, computing, communication, management, and sociology to study data and its environments (including domains and other contextual aspects, such as organizational and social aspects) to transform data to insights and decisions by following a data-to-knowledge-to-wisdom thinking and methodology”. In “ Understanding data science modeling ”, we briefly discuss the data science modeling from a practical perspective starting from business problems to data products that can assist the data scientists to think and work in a particular real-world problem domain within the area of data science and analytics.

Related Work

In the area, several papers have been reviewed by the researchers based on data science and its significance. For example, the authors in [ 19 ] identify the evolving field of data science and its importance in the broader knowledge environment and some issues that differentiate data science and informatics issues from conventional approaches in information sciences. Donoho et al. [ 27 ] present 50 years of data science including recent commentary on data science in mass media, and on how/whether data science varies from statistics. The authors formally conceptualize the theory-guided data science (TGDS) model in [ 53 ] and present a taxonomy of research themes in TGDS. Cao et al. include a detailed survey and tutorial on the fundamental aspects of data science in [ 17 ], which considers the transition from data analysis to data science, the principles of data science, as well as the discipline and competence of data education.

Besides, the authors include a data science analysis in [ 20 ], which aims to provide a realistic overview of the use of statistical features and related data science methods in bioimage informatics. The authors in [ 61 ] study the key streams of data science algorithm use at central banks and show how their popularity has risen over time. This research contributes to the creation of a research vector on the role of data science in central banking. In [ 62 ], the authors provide an overview and tutorial on the data-driven design of intelligent wireless networks. The authors in [ 87 ] provide a thorough understanding of computational optimal transport with application to data science. In [ 97 ], the authors present data science as theoretical contributions in information systems via text analytics.

Unlike the above recent studies, in this paper, we concentrate on the knowledge of data science including advanced analytics methods, machine learning modeling, real-world application domains, and potential research directions within the scope of our study. The advanced analytics methods based on machine learning techniques discussed in this paper can be applied to enhance the capabilities of an application in terms of data-driven intelligent decision making and automation in the final data product or systems.

Understanding Data Science Modeling

In this section, we briefly discuss how data science can play a significant role in the real-world business process. For this, we first categorize various types of data and then discuss the major steps of data science modeling starting from business problems to data product and automation.

Types of Real-World Data

Typically, to build a data-driven real-world system in a particular domain, the availability of data is the key [ 17 , 112 , 114 ]. The data can be in different types such as (i) Structured—that has a well-defined data structure and follows a standard order, examples are names, dates, addresses, credit card numbers, stock information, geolocation, etc.; (ii) Unstructured—has no pre-defined format or organization, examples are sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, etc.; (iii) Semi-structured—has elements of both the structured and unstructured data containing certain organizational properties, examples are HTML, XML, JSON documents, NoSQL databases, etc.; and (iv) Metadata—that represents data about the data, examples are author, file type, file size, creation date and time, last modification date and time, etc. [ 38 , 105 ].

In the area of data science, researchers use various widely-used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 127 ], UNSW-NB15 [ 79 ], Bot-IoT [ 59 ], ISCX’12 [ 15 ], CIC-DDoS2019 [ 22 ], etc., smartphone datasets such as phone call logs [ 88 , 110 ], mobile application usages logs [ 124 , 149 ], SMS Log [ 28 ], mobile phone notification logs [ 77 ] etc., IoT data [ 56 , 11 , 64 ], health data such as heart disease [ 99 ], diabetes mellitus [ 86 , 147 ], COVID-19 [ 41 , 78 ], etc., agriculture and e-commerce data [ 128 , 150 ], and many more in various application domains. In “ Real-world application domains ”, we discuss ten potential real-world application domains of data science and analytics by taking into account data-driven smart computing and decision making, which can help the data scientists and application developers to explore more in various real-world issues.

Overall, the data used in data-driven applications can be any of the types mentioned above, and they can differ from one application to another in the real world. Data science modeling, which is briefly discussed below, can be used to analyze such data in a specific problem domain and derive insights or useful information from the data to build a data-driven model or data product.

Steps of Data Science Modeling

Data science is typically an umbrella term that encompasses advanced data analytics, data mining, machine, and deep learning modeling, and several other related disciplines like statistics, to extract insights or useful knowledge from the datasets and transform them into actionable business strategies, mentioned earlier in “ Background and related work ”. In this section, we briefly discuss how data science can play a significant role in the real-world business process. Figure ​ Figure2 2 shows an example of data science modeling starting from real-world data to data-driven product and automation. In the following, we briefly discuss each module of the data science process.

  • Understanding business problems: This involves getting a clear understanding of the problem that is needed to solve, how it impacts the relevant organization or individuals, the ultimate goals for addressing it, and the relevant project plan. Thus to understand and identify the business problems, the data scientists formulate relevant questions while working with the end-users and other stakeholders. For instance, how much/many, which category/group, is the behavior unrealistic/abnormal, which option should be taken, what action, etc. could be relevant questions depending on the nature of the problems. This helps to get a better idea of what business needs and what we should be extracted from data. Such business knowledge can enable organizations to enhance their decision-making process, is known as “Business Intelligence” [ 65 ]. Identifying the relevant data sources that can help to answer the formulated questions and what kinds of actions should be taken from the trends that the data shows, is another important task associated with this stage. Once the business problem has been clearly stated, the data scientist can define the analytic approach to solve the problem.
  • Understanding data: As we know that data science is largely driven by the availability of data [ 114 ]. Thus a sound understanding of the data is needed towards a data-driven model or system. The reason is that real-world data sets are often noisy, missing values, have inconsistencies, or other data issues, which are needed to handle effectively [ 101 ]. To gain actionable insights, the appropriate data or the quality of the data must be sourced and cleansed, which is fundamental to any data science engagement. For this, data assessment that evaluates what data is available and how it aligns to the business problem could be the first step in data understanding. Several aspects such as data type/format, the quantity of data whether it is sufficient or not to extract the useful knowledge, data relevance, authorized access to data, feature or attribute importance, combining multiple data sources, important metrics to report the data, etc. are needed to take into account to clearly understand the data for a particular business problem. Overall, the data understanding module involves figuring out what data would be best needed and the best ways to acquire it.
  • Data pre-processing and exploration: Exploratory data analysis is defined in data science as an approach to analyzing datasets to summarize their key characteristics, often with visual methods [ 135 ]. This examines a broad data collection to discover initial trends, attributes, points of interest, etc. in an unstructured manner to construct meaningful summaries of the data. Thus data exploration is typically used to figure out the gist of data and to develop a first step assessment of its quality, quantity, and characteristics. A statistical model can be used or not, but primarily it offers tools for creating hypotheses by generally visualizing and interpreting the data through graphical representation such as a chart, plot, histogram, etc [ 72 , 91 ]. Before the data is ready for modeling, it’s necessary to use data summarization and visualization to audit the quality of the data and provide the information needed to process it. To ensure the quality of the data, the data  pre-processing technique, which is typically the process of cleaning and transforming raw data [ 107 ] before processing and analysis is important. It also involves reformatting information, making data corrections, and merging data sets to enrich data. Thus, several aspects such as expected data, data cleaning, formatting or transforming data, dealing with missing values, handling data imbalance and bias issues, data distribution, search for outliers or anomalies in data and dealing with them, ensuring data quality, etc. could be the key considerations in this step.
  • Machine learning modeling and evaluation: Once the data is prepared for building the model, data scientists design a model, algorithm, or set of models, to address the business problem. Model building is dependent on what type of analytics, e.g., predictive analytics, is needed to solve the particular problem, which is discussed briefly in “ Advanced analytics methods and smart computing ”. To best fits the data according to the type of analytics, different types of data-driven or machine learning models that have been summarized in our earlier paper Sarker et al. [ 105 ], can be built to achieve the goal. Data scientists typically separate training and test subsets of the given dataset usually dividing in the ratio of 80:20 or data considering the most popular k -folds data splitting method [ 38 ]. This is to observe whether the model performs well or not on the data, to maximize the model performance. Various model validation and assessment metrics, such as error rate, accuracy, true positive, false positive, true negative, false negative, precision, recall, f-score, ROC (receiver operating characteristic curve) analysis, applicability analysis, etc. [ 38 , 115 ] are used to measure the model performance, which can guide the data scientists to choose or design the learning method or model. Besides, machine learning experts or data scientists can take into account several advanced analytics such as feature engineering, feature selection or extraction methods, algorithm tuning, ensemble methods, modifying existing algorithms, or designing new algorithms, etc. to improve the ultimate data-driven model to solve a particular business problem through smart decision making.
  • Data product and automation: A data product is typically the output of any data science activity [ 17 ]. A data product, in general terms, is a data deliverable, or data-enabled or guide, which can be a discovery, prediction, service, suggestion, insight into decision-making, thought, model, paradigm, tool, application, or system that process data and generate results. Businesses can use the results of such data analysis to obtain useful information like churn (a measure of how many customers stop using a product) prediction and customer segmentation, and use these results to make smarter business decisions and automation. Thus to make better decisions in various business problems, various machine learning pipelines and data products can be developed. To highlight this, we summarize several potential real-world data science application areas in “ Real-world application domains ”, where various data products can play a significant role in relevant business problems to make them smart and automate.

Overall, we can conclude that data science modeling can be used to help drive changes and improvements in business practices. The interesting part of the data science process indicates having a deeper understanding of the business problem to solve. Without that, it would be much harder to gather the right data and extract the most useful information from the data for making decisions to solve the problem. In terms of role, “Data Scientists” typically interpret and manage data to uncover the answers to major questions that help organizations to make objective decisions and solve complex problems. In a summary, a data scientist proactively gathers and analyzes information from multiple sources to better understand how the business performs, and  designs machine learning or data-driven tools/methods, or algorithms, focused on advanced analytics, which can make today’s computing process smarter and intelligent, discussed briefly in the following section.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig2_HTML.jpg

An example of data science modeling from real-world data to data-driven system and decision making

Advanced Analytics Methods and Smart Computing

As mentioned earlier in “ Background and related work ”, basic analytics provides a summary of data whereas advanced analytics takes a step forward in offering a deeper understanding of data and helps in granular data analysis. For instance, the predictive capabilities of advanced analytics can be used to forecast trends, events, and behaviors. Thus, “advanced analytics” can be defined as the autonomous or semi-autonomous analysis of data or content using advanced techniques and methods to discover deeper insights, make predictions, or produce recommendations, where machine learning-based analytical modeling is considered as the key technologies in the area. In the following section, we first summarize various types of analytics and outcome that are needed to solve the associated business problems, and then we briefly discuss machine learning-based analytical modeling.

Types of Analytics and Outcome

In the real-world business process, several key questions such as “What happened?”, “Why did it happen?”, “What will happen in the future?”, “What action should be taken?” are common and important. Based on these questions, in this paper, we categorize and highlight the analytics into four types such as descriptive, diagnostic, predictive, and prescriptive, which are discussed below.

  • Descriptive analytics: It is the interpretation of historical data to better understand the changes that have occurred in a business. Thus descriptive analytics answers the question, “what happened in the past?” by summarizing past data such as statistics on sales and operations or marketing strategies, use of social media, and engagement with Twitter, Linkedin or Facebook, etc. For instance, using descriptive analytics through analyzing trends, patterns, and anomalies, etc., customers’ historical shopping data can be used to predict the probability of a customer purchasing a product. Thus, descriptive analytics can play a significant role to provide an accurate picture of what has occurred in a business and how it relates to previous times utilizing a broad range of relevant business data. As a result, managers and decision-makers can pinpoint areas of strength and weakness in their business, and eventually can take more effective management strategies and business decisions.
  • Diagnostic analytics: It is a form of advanced analytics that examines data or content to answer the question, “why did it happen?” The goal of diagnostic analytics is to help to find the root cause of the problem. For example, the human resource management department of a business organization may use these diagnostic analytics to find the best applicant for a position, select them, and compare them to other similar positions to see how well they perform. In a healthcare example, it might help to figure out whether the patients’ symptoms such as high fever, dry cough, headache, fatigue, etc. are all caused by the same infectious agent. Overall, diagnostic analytics enables one to extract value from the data by posing the right questions and conducting in-depth investigations into the answers. It is characterized by techniques such as drill-down, data discovery, data mining, and correlations.
  • Predictive analytics: Predictive analytics is an important analytical technique used by many organizations for various purposes such as to assess business risks, anticipate potential market patterns, and decide when maintenance is needed, to enhance their business. It is a form of advanced analytics that examines data or content to answer the question, “what will happen in the future?” Thus, the primary goal of predictive analytics is to identify and typically answer this question with a high degree of probability. Data scientists can use historical data as a source to extract insights for building predictive models using various regression analyses and machine learning techniques, which can be used in various application domains for a better outcome. Companies, for example, can use predictive analytics to minimize costs by better anticipating future demand and changing output and inventory, banks and other financial institutions to reduce fraud and risks by predicting suspicious activity, medical specialists to make effective decisions through predicting patients who are at risk of diseases, retailers to increase sales and customer satisfaction through understanding and predicting customer preferences, manufacturers to optimize production capacity through predicting maintenance requirements, and many more. Thus predictive analytics can be considered as the core analytical method within the area of data science.
  • Prescriptive analytics: Prescriptive analytics focuses on recommending the best way forward with actionable information to maximize overall returns and profitability, which typically answer the question, “what action should be taken?” In business analytics, prescriptive analytics is considered the final step. For its models, prescriptive analytics collects data from several descriptive and predictive sources and applies it to the decision-making process. Thus, we can say that it is related to both descriptive analytics and predictive analytics, but it emphasizes actionable insights instead of data monitoring. In other words, it can be considered as the opposite of descriptive analytics, which examines decisions and outcomes after the fact. By integrating big data, machine learning, and business rules, prescriptive analytics helps organizations to make more informed decisions to produce results that drive the most successful business decisions.

In summary, to clarify what happened and why it happened, both descriptive analytics and diagnostic analytics look at the past. Historical data is used by predictive analytics and prescriptive analytics to forecast what will happen in the future and what steps should be taken to impact those effects. In Table ​ Table1, 1 , we have summarized these analytics methods with examples. Forward-thinking organizations in the real world can jointly use these analytical methods to make smart decisions that help drive changes in business processes and improvements. In the following, we discuss how machine learning techniques can play a big role in these analytical methods through their learning capabilities from the data.

Various types of analytical methods with examples

Machine Learning Based Analytical Modeling

In this section, we briefly discuss various advanced analytics methods based on machine learning modeling, which can make the computing process smart through intelligent decision-making in a business process. Figure ​ Figure3 3 shows a general structure of a machine learning-based predictive modeling considering both the training and testing phase. In the following, we discuss a wide range of methods such as regression and classification analysis, association rule analysis, time-series analysis, behavioral analysis, log analysis, and so on within the scope of our study.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig3_HTML.jpg

A general structure of a machine learning based predictive model considering both the training and testing phase

Regression Analysis

In data science, one of the most common statistical approaches used for predictive modeling and data mining tasks is regression techniques [ 38 ]. Regression analysis is a form of supervised machine learning that examines the relationship between a dependent variable (target) and independent variables (predictor) to predict continuous-valued output [ 105 , 117 ]. The following equations Eqs. 1 , 2 , and 3 [ 85 , 105 ] represent the simple, multiple or multivariate, and polynomial regressions respectively, where x represents independent variable and y is the predicted/target output mentioned above:

Regression analysis is typically conducted for one of two purposes: to predict the value of the dependent variable in the case of individuals for whom some knowledge relating to the explanatory variables is available, or to estimate the effect of some explanatory variable on the dependent variable, i.e., finding the relationship of causal influence between the variables. Linear regression cannot be used to fit non-linear data and may cause an underfitting problem. In that case, polynomial regression performs better, however, increases the model complexity. The regularization techniques such as Ridge, Lasso, Elastic-Net, etc. [ 85 , 105 ] can be used to optimize the linear regression model. Besides, support vector regression, decision tree regression, random forest regression techniques [ 85 , 105 ] can be used for building effective regression models depending on the problem type, e.g., non-linear tasks. Financial forecasting or prediction, cost estimation, trend analysis, marketing, time-series estimation, drug response modeling, etc. are some examples where the regression models can be used to solve real-world problems in the domain of data science and analytics.

Classification Analysis

Classification is one of the most widely used and best-known data science processes. This is a form of supervised machine learning approach that also refers to a predictive modeling problem in which a class label is predicted for a given example [ 38 ]. Spam identification, such as ‘spam’ and ‘not spam’ in email service providers, can be an example of a classification problem. There are several forms of classification analysis available in the area such as binary classification—which refers to the prediction of one of two classes; multi-class classification—which involves the prediction of one of more than two classes; multi-label classification—a generalization of multiclass classification in which the problem’s classes are organized hierarchically [ 105 ].

Several popular classification techniques, such as k-nearest neighbors [ 5 ], support vector machines [ 55 ], navies Bayes [ 49 ], adaptive boosting [ 32 ], extreme gradient boosting [ 85 ], logistic regression [ 66 ], decision trees ID3 [ 92 ], C4.5 [ 93 ], and random forests [ 13 ] exist to solve classification problems. The tree-based classification technique, e.g., random forest considering multiple decision trees, performs better than others to solve real-world problems in many cases as due to its capability of producing logic rules [ 103 , 115 ]. Figure ​ Figure4 4 shows an example of a random forest structure considering multiple decision trees. In addition, BehavDT recently proposed by Sarker et al. [ 109 ], and IntrudTree [ 106 ] can be used for building effective classification or prediction models in the relevant tasks within the domain of data science and analytics.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig4_HTML.jpg

An example of a random forest structure considering multiple decision trees

Cluster Analysis

Clustering is a form of unsupervised machine learning technique and is well-known in many data science application areas for statistical data analysis [ 38 ]. Usually, clustering techniques search for the structures inside a dataset and, if the classification is not previously identified, classify homogeneous groups of cases. This means that data points are identical to each other within a cluster, and different from data points in another cluster. Overall, the purpose of cluster analysis is to sort various data points into groups (or clusters) that are homogeneous internally and heterogeneous externally [ 105 ]. To gain insight into how data is distributed in a given dataset or as a preprocessing phase for other algorithms, clustering is often used. Data clustering, for example, assists with customer shopping behavior, sales campaigns, and retention of consumers for retail businesses, anomaly detection, etc.

Many clustering algorithms with the ability to group data have been proposed in machine learning and data science literature [ 98 , 138 , 141 ]. In our earlier paper Sarker et al. [ 105 ], we have summarized this based on several perspectives, such as partitioning methods, density-based methods, hierarchical-based methods, model-based methods, etc. In the literature, the popular K-means [ 75 ], K-Mediods [ 84 ], CLARA [ 54 ] etc. are known as partitioning methods; DBSCAN [ 30 ], OPTICS [ 8 ] etc. are known as density-based methods; single linkage [ 122 ], complete linkage [ 123 ], etc. are known as hierarchical methods. In addition, grid-based clustering methods, such as STING [ 134 ], CLIQUE [ 2 ], etc.; model-based clustering such as neural network learning [ 141 ], GMM [ 94 ], SOM [ 18 , 104 ], etc.; constrained-based methods such as COP K-means [ 131 ], CMWK-Means [ 25 ], etc. are used in the area. Recently, Sarker et al. [ 111 ] proposed a hierarchical clustering method, BOTS [ 111 ] based on bottom-up agglomerative technique for capturing user’s similar behavioral characteristics over time. The key benefit of agglomerative hierarchical clustering is that the tree-structure hierarchy created by agglomerative clustering is more informative than an unstructured set of flat clusters, which can assist in better decision-making in relevant application areas in data science.

Association Rule Analysis

Association rule learning is known as a rule-based machine learning system, an unsupervised learning method is typically used to establish a relationship among variables. This is a descriptive technique often used to analyze large datasets for discovering interesting relationships or patterns. The association learning technique’s main strength is its comprehensiveness, as it produces all associations that meet user-specified constraints including minimum support and confidence value [ 138 ].

Association rules allow a data scientist to identify trends, associations, and co-occurrences between data sets inside large data collections. In a supermarket, for example, associations infer knowledge about the buying behavior of consumers for different items, which helps to change the marketing and sales plan. In healthcare, to better diagnose patients, physicians may use association guidelines. Doctors can assess the conditional likelihood of a given illness by comparing symptom associations in the data from previous cases using association rules and machine learning-based data analysis. Similarly, association rules are useful for consumer behavior analysis and prediction, customer market analysis, bioinformatics, weblog mining, recommendation systems, etc.

Several types of association rules have been proposed in the area, such as frequent pattern based [ 4 , 47 , 73 ], logic-based [ 31 ], tree-based [ 39 ], fuzzy-rules [ 126 ], belief rule [ 148 ] etc. The rule learning techniques such as AIS [ 3 ], Apriori [ 4 ], Apriori-TID and Apriori-Hybrid [ 4 ], FP-Tree [ 39 ], Eclat [ 144 ], RARM [ 24 ] exist to solve the relevant business problems. Apriori [ 4 ] is the most commonly used algorithm for discovering association rules from a given dataset among the association rule learning techniques [ 145 ]. The recent association rule-learning technique ABC-RuleMiner proposed in our earlier paper by Sarker et al. [ 113 ] could give significant results in terms of generating non-redundant rules that can be used for smart decision making according to human preferences, within the area of data science applications.

Time-Series Analysis and Forecasting

A time series is typically a series of data points indexed in time order particularly, by date, or timestamp [ 111 ]. Depending on the frequency, the time-series can be different types such as annually, e.g., annual budget, quarterly, e.g., expenditure, monthly, e.g., air traffic, weekly, e.g., sales quantity, daily, e.g., weather, hourly, e.g., stock price, minute-wise, e.g., inbound calls in a call center, and even second-wise, e.g., web traffic, and so on in relevant domains.

A mathematical method dealing with such time-series data, or the procedure of fitting a time series to a proper model is termed time-series analysis. Many different time series forecasting algorithms and analysis methods can be applied to extract the relevant information. For instance, to do time-series forecasting for future patterns, the autoregressive (AR) model [ 130 ] learns the behavioral trends or patterns of past data. Moving average (MA) [ 40 ] is another simple and common form of smoothing used in time series analysis and forecasting that uses past forecasted errors in a regression-like model to elaborate an averaged trend across the data. The autoregressive moving average (ARMA) [ 12 , 120 ] combines these two approaches, where autoregressive extracts the momentum and pattern of the trend and moving average capture the noise effects. The most popular and frequently used time-series model is the autoregressive integrated moving average (ARIMA) model [ 12 , 120 ]. ARIMA model, a generalization of an ARMA model, is more flexible than other statistical models such as exponential smoothing or simple linear regression. In terms of data, the ARMA model can only be used for stationary time-series data, while the ARIMA model includes the case of non-stationarity as well. Similarly, seasonal autoregressive integrated moving average (SARIMA), autoregressive fractionally integrated moving average (ARFIMA), autoregressive moving average model with exogenous inputs model (ARMAX model) are also used in time-series models [ 120 ].

In addition to the stochastic methods for time-series modeling and forecasting, machine and deep learning-based approach can be used for effective time-series analysis and forecasting. For instance, in our earlier paper, Sarker et al. [ 111 ] present a bottom-up clustering-based time-series analysis to capture the mobile usage behavioral patterns of the users. Figure ​ Figure5 5 shows an example of producing aggregate time segments Seg_i from initial time slices TS_i based on similar behavioral characteristics that are used in our bottom-up clustering approach, where D represents the dominant behavior BH_i of the users, mentioned above [ 111 ]. The authors in [ 118 ], used a long short-term memory (LSTM) model, a kind of recurrent neural network (RNN) deep learning model, in forecasting time-series that outperform traditional approaches such as the ARIMA model. Time-series analysis is commonly used these days in various fields such as financial, manufacturing, business, social media, event data (e.g., clickstreams and system events), IoT and smartphone data, and generally in any applied science and engineering temporal measurement domain. Thus, it covers a wide range of application areas in data science.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig5_HTML.jpg

An example of producing aggregate time segments from initial time slices based on similar behavioral characteristics

Opinion Mining and Sentiment Analysis

Sentiment analysis or opinion mining is the computational study of the opinions, thoughts, emotions, assessments, and attitudes of people towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [ 71 ]. There are three kinds of sentiments: positive, negative, and neutral, along with more extreme feelings such as angry, happy and sad, or interested or not interested, etc. More refined sentiments to evaluate the feelings of individuals in various situations can also be found according to the problem domain.

Although the task of opinion mining and sentiment analysis is very challenging from a technical point of view, it’s very useful in real-world practice. For instance, a business always aims to obtain an opinion from the public or customers about its products and services to refine the business policy as well as a better business decision. It can thus benefit a business to understand the social opinion of their brand, product, or service. Besides, potential customers want to know what consumers believe they have when they use a service or purchase a product. Document-level, sentence level, aspect level, and concept level, are the possible levels of opinion mining in the area [ 45 ].

Several popular techniques such as lexicon-based including dictionary-based and corpus-based methods, machine learning including supervised and unsupervised learning, deep learning, and hybrid methods are used in sentiment analysis-related tasks [ 70 ]. To systematically define, extract, measure, and analyze affective states and subjective knowledge, it incorporates the use of statistics, natural language processing (NLP), machine learning as well as deep learning methods. Sentiment analysis is widely used in many applications, such as reviews and survey data, web and social media, and healthcare content, ranging from marketing and customer support to clinical practice. Thus sentiment analysis has a big influence in many data science applications, where public sentiment is involved in various real-world issues.

Behavioral Data and Cohort Analysis

Behavioral analytics is a recent trend that typically reveals new insights into e-commerce sites, online gaming, mobile and smartphone applications, IoT user behavior, and many more [ 112 ]. The behavioral analysis aims to understand how and why the consumers or users behave, allowing accurate predictions of how they are likely to behave in the future. For instance, it allows advertisers to make the best offers with the right client segments at the right time. Behavioral analytics, including traffic data such as navigation paths, clicks, social media interactions, purchase decisions, and marketing responsiveness, use the large quantities of raw user event information gathered during sessions in which people use apps, games, or websites. In our earlier papers Sarker et al. [ 101 , 111 , 113 ] we have discussed how to extract users phone usage behavioral patterns utilizing real-life phone log data for various purposes.

In the real-world scenario, behavioral analytics is often used in e-commerce, social media, call centers, billing systems, IoT systems, political campaigns, and other applications, to find opportunities for optimization to achieve particular outcomes. Cohort analysis is a branch of behavioral analytics that involves studying groups of people over time to see how their behavior changes. For instance, it takes data from a given data set (e.g., an e-commerce website, web application, or online game) and separates it into related groups for analysis. Various machine learning techniques such as behavioral data clustering [ 111 ], behavioral decision tree classification [ 109 ], behavioral association rules [ 113 ], etc. can be used in the area according to the goal. Besides, the concept of RecencyMiner, proposed in our earlier paper Sarker et al. [ 108 ] that takes into account recent behavioral patterns could be effective while analyzing behavioral data as it may not be static in the real-world changes over time.

Anomaly Detection or Outlier Analysis

Anomaly detection, also known as Outlier analysis is a data mining step that detects data points, events, and/or findings that deviate from the regularities or normal behavior of a dataset. Anomalies are usually referred to as outliers, abnormalities, novelties, noise, inconsistency, irregularities, and exceptions [ 63 , 114 ]. Techniques of anomaly detection may discover new situations or cases as deviant based on historical data through analyzing the data patterns. For instance, identifying fraud or irregular transactions in finance is an example of anomaly detection.

It is often used in preprocessing tasks for the deletion of anomalous or inconsistency in the real-world data collected from various data sources including user logs, devices, networks, and servers. For anomaly detection, several machine learning techniques can be used, such as k-nearest neighbors, isolation forests, cluster analysis, etc [ 105 ]. The exclusion of anomalous data from the dataset also results in a statistically significant improvement in accuracy during supervised learning [ 101 ]. However, extracting appropriate features, identifying normal behaviors, managing imbalanced data distribution, addressing variations in abnormal behavior or irregularities, the sparse occurrence of abnormal events, environmental variations, etc. could be challenging in the process of anomaly detection. Detection of anomalies can be applicable in a variety of domains such as cybersecurity analytics, intrusion detections, fraud detection, fault detection, health analytics, identifying irregularities, detecting ecosystem disturbances, and many more. This anomaly detection can be considered a significant task for building effective systems with higher accuracy within the area of data science.

Factor Analysis

Factor analysis is a collection of techniques for describing the relationships or correlations between variables in terms of more fundamental entities known as factors [ 23 ]. It’s usually used to organize variables into a small number of clusters based on their common variance, where mathematical or statistical procedures are used. The goals of factor analysis are to determine the number of fundamental influences underlying a set of variables, calculate the degree to which each variable is associated with the factors, and learn more about the existence of the factors by examining which factors contribute to output on which variables. The broad purpose of factor analysis is to summarize data so that relationships and patterns can be easily interpreted and understood [ 143 ].

Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are the two most popular factor analysis techniques. EFA seeks to discover complex trends by analyzing the dataset and testing predictions, while CFA tries to validate hypotheses and uses path analysis diagrams to represent variables and factors [ 143 ]. Factor analysis is one of the algorithms for unsupervised machine learning that is used for minimizing dimensionality. The most common methods for factor analytics are principal components analysis (PCA), principal axis factoring (PAF), and maximum likelihood (ML) [ 48 ]. Methods of correlation analysis such as Pearson correlation, canonical correlation, etc. may also be useful in the field as they can quantify the statistical relationship between two continuous variables, or association. Factor analysis is commonly used in finance, marketing, advertising, product management, psychology, and operations research, and thus can be considered as another significant analytical method within the area of data science.

Log Analysis

Logs are commonly used in system management as logs are often the only data available that record detailed system runtime activities or behaviors in production [ 44 ]. Log analysis is thus can be considered as the method of analyzing, interpreting, and capable of understanding computer-generated records or messages, also known as logs. This can be device log, server log, system log, network log, event log, audit trail, audit record, etc. The process of creating such records is called data logging.

Logs are generated by a wide variety of programmable technologies, including networking devices, operating systems, software, and more. Phone call logs [ 88 , 110 ], SMS Logs [ 28 ], mobile apps usages logs [ 124 , 149 ], notification logs [ 77 ], game Logs [ 82 ], context logs [ 16 , 149 ], web logs [ 37 ], smartphone life logs [ 95 ], etc. are some examples of log data for smartphone devices. The main characteristics of these log data is that it contains users’ actual behavioral activities with their devices. Similar other log data can be search logs [ 50 , 133 ], application logs [ 26 ], server logs [ 33 ], network logs [ 57 ], event logs [ 83 ], network and security logs [ 142 ] etc.

Several techniques such as classification and tagging, correlation analysis, pattern recognition methods, anomaly detection methods, machine learning modeling, etc. [ 105 ] can be used for effective log analysis. Log analysis can assist in compliance with security policies and industry regulations, as well as provide a better user experience by encouraging the troubleshooting of technical problems and identifying areas where efficiency can be improved. For instance, web servers use log files to record data about website visitors. Windows event log analysis can help an investigator draw a timeline based on the logging information and the discovered artifacts. Overall, advanced analytics methods by taking into account machine learning modeling can play a significant role to extract insightful patterns from these log data, which can be used for building automated and smart applications, and thus can be considered as a key working area in data science.

Neural Networks and Deep Learning Analysis

Deep learning is a form of machine learning that uses artificial neural networks to create a computational architecture that learns from data by combining multiple processing layers, such as the input, hidden, and output layers [ 38 ]. The key benefit of deep learning over conventional machine learning methods is that it performs better in a variety of situations, particularly when learning from large datasets [ 114 , 140 ].

The most common deep learning algorithms are: multi-layer perceptron (MLP) [ 85 ], convolutional neural network (CNN or ConvNet) [ 67 ], long short term memory recurrent neural network (LSTM-RNN) [ 34 ]. Figure ​ Figure6 6 shows a structure of an artificial neural network modeling with multiple processing layers. The Backpropagation technique [ 38 ] is used to adjust the weight values internally while building the model. Convolutional neural networks (CNNs) [ 67 ] improve on the design of traditional artificial neural networks (ANNs), which include convolutional layers, pooling layers, and fully connected layers. It is commonly used in a variety of fields, including natural language processing, speech recognition, image processing, and other autocorrelated data since it takes advantage of the two-dimensional (2D) structure of the input data. AlexNet [ 60 ], Xception [ 21 ], Inception [ 125 ], Visual Geometry Group (VGG) [ 42 ], ResNet [ 43 ], etc., and other advanced deep learning models based on CNN are also used in the field.

An external file that holds a picture, illustration, etc.
Object name is 42979_2021_765_Fig6_HTML.jpg

A structure of an artificial neural network modeling with multiple processing layers

In addition to CNN, recurrent neural network (RNN) architecture is another popular method used in deep learning. Long short-term memory (LSTM) is a popular type of recurrent neural network architecture used broadly in the area of deep learning. Unlike traditional feed-forward neural networks, LSTM has feedback connections. Thus, LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, sorting, and predicting data based on time-series data. Therefore, when the data is in a sequential format, such as time, sentence, etc., LSTM can be used, and it is widely used in the areas of time-series analysis, natural language processing, speech recognition, and so on.

In addition to the most popular deep learning methods mentioned above, several other deep learning approaches [ 104 ] exist in the field for various purposes. The self-organizing map (SOM) [ 58 ], for example, uses unsupervised learning to represent high-dimensional data as a 2D grid map, reducing dimensionality. Another learning technique that is commonly used for dimensionality reduction and feature extraction in unsupervised learning tasks is the autoencoder (AE) [ 10 ]. Restricted Boltzmann machines (RBM) can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling, according to [ 46 ]. A deep belief network (DBN) is usually made up of a backpropagation neural network and unsupervised networks like restricted Boltzmann machines (RBMs) or autoencoders (BPNN) [ 136 ]. A generative adversarial network (GAN) [ 35 ] is a deep learning network that can produce data with characteristics that are similar to the input data. Transfer learning is common worldwide presently because it can train deep neural networks with a small amount of data, which is usually the re-use of a pre-trained model on a new problem [ 137 ]. These deep learning methods can perform  well, particularly, when learning from large-scale datasets [ 105 , 140 ]. In our previous article Sarker et al. [ 104 ], we have summarized a brief discussion of various artificial neural networks (ANN) and deep learning (DL) models mentioned above, which can be used in a variety of data science and analytics tasks.

Real-World Application Domains

Almost every industry or organization is impacted by data, and thus “Data Science” including advanced analytics with machine learning modeling can be used in business, marketing, finance, IoT systems, cybersecurity, urban management, health care, government policies, and every possible industries, where data gets generated. In the following, we discuss ten most popular application areas based on data science and analytics.

  • Business or financial data science: In general, business data science can be considered as the study of business or e-commerce data to obtain insights about a business that can typically lead to smart decision-making as well as taking high-quality actions [ 90 ]. Data scientists can develop algorithms or data-driven models predicting customer behavior, identifying patterns and trends based on historical business data, which can help companies to reduce costs, improve service delivery, and generate recommendations for better decision-making. Eventually, business automation, intelligence, and efficiency can be achieved through the data science process discussed earlier, where various advanced analytics methods and machine learning modeling based on the collected data are the keys. Many online retailers, such as Amazon [ 76 ], can improve inventory management, avoid out-of-stock situations, and optimize logistics and warehousing using predictive modeling based on machine learning techniques [ 105 ]. In terms of finance, the historical data is related to financial institutions to make high-stakes business decisions, which is mostly used for risk management, fraud prevention, credit allocation, customer analytics, personalized services, algorithmic trading, etc. Overall, data science methodologies can play a key role in the future generation business or finance industry, particularly in terms of business automation, intelligence, and smart decision-making and systems.
  • Manufacturing or industrial data science: To compete in global production capability, quality, and cost, manufacturing industries have gone through many industrial revolutions [ 14 ]. The latest fourth industrial revolution, also known as Industry 4.0, is the emerging trend of automation and data exchange in manufacturing technology. Thus industrial data science, which is the study of industrial data to obtain insights that can typically lead to optimizing industrial applications, can play a vital role in such revolution. Manufacturing industries generate a large amount of data from various sources such as sensors, devices, networks, systems, and applications [ 6 , 68 ]. The main categories of industrial data include large-scale data devices, life-cycle production data, enterprise operation data, manufacturing value chain sources, and collaboration data from external sources [ 132 ]. The data needs to be processed, analyzed, and secured to help improve the system’s efficiency, safety, and scalability. Data science modeling thus can be used to maximize production, reduce costs and raise profits in manufacturing industries.
  • Medical or health data science: Healthcare is one of the most notable fields where data science is making major improvements. Health data science involves the extrapolation of actionable insights from sets of patient data, typically collected from electronic health records. To help organizations, improve the quality of treatment, lower the cost of care, and improve the patient experience, data can be obtained from several sources, e.g., the electronic health record, billing claims, cost estimates, and patient satisfaction surveys, etc., to analyze. In reality, healthcare analytics using machine learning modeling can minimize medical costs, predict infectious outbreaks, prevent preventable diseases, and generally improve the quality of life [ 81 , 119 ]. Across the global population, the average human lifespan is growing, presenting new challenges to today’s methods of delivery of care. Thus health data science modeling can play a role in analyzing current and historical data to predict trends, improve services, and even better monitor the spread of diseases. Eventually, it may lead to new approaches to improve patient care, clinical expertise, diagnosis, and management.
  • IoT data science: Internet of things (IoT) [ 9 ] is a revolutionary technical field that turns every electronic system into a smarter one and is therefore considered to be the big frontier that can enhance almost all activities in our lives. Machine learning has become a key technology for IoT applications because it uses expertise to identify patterns and generate models that help predict future behavior and events [ 112 ]. One of the IoT’s main fields of application is a smart city, which uses technology to improve city services and citizens’ living experiences. For example, using the relevant data, data science methods can be used for traffic prediction in smart cities, to estimate the total usage of energy of the citizens for a particular period. Deep learning-based models in data science can be built based on a large scale of IoT datasets [ 7 , 104 ]. Overall, data science and analytics approaches can aid modeling in a variety of IoT and smart city services, including smart governance, smart homes, education, connectivity, transportation, business, agriculture, health care, and industry, and many others.
  • Cybersecurity data science: Cybersecurity, or the practice of defending networks, systems, hardware, and data from digital attacks, is one of the most important fields of Industry 4.0 [ 114 , 121 ]. Data science techniques, particularly machine learning, have become a crucial cybersecurity technology that continually learns to identify trends by analyzing data, better detecting malware in encrypted traffic, finding insider threats, predicting where bad neighborhoods are online, keeping people safe while surfing, or protecting information in the cloud by uncovering suspicious user activity [ 114 ]. For instance, machine learning and deep learning-based security modeling can be used to effectively detect various types of cyberattacks or anomalies [ 103 , 106 ]. To generate security policy rules, association rule learning can play a significant role to build rule-based systems [ 102 ]. Deep learning-based security models can perform better when utilizing the large scale of security datasets [ 140 ]. Thus data science modeling can enable professionals in cybersecurity to be more proactive in preventing threats and reacting in real-time to active attacks, through extracting actionable insights from the security datasets.
  • Behavioral data science: Behavioral data is information produced as a result of activities, most commonly commercial behavior, performed on a variety of Internet-connected devices, such as a PC, tablet, or smartphones [ 112 ]. Websites, mobile applications, marketing automation systems, call centers, help desks, and billing systems, etc. are all common sources of behavioral data. Behavioral data is much more than just data, which is not static data [ 108 ]. Advanced analytics of these data including machine learning modeling can facilitate in several areas such as predicting future sales trends and product recommendations in e-commerce and retail; predicting usage trends, load, and user preferences in future releases in online gaming; determining how users use an application to predict future usage and preferences in application development; breaking users down into similar groups to gain a more focused understanding of their behavior in cohort analysis; detecting compromised credentials and insider threats by locating anomalous behavior, or making suggestions, etc. Overall, behavioral data science modeling typically enables to make the right offers to the right consumers at the right time on various common platforms such as e-commerce platforms, online games, web and mobile applications, and IoT. In social context, analyzing the behavioral data of human being using advanced analytics methods and the extracted insights from social data can be used for data-driven intelligent social services, which can be considered as social data science.
  • Mobile data science: Today’s smart mobile phones are considered as “next-generation, multi-functional cell phones that facilitate data processing, as well as enhanced wireless connectivity” [ 146 ]. In our earlier paper [ 112 ], we have shown that users’ interest in “Mobile Phones” is more and more than other platforms like “Desktop Computer”, “Laptop Computer” or “Tablet Computer” in recent years. People use smartphones for a variety of activities, including e-mailing, instant messaging, online shopping, Internet surfing, entertainment, social media such as Facebook, Linkedin, and Twitter, and various IoT services such as smart cities, health, and transportation services, and many others. Intelligent apps are based on the extracted insight from the relevant datasets depending on apps characteristics, such as action-oriented, adaptive in nature, suggestive and decision-oriented, data-driven, context-awareness, and cross-platform operation [ 112 ]. As a result, mobile data science, which involves gathering a large amount of mobile data from various sources and analyzing it using machine learning techniques to discover useful insights or data-driven trends, can play an important role in the development of intelligent smartphone applications.
  • Multimedia data science: Over the last few years, a big data revolution in multimedia management systems has resulted from the rapid and widespread use of multimedia data, such as image, audio, video, and text, as well as the ease of access and availability of multimedia sources. Currently, multimedia sharing websites, such as Yahoo Flickr, iCloud, and YouTube, and social networks such as Facebook, Instagram, and Twitter, are considered as valuable sources of multimedia big data [ 89 ]. People, particularly younger generations, spend a lot of time on the Internet and social networks to connect with others, exchange information, and create multimedia data, thanks to the advent of new technology and the advanced capabilities of smartphones and tablets. Multimedia analytics deals with the problem of effectively and efficiently manipulating, handling, mining, interpreting, and visualizing various forms of data to solve real-world problems. Text analysis, image or video processing, computer vision, audio or speech processing, and database management are among the solutions available for a range of applications including healthcare, education, entertainment, and mobile devices.
  • Smart cities or urban data science: Today, more than half of the world’s population live in urban areas or cities [ 80 ] and considered as drivers or hubs of economic growth, wealth creation, well-being, and social activity [ 96 , 116 ]. In addition to cities, “Urban area” can refer to the surrounding areas such as towns, conurbations, or suburbs. Thus, a large amount of data documenting daily events, perceptions, thoughts, and emotions of citizens or people are recorded, that are loosely categorized into personal data, e.g., household, education, employment, health, immigration, crime, etc., proprietary data, e.g., banking, retail, online platforms data, etc., government data, e.g., citywide crime statistics, or government institutions, etc., Open and public data, e.g., data.gov, ordnance survey, and organic and crowdsourced data, e.g., user-generated web data, social media, Wikipedia, etc. [ 29 ]. The field of urban data science typically focuses on providing more effective solutions from a data-driven perspective, through extracting knowledge and actionable insights from such urban data. Advanced analytics of these data using machine learning techniques [ 105 ] can facilitate the efficient management of urban areas including real-time management, e.g., traffic flow management, evidence-based planning decisions which pertain to the longer-term strategic role of forecasting for urban planning, e.g., crime prevention, public safety, and security, or framing the future, e.g., political decision-making [ 29 ]. Overall, it can contribute to government and public planning, as well as relevant sectors including retail, financial services, mobility, health, policing, and utilities within a data-rich urban environment through data-driven smart decision-making and policies, which lead to smart cities and improve the quality of human life.
  • Smart villages or rural data science: Rural areas or countryside are the opposite of urban areas, that include villages, hamlets, or agricultural areas. The field of rural data science typically focuses on making better decisions and providing more effective solutions that include protecting public safety, providing critical health services, agriculture, and fostering economic development from a data-driven perspective, through extracting knowledge and actionable insights from the collected rural data. Advanced analytics of rural data including machine learning [ 105 ] modeling can facilitate providing new opportunities for them to build insights and capacity to meet current needs and prepare for their futures. For instance, machine learning modeling [ 105 ] can help farmers to enhance their decisions to adopt sustainable agriculture utilizing the increasing amount of data captured by emerging technologies, e.g., the internet of things (IoT), mobile technologies and devices, etc. [ 1 , 51 , 52 ]. Thus, rural data science can play a very important role in the economic and social development of rural areas, through agriculture, business, self-employment, construction, banking, healthcare, governance, or other services, etc. that lead to smarter villages.

Overall, we can conclude that data science modeling can be used to help drive changes and improvements in almost every sector in our real-world life, where the relevant data is available to analyze. To gather the right data and extract useful knowledge or actionable insights from the data for making smart decisions is the key to data science modeling in any application domain. Based on our discussion on the above ten potential real-world application domains by taking into account data-driven smart computing and decision making, we can say that the prospects of data science and the role of data scientists are huge for the future world. The “Data Scientists” typically analyze information from multiple sources to better understand the data and business problems, and develop machine learning-based analytical modeling or algorithms, or data-driven tools, or solutions, focused on advanced analytics, which can make today’s computing process smarter, automated, and intelligent.

Challenges and Research Directions

Our study on data science and analytics, particularly data science modeling in “ Understanding data science modeling ”, advanced analytics methods and smart computing in “ Advanced analytics methods and smart computing ”, and real-world application areas in “ Real-world application domains ” open several research issues in the area of data-driven business solutions and eventual data products. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions to build data-driven products.

  • Understanding the real-world business problems and associated data including nature, e.g., what forms, type, size, labels, etc., is the first challenge in the data science modeling, discussed briefly in “ Understanding data science modeling ”. This is actually to identify, specify, represent and quantify the domain-specific business problems and data according to the requirements. For a data-driven effective business solution, there must be a well-defined workflow before beginning the actual data analysis work. Furthermore, gathering business data is difficult because data sources can be numerous and dynamic. As a result, gathering different forms of real-world data, such as structured, or unstructured, related to a specific business issue with legal access, which varies from application to application, is challenging. Moreover, data annotation, which is typically the process of categorization, tagging, or labeling of raw data, for the purpose of building data-driven models, is another challenging issue. Thus, the primary task is to conduct a more in-depth analysis of data collection and dynamic annotation methods. Therefore, understanding the business problem, as well as integrating and managing the raw data gathered for efficient data analysis, may be one of the most challenging aspects of working in the field of data science and analytics.
  • The next challenge is the extraction of the relevant and accurate information from the collected data mentioned above. The main focus of data scientists is typically to disclose, describe, represent, and capture data-driven intelligence for actionable insights from data. However, the real-world data may contain many ambiguous values, missing values, outliers, and meaningless data [ 101 ]. The advanced analytics methods including machine and deep learning modeling, discussed in “ Advanced analytics methods and smart computing ”, highly impact the quality, and availability of the data. Thus understanding real-world business scenario and associated data, to whether, how, and why they are insufficient, missing, or problematic, then extend or redevelop the existing methods, such as large-scale hypothesis testing, learning inconsistency, and uncertainty, etc. to address the complexities in data and business problems is important. Therefore, developing new techniques to effectively pre-process the diverse data collected from multiple sources, according to their nature and characteristics could be another challenging task.
  • Understanding and selecting the appropriate analytical methods to extract the useful insights for smart decision-making for a particular business problem is the main issue in the area of data science. The emphasis of advanced analytics is more on anticipating the use of data to detect patterns to determine what is likely to occur in the future. Basic analytics offer a description of data in general, while advanced analytics is a step forward in offering a deeper understanding of data and helping to granular data analysis. Thus, understanding the advanced analytics methods, especially machine and deep learning-based modeling is the key. The traditional learning techniques mentioned in “ Advanced analytics methods and smart computing ” may not be directly applicable for the expected outcome in many cases. For instance, in a rule-based system, the traditional association rule learning technique [ 4 ] may  produce redundant rules from the data that makes the decision-making process complex and ineffective [ 113 ]. Thus, a scientific understanding of the learning algorithms, mathematical properties, how the techniques are robust or fragile to input data, is needed to understand. Therefore, a deeper understanding of the strengths and drawbacks of the existing machine and deep learning methods [ 38 , 105 ] to solve a particular business problem is needed, consequently to improve or optimize the learning algorithms according to the data characteristics, or to propose the new algorithm/techniques with higher accuracy becomes a significant challenging issue for the future generation data scientists.
  • The traditional data-driven models or systems typically use a large amount of business data to generate data-driven decisions. In several application fields, however, the new trends are more likely to be interesting and useful for modeling and predicting the future than older ones. For example, smartphone user behavior modeling, IoT services, stock market forecasting, health or transport service, job market analysis, and other related areas where time-series and actual human interests or preferences are involved over time. Thus, rather than considering the traditional data analysis, the concept of RecencyMiner, i.e., recent pattern-based extracted insight or knowledge proposed in our earlier paper Sarker et al. [ 108 ] might be effective. Therefore, to propose the new techniques by taking into account the recent data patterns, and consequently to build a recency-based data-driven model for solving real-world problems, is another significant challenging issue in the area.
  • The most crucial task for a data-driven smart system is to create a framework that supports data science modeling discussed in “ Understanding data science modeling ”. As a result, advanced analytical methods based on machine learning or deep learning techniques can be considered in such a system to make the framework capable of resolving the issues. Besides, incorporating contextual information such as temporal context, spatial context, social context, environmental context, etc. [ 100 ] can be used for building an adaptive, context-aware, and dynamic model or framework, depending on the problem domain. As a result, a well-designed data-driven framework, as well as experimental evaluation, is a very important direction to effectively solve a business problem in a particular domain, as well as a big challenge for the data scientists.
  • In several important application areas such as autonomous cars, criminal justice, health care, recruitment, housing, management of the human resource, public safety, where decisions made by models, or AI agents, have a direct effect on human lives. As a result, there is growing concerned about whether these decisions can be trusted, to be right, reasonable, ethical, personalized, accurate, robust, and secure, particularly in the context of adversarial attacks [ 104 ]. If we can explain the result in a meaningful way, then the model can be better trusted by the end-user. For machine-learned models, new trust properties yield new trade-offs, such as privacy versus accuracy; robustness versus efficiency; fairness versus robustness. Therefore, incorporating trustworthy AI particularly, data-driven or machine learning modeling could be another challenging issue in the area.

In the above, we have summarized and discussed several challenges and the potential research opportunities and directions, within the scope of our study in the area of data science and advanced analytics. The data scientists in academia/industry and the researchers in the relevant area have the opportunity to contribute to each issue identified above and build effective data-driven models or systems, to make smart decisions in the corresponding business domains.

In this paper, we have presented a comprehensive view on data science including various types of advanced analytical methods that can be applied to enhance the intelligence and the capabilities of an application. We have also visualized the current popularity of data science and machine learning-based advanced analytical modeling and also differentiate these from the relevant terms used in the area, to make the position of this paper. A thorough study on the data science modeling with its various processing modules that are needed to extract the actionable insights from the data for a particular business problem and the eventual data product. Thus, according to our goal, we have briefly discussed how different data modules can play a significant role in a data-driven business solution through the data science process. For this, we have also summarized various types of advanced analytical methods and outcomes as well as machine learning modeling that are needed to solve the associated business problems. Thus, this study’s key contribution has been identified as the explanation of different advanced analytical methods and their applicability in various real-world data-driven applications areas including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making.

Finally, within the scope of our study, we have outlined and discussed the challenges we faced, as well as possible research opportunities and future directions. As a result, the challenges identified provide promising research opportunities in the field that can be explored with effective solutions to improve the data-driven model and systems. Overall, we conclude that our study of advanced analytical solutions based on data science and machine learning methods, leads in a positive direction and can be used as a reference guide for future research and applications in the field of data science and its real-world applications by both academia and industry professionals.

Declarations

The author declares no conflict of interest.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

  1. 😍 Define analytical research. How to Write an Analytical Research Essay

    definition of analytical research

  2. Analytical Research: What is it, Importance + Examples

    definition of analytical research

  3. Descriptive and analytical research

    definition of analytical research

  4. Descriptive and analytical research

    definition of analytical research

  5. PPT

    definition of analytical research

  6. Descriptive and analytical research

    definition of analytical research

VIDEO

  1. Basics of HR Analytics 3.1

  2. Descriptive and Analytical Research

  3. What is Analytical Chemistry?

  4. What is definition of Analytical Method Validation???? AMV kya hota hai????

  5. Data Analysis in Research

  6. IMPORTANCE OF ANALYTICAL CHEMISTRY

COMMENTS

  1. Analytical Research: What is it, Importance + Examples

    Analytical research is the process of gathering, analyzing, and interpreting information to make inferences and reach conclusions. Depending on the purpose of the research and the data you have access to, you can conduct analytical research using a variety of methods. Here are a few typical approaches:

  2. Descriptive and Analytical Research: What's the Difference?

    Descriptive research classifies, describes, compares, and measures data. Meanwhile, analytical research focuses on cause and effect. For example, take numbers on the changing trade deficits between the United States and the rest of the world in 2015-2018. This is descriptive research.

  3. Analytical Research: What is it, Importance + Examples

    Analytical research is a type of research which requires critical thinking skills plus the verification of relevant facts and information. ... Cross-Sectional Study: Definition, Designs & Examples. It is used by various professionals, including psychologists, doctors, and academics, to identify the most pertinent material during investigations. ...

  4. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.Data analysis techniques are used to gain useful insights from datasets, which ...

  5. Data Analytics: Definition, Uses, Examples, and More

    Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. Data analytics is often confused with data analysis. While these are related terms, they aren't exactly the same. In fact, data analysis is a subcategory of data analytics that deals ...

  6. Study designs: Part 1

    Research study design is a framework, or the set of methods and procedures used to collect and analyze data on variables specified in a particular research problem. Research study designs are of many types, each with its advantages and limitations. The type of study design used to answer a particular research question is determined by the ...

  7. What Is Quantitative Research?

    Revised on June 22, 2023. Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing ...

  8. Analytical studies: a framework for quality improvement design and

    An analytical study is one in which action will be taken on a cause system to improve the future performance of the system of interest. The aim of an enumerative study is estimation, while an analytical study focuses on prediction. Because of the temporal nature of improvement, the theory and methods for analytical studies are a critical ...

  9. Research Methods

    Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  10. What are Analytical Study Designs?

    When are analytical study designs used? A study design is a systematic plan, developed so you can carry out your research study effectively and efficiently. Having a design is important because it will determine the right methodologies for your study. Using the right study design makes your results more credible, valid, and coherent.

  11. Introduction to systematic review and meta-analysis

    A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality. A meta-analysis is a valid, objective ...

  12. PDF What Is Analysis in Qualitative Research?

    a research product, through which we express our analytic insights and con-structions of lived experience. Analysis Is a Process of Selection, Interpretation, and Abstraction One of the reasons we do social scientific research is to come to a different understanding of the social world. In the absence of analysis, we would have

  13. The role of analytic direction in qualitative research

    The literature on qualitative data analysis mostly concerns analyses pertaining to an individual research question and the organization of data within that research question. Few authors have written about the entire qualitative dataset from which multiple and separate analyses could be conducted and reported. The concept of analytic direction is a strategy that can assist qualitative ...

  14. Everything You Need To Know About Analytical Research

    Research is vital in any field. It helps in finding out information about various subjects. It is a systematic process of collecting data, documenting critical information, analyzing data, and interpreting it. It employs different methodologies to perform various tasks. Its main task is to collect, compose and analyze data on the subject matter.

  15. 1.3 Scientific Method

    Chapter 1. In the first section, we start with the definition of various terms relating to research. Terms to be discussed are research , research methods and research methodology and, finally, a brief discussion of various types of research. In the second section, we will discuss what is economics and what economists do.

  16. (PDF) ANALYTICAL THINKING AS A KEY COMPETENCE FOR ...

    The analytical thinking is a critical component of mental activity that enables people to solve problems. quickly and effectively. It includes a methodical grading approach that allows complex ...

  17. What is Scientific Research and How Can it be Done?

    Research conducted for the purpose of contributing towards science by the systematic collection, interpretation and evaluation of data and that, too, in a planned manner is called scientific research: a researcher is the one who conducts this research. The results obtained from a small group through scientific studies are socialised, and new ...

  18. Analytic Definition & Meaning

    The meaning of ANALYTIC is of or relating to analysis or analytics; especially : separating something into component parts or constituent elements. How to use analytic in a sentence.

  19. Operations Research & Analytics

    Operations research (O.R.) is defined as the scientific process of transforming data into insights to making better decisions. Analytics is the application of scientific & mathematical methods to the study & analysis of problems involving complex systems. There are three distinct types of analytics: Descriptive Analytics gives insight into past events, using historical data.

  20. Analytical Research: Examples and Advantages

    Analytical research is a methodical investigation approach that delves deep into complex subjects through data analysis. It aids in understanding, problem-solving, and informed decision-making in diverse fields. A retail company is using analytical research to enhance its marketing strategies. They gather extensive data on consumer behaviour ...

  21. Study designs: Part 3

    Abstract. In analytical observational studies, researchers try to establish an association between exposure (s) and outcome (s). Depending on the direction of enquiry, these studies can be directed forwards (cohort studies) or backwards (case-control studies). In this article, we examine the key features of these two types of studies.

  22. ANALYTICAL

    ANALYTICAL meaning: 1. examining or liking to examine things in detail, in order to discover more about them: 2…. Learn more.

  23. Data Science and Analytics: An Overview from Data-Driven Smart

    This research contributes to the creation of a research vector on the role of data science in central banking. In , the authors provide an overview and tutorial on the data-driven design of intelligent wireless networks. The authors in provide a thorough understanding of computational optimal transport with application to data science.