logo image missing

  • > Statistics

An Overview of Descriptive Analysis

  • Ayush Singh Rawat
  • Mar 31, 2021

An Overview of Descriptive Analysis title banner

Nowadays, Big Data and Data Science have become high volume keywords. They tend to become extensively researched and this makes this data to be processed and studied with scrutiny. One of the techniques to analyse this data is Descriptive Analysis.

This data needs to be analysed to provide great insights and influential trends that allows the next batch of content to be made in accordance to the general population’s liking or dis-liking.

Introduction

The conversion of raw data into a form that will make it easy to understand & interpret, ie., rearranging, ordering, and manipulating data to provide insightful information about the provided data.

Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns might emerge that fulfill every condition of the data.

It is one of the most important steps for conducting statistical data analysis . It gives you a conclusion of the distribution of your data, helps you detect typos and outliers, and enables you to identify similarities among variables, thus making you ready for conducting further statistical analyses.   

Techniques for Descriptive Analysis

Data aggregation and data mining are two techniques used in descriptive analysis to churn out historical data. In Data aggregation, data is first collected and then sorted in order to make the datasets more manageable.

Descriptive techniques often include constructing tables of quantiles and means, methods of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs" that can be used to carry out many disparate hypotheses. These hypotheses often highlight differences among subgroups.

Measures like segregation, discrimination, and inequality are studied using specialised descriptive techniques. Discrimination is measured with the help of audit studies or decomposition methods. More segregation on the basis of type or inequality of outcomes need not be wholly good or bad in itself, but it is often considered a marker of unjust social processes; accurate measurement of the different steps across space and time is a prerequisite to understanding these processes.

A table of means by subgroup is used to show important differences across subgroups, which mostly results in inference and conclusions being made. When we notice a gap in earnings, for example, we naturally tend to extrapolate reasons for those patterns complying. 

But this also enters the province of measuring impacts which requires the use of different techniques. Often, random variation causes difference in means, and statistical inference is required to determine whether observed differences could happen merely due to chance.

A crosstab or two-way tabulation is supposed to show the proportions of components with unique values for each of two variables available, or cell proportions. For example, we might tabulate the proportion of the population that has a high school degree and also receives food or cash assistance, meaning a crosstab of education versus receipt of assistance is supposed to be made. 

Then we might also want to examine row proportions, or the fractions in each education group who receive food or cash assistance, perhaps seeing assistance levels dip extraordinarily at higher education levels.

Column proportions can also be examined, for the fraction of population with different levels of education, but this is the opposite from any causal effects. We might come across a surprisingly high number or proportion of recipients with a college education, but this might be a result of larger numbers of people being college graduates than people who have less than a high school degree.

(Must check: 4 Types of Data in Statistics )

Types of Descriptive Analysis

Descriptive analysis can be categorized into four types which are measures of frequency, central tendency, dispersion or variation, and position. These methods are optimal for a single variable at a time.

the photo represents the different types of Descriptive analysis techniques, namely; Measures of frequency, measures of central tendency, measures of dispersion, measures of position, contingency tables and scatter plots.

Different types of Descriptive Analysis

Measures of Frequency

In descriptive analysis, it’s essential to know how frequently a certain event or response is likely to occur. This is the prime purpose of measures of frequency to make like a count or percent. 

For example, consider a survey where 500 participants are asked about their favourite IPL team. A list of 500 responses would be difficult to consume and accommodate, but the data can be made much more accessible by measuring how many times a certain IPL team was selected.

Measures of Central Tendency

In descriptive analysis, it’s also important to find out the Central (or average) Tendency or response. Central tendency is measured with the use of three averages — mean, median, and mode. As an example, consider a survey in which the weight of 1,000 people is measured. In this case, the mean average would be an excellent descriptive metric to measure mid-values.

Measures of Dispersion

Sometimes, it is important to know how data is divided across a range. To elaborate this, consider the average weight in a sample of two people. If both individuals are 60 kilos, the average weight will be 60 kg. However, if one individual is 50 kg and the other is 70 kg, the average weight is still 60 kg. Measures of dispersion like range or standard deviation can be employed to measure this kind of distribution.

Measures of Position

Descriptive analysis also involves identifying the position of a single value or its response in relation to others. Measures like percentiles and quartiles become very useful in this area of expertise.

Apart from it, if you’ve collected data on multiple variables, you can use the Bivariate or Multivariate descriptive statistics to study whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two different variables to see if they seem to have a pattern and vary together. You can also test and compare the central tendency of the two variables before carrying out further types of statistical analysis .

Multivariate analysis is the same as bivariate analysis but it is carried out for more than two variables. Following 2 methods are for bivariate analysis.

Contingency table

In a contingency table, each cell represents the combination of the two variables. Naturally, an independent variable (e.g., gender) is listed along the vertical axis and a dependent one is tallied along the horizontal axis (e.g., activities). You need to read “across” the table to witness how the two variables i.e. independent and dependent variables relate to each other.

A table showing a tally of different gender with number of activities

Scatter plots

A scatter plot is a chart that enables you to see the relationship between two or three different variables. It’s a visual rendition of the strength of a relationship.

In a scatter plot, you are supposed to plot one variable along the x-axis and another one along the y-axis. Each data point is denoted by a point in the chart.

the photo is a scatter plot representation for the different hours of sleep a person needs to acquire by the different age in his lifespan

The scatter plot shows the hours of sleep needed per day by age, Source

(Recommend Blog: Introduction to Bayesian Statistics )

Advantages of Descriptive Analysis

High degree of objectivity and neutrality of the researchers are one of the main advantages of Descriptive Analysis. The reason why researchers need to be extra vigilant is because descriptive analysis shows different characteristics of the data extracted and if the data doesn’t match with the trends then it will lead to major dumping of data.

Descriptive analysis is considered to be more vast than other quantitative methods and provide a broader picture of an event or phenomenon. It can use any number of variables or even a single number of variables to conduct a descriptive research. 

This type of analysis is considered as a better method for collecting information that describes relationships as natural and exhibits the world as it exists. This reason makes this analysis very real and close to humanity as all the trends are made after research about the real-life behaviour of the data.

It is considered useful for identifying variables and new hypotheses which can be further analyzed through experimental and inferential studies. It is considered useful because the margin for error is very less as we are taking the trends straight from the data properties.

This type of study gives the researcher the flexibility to use both quantitative and qualitative data in order to discover the properties of the population.

For example, researchers can use both case study which is a qualitative analysis and correlation analysis to describe a phenomena in its own way. Using the case studies for describing people, events, institutions enables the researcher to understand the behavior and pattern of the concerned set to its maximum potential. 

In the case of surveys which consist of one of the main types of Descriptive Analysis, the researcher tends to gather data points from a relatively large number of samples unlike experimental studies that generally need smaller samples.

This is an out and out advantage of the survey method over other descriptive methods that it enables researchers to study larger groups of individuals with ease. If the surveys are properly administered, it gives a broader and neater description of the unit under research.

(Also check: Importance of Statistics for Data Science )

Share Blog :

what is descriptive analysis in research

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

what is descriptive analysis in research

Katherine Griffith

Hello everyone, I wish to share my testimonies with the general public about Dr Kachi for helping me to win the LOTTO MAX, i have been playing all types of lottery for the past 9years now. the only big money i have ever win was $3000 ever since things became worse to enduring because i couldn’t been able to win again, i was not happy i need help to win the lottery, until the day i was reading a newspaper online which so many people has talked good things about best lottery cast Dr Kachi who can change your life into riches. So I contacted him and he cast the spell and gave me the hot figures. I played the LOTTO MAX DRAW Behold when I went to check and to my greatest surprise my name came out as one of the winners. I won $60 Millions Dr Kachi, your spell made it wonderful to win the lottery. I can't believe it. Thank you so much sir for dedicating your time to cast the Lottery spell for me. I am eternally grateful for the lottery spell winning Dr Kachi did for me. I’m now out of debts and experiencing the most amazing good life of the lottery after I won a huge amount of money. I am more excited now than I ever have been in my life. In case you also need him to help you win, you can contact: [email protected] OR Call/Text number: +1 (209) 893-8075 Visit his Website: https://drkachispellcaster.wixsite.com/my-site

what is descriptive analysis in research

johncastrop169075dc1acbecd49d1

SOLVE YOUR FINACIAL PROBLEMS AND PAY OFF YOUR DEBTS I have being hearing about this blank ATM card for a while i never really paid any interest to it because of my doubts.I inquired about The Blank ATM Card. If it works or even Exist. He told me Yes and that its a card programmed for only money withdraws without beng noticed and can also be used for free online purchases of any kind. This was shocking and i still had my doubts. Then i gave it a try and asked for the card and agreed to their terms and conditions.. Four days later I received my card and tried with the closest ATM machine close to me, to my greatest surprise It worked like magic. I was able to withdraw up to $2,000 daily. ATM has really change my life. If you want to contact them EMAIL: [email protected] TELEGRAM: @william_barry

stevendavid30392790e676201b4280

Hello, I’ll start off by saying I had a rough past and made financial and life missteps in my 20s. I’m in my 30s now. This is both a recommendation and an appreciation of the good job that PINNACLE CREDIT SPECIALIST did for me. After reading various testimonies on Quora I decided to give them a trial. And believe me when I say they were up to the task as they cleared my old debts off my credit report, overnight one that was almost 6 years old suddenly disappeared. My report is squeaky clean. My scores went up from 400 to 800+. PINNACLE CREDIT SPECIALIST is a life saver. Contact him by email: [email protected] Or Text +1 (409) 231-0041.

christophergonzo14050304f97fed0914626

My life is back again... After 5years of Broken marriage, my wife left me with our kids .I felt like my life was about to end and I almost committed suicide, I was emotionally down for a very long time.Thanks to a spell caster called Dr.Eze Odogwu, who I met online On one faithful day, as I was browsing through the internet. I came across a lot of testimonies about this particular spell caster.Some people testified that he brought their Ex boyfriend back, some testified that he restores Womb, some testified that he can cast a spell to stop divorce and so on. I also came across one particular testimony and it was about a woman called Jenni, she testified about how he brought back her Ex lover in less than 4 days and at the end of her testimony she dropped Dr.Eze Odogwu contact email After reading all these,I decided to give it a try and I contacted him via his email address and explained my problem to him. In just 96 hours, my wife came back to me, and we solved our issues. We are even happier than before. Dr.Eze Odogwu is really a gifted man and I will not stop testifying about him because he is a wonderful man and so powerful as well. If you have a problem and you are looking for a real and genuine spell caster to solve all your problems. contact him now via the email below email: [email protected]

what is descriptive analysis in research

Mary Robinson

Good day to everyone reading my post, i'm here to appreciate a legitimate spell caster call Dr Kachi who can help you winning the lottery draw, i have never win a biggest amount in lottery unite the day i saw good reviews about DR Kachi how he has helped a lot of people in different ways both financially/martially and i have been playing Mega Million for 8years now, but things suddenly change the moment i contacted Dr Kachi and explained everything to me about the spell and I accepted. I followed his instructions and played the Mega Million with the numbers he gave me, now i am a proud lottery winner with the help of Dr Kachi spell, i win $640 Million Dollars in Mega Millions Ticket, i am making this known to everyone out there who have been trying all day to win the lottery jackpot, believe me this is the only way to win the lottery, this is the real secret we all have been searching for. I want to thank Dr Kachi for his endless help and his from the United States. you can contact via email [email protected] or through Text and Call Number: +1 (209) 893-8075 his website: https://drkachispellcaster.wixsite.com/my-site

dianewayne518e8b080d59714429

I have a couple collection accounts that I want to get removed... I was going through a divorce and lost track of certain bills/paperwork. Also, I have completed student loan rehabilitation on my six student loans that went into default. I have been making on time payments with the new provider, but the old loans remain as closed/transferred with the negative payment history on them. Shouldn’t they be deleted since the same loans are now with a new loan provider? It is essentially duplicated reporting them, right? I am planning on getting married in a couple years and want to buy a house with my current boyfriend and with this stuff on my report I fear I won’t be able to do this. This was exactly my story until I contacted PINNACLE CREDIT SPECIALIST who did the magic by clearing all the debts and raised my score to 800+ across the bureaus and even added some positive tradelines on my report. I was able to get a good house with a very low interest rate. Thank you PINNACLE CREDIT SPECIALIST for helping me...... Text +1 (409) 231-0041 for clearing of debts and more just the way I did or Email: [email protected]

scottfoster327117a16157f9240ab

I’ve come to the point in my life where I realize the importance of having good credit. So, this year, when I received my tax return, I made it my goal to rebuild my credit. I’m very happy that I contacted PINNACLE CREDIT SPECIALIST. I had one credit card with a high balance, and it was through Synchrony Bank (PayPal credit card). I was always on time for payment but paid the minimum due monthly. I went from a balance of $700 to $515. Then, they lowered my CL from $900 to$530, putting me at a 97% credit utilization rate and lowering my score by over 10 points. I was extremely upset when I found out that my CL was lowered for no apparent reason. I decided to pay off the remaining balance. The synchrony Bank (PayPal CC) claimed that they lowered my credit limit because I didn’t own any real estate, but that wasn’t a requirement when I first opened the account in July 2023. My credit score was EX 585, EQ 580, TU 587 (Fico 8). My goal is to improve my credit score and maybe even push it over 700. Few days after I contacted PINNACLE CREDIT SPECIALIST, they increased my credit score to EX 811, EQ 806, TU 802 and improved my credit profile by removing charge offs, collections and inquiries. I sincerely acknowledge their relentless efforts and urge you to contact PINNACLE CREDIT SPECIALIST for any credit related issues. Contact info: [email protected] Or Call +1 (409) 231-0041.

what is descriptive analysis in research

brenwright30

THIS IS HOW YOU CAN RECOVER YOUR LOST CRYPTO? Are you a victim of Investment, BTC, Forex, NFT, Credit card, etc Scam? Do you want to investigate a cheating spouse? Do you desire credit repair (all bureaus)? Contact Hacker Steve (Funds Recovery agent) asap to get started. He specializes in all cases of ethical hacking, cryptocurrency, fake investment schemes, recovery scam, credit repair, stolen account, etc. Stay safe out there! [email protected] https://hackersteve.great-site.net/

johnconniff00558ba69373e364729

Years ago, as part of my divorce I granted my ex-wife the house that we bought together while married and she assumed responsibility for it. At some point she started paying her mortgage late and that eventually found its way to my credit report (some company called PHH). Since it had nothing to do with me, I disputed the account and submitted my divorce decree that clearly shows she assumed responsibility and I no longer have any interest in the house etc. the dispute was denied and come back verified. Frustrating since I had nothing to do with it but at the time I assumed I was stuck with it because in the eyes of creditors I was still on the hook for the debt (the house wasn’t refinanced out of my name at the time). There are several 180 days late and they are the only baddies left on my report. All this brought down my credit score to 522. I can’t thank PINNACLE CREDIT SPECIALIST enough for taking my credit out of the garbage after my divorce. PINNACLE CREDIT SPECIALIST has done such an amazing job to help me clear all the negative items and restore my credit score to 811. This company has been so professional and so amazing at keeping in touch with me every single step of the way. Explaining in detail exactly what needs to be done. Thank you PINNACLE CREDIT SPECIALIST I will recommend their service to all my friends and family. Contact them via: [email protected] / +1 (409) 231-0041.

what is descriptive analysis in research

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

PESTLE Analysis

Insights and resources on business analysis tools

Descriptive Analysis: How-To, Types, Examples

Last Updated: Mar 29, 2024 by Thomas Bush Filed Under: Business

From diagnostic to predictive, there are many different types of data analysis . Perhaps the most straightforward of them is descriptive analysis, which seeks to describe or summarize past and present data, helping to create accessible data insights. In this short guide, we’ll review the basics of descriptive analysis, including what exactly it is, what benefits it has, how to do it, as well as some types and examples.

What Is Descriptive Analysis?

Descriptive analysis, also known as descriptive analytics or descriptive statistics, is the process of using statistical techniques to describe or summarize a set of data. As one of the major types of data analysis, descriptive analysis is popular for its ability to generate accessible insights from otherwise uninterpreted data.

Unlike other types of data analysis, the descriptive analysis does not attempt to make predictions about the future. Instead, it draws insights solely from past data, by manipulating in ways that make it more meaningful.

Benefits of Descriptive Analysis

Descriptive analysis is all about trying to describe or summarize data. Although it doesn’t make predictions about the future, it can still be extremely valuable in business environments . This is chiefly because descriptive analysis makes it easier to consume data, which can make it easier for analysts to act on.

Another benefit of descriptive analysis is that it can help to filter out less meaningful data. This is because the statistical techniques used within this type of analysis usually focus on the patterns in data, and not the outliers.

Types of Descriptive Analysis

According to CampusLabs.com , descriptive analysis can be categorized as one of four types. They are measures of frequency, central tendency, dispersion or variation, and position.

Measures of Frequency

In descriptive analysis, it’s essential to know how frequently a certain event or response occurs. This is the purpose of measures of frequency, like a count or percent. For example, consider a survey where 1,000 participants are asked about their favourite ice cream flavor. A list of 1,000 responses would be difficult to consume, but the data can be made much more accessible by measuring how many times a certain flavor was selected.

Measures of Central Tendency

In descriptive analysis, it’s also worth knowing the central (or average) event or response. Common measures of central tendency include the three averages — mean, median, and mode. As an example, consider a survey in which the height of 1,000 people is measured. In this case, the mean average would be a very helpful descriptive metric.

Measures of Dispersion

Sometimes, it may be worth knowing how data is distributed across a range. To illustrate this, consider the average height in a sample of two people. If both individuals are six feet tall, the average height is six feet. However, if one individual is five feet tall and the other is seven feet tall, the average height is still six feet. In order to measure this kind of distribution, measures of dispersion like range or standard deviation can be employed.

Measures of Position

Last of all, descriptive analysis can involve identifying the position of one event or response in relation to others. This is where measures like percentiles and quartiles can be used.

How to Do Descriptive Analysis

Like many types of data analysis, descriptive analysis can be quite open-ended. In other words, it’s up to you what you want to look for in your analysis. With that said, the process of descriptive analysis usually consists of the same few steps.

  • Collect data

The first step in any type of data analysis is to collect the data. This can be done in a variety of ways, but surveys and good old fashioned measurements are often used.

Another important step in descriptive and other types of data analysis is to clean the data. This is because data may be formatted in inaccessible ways, which will make it difficult to manipulate with statistics. Cleaning data may involve changing its textual format, categorizing it, and/or removing outliers.

  • Apply methods

Finally, descriptive analysis involves applying the chosen statistical methods so as to draw the desired conclusions. What methods you choose will depend on the data you are dealing with and what you are looking to determine. If in doubt, review the four types of descriptive analysis methods explained above.

When to Do Descriptive Analysis

Descriptive analysis is often used when reviewing any past or present data. This is because raw data is difficult to consume and interpret, while the metrics offered by descriptive analysis are much more focused.

Descriptive analysis can also be conducted as the precursor to diagnostic or predictive analysis , providing insights into what has happened in the past before attempting to explain why it happened or predicting what will happen in the future.

Descriptive Analysis Example

As an example of descriptive analysis, consider an insurance company analyzing its customer base.

The insurance company may know certain traits about its customers, such as their gender, age, and nationality. To gain a better profile of their customers, the insurance company can apply descriptive analysis.

Measures of frequency can be used to identify how many customers are under a certain age; measures of central tendency can be used to identify who most of their customers are; measures of dispersion can be used to identify the variation in, for example, the age of their customers; finally, measures of position can be used to compare segments of customers based on specific traits.

Final Thoughts

Descriptive analysis is a popular type of data analysis. It’s often conducted before diagnostic or predictive analysis, as it simply aims to describe and summarize past data.

To do so, descriptive analysis uses a variety of statistical techniques, including measures of frequency, central tendency, dispersion, and position. How exactly you conduct descriptive analysis will depend on what you are looking to find out, but the steps usually involve collecting, cleaning, and finally analyzing data.

In any case, this business analysis process is invaluable when working with data.

Image by  Pexels

Grad Coach

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

  • Descriptive vs inferential statistics
  • Why the descriptives matter
  • The “ Big 7 ” descriptive statistics
  • Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives! 

Free Webinar: Research Methodology 101

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers. 
The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.
The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

Example set of descriptive stats

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Example of skewness

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

what is descriptive analysis in research

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

Example of skewed data

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

  • Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
  • Measures of central tendency include the mean (average), median and mode.
  • Skewness indicates whether a dataset leans to one side or another
  • Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey. 

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Inferential stats 101

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Khan

Right above you comment 🙂

Laarbik Patience

Good job. you saved me

Lou

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Descriptive Research Design | Definition, Methods & Examples

Descriptive Research Design | Definition, Methods & Examples

Published on 5 May 2022 by Shona McCombes . Revised on 10 October 2022.

Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can answer what , where , when , and how   questions , but not why questions.

A descriptive research design can use a wide variety of research methods  to investigate one or more variables . Unlike in experimental research , the researcher does not control or manipulate any of the variables, but only observes and measures them.

Table of contents

When to use a descriptive research design, descriptive research methods.

Descriptive research is an appropriate choice when the research aim is to identify characteristics, frequencies, trends, and categories.

It is useful when not much is known yet about the topic or problem. Before you can research why something happens, you need to understand how, when, and where it happens.

  • How has the London housing market changed over the past 20 years?
  • Do customers of company X prefer product Y or product Z?
  • What are the main genetic, behavioural, and morphological differences between European wildcats and domestic cats?
  • What are the most popular online news sources among under-18s?
  • How prevalent is disease A in population B?

Prevent plagiarism, run a free check.

Descriptive research is usually defined as a type of quantitative research , though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable .

Survey research allows you to gather large volumes of data that can be analysed for frequencies, averages, and patterns. Common uses of surveys include:

  • Describing the demographics of a country or region
  • Gauging public opinion on political and social topics
  • Evaluating satisfaction with a company’s products or an organisation’s services

Observations

Observations allow you to gather data on behaviours and phenomena without having to rely on the honesty and accuracy of respondents. This method is often used by psychological, social, and market researchers to understand how people act in real-life situations.

Observation of physical entities and phenomena is also an important part of research in the natural sciences. Before you can develop testable hypotheses , models, or theories, it’s necessary to observe and systematically describe the subject under investigation.

Case studies

A case study can be used to describe the characteristics of a specific subject (such as a person, group, event, or organisation). Instead of gathering a large volume of data to identify patterns across time or location, case studies gather detailed data to identify the characteristics of a narrowly defined subject.

Rather than aiming to describe generalisable facts, case studies often focus on unusual or interesting cases that challenge assumptions, add complexity, or reveal something new about a research problem .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, October 10). Descriptive Research Design | Definition, Methods & Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/research-methods/descriptive-research-design/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, a quick guide to experimental design | 5 steps & examples, correlational research | guide, design & examples, qualitative vs quantitative research | examples & methods.

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

14 Quantitative analysis: Descriptive statistics

Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarise themselves with one of these programs for understanding the concepts described in this chapter.

Data preparation

In research projects, data may be collected from a variety of sources: postal surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine-readable, numeric format, such as in a spreadsheet or a text file, so that they can be analysed by computer programs like SPSS or SAS. Data preparation usually follows the following steps:

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing a detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale, and whether this scale is a five-point, seven-point scale, etc.), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from ‘strongly disagree’ to ‘strongly agree’, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analysed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, if a survey measuring a construct such as ‘benefits of computers’ provided respondents with a checklist of benefits that they could select from, and respondents were encouraged to choose as many of those benefits as they wanted, then the total number of checked items could be used as an aggregate measure of benefits. Note that many other forms of data—such as interview transcripts—cannot be converted into a numeric format for statistical analysis. Codebooks are especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format—e.g., SPSS stores data as .sav files—which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database where it can be reorganised as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet created using a program such as Microsoft Excel, while larger datasets with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet, and each measurement item can be represented as one column. Data should be checked for accuracy during and after entry via occasional spot checks on a set of items or observations. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the ‘strongly agree’ response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

-1

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items—where items convey the opposite meaning of that of their underlying construct—should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate analysis

Univariate analysis—or analysis of a single variable—refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: frequency distribution, central tendency, and dispersion. The frequency distribution of a variable is a summary of the frequency—or percentages—of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services—as a gauge of their ‘religiosity’—using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for ‘did not answer’. If we count the number or percentage of observations within each category—except ‘did not answer’ which is really a missing value rather than a category—and display it in the form of a table, as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

Frequency distribution of religiosity

With very large samples, where observations are independent and random, the frequency distribution tends to follow a plot that looks like a bell-shaped curve—a smoothed bar chart of the frequency distribution—similar to that shown in Figure 14.2. Here most observations are clustered toward the centre of the range of values, with fewer and fewer observations clustered toward the extreme ends of the range. Such a curve is called a normal distribution .

(15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8=20.875

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

36-15=21

Bivariate analysis

Bivariate analysis examines how two variables are related to one another. The most common bivariate statistic is the bivariate correlation —often, simply called ‘correlation’—which is a number between -1 and +1 denoting the strength of the relationship between two variables. Say that we wish to study how age is related to self-esteem in a sample of 20 respondents—i.e., as age increases, does self-esteem increase, decrease, or remain unchanged?. If self-esteem increases, then we have a positive correlation between the two variables, if self-esteem decreases, then we have a negative correlation, and if it remains the same, we have a zero correlation. To calculate the value of this correlation, consider the hypothetical dataset shown in Table 14.1.

Normal distribution

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

\[H_0:\quad r = 0 \]

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Descriptive Statistics for Summarising Data

Ray w. cooksey.

UNE Business School, University of New England, Armidale, NSW Australia

This chapter discusses and illustrates descriptive statistics . The purpose of the procedures and fundamental concepts reviewed in this chapter is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data (e.g. a histogram, box plot, radar plot, stem-and-leaf display, icon plot or line graph) or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement (e.g., frequency counts, measures of central tendency, variability, standard scores). Along the way, we explore the fundamental concepts of probability and the normal distribution. We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

The first broad category of statistics we discuss concerns descriptive statistics . The purpose of the procedures and fundamental concepts in this category is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement.

We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

Reflect on the QCI research scenario and the associated data set discussed in Chap. 10.1007/978-981-15-2537-7_4. Consider the following questions that Maree might wish to address with respect to decision accuracy and speed scores:

  • What was the typical level of accuracy and decision speed for inspectors in the sample? [see Procedure 5.4 – Assessing central tendency.]
  • What was the most common accuracy and speed score amongst the inspectors? [see Procedure 5.4 – Assessing central tendency.]
  • What was the range of accuracy and speed scores; the lowest and the highest scores? [see Procedure 5.5 – Assessing variability.]
  • How frequently were different levels of inspection accuracy and speed observed? What was the shape of the distribution of inspection accuracy and speed scores? [see Procedure 5.1 – Frequency tabulation, distributions & crosstabulation.]
  • What percentage of inspectors would have ‘failed’ to ‘make the cut’ assuming the industry standard for acceptable inspection accuracy and speed combined was set at 95%? [see Procedure 5.7 – Standard ( z ) scores.]
  • How variable were the inspectors in their accuracy and speed scores? Were all the accuracy and speed levels relatively close to each other in magnitude or were the scores widely spread out over the range of possible test outcomes? [see Procedure 5.5 – Assessing variability.]
  • What patterns might be visually detected when looking at various QCI variables singly and together as a set? [see Procedure 5.2 – Graphical methods for dispaying data, Procedure 5.3 – Multivariate graphs & displays, and Procedure 5.6 – Exploratory data analysis.]

This chapter includes discussions and illustrations of a number of procedures available for answering questions about data like those posed above. In addition, you will find discussions of two fundamental concepts, namely probability and the normal distribution ; concepts that provide building blocks for Chaps. 10.1007/978-981-15-2537-7_6 and 10.1007/978-981-15-2537-7_7.

Procedure 5.1: Frequency Tabulation, Distributions & Crosstabulation

Frequency tabulation and distributions.

Frequency tabulation serves to provide a convenient counting summary for a set of data that facilitates interpretation of various aspects of those data. Basically, frequency tabulation occurs in two stages:

  • First, the scores in a set of data are rank ordered from the lowest value to the highest value.
  • Second, the number of times each specific score occurs in the sample is counted. This count records the frequency of occurrence for that specific data value.

Consider the overall job satisfaction variable, jobsat , from the QCI data scenario. Performing frequency tabulation across the 112 Quality Control Inspectors on this variable using the SPSS Frequencies procedure (Allen et al. 2019 , ch. 3; George and Mallery 2019 , ch. 6) produces the frequency tabulation shown in Table 5.1 . Note that three of the inspectors in the sample did not provide a rating for jobsat thereby producing three missing values (= 2.7% of the sample of 112) and leaving 109 inspectors with valid data for the analysis.

Frequency tabulation of overall job satisfaction scores

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab1_HTML.jpg

The display of frequency tabulation is often referred to as the frequency distribution for the sample of scores. For each value of a variable, the frequency of its occurrence in the sample of data is reported. It is possible to compute various percentages and percentile values from a frequency distribution.

Table 5.1 shows the ‘Percent’ or relative frequency of each score (the percentage of the 112 inspectors obtaining each score, including those inspectors who were missing scores, which SPSS labels as ‘System’ missing). Table 5.1 also shows the ‘Valid Percent’ which is computed only for those inspectors in the sample who gave a valid or non-missing response.

Finally, it is possible to add up the ‘Valid Percent’ values, starting at the low score end of the distribution, to form the cumulative distribution or ‘Cumulative Percent’ . A cumulative distribution is useful for finding percentiles which reflect what percentage of the sample scored at a specific value or below.

We can see in Table 5.1 that 4 of the 109 valid inspectors (a ‘Valid Percent’ of 3.7%) indicated the lowest possible level of job satisfaction—a value of 1 (Very Low) – whereas 18 of the 109 valid inspectors (a ‘Valid Percent’ of 16.5%) indicated the highest possible level of job satisfaction—a value of 7 (Very High). The ‘Cumulative Percent’ number of 18.3 in the row for the job satisfaction score of 3 can be interpreted as “roughly 18% of the sample of inspectors reported a job satisfaction score of 3 or less”; that is, nearly a fifth of the sample expressed some degree of negative satisfaction with their job as a quality control inspector in their particular company.

If you have a large data set having many different scores for a particular variable, it may be more useful to tabulate frequencies on the basis of intervals of scores.

For the accuracy scores in the QCI database, you could count scores occurring in intervals such as ‘less than 75% accuracy’, ‘between 75% but less than 85% accuracy’, ‘between 85% but less than 95% accuracy’, and ‘95% accuracy or greater’, rather than counting the individual scores themselves. This would yield what is termed a ‘grouped’ frequency distribution since the data have been grouped into intervals or score classes. Producing such an analysis using SPSS would involve extra steps to create the new category or ‘grouping’ system for scores prior to conducting the frequency tabulation.

Crosstabulation

In a frequency crosstabulation , we count frequencies on the basis of two variables simultaneously rather than one; thus we have a bivariate situation.

For example, Maree might be interested in the number of male and female inspectors in the sample of 112 who obtained each jobsat score. Here there are two variables to consider: inspector’s gender and inspector’s j obsat score. Table 5.2 shows such a crosstabulation as compiled by the SPSS Crosstabs procedure (George and Mallery 2019 , ch. 8). Note that inspectors who did not report a score for jobsat and/or gender have been omitted as missing values, leaving 106 valid inspectors for the analysis.

Frequency crosstabulation of jobsat scores by gender category for the QCI data

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab2_HTML.jpg

The crosstabulation shown in Table 5.2 gives a composite picture of the distribution of satisfaction levels for male inspectors and for female inspectors. If frequencies or ‘Counts’ are added across the gender categories, we obtain the numbers in the ‘Total’ column (the percentages or relative frequencies are also shown immediately below each count) for each discrete value of jobsat (note this column of statistics differs from that in Table 5.1 because the gender variable was missing for certain inspectors). By adding down each gender column, we obtain, in the bottom row labelled ‘Total’, the number of males and the number of females that comprised the sample of 106 valid inspectors.

The totals, either across the rows or down the columns of the crosstabulation, are termed the marginal distributions of the table. These marginal distributions are equivalent to frequency tabulations for each of the variables jobsat and gender . As with frequency tabulation, various percentage measures can be computed in a crosstabulation, including the percentage of the sample associated with a specific count within either a row (‘% within jobsat ’) or a column (‘% within gender ’). You can see in Table 5.2 that 18 inspectors indicated a job satisfaction level of 7 (Very High); of these 18 inspectors reported in the ‘Total’ column, 8 (44.4%) were male and 10 (55.6%) were female. The marginal distribution for gender in the ‘Total’ row shows that 57 inspectors (53.8% of the 106 valid inspectors) were male and 49 inspectors (46.2%) were female. Of the 57 male inspectors in the sample, 8 (14.0%) indicated a job satisfaction level of 7 (Very High). Furthermore, we could generate some additional interpretive information of value by adding the ‘% within gender’ values for job satisfaction levels of 5, 6 and 7 (i.e. differing degrees of positive job satisfaction). Here we would find that 68.4% (= 24.6% + 29.8% + 14.0%) of male inspectors indicated some degree of positive job satisfaction compared to 61.2% (= 10.2% + 30.6% + 20.4%) of female inspectors.

This helps to build a picture of the possible relationship between an inspector’s gender and their level of job satisfaction (a relationship that, as we will see later, can be quantified and tested using Procedure 10.1007/978-981-15-2537-7_6#Sec14 and Procedure 10.1007/978-981-15-2537-7_7#Sec17).

It should be noted that a crosstabulation table such as that shown in Table 5.2 is often referred to as a contingency table about which more will be said later (see Procedure 10.1007/978-981-15-2537-7_7#Sec17 and Procedure 10.1007/978-981-15-2537-7_7#Sec115).

Frequency tabulation is useful for providing convenient data summaries which can aid in interpreting trends in a sample, particularly where the number of discrete values for a variable is relatively small. A cumulative percent distribution provides additional interpretive information about the relative positioning of specific scores within the overall distribution for the sample.

Crosstabulation permits the simultaneous examination of the distributions of values for two variables obtained from the same sample of observations. This examination can yield some useful information about the possible relationship between the two variables. More complex crosstabulations can be also done where the values of three or more variables are tracked in a single systematic summary. The use of frequency tabulation or cross-tabulation in conjunction with various other statistical measures, such as measures of central tendency (see Procedure 5.4 ) and measures of variability (see Procedure 5.5 ), can provide a relatively complete descriptive summary of any data set.

Disadvantages

Frequency tabulations can get messy if interval or ratio-level measures are tabulated simply because of the large number of possible data values. Grouped frequency distributions really should be used in such cases. However, certain choices, such as the size of the score interval (group size), must be made, often arbitrarily, and such choices can affect the nature of the final frequency distribution.

Additionally, percentage measures have certain problems associated with them, most notably, the potential for their misinterpretation in small samples. One should be sure to know the sample size on which percentage measures are based in order to obtain an interpretive reference point for the actual percentage values.

For example

In a sample of 10 individuals, 20% represents only two individuals whereas in a sample of 300 individuals, 20% represents 60 individuals. If all that is reported is the 20%, then the mental inference drawn by readers is likely to be that a sizeable number of individuals had a score or scores of a particular value—but what is ‘sizeable’ depends upon the total number of observations on which the percentage is based.

Where Is This Procedure Useful?

Frequency tabulation and crosstabulation are very commonly applied procedures used to summarise information from questionnaires, both in terms of tabulating various demographic characteristics (e.g. gender, age, education level, occupation) and in terms of actual responses to questions (e.g. numbers responding ‘yes’ or ‘no’ to a particular question). They can be particularly useful in helping to build up the data screening and demographic stories discussed in Chap. 10.1007/978-981-15-2537-7_4. Categorical data from observational studies can also be analysed with this technique (e.g. the number of times Suzy talks to Frank, to Billy, and to John in a study of children’s social interactions).

Certain types of experimental research designs may also be amenable to analysis by crosstabulation with a view to drawing inferences about distribution differences across the sets of categories for the two variables being tracked.

You could employ crosstabulation in conjunction with the tests described in Procedure 10.1007/978-981-15-2537-7_7#Sec17 to see if two different styles of advertising campaign differentially affect the product purchasing patterns of male and female consumers.

In the QCI database, Maree could employ crosstabulation to help her answer the question “do different types of electronic manufacturing firms ( company ) differ in terms of their tendency to employ male versus female quality control inspectors ( gender )?”

Software Procedures

Procedure 5.2: graphical methods for displaying data.

Graphical methods for displaying data include bar and pie charts, histograms and frequency polygons, line graphs and scatterplots. It is important to note that what is presented here is a small but representative sampling of the types of simple graphs one can produce to summarise and display trends in data. Generally speaking, SPSS offers the easiest facility for producing and editing graphs, but with a rather limited range of styles and types. SYSTAT, STATGRAPHICS and NCSS offer a much wider range of graphs (including graphs unique to each package), but with the drawback that it takes somewhat more effort to get the graphs in exactly the form you want.

Bar and Pie Charts

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are categorical (nominal or ordinal level of measurement).

  • A bar chart uses vertical and horizontal axes to summarise the data. The vertical axis is used to represent frequency (number) of occurrence or the relative frequency (percentage) of occurrence; the horizontal axis is used to indicate the data categories of interest.
  • A pie chart gives a simpler visual representation of category frequencies by cutting a circular plot into wedges or slices whose sizes are proportional to the relative frequency (percentage) of occurrence of specific data categories. Some pie charts can have a one or more slices emphasised by ‘exploding’ them out from the rest of the pie.

Consider the company variable from the QCI database. This variable depicts the types of manufacturing firms that the quality control inspectors worked for. Figure 5.1 illustrates a bar chart summarising the percentage of female inspectors in the sample coming from each type of firm. Figure 5.2 shows a pie chart representation of the same data, with an ‘exploded slice’ highlighting the percentage of female inspectors in the sample who worked for large business computer manufacturers – the lowest percentage of the five types of companies. Both graphs were produced using SPSS.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig1_HTML.jpg

Bar chart: Percentage of female inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig2_HTML.jpg

Pie chart: Percentage of female inspectors

The pie chart was modified with an option to show the actual percentage along with the label for each category. The bar chart shows that computer manufacturing firms have relatively fewer female inspectors compared to the automotive and electrical appliance (large and small) firms. This trend is less clear from the pie chart which suggests that pie charts may be less visually interpretable when the data categories occur with rather similar frequencies. However, the ‘exploded slice’ option can help interpretation in some circumstances.

Certain software programs, such as SPSS, STATGRAPHICS, NCSS and Microsoft Excel, offer the option of generating 3-dimensional bar charts and pie charts and incorporating other ‘bells and whistles’ that can potentially add visual richness to the graphic representation of the data. However, you should generally be careful with these fancier options as they can produce distortions and create ambiguities in interpretation (e.g. see discussions in Jacoby 1997 ; Smithson 2000 ; Wilkinson 2009 ). Such distortions and ambiguities could ultimately end up providing misinformation to researchers as well as to those who read their research.

Histograms and Frequency Polygons

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are essentially continuous (interval or ratio level of measurement) in nature. Both histograms and frequency polygons use vertical and horizontal axes to summarise the data. The vertical axis is used to represent the frequency (number) of occurrence or the relative frequency (percentage) of occurrences; the horizontal axis is used for the data values or ranges of values of interest. The histogram uses bars of varying heights to depict frequency; the frequency polygon uses lines and points.

There is a visual difference between a histogram and a bar chart: the bar chart uses bars that do not physically touch, signifying the discrete and categorical nature of the data, whereas the bars in a histogram physically touch to signal the potentially continuous nature of the data.

Suppose Maree wanted to graphically summarise the distribution of speed scores for the 112 inspectors in the QCI database. Figure 5.3 (produced using NCSS) illustrates a histogram representation of this variable. Figure 5.3 also illustrates another representational device called the ‘density plot’ (the solid tracing line overlaying the histogram) which gives a smoothed impression of the overall shape of the distribution of speed scores. Figure 5.4 (produced using STATGRAPHICS) illustrates the frequency polygon representation for the same data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig3_HTML.jpg

Histogram of the speed variable (with density plot overlaid)

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig4_HTML.jpg

Frequency polygon plot of the speed variable

These graphs employ a grouped format where speed scores which fall within specific intervals are counted as being essentially the same score. The shape of the data distribution is reflected in these plots. Each graph tells us that the inspection speed scores are positively skewed with only a few inspectors taking very long times to make their inspection judgments and the majority of inspectors taking rather shorter amounts of time to make their decisions.

Both representations tell a similar story; the choice between them is largely a matter of personal preference. However, if the number of bars to be plotted in a histogram is potentially very large (and this is usually directly controllable in most statistical software packages), then a frequency polygon would be the preferred representation simply because the amount of visual clutter in the graph will be much reduced.

It is somewhat of an art to choose an appropriate definition for the width of the score grouping intervals (or ‘bins’ as they are often termed) to be used in the plot: choose too many and the plot may look too lumpy and the overall distributional trend may not be obvious; choose too few and the plot will be too coarse to give a useful depiction. Programs like SPSS, SYSTAT, STATGRAPHICS and NCSS are designed to choose an ‘appropriate’ number of bins to be used, but the analyst’s eye is often a better judge than any statistical rule that a software package would use.

There are several interesting variations of the histogram which can highlight key data features or facilitate interpretation of certain trends in the data. One such variation is a graph is called a dual histogram (available in SYSTAT; a variation called a ‘comparative histogram’ can be created in NCSS) – a graph that facilitates visual comparison of the frequency distributions for a specific variable for participants from two distinct groups.

Suppose Maree wanted to graphically compare the distributions of speed scores for inspectors in the two categories of education level ( educlev ) in the QCI database. Figure 5.5 shows a dual histogram (produced using SYSTAT) that accomplishes this goal. This graph still employs the grouped format where speed scores falling within particular intervals are counted as being essentially the same score. The shape of the data distribution within each group is also clearly reflected in this plot. However, the story conveyed by the dual histogram is that, while the inspection speed scores are positively skewed for inspectors in both categories of educlev, the comparison suggests that inspectors with a high school level of education (= 1) tend to take slightly longer to make their inspection decisions than do their colleagues who have a tertiary qualification (= 2).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig5_HTML.jpg

Dual histogram of speed for the two categories of educlev

Line Graphs

The line graph is similar in style to the frequency polygon but is much more general in its potential for summarising data. In a line graph, we seldom deal with percentage or frequency data. Instead we can summarise other types of information about data such as averages or means (see Procedure 5.4 for a discussion of this measure), often for different groups of participants. Thus, one important use of the line graph is to break down scores on a specific variable according to membership in the categories of a second variable.

In the context of the QCI database, Maree might wish to summarise the average inspection accuracy scores for the inspectors from different types of manufacturing companies. Figure 5.6 was produced using SPSS and shows such a line graph.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig6_HTML.jpg

Line graph comparison of companies in terms of average inspection accuracy

Note how the trend in performance across the different companies becomes clearer with such a visual representation. It appears that the inspectors from the Large Business Computer and PC manufacturing companies have better average inspection accuracy compared to the inspectors from the remaining three industries.

With many software packages, it is possible to further elaborate a line graph by including error or confidence intervals bars (see Procedure 10.1007/978-981-15-2537-7_8#Sec18). These give some indication of the precision with which the average level for each category in the population has been estimated (narrow bars signal a more precise estimate; wide bars signal a less precise estimate).

Figure 5.7 shows such an elaborated line graph, using 95% confidence interval bars, which can be used to help make more defensible judgments (compared to Fig. 5.6 ) about whether the companies are substantively different from each other in average inspection performance. Companies whose confidence interval bars do not overlap each other can be inferred to be substantively different in performance characteristics.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig7_HTML.jpg

Line graph using confidence interval bars to compare accuracy across companies

The accuracy confidence interval bars for participants from the Large Business Computer manufacturing firms do not overlap those from the Large or Small Electrical Appliance manufacturers or the Automobile manufacturers.

We might conclude that quality control inspection accuracy is substantially better in the Large Business Computer manufacturing companies than in these other industries but is not substantially better than the PC manufacturing companies. We might also conclude that inspection accuracy in PC manufacturing companies is not substantially different from Small Electrical Appliance manufacturers.

Scatterplots

Scatterplots are useful in displaying the relationship between two interval- or ratio-scaled variables or measures of interest obtained on the same individuals, particularly in correlational research (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

In a scatterplot, one variable is chosen to be represented on the horizontal axis; the second variable is represented on the vertical axis. In this type of plot, all data point pairs in the sample are graphed. The shape and tilt of the cloud of points in a scatterplot provide visual information about the strength and direction of the relationship between the two variables. A very compact elliptical cloud of points signals a strong relationship; a very loose or nearly circular cloud signals a weak or non-existent relationship. A cloud of points generally tilted upward toward the right side of the graph signals a positive relationship (higher scores on one variable associated with higher scores on the other and vice-versa). A cloud of points generally tilted downward toward the right side of the graph signals a negative relationship (higher scores on one variable associated with lower scores on the other and vice-versa).

Maree might be interested in displaying the relationship between inspection accuracy and inspection speed in the QCI database. Figure 5.8 , produced using SPSS, shows what such a scatterplot might look like. Several characteristics of the data for these two variables can be noted in Fig. 5.8 . The shape of the distribution of data points is evident. The plot has a fan-shaped characteristic to it which indicates that accuracy scores are highly variable (exhibit a very wide range of possible scores) at very fast inspection speeds but get much less variable and tend to be somewhat higher as inspection speed increases (where inspectors take longer to make their quality control decisions). Thus, there does appear to be some relationship between inspection accuracy and inspection speed (a weak positive relationship since the cloud of points tends to be very loose but tilted generally upward toward the right side of the graph – slower speeds tend to be slightly associated with higher accuracy.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig8_HTML.jpg

Scatterplot relating inspection accuracy to inspection speed

However, it is not the case that the inspection decisions which take longest to make are necessarily the most accurate (see the labelled points for inspectors 7 and 62 in Fig. 5.8 ). Thus, Fig. 5.8 does not show a simple relationship that can be unambiguously summarised by a statement like “the longer an inspector takes to make a quality control decision, the more accurate that decision is likely to be”. The story is more complicated.

Some software packages, such as SPSS, STATGRAPHICS and SYSTAT, offer the option of using different plotting symbols or markers to represent the members of different groups so that the relationship between the two focal variables (the ones anchoring the X and Y axes) can be clarified with reference to a third categorical measure.

Maree might want to see if the relationship depicted in Fig. 5.8 changes depending upon whether the inspector was tertiary-qualified or not (this information is represented in the educlev variable of the QCI database).

Figure 5.9 shows what such a modified scatterplot might look like; the legend in the upper corner of the figure defines the marker symbols for each category of the educlev variable. Note that for both High School only-educated inspectors and Tertiary-qualified inspectors, the general fan-shaped relationship between accuracy and speed is the same. However, it appears that the distribution of points for the High School only-educated inspectors is shifted somewhat upward and toward the right of the plot suggesting that these inspectors tend to be somewhat more accurate as well as slower in their decision processes.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig9_HTML.jpg

Scatterplot displaying accuracy vs speed conditional on educlev group

There are many other styles of graphs available, often dependent upon the specific statistical package you are using. Interestingly, NCSS and, particularly, SYSTAT and STATGRAPHICS, appear to offer the most variety in terms of types of graphs available for visually representing data. A reading of the user’s manuals for these programs (see the Useful additional readings) would expose you to the great diversity of plotting techniques available to researchers. Many of these techniques go by rather interesting names such as: Chernoff’s faces, radar plots, sunflower plots, violin plots, star plots, Fourier blobs, and dot plots.

These graphical methods provide summary techniques for visually presenting certain characteristics of a set of data. Visual representations are generally easier to understand than a tabular representation and when these plots are combined with available numerical statistics, they can give a very complete picture of a sample of data. Newer methods have become available which permit more complex representations to be depicted, opening possibilities for creatively visually representing more aspects and features of the data (leading to a style of visual data storytelling called infographics ; see, for example, McCandless 2014 ; Toseland and Toseland 2012 ). Many of these newer methods can display data patterns from multiple variables in the same graph (several of these newer graphical methods are illustrated and discussed in Procedure 5.3 ).

Graphs tend to be cumbersome and space consuming if a great many variables need to be summarised. In such cases, using numerical summary statistics (such as means or correlations) in tabular form alone will provide a more economical and efficient summary. Also, it can be very easy to give a misleading picture of data trends using graphical methods by simply choosing the ‘correct’ scaling for maximum effect or choosing a display option (such as a 3-D effect) that ‘looks’ presentable but which actually obscures a clear interpretation (see Smithson 2000 ; Wilkinson 2009 ).

Thus, you must be careful in creating and interpreting visual representations so that the influence of aesthetic choices for sake of appearance do not become more important than obtaining a faithful and valid representation of the data—a very real danger with many of today’s statistical packages where ‘default’ drawing options have been pre-programmed in. No single plot can completely summarise all possible characteristics of a sample of data. Thus, choosing a specific method of graphical display may, of necessity, force a behavioural researcher to represent certain data characteristics (such as frequency) at the expense of others (such as averages).

Virtually any research design which produces quantitative data and statistics (even to the extent of just counting the number of occurrences of several events) provides opportunities for graphical data display which may help to clarify or illustrate important data characteristics or relationships. Remember, graphical displays are communication tools just like numbers—which tool to choose depends upon the message to be conveyed. Visual representations of data are generally more useful in communicating to lay persons who are unfamiliar with statistics. Care must be taken though as these same lay people are precisely the people most likely to misinterpret a graph if it has been incorrectly drawn or scaled.

Procedure 5.3: Multivariate Graphs & Displays

Graphical methods for displaying multivariate data (i.e. many variables at once) include scatterplot matrices, radar (or spider) plots, multiplots, parallel coordinate displays, and icon plots. Multivariate graphs are useful for visualising broad trends and patterns across many variables (Cleveland 1995 ; Jacoby 1998 ). Such graphs typically sacrifice precision in representation in favour of a snapshot pictorial summary that can help you form general impressions of data patterns.

It is important to note that what is presented here is a small but reasonably representative sampling of the types of graphs one can produce to summarise and display trends in multivariate data. Generally speaking, SYSTAT offers the best facilities for producing multivariate graphs, followed by STATGRAPHICS, but with the drawback that it is somewhat tricky to get the graphs in exactly the form you want. SYSTAT also has excellent facilities for creating new forms and combinations of graphs – essentially allowing graphs to be tailor-made for a specific communication purpose. Both SPSS and NCSS offer a more limited range of multivariate graphs, generally restricted to scatterplot matrices and variations of multiplots. Microsoft Excel or STATGRAPHICS are the packages to use if radar or spider plots are desired.

Scatterplot Matrices

A scatterplot matrix is a useful multivariate graph designed to show relationships between pairs of many variables in the same display.

Figure 5.10 illustrates a scatterplot matrix, produced using SYSTAT, for the mentabil , accuracy , speed , jobsat and workcond variables in the QCI database. It is easy to see that all the scatterplot matrix does is stack all pairs of scatterplots into a format where it is easy to pick out the graph for any ‘row’ variable that intersects a column ‘variable’.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig10_HTML.jpg

Scatterplot matrix relating mentabil , accuracy , speed , jobsat & workcond

In those plots where a ‘row’ variable intersects itself in a column of the matrix (along the so-called ‘diagonal’), SYSTAT permits a range of univariate displays to be shown. Figure 5.10 shows univariate histograms for each variable (recall Procedure 5.2 ). One obvious drawback of the scatterplot matrix is that, if many variables are to be displayed (say ten or more); the graph gets very crowded and becomes very hard to visually appreciate.

Looking at the first column of graphs in Fig. 5.10 , we can see the scatterplot relationships between mentabil and each of the other variables. We can get a visual impression that mentabil seems to be slightly negatively related to accuracy (the cloud of scatter points tends to angle downward to the right, suggesting, very slightly, that higher mentabil scores are associated with lower levels of accuracy ).

Conversely, the visual impression of the relationship between mentabil and speed is that the relationship is slightly positive (higher mentabil scores tend to be associated with higher speed scores = longer inspection times). Similar types of visual impressions can be formed for other parts of Fig. 5.10 . Notice that the histogram plots along the diagonal give a clear impression of the shape of the distribution for each variable.

Radar Plots

The radar plot (also known as a spider graph for obvious reasons) is a simple and effective device for displaying scores on many variables. Microsoft Excel offers a range of options and capabilities for producing radar plots, such as the plot shown in Fig. 5.11 . Radar plots are generally easy to interpret and provide a good visual basis for comparing plots from different individuals or groups, even if a fairly large number of variables (say, up to about 25) are being displayed. Like a clock face, variables are evenly spaced around the centre of the plot in clockwise order starting at the 12 o’clock position. Visual interpretation of a radar plot primarily relies on shape comparisons, i.e. the rise and fall of peaks and valleys along the spokes around the plot. Valleys near the centre display low scores on specific variables, peaks near the outside of the plot display high scores on specific variables. [Note that, technically, radar plots employ polar coordinates.] SYSTAT can draw graphs using polar coordinates but not as easily as Excel can, from the user’s perspective. Radar plots work best if all the variables represented are measured on the same scale (e.g. a 1 to 7 Likert-type scale or 0% to 100% scale). Individuals who are missing any scores on the variables being plotted are typically omitted.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig11_HTML.jpg

Radar plot comparing attitude ratings for inspectors 66 and 104

The radar plot in Fig. 5.11 , produced using Excel, compares two specific inspectors, 66 and 104, on the nine attitude rating scales. Inspector 66 gave the highest rating (= 7) on the cultqual variable and inspector 104 gave the lowest rating (= 1). The plot shows that inspector 104 tended to provide very low ratings on all nine attitude variables, whereas inspector 66 tended to give very high ratings on all variables except acctrain and trainapp , where the scores were similar to those for inspector 104. Thus, in general, inspector 66 tended to show much more positive attitudes toward their workplace compared to inspector 104.

While Fig. 5.11 was generated to compare the scores for two individuals in the QCI database, it would be just as easy to produce a radar plot that compared the five types of companies in terms of their average ratings on the nine variables, as shown in Fig. 5.12 .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig12_HTML.jpg

Radar plot comparing average attitude ratings for five types of company

Here we can form the visual impression that the five types of companies differ most in their average ratings of mgmtcomm and least in the average ratings of polsatis . Overall, the average ratings from inspectors from PC manufacturers (black diamonds with solid lines) seem to be generally the most positive as their scores lie on or near the outer ring of scores and those from Automobile manufacturers tend to be least positive on many variables (except the training-related variables).

Extrapolating from Fig. 5.12 , you may rightly conclude that including too many groups and/or too many variables in a radar plot comparison can lead to so much clutter that any visual comparison would be severely degraded. You may have to experiment with using colour-coded lines to represent different groups versus line and marker shape variations (as used in Fig. 5.12 ), because choice of coding method for groups can influence the interpretability of a radar plot.

A multiplot is simply a hybrid style of graph that can display group comparisons across a number of variables. There are a wide variety of possible multiplots one could potentially design (SYSTAT offers great capabilities with respect to multiplots). Figure 5.13 shows a multiplot comprising a side-by-side series of profile-based line graphs – one graph for each type of company in the QCI database.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig13_HTML.jpg

Multiplot comparing profiles of average attitude ratings for five company types

The multiplot in Fig. 5.13 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within a specific type of company. This multiplot shows the same story as the radar plot in Fig. 5.12 , but in a different graphical format. It is still fairly clear that the average ratings from inspectors from PC manufacturers tend to be higher than for the other types of companies and the profile for inspectors from automobile manufacturers tends to be lower than for the other types of companies.

The profile for inspectors from large electrical appliance manufacturers is the flattest, meaning that their average attitude ratings were less variable than for other types of companies. Comparing the ease with which you can glean the visual impressions from Figs. 5.12 and 5.13 may lead you to prefer one style of graph over another. If you have such preferences, chances are others will also, which may mean you need to carefully consider your options when deciding how best to display data for effect.

Frequently, choice of graph is less a matter of which style is right or wrong, but more a matter of which style will suit specific purposes or convey a specific story, i.e. the choice is often strategic.

Parallel Coordinate Displays

A parallel coordinate display is useful for displaying individual scores on a range of variables, all measured using the same scale. Furthermore, such graphs can be combined side-by-side to facilitate very broad visual comparisons among groups, while retaining individual profile variability in scores. Each line in a parallel coordinate display represents one individual, e.g. an inspector.

The interpretation of a parallel coordinate display, such as the two shown in Fig. 5.14 , depends on visual impressions of the peaks and valleys (highs and lows) in the profiles as well as on the density of similar profile lines. The graph is called ‘parallel coordinate’ simply because it assumes that all variables are measured on the same scale and that scores for each variable can therefore be located along vertical axes that are parallel to each other (imagine vertical lines on Fig. 5.14 running from bottom to top for each variable on the X-axis). The main drawback of this method of data display is that only those individuals in the sample who provided legitimate scores on all of the variables being plotted (i.e. who have no missing scores) can be displayed.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig14_HTML.jpg

Parallel coordinate displays comparing profiles of average attitude ratings for five company types

The parallel coordinate display in Fig. 5.14 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within two specific types of company: the left graph for inspectors from PC manufacturers and the right graph for automobile manufacturers.

There are fewer lines in each display than the number of inspectors from each type of company simply because several inspectors from each type of company were missing a rating on at least one of the nine attitude variables. The graphs show great variability in scores amongst inspectors within a company type, but there are some overall patterns evident.

For example, inspectors from automobile companies clearly and fairly uniformly rated mgmtcomm toward the low end of the scale, whereas the reverse was generally true for that variable for inspectors from PC manufacturers. Conversely, inspectors from automobile companies tend to rate acctrain and trainapp more toward the middle to high end of the scale, whereas the reverse is generally true for those variables for inspectors from PC manufacturers.

Perhaps the most creative types of multivariate displays are the so-called icon plots . SYSTAT and STATGRAPHICS offer an impressive array of different types of icon plots, including, amongst others, Chernoff’s faces, profile plots, histogram plots, star glyphs and sunray plots (Jacoby 1998 provides a detailed discussion of icon plots).

Icon plots generally use a specific visual construction to represent variables scores obtained by each individual within a sample or group. All icon plots are thus methods for displaying the response patterns for individual members of a sample, as long as those individuals are not missing any scores on the variables to be displayed (note that this is the same limitation as for radar plots and parallel coordinate displays). To illustrate icon plots, without generating too many icons to focus on, Figs. 5.15 , 5.16 , 5.17 and 5.18 present four different icon plots for QCI inspectors classified, using a new variable called BEST_WORST , as either the worst performers (= 1 where their accuracy scores were less than 70%) or the best performers (= 2 where their accuracy scores were 90% or greater).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig15_HTML.jpg

Chernoff’s faces icon plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig16_HTML.jpg

Profile plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig17_HTML.jpg

Histogram plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig18_HTML.jpg

Sunray plot comparing individual attitude ratings for best and worst performing inspectors

The Chernoff’s faces plot gets its name from the visual icon used to represent variable scores – a cartoon-type face. This icon tries to capitalise on our natural human ability to recognise and differentiate faces. Each feature of the face is controlled by the scores on a single variable. In SYSTAT, up to 20 facial features are controllable; the first five being curvature of mouth, angle of brow, width of nose, length of nose and length of mouth (SYSTAT Software Inc., 2009 , p. 259). The theory behind Chernoff’s faces is that similar patterns of variable scores will produce similar looking faces, thereby making similarities and differences between individuals more apparent.

The profile plot and histogram plot are actually two variants of the same type of icon plot. A profile plot represents individuals’ scores for a set of variables using simplified line graphs, one per individual. The profile is scaled so that the vertical height of the peaks and valleys correspond to actual values for variables where the variables anchor the X-axis in a fashion similar to the parallel coordinate display. So, as you examine a profile from left to right across the X-axis of each graph, you are looking across the set of variables. A histogram plot represents the same information in the same way as for the profile plot but using histogram bars instead.

Figure 5.15 , produced using SYSTAT, shows a Chernoff’s faces plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine general attitude statements.

Each face is labelled with the inspector number it represents. The gaps indicate where an inspector had missing data on at least one of the variables, meaning a face could not be generated for them. The worst performers are drawn using red lines; the best using blue lines. The first variable is jobsat and this variable controls mouth curvature; the second variable is workcond and this controls angle of brow, and so on. It seems clear that there are differences in the faces between the best and worst performers with, for example, best performers tending to be more satisfied (smiling) and with higher ratings for working conditions (brow angle).

Beyond a broad visual impression, there is little in terms of precise inferences you can draw from a Chernoff’s faces plot. It really provides a visual sketch, nothing more. The fact that there is no obvious link between facial features, variables and score levels means that the Chernoff’s faces icon plot is difficult to interpret at the level of individual variables – a holistic impression of similarity and difference is what this type of plot facilitates.

Figure 5.16 produced using SYSTAT, shows a profile plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine attitude variables.

Like the Chernoff’s faces plot (Fig. 5.15 ), as you read across the rows of the plot from left to right, each plot corresponds respectively to a inspector in the sample who was either in the worst performer (red) or best performer (blue) category. The first attitude variable is jobsat and anchors the left end of each line graph; the last variable is polsatis and anchors the right end of the line graph. The remaining variables are represented in order from left to right across the X-axis of each graph. Figure 5.16 shows that these inspectors are rather different in their attitude profiles, with best performers tending to show taller profiles on the first two variables, for example.

Figure 5.17 produced using SYSTAT, shows a histogram plot for the best and worst performing inspectors based on their ratings of job satisfaction, working conditions and the nine attitude variables. This plot tells the same story as the profile plot, only using histogram bars. Some people would prefer the histogram icon plot to the profile plot because each histogram bar corresponds to one variable, making the visual linking of a specific bar to a specific variable much easier than visually linking a specific position along the profile line to a specific variable.

The sunray plot is actually a simplified adaptation of the radar plot (called a “star glyph”) used to represent scores on a set of variables for each individual within a sample or group. Remember that a radar plot basically arranges the variables around a central point like a clock face; the first variable is represented at the 12 o’clock position and the remaining variables follow around the plot in a clockwise direction.

Unlike a radar plot, while the spokes (the actual ‘star’ of the glyph’s name) of the plot are visible, no interpretive scale is evident. A variable’s score is visually represented by its distance from the central point. Thus, the star glyphs in a sunray plot are designed, like Chernoff’s faces, to provide a general visual impression, based on icon shape. A wide diameter well-rounded plot indicates an individual with high scores on all variables and a small diameter well-rounded plot vice-versa. Jagged plots represent individuals with highly variable scores across the variables. ‘Stars’ of similar size, shape and orientation represent similar individuals.

Figure 5.18 , produced using STATGRAPHICS, shows a sunray plot for the best and worst performing inspectors. An interpretation glyph is also shown in the lower right corner of Fig. 5.18 , where variables are aligned with the spokes of a star (e.g. jobsat is at the 12 o’clock position). This sunray plot could lead you to form the visual impression that the worst performing inspectors (group 1) have rather less rounded rating profiles than do the best performing inspectors (group 2) and that the jobsat and workcond spokes are generally lower for the worst performing inspectors.

Comparatively speaking, the sunray plot makes identifying similar individuals a bit easier (perhaps even easier than Chernoff’s faces) and, when ordered as STATGRAPHICS showed in Fig. 5.18 , permits easier visual comparisons between groups of individuals, but at the expense of precise knowledge about variable scores. Remember, a holistic impression is the goal pursued using a sunray plot.

Multivariate graphical methods provide summary techniques for visually presenting certain characteristics of a complex array of data on variables. Such visual representations are generally better at helping us to form holistic impressions of multivariate data rather than any sort of tabular representation or numerical index. They also allow us to compress many numerical measures into a finite representation that is generally easy to understand. Multivariate graphical displays can add interest to an otherwise dry statistical reporting of numerical data. They are designed to appeal to our pattern recognition skills, focusing our attention on features of the data such as shape, level, variability and orientation. Some multivariate graphs (e.g. radar plots, sunray plots and multiplots) are useful not only for representing score patterns for individuals but also providing summaries of score patterns across groups of individuals.

Multivariate graphs tend to get very busy-looking and are hard to interpret if a great many variables or a large number of individuals need to be displayed (imagine any of the icon plots, for a sample of 200 questionnaire participants, displayed on a A4 page – each icon would be so small that its features could not be easily distinguished, thereby defeating the purpose of the display). In such cases, using numerical summary statistics (such as averages or correlations) in tabular form alone will provide a more economical and efficient summary. Also, some multivariate displays will work better for conveying certain types of information than others.

Information about variable relationships may be better displayed using a scatterplot matrix. Information about individual similarities and difference on a set of variables may be better conveyed using a histogram or sunray plot. Multiplots may be better suited to displaying information about group differences across a set of variables. Information about the overall similarity of individual entities in a sample might best be displayed using Chernoff’s faces.

Because people differ greatly in their visual capacities and preferences, certain types of multivariate displays will work for some people and not others. Sometimes, people will not see what you see in the plots. Some plots, such as Chernoff’s faces, may not strike a reader as a serious statistical procedure and this could adversely influence how convinced they will be by the story the plot conveys. None of the multivariate displays described here provide sufficiently precise information for solid inferences or interpretations; all are designed to simply facilitate the formation of holistic visual impressions. In fact, you may have noticed that some displays (scatterplot matrices and the icon plots, for example) provide no numerical scaling information that would help make precise interpretations. If precision in summary information is desired, the types of multivariate displays discussed here would not be the best strategic choices.

Virtually any research design which produces quantitative data/statistics for multiple variables provides opportunities for multivariate graphical data display which may help to clarify or illustrate important data characteristics or relationships. Thus, for survey research involving many identically-scaled attitudinal questions, a multivariate display may be just the device needed to communicate something about patterns in the data. Multivariate graphical displays are simply specialised communication tools designed to compress a lot of information into a meaningful and efficient format for interpretation—which tool to choose depends upon the message to be conveyed.

Generally speaking, visual representations of multivariate data could prove more useful in communicating to lay persons who are unfamiliar with statistics or who prefer visual as opposed to numerical information. However, these displays would probably require some interpretive discussion so that the reader clearly understands their intent.

Procedure 5.4: Assessing Central Tendency

The three most commonly reported measures of central tendency are the mean, median and mode. Each measure reflects a specific way of defining central tendency in a distribution of scores on a variable and each has its own advantages and disadvantages.

The mean is the most widely used measure of central tendency (also called the arithmetic average). Very simply, a mean is the sum of all the scores for a specific variable in a sample divided by the number of scores used in obtaining the sum. The resulting number reflects the average score for the sample of individuals on which the scores were obtained. If one were asked to predict the score that any single individual in the sample would obtain, the best prediction, in the absence of any other relevant information, would be the sample mean. Many parametric statistical methods (such as Procedures 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 and 10.1007/978-981-15-2537-7_7#Sec68) deal with sample means in one way or another. For any sample of data, there is one and only one possible value for the mean in a specific distribution. For most purposes, the mean is the preferred measure of central tendency because it utilises all the available information in a sample.

In the context of the QCI database, Maree could quite reasonably ask what inspectors scored on the average in terms of mental ability ( mentabil ), inspection accuracy ( accuracy ), inspection speed ( speed ), overall job satisfaction ( jobsat ), and perceived quality of their working conditions ( workcond ). Table 5.3 shows the mean scores for the sample of 112 quality control inspectors on each of these variables. The statistics shown in Table 5.3 were computed using the SPSS Frequencies ... procedure. Notice that the table indicates how many of the 112 inspectors had a valid score for each variable and how many were missing a score (e.g. 109 inspectors provided a valid rating for jobsat; 3 inspectors did not).

Measures of central tendency for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab3_HTML.jpg

Each mean needs to be interpreted in terms of the original units of measurement for each variable. Thus, the inspectors in the sample showed an average mental ability score of 109.84 (higher than the general population mean of 100 for the test), an average inspection accuracy of 82.14%, and an average speed for making quality control decisions of 4.48 s. Furthermore, in terms of their work context, inspectors reported an average overall job satisfaction of 4.96 (on the 7-point scale, or a level of satisfaction nearly one full scale point above the Neutral point of 4—indicating a generally positive but not strong level of job satisfaction, and an average perceived quality of work conditions of 4.21 (on the 7-point scale which is just about at the level of Stressful but Tolerable.

The mean is sensitive to the presence of extreme values, which can distort its value, giving a biased indication of central tendency. As we will see below, the median is an alternative statistic to use in such circumstances. However, it is also possible to compute what is called a trimmed mean where the mean is calculated after a certain percentage (say, 5% or 10%) of the lowest and highest scores in a distribution have been ignored (a process called ‘trimming’; see, for example, the discussion in Field 2018 , pp. 262–264). This yields a statistic less influenced by extreme scores. The drawbacks are that the decision as to what percentage to trim can be somewhat subjective and trimming necessarily sacrifices information (i.e. the extreme scores) in order to achieve a less biased measure. Some software packages, such as SPSS, SYSTAT or NCSS, can report a specific percentage trimmed mean, if that option is selected for descriptive statistics or exploratory data analysis (see Procedure 5.6 ) procedures. Comparing the original mean with a trimmed mean can provide an indication of the degree to which the original mean has been biased by extreme values.

Very simply, the median is the centre or middle score of a set of scores. By ‘centre’ or ‘middle’ is meant that 50% of the data values are smaller than or equal to the median and 50% of the data values are larger when the entire distribution of scores is rank ordered from the lowest to highest value. Thus, we can say that the median is that score in the sample which occurs at the 50th percentile. [Note that a ‘percentile’ is attached to a specific score that a specific percentage of the sample scored at or below. Thus, a score at the 25th percentile means that 25% of the sample achieved this score or a lower score.] Table 5.3 shows the 25th, 50th and 75th percentile scores for each variable – note how the 50th percentile score is exactly equal to the median in each case .

The median is reported somewhat less frequently than the mean but does have some advantages over the mean in certain circumstances. One such circumstance is when the sample of data has a few extreme values in one direction (either very large or very small relative to all other scores). In this case, the mean would be influenced (biased) to a much greater degree than would the median since all of the data are used to calculate the mean (including the extreme scores) whereas only the single centre score is needed for the median. For this reason, many nonparametric statistical procedures (such as Procedures 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 and 10.1007/978-981-15-2537-7_7#Sec63) focus on the median as the comparison statistic rather than on the mean.

A discrepancy between the values for the mean and median of a variable provides some insight to the degree to which the mean is being influenced by the presence of extreme data values. In a distribution where there are no extreme values on either side of the distribution (or where extreme values balance each other out on either side of the distribution, as happens in a normal distribution – see Fundamental Concept II ), the mean and the median will coincide at the same value and the mean will not be biased.

For highly skewed distributions, however, the value of the mean will be pulled toward the long tail of the distribution because that is where the extreme values lie. However, in such skewed distributions, the median will be insensitive (statisticians call this property ‘robustness’) to extreme values in the long tail. For this reason, the direction of the discrepancy between the mean and median can give a very rough indication of the direction of skew in a distribution (‘mean larger than median’ signals possible positive skewness; ‘mean smaller than median’ signals possible negative skewness). Like the mean, there is one and only one possible value for the median in a specific distribution.

In Fig. 5.19 , the left graph shows the distribution of speed scores and the right-hand graph shows the distribution of accuracy scores. The speed distribution clearly shows the mean being pulled toward the right tail of the distribution whereas the accuracy distribution shows the mean being just slightly pulled toward the left tail. The effect on the mean is stronger in the speed distribution indicating a greater biasing effect due to some very long inspection decision times.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig19_HTML.jpg

Effects of skewness in a distribution on the values for the mean and median

If we refer to Table 5.3 , we can see that the median score for each of the five variables has also been computed. Like the mean, the median must be interpreted in the original units of measurement for the variable. We can see that for mentabil , accuracy , and workcond , the value of the median is very close to the value of the mean, suggesting that these distributions are not strongly influenced by extreme data values in either the high or low direction. However, note that the median speed was 3.89 s compared to the mean of 4.48 s, suggesting that the distribution of speed scores is positively skewed (the mean is larger than the median—refer to Fig. 5.19 ). Conversely, the median jobsat score was 5.00 whereas the mean score was 4.96 suggesting very little substantive skewness in the distribution (mean and median are nearly equal).

The mode is the simplest measure of central tendency. It is defined as the most frequently occurring score in a distribution. Put another way, it is the score that more individuals in the sample obtain than any other score. An interesting problem associated with the mode is that there may be more than one in a specific distribution. In the case where multiple modes exist, the issue becomes which value do you report? The answer is that you must report all of them. In a ‘normal’ bell-shaped distribution, there is only one mode and it is indeed at the centre of the distribution, coinciding with both the mean and the median.

Table 5.3 also shows the mode for each of the five variables. For example, more inspectors achieved a mentabil score of 111 more often than any other score and inspectors reported a jobsat rating of 6 more often than any other rating. SPSS only ever reports one mode even if several are present, so one must be careful and look at a histogram plot for each variable to make a final determination of the mode(s) for that variable.

All three measures of central tendency yield information about what is going on in the centre of a distribution of scores. The mean and median provide a single number which can summarise the central tendency in the entire distribution. The mode can yield one or multiple indices. With many measurements on individuals in a sample, it is advantageous to have single number indices which can describe the distributions in summary fashion. In a normal or near-normal distribution of sample data, the mean, the median, and the mode will all generally coincide at the one point. In this instance, all three statistics will provide approximately the same indication of central tendency. Note however that it is seldom the case that all three statistics would yield exactly the same number for any particular distribution. The mean is the most useful statistic, unless the data distribution is skewed by extreme scores, in which case the median should be reported.

While measures of central tendency are useful descriptors of distributions, summarising data using a single numerical index necessarily reduces the amount of information available about the sample. Not only do we need to know what is going on in the centre of a distribution, we also need to know what is going on around the centre of the distribution. For this reason, most social and behavioural researchers report not only measures of central tendency, but also measures of variability (see Procedure 5.5 ). The mode is the least informative of the three statistics because of its potential for producing multiple values.

Measures of central tendency are useful in almost any type of experimental design, survey or interview study, and in any observational studies where quantitative data are available and must be summarised. The decision as to whether the mean or median should be reported depends upon the nature of the data which should ideally be ascertained by visual inspection of the data distribution. Some researchers opt to report both measures routinely. Computation of means is a prelude to many parametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 , 10.1007/978-981-15-2537-7_7#Sec52 , 10.1007/978-981-15-2537-7_7#Sec68 , 10.1007/978-981-15-2537-7_7#Sec76 and 10.1007/978-981-15-2537-7_7#Sec105); comparison of medians is associated with many nonparametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 , 10.1007/978-981-15-2537-7_7#Sec63 and 10.1007/978-981-15-2537-7_7#Sec81).

Procedure 5.5: Assessing Variability

There are a variety of measures of variability to choose from including the range, interquartile range, variance and standard deviation. Each measure reflects a specific way of defining variability in a distribution of scores on a variable and each has its own advantages and disadvantages. Most measures of variability are associated with a specific measure of central tendency so that researchers are now commonly expected to report both a measure of central tendency and its associated measure of variability whenever they display numerical descriptive statistics on continuous or ranked-ordered variables.

This is the simplest measure of variability for a sample of data scores. The range is merely the largest score in the sample minus the smallest score in the sample. The range is the one measure of variability not explicitly associated with any measure of central tendency. It gives a very rough indication as to the extent of spread in the scores. However, since the range uses only two of the total available scores in the sample, the rest of the scores are ignored, which means that a lot of potentially useful information is being sacrificed. There are also problems if either the highest or lowest (or both) scores are atypical or too extreme in their value (as in highly skewed distributions). When this happens, the range gives a very inflated picture of the typical variability in the scores. Thus, the range tends not be a frequently reported measure of variability.

Table 5.4 shows a set of descriptive statistics, produced by the SPSS Frequencies procedure, for the mentabil, accuracy, speed, jobsat and workcond measures in the QCI database. In the table, you will find three rows labelled ‘Range’, ‘Minimum’ and ‘Maximum’.

Measures of central tendency and variability for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab4_HTML.jpg

Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s – the fastest quality decision to 17.10 – the slowest quality decision). Accuracy scores had a range of 43 (from 57% – the least accurate inspector to 100% – the most accurate inspector). Both work context measures ( jobsat and workcond ) exhibited a range of 6 – the largest possible range given the 1 to 7 scale of measurement for these two variables.

Interquartile Range

The Interquartile Range ( IQR ) is a measure of variability that is specifically designed to be used in conjunction with the median. The IQR also takes care of the extreme data problem which typically plagues the range measure. The IQR is defined as the range that is covered by the middle 50% of scores in a distribution once the scores have been ranked in order from lowest value to highest value. It is found by locating the value in the distribution at or below which 25% of the sample scored and subtracting this number from the value in the distribution at or below which 75% of the sample scored. The IQR can also be thought of as the range one would compute after the bottom 25% of scores and the top 25% of scores in the distribution have been ‘chopped off’ (or ‘trimmed’ as statisticians call it).

The IQR gives a much more stable picture of the variability of scores and, like the median, is relatively insensitive to the biasing effects of extreme data values. Some behavioural researchers prefer to divide the IQR in half which gives a measure called the Semi-Interquartile Range ( S-IQR ) . The S-IQR can be interpreted as the distance one must travel away from the median, in either direction, to reach the value which separates the top (or bottom) 25% of scores in the distribution from the remaining 75%.

The IQR or S-IQR is typically not produced by descriptive statistics procedures by default in many computer software packages; however, it can usually be requested as an optional statistic to report or it can easily be computed by hand using percentile scores. Both the median and the IQR figure prominently in Exploratory Data Analysis, particularly in the production of boxplots (see Procedure 5.6 ).

Figure 5.20 illustrates the conceptual nature of the IQR and S-IQR compared to that of the range. Assume that 100% of data values are covered by the distribution curve in the figure. It is clear that these three measures would provide very different values for a measure of variability. Your choice would depend on your purpose. If you simply want to signal the overall span of scores between the minimum and maximum, the range is the measure of choice. But if you want to signal the variability around the median, the IQR or S-IQR would be the measure of choice.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig20_HTML.jpg

How the range, IQR and S-IQR measures of variability conceptually differ

Note: Some behavioural researchers refer to the IQR as the hinge-spread (or H-spread ) because of its use in the production of boxplots:

  • the 25th percentile data value is referred to as the ‘lower hinge’;
  • the 75th percentile data value is referred to as the ‘upper hinge’; and
  • their difference gives the H-spread.

Midspread is another term you may see used as a synonym for interquartile range.

Referring back to Table 5.4 , we can find statistics reported for the median and for the ‘quartiles’ (25th, 50th and 75th percentile scores) for each of the five variables of interest. The ‘quartile’ values are useful for finding the IQR or S-IQR because SPSS does not report these measures directly. The median clearly equals the 50th percentile data value in the table.

If we focus, for example, on the speed variable, we could find its IQR by subtracting the 25th percentile score of 2.19 s from the 75th percentile score of 5.71 s to give a value for the IQR of 3.52 s (the S-IQR would simply be 3.52 divided by 2 or 1.76 s). Thus, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores spanning a range of 3.52 s. Alternatively, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores which ranged 1.76 s either side of the median value.

Note: We could compare the ‘Minimum’ or ‘Maximum’ scores to the 25th percentile score and 75th percentile score respectively to get a feeling for whether the minimum or maximum might be considered extreme or uncharacteristic data values.

The variance uses information from every individual in the sample to assess the variability of scores relative to the sample mean. Variance assesses the average squared deviation of each score from the mean of the sample. Deviation refers to the difference between an observed score value and the mean of the sample—they are squared simply because adding them up in their naturally occurring unsquared form (where some differences are positive and others are negative) always gives a total of zero, which is useless for an index purporting to measure something.

If many scores are quite different from the mean, we would expect the variance to be large. If all the scores lie fairly close to the sample mean, we would expect a small variance. If all scores exactly equal the mean (i.e. all the scores in the sample have the same value), then we would expect the variance to be zero.

Figure 5.21 illustrates some possibilities regarding variance of a distribution of scores having a mean of 100. The very tall curve illustrates a distribution with small variance. The distribution of medium height illustrates a distribution with medium variance and the flattest distribution ia a distribution with large variance.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig21_HTML.jpg

The concept of variance

If we had a distribution with no variance, the curve would simply be a vertical line at a score of 100 (meaning that all scores were equal to the mean). You can see that as variance increases, the tails of the distribution extend further outward and the concentration of scores around the mean decreases. You may have noticed that variance and range (as well as the IQR) will be related, since the range focuses on the difference between the ends of the two tails in the distribution and larger variances extend the tails. So, a larger variance will generally be associated with a larger range and IQR compared to a smaller variance.

It is generally difficult to descriptively interpret the variance measure in a meaningful fashion since it involves squared deviations around the sample mean. [Note: If you look back at Table 5.4 , you will see the variance listed for each of the variables (e.g. the variance of accuracy scores is 84.118), but the numbers themselves make little sense and do not relate to the original measurement scale for the variables (which, for the accuracy variable, went from 0% to 100% accuracy).] Instead, we use the variance as a steppingstone for obtaining a measure of variability that we can clearly interpret, namely the standard deviation . However, you should know that variance is an important concept in its own right simply because it provides the statistical foundation for many of the correlational procedures and statistical inference procedures described in Chaps. 10.1007/978-981-15-2537-7_6 , 10.1007/978-981-15-2537-7_7 and 10.1007/978-981-15-2537-7_8.

When considering either correlations or tests of statistical hypotheses, we frequently speak of one variable explaining or sharing variance with another (see Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec47 ). In doing so, we are invoking the concept of variance as set out here—what we are saying is that variability in the behaviour of scores on one particular variable may be associated with or predictive of variability in scores on another variable of interest (e.g. it could explain why those scores have a non-zero variance).

Standard Deviation

The standard deviation (often abbreviated as SD, sd or Std. Dev.) is the most commonly reported measure of variability because it has a meaningful interpretation and is used in conjunction with reports of sample means. Variance and standard deviation are closely related measures in that the standard deviation is found by taking the square root of the variance. The standard deviation, very simply, is a summary number that reflects the ‘average distance of each score from the mean of the sample’. In many parametric statistical methods, both the sample mean and sample standard deviation are employed in some form. Thus, the standard deviation is a very important measure, not only for data description, but also for hypothesis testing and the establishment of relationships as well.

Referring again back to Table 5.4 , we’ll focus on the results for the speed variable for discussion purposes. Table 5.4 shows that the mean inspection speed for the QCI sample was 4.48 s. We can also see that the standard deviation (in the row labelled ‘Std Deviation’) for speed was 2.89 s.

This standard deviation has a straightforward interpretation: we would say that ‘on the average, an inspector’s quality inspection decision speed differed from the mean of the sample by about 2.89 s in either direction’. In a normal distribution of scores (see Fundamental Concept II ), we would expect to see about 68% of all inspectors having decision speeds between 1.59 s (the mean minus one amount of the standard deviation) and 7.37 s (the mean plus one amount of the standard deviation).

We noted earlier that the range of the speed scores was 16.05 s. However, the fact that the maximum speed score was 17.1 s compared to the 75th percentile score of just 5.71 s seems to suggest that this maximum speed might be rather atypically large compared to the bulk of speed scores. This means that the range is likely to be giving us a false impression of the overall variability of the inspectors’ decision speeds.

Furthermore, given that the mean speed score was higher than the median speed score, suggesting that speed scores were positively skewed (this was confirmed by the histogram for speed shown in Fig. 5.19 in Procedure 5.4 ), we might consider emphasising the median and its associated IQR or S-IQR rather than the mean and standard deviation. Of course, similar diagnostic and interpretive work could be done for each of the other four variables in Table 5.4 .

Measures of variability (particularly the standard deviation) provide a summary measure that gives an indication of how variable (spread out) a particular sample of scores is. When used in conjunction with a relevant measure of central tendency (particularly the mean), a reasonable yet economical description of a set of data emerges. When there are extreme data values or severe skewness is present in the data, the IQR (or S-IQR) becomes the preferred measure of variability to be reported in conjunction with the sample median (or 50th percentile value). These latter measures are much more resistant (‘robust’) to influence by data anomalies than are the mean and standard deviation.

As mentioned above, the range is a very cursory index of variability, thus, it is not as useful as variance or standard deviation. Variance has little meaningful interpretation as a descriptive index; hence, standard deviation is most often reported. However, the standard deviation (or IQR) has little meaning if the sample mean (or median) is not reported along with it.

Knowing that the standard deviation for accuracy is 9.17 tells you little unless you know the mean accuracy (82.14) that it is the standard deviation from.

Like the sample mean, the standard deviation can be strongly biased by the presence of extreme data values or severe skewness in a distribution in which case the median and IQR (or S-IQR) become the preferred measures. The biasing effect will be most noticeable in samples which are small in size (say, less than 30 individuals) and far less noticeable in large samples (say, in excess of 200 or 300 individuals). [Note that, in a manner similar to a trimmed mean, it is possible to compute a trimmed standard deviation to reduce the biasing effect of extreme data values, see Field 2018 , p. 263.]

It is important to realise that the resistance of the median and IQR (or S-IQR) to extreme values is only gained by deliberately sacrificing a good deal of the information available in the sample (nothing is obtained without a cost in statistics). What is sacrificed is information from all other members of the sample other than those members who scored at the median and 25th and 75th percentile points on a variable of interest; information from all members of the sample would automatically be incorporated in mean and standard deviation for that variable.

Any investigation where you might report on or read about measures of central tendency on certain variables should also report measures of variability. This is particularly true for data from experiments, quasi-experiments, observational studies and questionnaires. It is important to consider measures of central tendency and measures of variability to be inextricably linked—one should never report one without the other if an adequate descriptive summary of a variable is to be communicated.

Other descriptive measures, such as those for skewness and kurtosis 1 may also be of interest if a more complete description of any variable is desired. Most good statistical packages can be instructed to report these additional descriptive measures as well.

Of all the statistics you are likely to encounter in the business, behavioural and social science research literature, means and standard deviations will dominate as measures for describing data. Additionally, these statistics will usually be reported when any parametric tests of statistical hypotheses are presented as the mean and standard deviation provide an appropriate basis for summarising and evaluating group differences.

Fundamental Concept I: Basic Concepts in Probability

The concept of simple probability.

In Procedures 5.1 and 5.2 , you encountered the idea of the frequency of occurrence of specific events such as particular scores within a sample distribution. Furthermore, it is a simple operation to convert the frequency of occurrence of a specific event into a number representing the relative frequency of that event. The relative frequency of an observed event is merely the number of times the event is observed divided by the total number of times one makes an observation. The resulting number ranges between 0 and 1 but we typically re-express this number as a percentage by multiplying it by 100%.

In the QCI database, Maree Lakota observed data from 112 quality control inspectors of which 58 were male and 51 were female (gender indications were missing for three inspectors). The statistics 58 and 51 are thus the frequencies of occurrence for two specific types of research participant, a male inspector or a female inspector.

If she divided each frequency by the total number of observations (i.e. 112), whe would obtain .52 for males and .46 for females (leaving .02 of observations with unknown gender). These statistics are relative frequencies which indicate the proportion of times that Maree obtained data from a male or female inspector. Multiplying each relative frequency by 100% would yield 52% and 46% which she could interpret as indicating that 52% of her sample was male and 46% was female (leaving 2% of the sample with unknown gender).

It does not take much of a leap in logic to move from the concept of ‘relative frequency’ to the concept of ‘probability’. In our discussion above, we focused on relative frequency as indicating the proportion or percentage of times a specific category of participant was obtained in a sample. The emphasis here is on data from a sample.

Imagine now that Maree had infinite resources and research time and was able to obtain ever larger samples of quality control inspectors for her study. She could still compute the relative frequencies for obtaining data from males and females in her sample but as her sample size grew larger and larger, she would notice these relative frequencies converging toward some fixed values.

If, by some miracle, Maree could observe all of the quality control inspectors on the planet today, she would have measured the entire population and her computations of relative frequency for males and females would yield two precise numbers, each indicating the proportion of the population of inspectors that was male and the proportion that was female.

If Maree were then to list all of these inspectors and randomly choose one from the list, the chances that she would choose a male inspector would be equal to the proportion of the population of inspectors that was male and this logic extends to choosing a female inspector. The number used to quantify this notion of ‘chances’ is called a probability. Maree would therefore have established the probability of randomly observing a male or a female inspector in the population on any specific occasion.

Probability is expressed on a 0.0 (the observation or event will certainly not be seen) to 1.0 (the observation or event will certainly be seen) scale where values close to 0.0 indicate observations that are less certain to be seen and values close to 1.0 indicate observations that are more certain to be seen (a value of .5 indicates an even chance that an observation or event will or will not be seen – a state of maximum uncertainty). Statisticians often interpret a probability as the likelihood of observing an event or type of individual in the population.

In the QCI database, we noted that the relative frequency of observing males was .52 and for females was .46. If we take these relative frequencies as estimates of the proportions of each gender in the population of inspectors, then .52 and .46 represent the probability of observing a male or female inspector, respectively.

Statisticians would state this as “the probability of observing a male quality control inspector is .52” or in a more commonly used shorthand code, the likelihood of observing a male quality control inspector is p = .52 (p for probability). For some, probabilities make more sense if they are converted to percentages (by multiplying by 100%). Thus, p = .52 can also understood as a 52% chance of observing a male quality control inspector.

We have seen that relative frequency is a sample statistic that can be used to estimate the population probability. Our estimate will get more precise as we use larger and larger samples (technically, as the size of our samples more closely approximates the size of our population). In most behavioural research, we never have access to entire populations so we must always estimate our probabilities.

In some very special populations, having a known number of fixed possible outcomes, such as results of coin tosses or rolls of a die, we can analytically establish event probabilities without doing an infinite number of observations; all we must do is assume that we have a fair coin or die. Thus, with a fair coin, the probability of observing a H or a T on any single coin toss is ½ or .5 or 50%; the probability of observing a 6 on any single throw of a die is 1/6 or .16667 or 16.667%. With behavioural data, though, we can never measure all possible behavioural outcomes, which thereby forces researchers to depend on samples of observations in order to make estimates of population values.

The concept of probability is central to much of what is done in the statistical analysis of behavioural data. Whenever a behavioural scientist wishes to establish whether a particular relationship exists between variables or whether two groups, treated differently, actually show different behaviours, he/she is playing a probability game. Given a sample of observations, the behavioural scientist must decide whether what he/she has observed is providing sufficient information to conclude something about the population from which the sample was drawn.

This decision always has a non-zero probability of being in error simply because in samples that are much smaller than the population, there is always the chance or probability that we are observing something rare and atypical instead of something which is indicative of a consistent population trend. Thus, the concept of probability forms the cornerstone for statistical inference about which we will have more to say later (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec6). Probability also plays an important role in helping us to understand theoretical statistical distributions (e.g. the normal distribution) and what they can tell us about our observations. We will explore this idea further in Fundamental Concept II .

The Concept of Conditional Probability

It is important to understand that the concept of probability as described above focuses upon the likelihood or chances of observing a specific event or type of observation for a specific variable relative to a population or sample of observations. However, many important behavioural research issues may focus on the question of the probability of observing a specific event given that the researcher has knowledge that some other event has occurred or been observed (this latter event is usually measured by a second variable). Here, the focus is on the potential relationship or link between two variables or two events.

With respect to the QCI database, Maree could ask the quite reasonable question “what is the probability (estimated in the QCI sample by a relative frequency) of observing an inspector being female given that she knows that an inspector works for a Large Business Computer manufacturer.

To address this question, all she needs to know is:

  • how many inspectors from Large Business Computer manufacturers are in the sample ( 22 ); and
  • how many of those inspectors were female ( 7 ) (inspectors who were missing a score for either company or gender have been ignored here).

If she divides 7 by 22, she would obtain the probability that an inspector is female given that they work for a Large Business Computer manufacturer – that is, p = .32 .

This type of question points to the important concept of conditional probability (‘conditional’ because we are asking “what is the probability of observing one event conditional upon our knowledge of some other event”).

Continuing with the previous example, Maree would say that the conditional probability of observing a female inspector working for a Large Business Computer manufacturer is .32 or, equivalently, a 32% chance. Compare this conditional probability of p  = .32 to the overall probability of observing a female inspector in the entire sample ( p  = .46 as shown above).

This means that there is evidence for a connection or relationship between gender and the type of company an inspector works for. That is, the chances are lower for observing a female inspector from a Large Business Computer manufacturer than they are for simply observing a female inspector at all.

Maree therefore has evidence suggesting that females may be relatively under-represented in Large Business Computer manufacturing companies compared to the overall population. Knowing something about the company an inspector works for therefore can help us make a better prediction about their likely gender.

Suppose, however, that Maree’s conditional probability had been exactly equal to p  = .46. This would mean that there was exactly the same chance of observing a female inspector working for a Large Business Computer manufacturer as there was of observing a female inspector in the general population. Here, knowing something about the company an inspector works doesn’t help Maree make any better prediction about their likely gender. This would mean that the two variables are statistically independent of each other.

A classic case of events that are statistically independent is two successive throws of a fair die: rolling a six on the first throw gives us no information for predicting how likely it will be that we would roll a six on the second throw. The conditional probability of observing a six on the second throw given that I have observed a six on the first throw is 0.16667 (= 1 divided by 6) which is the same as the simple probability of observing a six on any specific throw. This statistical independence also means that if we wanted to know what the probability of throwing two sixes on two successive throws of a fair die, we would just multiply the probabilities for each independent event (i.e., throw) together; that is, .16667 × .16667 = .02789 (this is known as the multiplication rule of probability, see, for example, Smithson 2000 , p. 114).

Finally, you should know that conditional probabilities are often asymmetric. This means that for many types of behavioural variables, reversing the conditional arrangement will change the story about the relationship. Bayesian statistics (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec73) relies heavily upon this asymmetric relationship between conditional probabilities.

Maree has already learned that the conditional probability that an inspector is female given that they worked for a Large Business Computer manufacturer is p = .32. She could easily turn the conditional relationship around and ask what is the conditional probability that an inspector works for a Large Business Computer manufacturer given that the inspector is female?

From the QCI database, she can find that 51 inspectors in her total sample were female and of those 51, 7 worked for a Large Business Computer manufacturer. If she divided 7 by 51, she would get p = .14 (did you notice that all that changed was the number she divided by?). Thus, there is only a 14% chance of observing an inspector working for a Large Business Computer manufacturer given that the inspector is female – a rather different probability from p = .32, which tells a different story.

As you will see in Procedures 10.1007/978-981-15-2537-7_6#Sec14 and 10.1007/978-981-15-2537-7_7#Sec17, conditional relationships between categorical variables are precisely what crosstabulation contingency tables are designed to reveal.

Procedure 5.6: Exploratory Data Analysis

There are a variety of visual display methods for EDA, including stem & leaf displays, boxplots and violin plots. Each method reflects a specific way of displaying features of a distribution of scores or measurements and, of course, each has its own advantages and disadvantages. In addition, EDA displays are surprisingly flexible and can combine features in various ways to enhance the story conveyed by the plot.

Stem & Leaf Displays

The stem & leaf display is a simple data summary technique which not only rank orders the data points in a sample but presents them visually so that the shape of the data distribution is reflected. Stem & leaf displays are formed from data scores by splitting each score into two parts: the first part of each score serving as the ‘stem’, the second part as the ‘leaf’ (e.g. for 2-digit data values, the ‘stem’ is the number in the tens position; the ‘leaf’ is the number in the ones position). Each stem is then listed vertically, in ascending order, followed horizontally by all the leaves in ascending order associated with it. The resulting display thus shows all of the scores in the sample, but reorganised so that a rough idea of the shape of the distribution emerges. As well, extreme scores can be easily identified in a stem & leaf display.

Consider the accuracy and speed scores for the 112 quality control inspectors in the QCI sample. Figure 5.22 (produced by the R Commander Stem-and-leaf display … procedure) shows the stem & leaf displays for inspection accuracy (left display) and speed (right display) data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig22_HTML.jpg

Stem & leaf displays produced by R Commander

[The first six lines reflect information from R Commander about each display: lines 1 and 2 show the actual R command used to produce the plot (the variable name has been highlighted in bold); line 3 gives a warning indicating that inspectors with missing values (= NA in R ) on the variable have been omitted from the display; line 4 shows how the stems and leaves have been defined; line 5 indicates what a leaf unit represents in value; and line 6 indicates the total number (n) of inspectors included in the display).] In Fig. 5.22 , for the accuracy display on the left-hand side, the ‘stems’ have been split into ‘half-stems’—one (which is starred) associated with the ‘leaves’ 0 through 4 and the other associated with the ‘leaves’ 5 through 9—a strategy that gives the display better balance and visual appeal.

Notice how the left stem & leaf display conveys a fairly clear (yet sideways) picture of the shape of the distribution of accuracy scores. It has a rather symmetrical bell-shape to it with only a slight suggestion of negative skewness (toward the extreme score at the top). The right stem & leaf display clearly depicts the highly positively skewed nature of the distribution of speed scores. Importantly, we could reconstruct the entire sample of scores for each variable using its display, which means that unlike most other graphical procedures, we didn’t have to sacrifice any information to produce the visual summary.

Some programs, such as SYSTAT, embellish their stem & leaf displays by indicating in which stem or half-stem the ‘median’ (50th percentile), the ‘upper hinge score’ (75th percentile), and ‘lower hinge score’ (25th percentile) occur in the distribution (recall the discussion of interquartile range in Procedure 5.5 ). This is shown in Fig. 5.23 , produced by SYSTAT, where M and H indicate the stem locations for the median and hinge points, respectively. This stem & leaf display labels a single extreme accuracy score as an ‘outside value’ and clearly shows that this actual score was 57.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig23_HTML.jpg

Stem & leaf display, produced by SYSTAT, of the accuracy QCI variable

Another important EDA technique is the boxplot or, as it is sometimes known, the box-and-whisker plot . This plot provides a symbolic representation that preserves less of the original nature of the data (compared to a stem & leaf display) but typically gives a better picture of the distributional characteristics. The basic boxplot, shown in Fig. 5.24 , utilises information about the median (50th percentile score) and the upper (75th percentile score) and lower (25th percentile score) hinge points in the construction of the ‘box’ portion of the graph (the ‘median’ defines the centre line in the box; the ‘upper’ and ‘lower hinge values’ define the end boundaries of the box—thus the box encompasses the middle 50% of data values).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig24_HTML.jpg

Boxplots for the accuracy and speed QCI variables

Additionally, the boxplot utilises the IQR (recall Procedure 5.5 ) as a way of defining what are called ‘fences’ which are used to indicate score boundaries beyond which we would consider a score in a distribution to be an ‘outlier’ (or an extreme or unusual value). In SPSS, the inner fence is typically defined as 1.5 times the IQR in each direction and a ‘far’ outlier or extreme case is typically defined as 3 times the IQR in either direction (Field 2018 , p. 193). The ‘whiskers’ in a boxplot extend out to the data values which are closest to the upper and lower inner fences (in most cases, the vast majority of data values will be contained within the fences). Outliers beyond these ‘whiskers’ are then individually listed. ‘Near’ outliers are those lying just beyond the inner fences and ‘far’ outliers lie well beyond the inner fences.

Figure 5.24 shows two simple boxplots (produced using SPSS), one for the accuracy QCI variable and one for the speed QCI variable. The accuracy plot shows a median value of about 83, roughly 50% of the data fall between about 77 and 89 and there is one outlier, inspector 83, in the lower ‘tail’ of the distribution. The accuracy boxplot illustrates data that are relatively symmetrically distributed without substantial skewness. Such data will tend to have their median in the middle of the box, whiskers of roughly equal length extending out from the box and few or no outliers.

The speed plot shows a median value of about 4 s, roughly 50% of the data fall between 2 s and 6 s and there are four outliers, inspectors 7, 62, 65 and 75 (although inspectors 65 and 75 fall at the same place and are rather difficult to read), all falling in the slow speed ‘tail’ of the distribution. Inspectors 65, 75 and 7 are shown as ‘near’ outliers (open circles) whereas inspector 62 is shown as a ‘far’ outlier (asterisk). The speed boxplot illustrates data which are asymmetrically distributed because of skewness in one direction. Such data may have their median offset from the middle of the box and/or whiskers of unequal length extending out from the box and outliers in the direction of the longer whisker. In the speed boxplot, the data are clearly positively skewed (the longer whisker and extreme values are in the slow speed ‘tail’).

Boxplots are very versatile representations in that side-by-side displays for sub-groups of data within a sample can permit easy visual comparisons of groups with respect to central tendency and variability. Boxplots can also be modified to incorporate information about error bands associated with the median producing what is called a ‘notched boxplot’. This helps in the visual detection of meaningful subgroup differences, where boxplot ‘notches’ don’t overlap.

Figure 5.25 (produced using NCSS), compares the distributions of accuracy and speed scores for QCI inspectors from the five types of companies, plotted side-by-side.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig25_HTML.jpg

Comparisons of the accuracy (regular boxplots) and speed (notched boxplots) QCI variables for different types of companies

Focus first on the left graph in Fig. 5.25 which plots the distribution of accuracy scores broken down by company using regular boxplots. This plot clearly shows the differing degree of skewness in each type of company (indicated by one or more outliers in one ‘tail’, whiskers which are not the same length and/or the median line being offset from the centre of a box), the differing variability of scores within each type of company (indicated by the overall length of each plot—box and whiskers), and the differing central tendency in each type of company (the median lines do not all fall at the same level of accuracy score). From the left graph in Fig. 5.25 , we could conclude that: inspection accuracy scores are most variable in PC and Large Electrical Appliance manufacturing companies and least variable in the Large Business Computer manufacturing companies; Large Business Computer and PC manufacturing companies have the highest median level of inspection accuracy; and inspection accuracy scores tend to be negatively skewed (many inspectors toward higher levels, relatively fewer who are poorer in inspection performance) in the Automotive manufacturing companies. One inspector, working for an Automotive manufacturing company, shows extremely poor inspection accuracy performance.

The right display compares types of companies in terms of their inspection speed scores, using’ notched’ boxplots. The notches define upper and lower error limits around each median. Aside from the very obvious positive skewness for speed scores (with a number of slow speed outliers) in every type of company (least so for Large Electrical Appliance manufacturing companies), the story conveyed by this comparison is that inspectors from Large Electrical Appliance and Automotive manufacturing companies have substantially faster median decision speeds compared to inspectors from Large Business Computer and PC manufacturing companies (i.e. their ‘notches’ do not overlap, in terms of speed scores, on the display).

Boxplots can also add interpretive value to other graphical display methods through the creation of hybrid displays. Such displays might combine a standard histogram with a boxplot along the X-axis to provide an enhanced picture of the data distribution as illustrated for the mentabil variable in Fig. 5.26 (produced using NCSS). This hybrid plot also employs a data ‘smoothing’ method called a density trace to outline an approximate overall shape for the data distribution. Any one graphical method would tell some of the story, but combined in the hybrid display, the story of a relatively symmetrical set of mentabil scores becomes quite visually compelling.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig26_HTML.jpg

A hybrid histogram-density-boxplot of the mentabil QCI variable

Violin Plots

Violin plots are a more recent and interesting EDA innovation, implemented in the NCSS software package (Hintze 2012 ). The violin plot gets its name from the rough shape that the plots tend to take on. Violin plots are another type of hybrid plot, this time combining density traces (mirror-imaged right and left so that the plots have a sense of symmetry and visual balance) with boxplot-type information (median, IQR and upper and lower inner ‘fences’, but not outliers). The goal of the violin plot is to provide a quick visual impression of the shape, central tendency and variability of a distribution (the length of the violin conveys a sense of the overall variability whereas the width of the violin conveys a sense of the frequency of scores occurring in a specific region).

Figure 5.27 (produced using NCSS), compares the distributions of speed scores for QCI inspectors across the five types of companies, plotted side-by-side. The violin plot conveys a similar story to the boxplot comparison for speed in the right graph of Fig. 5.25 . However, notice that with the violin plot, unlike with a boxplot, you also get a sense of distributions that have ‘clumps’ of scores in specific areas. Some violin plots, like that for Automobile manufacturing companies in Fig. 5.27 , have a shape suggesting a multi-modal distribution (recall Procedure 5.4 and the discussion of the fact that a distribution may have multiple modes). The violin plot in Fig. 5.27 has also been produced to show where the median (solid line) and mean (dashed line) would fall within each violin. This facilitates two interpretations: (1) a relative comparison of central tendency across the five companies and (2) relative degree of skewness in the distribution for each company (indicated by the separation of the two lines within a violin; skewness is particularly bad for the Large Business Computer manufacturing companies).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig27_HTML.jpg

Violin plot comparisons of the speed QCI variable for different types of companies

EDA methods (of which we have illustrated only a small subset; we have not reviewed dot density diagrams, for example) provide summary techniques for visually displaying certain characteristics of a set of data. The advantage of the EDA methods over more traditional graphing techniques such as those described in Procedure 5.2 is that as much of the original integrity of the data is maintained as possible while maximising the amount of summary information available about distributional characteristics.

Stem & leaf displays maintain the data in as close to their original form as possible whereas boxplots and violin plots provide more symbolic and flexible representations. EDA methods are best thought of as communication devices designed to facilitate quick visual impressions and they can add interest to any statistical story being conveyed about a sample of data. NCSS, SYSTAT, STATGRAPHICS and R Commander generally offer more options and flexibility in the generation of EDA displays than SPSS.

EDA methods tend to get cumbersome if a great many variables or groups need to be summarised. In such cases, using numerical summary statistics (such as means and standard deviations) will provide a more economical and efficient summary. Boxplots or violin plots are generally more space efficient summary techniques than stem & leaf displays.

Often, EDA techniques are used as data screening devices, which are typically not reported in actual write-ups of research (we will discuss data screening in more detail in Procedure 10.1007/978-981-15-2537-7_8#Sec11). This is a perfectly legitimate use for the methods although there is an argument for researchers to put these techniques to greater use in published literature.

Software packages may use different rules for constructing EDA plots which means that you might get rather different looking plots and different information from different programs (you saw some evidence of this in Figs. 5.22 and 5.23 ). It is important to understand what the programs are using as decision rules for locating fences and outliers so that you are clear on how best to interpret the resulting plot—such information is generally contained in the user’s guides or manuals for NCSS (Hintze 2012 ), SYSTAT (SYSTAT Inc. 2009a , b ), STATGRAPHICS (StatPoint Technologies Inc. 2010 ) and SPSS (Norušis 2012 ).

Virtually any research design which produces numerical measures (even to the extent of just counting the number of occurrences of several events) provides opportunities for employing EDA displays which may help to clarify data characteristics or relationships. One extremely important use of EDA methods is as data screening devices for detecting outliers and other data anomalies, such as non-normality and skewness, before proceeding to parametric statistical analyses. In some cases, EDA methods can help the researcher to decide whether parametric or nonparametric statistical tests would be best to apply to his or her data because critical data characteristics such as distributional shape and spread are directly reflected.

Procedure 5.7: Standard ( z ) Scores

In certain practical situations in behavioural research, it may be desirable to know where a specific individual’s score lies relative to all other scores in a distribution. A convenient measure is to observe how many standard deviations (see Procedure 5.5 ) above or below the sample mean a specific score lies. This measure is called a standard score or z -score . Very simply, any raw score can be converted to a z -score by subtracting the sample mean from the raw score and dividing that result by the sample’s standard deviation. z -scores can be positive or negative and their sign simply indicates whether the score lies above (+) or below (−) the mean in value. A z -score has a very simple interpretation: it measures the number of standard deviations above or below the sample mean a specific raw score lies.

In the QCI database, we have a sample mean for speed scores of 4.48 s, a standard deviation for speed scores of 2.89 s (recall Table 5.4 in Procedure 5.5 ). If we are interested in the z -score for Inspector 65’s raw speed score of 11.94 s, we would obtain a z -score of +2.58 using the method described above (subtract 4.48 from 11.94 and divide the result by 2.89). The interpretation of this number is that a raw decision speed score of 11.94 s lies about 2.9 standard deviations above the mean decision speed for the sample.

z -scores have some interesting properties. First, if one converts (statisticians would say ‘transforms’) every available raw score in a sample to z -scores, the mean of these z -scores will always be zero and the standard deviation of these z -scores will always be 1.0. These two facts about z -scores (mean = 0; standard deviation = 1) will be true no matter what sample you are dealing with and no matter what the original units of measurement are (e.g. seconds, percentages, number of widgets assembled, amount of preference for a product, attitude rating, amount of money spent). This is because transforming raw scores to z -scores automatically changes the measurement units from whatever they originally were to a new system of measurements expressed in standard deviation units.

Suppose Maree was interested in the performance statistics for the top 25% most accurate quality control inspectors in the sample. Given a sample size of 112, this would mean finding the top 28 inspectors in terms of their accuracy scores. Since Maree is interested in performance statistics, speed scores would also be of interest. Table 5.5 (generated using the SPSS Descriptives … procedure, listed using the Case Summaries … procedure and formatted for presentation using Excel) shows accuracy and speed scores for the top 28 inspectors in descending order of accuracy scores. The z -score transformation for each of these scores is also shown (last two columns) as are the type of company, education level and gender for each inspector.

Listing of the 28 (top 25%) most accurate QCI inspectors’ accuracy and speed scores as well as standard ( z ) score transformations for each score

There are three inspectors (8, 9 and 14) who scored maximum accuracy of 100%. Such accuracy converts to a z -score of +1.95. Thus 100% accuracy is 1.95 standard deviations above the sample’s mean accuracy level. Interestingly, all three inspectors worked for PC manufacturers and all three had only high school-level education. The least accurate inspector in the top 25% had a z -score for accuracy that was .75 standard deviations above the sample mean.

Interestingly, the top three inspectors in terms of accuracy had decision speeds that fell below the sample’s mean speed; inspector 8 was the fastest inspector of the three with a speed just over 1 standard deviation ( z  = −1.03) below the sample mean. The slowest inspector in the top 25% was inspector 75 (case #28 in the list) with a speed z -score of +2.62; i.e., he was over two and a half standard deviations slower in making inspection decisions relative to the sample’s mean speed.

The fact that z -scores always have a common measurement scale having a mean of 0 and a standard deviation of 1.0 leads to an interesting application of standard scores. Suppose we focus on inspector number 65 (case #8 in the list) in Table 5.5 . It might be of interest to compare this inspector’s quality control performance in terms of both his decision accuracy and decision speed. Such a comparison is impossible using raw scores since the inspector’s accuracy score and speed scores are different measures which have differing means and standard deviations expressed in fundamentally different units of measurement (percentages and seconds). However, if we are willing to assume that the score distributions for both variables are approximately the same shape and that both accuracy and speed are measured with about the same level of reliability or consistency (see Procedure 10.1007/978-981-15-2537-7_8#Sec1), we can compare the inspector’s two scores by first converting them to z -scores within their own respective distributions as shown in Table 5.5 .

Inspector 65 looks rather anomalous in that he demonstrated a relatively high level of accuracy (raw score = 94%; z  = +1.29) but took a very long time to make those accurate decisions (raw score = 11.94 s; z  = +2.58). Contrast this with inspector 106 (case #17 in the list) who demonstrated a similar level of accuracy (raw score = 92%; z  = +1.08) but took a much shorter time to make those accurate decisions (raw score = 1.70 s; z  = −.96). In terms of evaluating performance, from a company perspective, we might conclude that inspector 106 is performing at an overall higher level than inspector 65 because he can achieve a very high level of accuracy but much more quickly; accurate and fast is more cost effective and efficient than accurate and slow.

Note: We should be cautious here since we know from our previous explorations of the speed variable in Procedure 5.6 , that accuracy scores look fairly symmetrical and speed scores are positively skewed, so assuming that the two variables have the same distribution shape, so that z -score comparisons are permitted, would be problematic.

You might have noticed that as you scanned down the two columns of z -scores in Table 5.5 , there was a suggestion of a pattern between the signs attached to the respective z -scores for each person. There seems to be a very slight preponderance of pairs of z -scores where the signs are reversed (12 out of 22 pairs). This observation provides some very preliminary evidence to suggest that there may be a relationship between inspection accuracy and decision speed, namely that a more accurate decision tends to be associated with a faster decision speed. Of course, this pattern would be better verified using the entire sample rather than the top 25% of inspectors. However, you may find it interesting to learn that it is precisely this sort of suggestive evidence (about agreement or disagreement between z -score signs for pairs of variable scores throughout a sample) that is captured and summarised by a single statistical indicator called a ‘correlation coefficient’ (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

z -scores are not the only type of standard score that is commonly used. Three other types of standard scores are: stanines (standard nines), IQ scores and T-scores (not to be confused with the t -test described in Procedure 10.1007/978-981-15-2537-7_7#Sec22). These other types of scores have the advantage of producing only positive integer scores rather than positive and negative decimal scores. This makes interpretation somewhat easier for certain applications. However, you should know that almost all other types of standard scores come from a specific transformation of z -scores. This is because once you have converted raw scores into z -scores, they can then be quite readily transformed into any other system of measurement by simply multiplying a person’s z -score by the new desired standard deviation for the measure and adding to that product the new desired mean for the measure.

T-scores are simply z-scores transformed to have a mean of 50.0 and a standard deviation of 10.0; IQ scores are simply z-scores transformed to have a mean of 100 and a standard deviation of 15 (or 16 in some systems). For more information, see Fundamental Concept II .

Standard scores are useful for representing the position of each raw score within a sample distribution relative to the mean of that distribution. The unit of measurement becomes the number of standard deviations a specific score is away from the sample mean. As such, z -scores can permit cautious comparisons across samples or across different variables having vastly differing means and standard deviations within the constraints of the comparison samples having similarly shaped distributions and roughly equivalent levels of measurement reliability. z -scores also form the basis for establishing the degree of correlation between two variables. Transforming raw scores into z -scores does not change the shape of a distribution or rank ordering of individuals within that distribution. For this reason, a z -score is referred to as a linear transformation of a raw score. Interestingly, z -scores provide an important foundational element for more complex analytical procedures such as factor analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec36), cluster analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec41) and multiple regression analysis (see, for example, Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec86).

While standard scores are useful indices, they are subject to restrictions if used to compare scores across samples or across different variables. The samples must have similar distribution shapes for the comparisons to be meaningful and the measures must have similar levels of reliability in each sample. The groups used to generate the z -scores should also be similar in composition (with respect to age, gender distribution, and so on). Because z -scores are not an intuitively meaningful way of presenting scores to lay-persons, many other types of standard score schemes have been devised to improve interpretability. However, most of these schemes produce scores that run a greater risk of facilitating lay-person misinterpretations simply because their connection with z -scores is hidden or because the resulting numbers ‘look’ like a more familiar type of score which people do intuitively understand.

It is extremely rare for a T-score to exceed 100 or go below 0 because this would mean that the raw score was in excess of 5 standard deviations away from the sample mean. This unfortunately means that T-scores are often misinterpreted as percentages because they typically range between 0 and 100 and therefore ‘look’ like percentages. However, T-scores are definitely not percentages.

Finally, a common misunderstanding of z -scores is that transforming raw scores into z -scores makes them follow a normal distribution (see Fundamental Concept II ). This is not the case. The distribution of z -scores will have exactly the same shape as that for the raw scores; if the raw scores are positively skewed, then the corresponding z -scores will also be positively skewed.

z -scores are particularly useful in evaluative studies where relative performance indices are of interest. Whenever you compute a correlation coefficient ( Procedure 10.1007/978-981-15-2537-7_6#Sec4), you are implicitly transforming the two variables involved into z -scores (which equates the variables in terms of mean and standard deviation), so that only the patterning in the relationship between the variables is represented. z -scores are also useful as a preliminary step to more advanced parametric statistical methods when variables differing in scale, range and/or measurement units must be equated for means and standard deviations prior to analysis.

Fundamental Concept II: The Normal Distribution

Arguably the most fundamental distribution used in the statistical analysis of quantitative data in the behavioural and social sciences is the normal distribution (also known as the Gaussian or bell-shaped distribution ). Many behavioural phenomena, if measured on a large enough sample of people, tend to produce ‘normally distributed’ variable scores. This includes most measures of ability, performance and productivity, personality characteristics and attitudes. The normal distribution is important because it is the one form of distribution that you must assume describes the scores of a variable in the population when parametric tests of statistical inference are undertaken. The standard normal distribution is defined as having a population mean of 0.0 and a population standard deviation of 1.0. The normal distribution is also important as a means of interpreting various types of scoring systems.

Figure 5.28 displays the standard normal distribution (mean = 0; standard deviation = 1.0) and shows that there is a clear link between z -scores and the normal distribution. Statisticians have analytically calculated the probability (also expressed as percentages or percentiles) that observations will fall above or below any specific z -score in the theoretical standard normal distribution. Thus, a z -score of +1.0 in the standard normal distribution will have 84.13% (equals a probability of .8413) of observations in the population falling at or below one standard deviation above the mean and 15.87% falling above that point. A z -score of −2.0 will have 2.28% of observations falling at that point or below and 97.72% of observations falling above that point. It is clear then that, in a standard normal distribution, z -scores have a direct relationship with percentiles .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig28_HTML.jpg

The normal (bell-shaped or Gaussian) distribution

Figure 5.28 also shows how T-scores relate to the standard normal distribution and to z -scores. The mean T-score falls at 50 and each increment or decrement of 10 T-score units means a movement of another standard deviation away from this mean of 50. Thus, a T-score of 80 corresponds to a z -score of +3.0—a score 3 standard deviations higher than the mean of 50.

Of special interest to behavioural researchers are the values for z -scores in a standard normal distribution that encompass 90% of observations ( z  = ±1.645—isolating 5% of the distribution in each tail), 95% of observations ( z  = ±1.96—isolating 2.5% of the distribution in each tail), and 99% of observations ( z  = ±2.58—isolating 0.5% of the distribution in each tail).

Depending upon the degree of certainty required by the researcher, these bands describe regions outside of which one might define an observation as being atypical or as perhaps not belonging to a distribution being centred at a mean of 0.0. Most often, what is taken as atypical or rare in the standard normal distribution is a score at least two standard deviations away from the mean, in either direction. Why choose two standard deviations? Since in the standard normal distribution, only about 5% of observations will fall outside a band defined by z -scores of ±1.96 (rounded to 2 for simplicity), this equates to data values that are 2 standard deviations away from their mean. This can give us a defensible way to identify outliers or extreme values in a distribution.

Thinking ahead to what you will encounter in Chap. 10.1007/978-981-15-2537-7_7, this ‘banding’ logic can be extended into the world of statistics (like means and percentages) as opposed to just the world of observations. You will frequently hear researchers speak of some statistic estimating a specific value (a parameter ) in a population, plus or minus some other value.

A survey organisation might report political polling results in terms of a percentage and an error band, e.g. 59% of Australians indicated that they would vote Labour at the next federal election, plus or minus 2%.

Most commonly, this error band (±2%) is defined by possible values for the population parameter that are about two standard deviations (or two standard errors—a concept discussed further in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) away from the reported or estimated statistical value. In effect, the researcher is saying that on 95% of the occasions he/she would theoretically conduct his/her study, the population value estimated by the statistic being reported would fall between the limits imposed by the endpoints of the error band (the official name for this error band is a confidence interval ; see Procedure 10.1007/978-981-15-2537-7_8#Sec18). The well-understood mathematical properties of the standard normal distribution are what make such precise statements about levels of error in statistical estimates possible.

Checking for Normality

It is important to understand that transforming the raw scores for a variable to z -scores (recall Procedure 5.7 ) does not produce z -scores which follow a normal distribution; rather they will have the same distributional shape as the original scores. However, if you are willing to assume that the normal distribution is the correct reference distribution in the population, then you are justified is interpreting z -scores in light of the known characteristics of the normal distribution.

In order to justify this assumption, not only to enhance the interpretability of z -scores but more generally to enhance the integrity of parametric statistical analyses, it is helpful to actually look at the sample frequency distributions for variables (using a histogram (illustrated in Procedure 5.2 ) or a boxplot (illustrated in Procedure 5.6 ), for example), since non-normality can often be visually detected. It is important to note that in the social and behavioural sciences as well as in economics and finance, certain variables tend to be non-normal by their very nature. This includes variables that measure time taken to complete a task, achieve a goal or make decisions and variables that measure, for example, income, occurrence of rare or extreme events or organisational size. Such variables tend to be positively skewed in the population, a pattern that can often be confirmed by graphing the distribution.

If you cannot justify an assumption of ‘normality’, you may be able to force the data to be normally distributed by using what is called a ‘normalising transformation’. Such transformations will usually involve a nonlinear mathematical conversion (such as computing the logarithm, square root or reciprocal) of the raw scores. Such transformations will force the data to take on a more normal appearance so that the assumption of ‘normality’ can be reasonably justified, but at the cost of creating a new variable whose units of measurement and interpretation are more complicated. [For some non-normal variables, such as the occurrence of rare, extreme or catastrophic events (e.g. a 100-year flood or forest fire, coronavirus pandemic, the Global Financial Crisis or other type of financial crisis, man-made or natural disaster), the distributions cannot be ‘normalised’. In such cases, the researcher needs to model the distribution as it stands. For such events, extreme value theory (e.g. see Diebold et al. 2000 ) has proven very useful in recent years. This theory uses a variation of the Pareto or Weibull distribution as a reference, rather than the normal distribution, when making predictions.]

Figure 5.29 displays before and after pictures of the effects of a logarithmic transformation on the positively skewed speed variable from the QCI database. Each graph, produced using NCSS, is of the hybrid histogram-density trace-boxplot type first illustrated in Procedure 5.6 . The left graph clearly shows the strong positive skew in the speed scores and the right graph shows the result of taking the log 10 of each raw score.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig29_HTML.jpg

Combined histogram-density trace-boxplot graphs displaying the before and after effects of a ‘normalising’ log 10 transformation of the speed variable

Notice how the long tail toward slow speed scores is pulled in toward the mean and the very short tail toward fast speed scores is extended away from the mean. The result is a more ‘normal’ appearing distribution. The assumption would then be that we could assume normality of speed scores, but only in a log 10 format (i.e. it is the log of speed scores that we assume is normally distributed in the population). In general, taking the logarithm of raw scores provides a satisfactory remedy for positively skewed distributions (but not for negatively skewed ones). Furthermore, anything we do with the transformed speed scores now has to be interpreted in units of log 10 (seconds) which is a more complex interpretation to make.

Another visual method for detecting non-normality is to graph what is called a normal Q-Q plot (the Q-Q stands for Quantile-Quantile). This plots the percentiles for the observed data against the percentiles for the standard normal distribution (see Cleveland 1995 for more detailed discussion; also see Lane 2007 , http://onlinestatbook.com/2/advanced_graphs/ q-q_plots.html) . If the pattern for the observed data follows a normal distribution, then all the points on the graph will fall approximately along a diagonal line.

Figure 5.30 shows the normal Q-Q plots for the original speed variable and the transformed log-speed variable, produced using the SPSS Explore... procedure. The diagnostic diagonal line is shown on each graph. In the left-hand plot, for speed , the plot points clearly deviate from the diagonal in a way that signals positive skewness. The right-hand plot, for log_speed, shows the plot points generally falling along the diagonal line thereby conforming much more closely to what is expected in a normal distribution.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig30_HTML.jpg

Normal Q-Q plots for the original speed variable and the new log_speed variable

In addition to visual ways of detecting non-normality, there are also numerical ways. As highlighted in Chap. 10.1007/978-981-15-2537-7_1, there are two additional characteristics of any distribution, namely skewness (asymmetric distribution tails) and kurtosis (peakedness of the distribution). Both have an associated statistic that provides a measure of that characteristic, similar to the mean and standard deviation statistics. In a normal distribution, the values for the skewness and kurtosis statistics are both zero (skewness = 0 means a symmetric distribution; kurtosis = 0 means a mesokurtic distribution). The further away each statistic is from zero, the more the distribution deviates from a normal shape. Both the skewness statistic and the kurtosis statistic have standard errors (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) associated with them (which work very much like the standard deviation, only for a statistic rather than for observations); these can be routinely computed by almost any statistical package when you request a descriptive analysis. Without going into the logic right now (this will come in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1), a rough rule of thumb you can use to check for normality using the skewness and kurtosis statistics is to do the following:

  • Prepare : Take the standard error for the statistic and multiply it by 2 (or 3 if you want to be more conservative).
  • Interval : Add the result from the Prepare step to the value of the statistic and subtract the result from the value of the statistic. You will end up with two numbers, one low - one high, that define the ends of an interval (what you have just created approximates what is called a ‘confidence interval’, see Procedure 10.1007/978-981-15-2537-7_8#Sec18).
  • Check : If zero falls inside of this interval (i.e. between the low and high endpoints from the Interval step), then there is likely to be no significant issue with that characteristic of the distribution. If zero falls outside of the interval (i.e. lower than the low value endpoint or higher than the high value endpoint), then you likely have an issue with non-normality with respect to that characteristic.

Visually, we saw in the left graph in Fig. 5.29 that the speed variable was highly positively skewed. What if Maree wanted to check some numbers to support this judgment? She could ask SPSS to produce the skewness and kurtosis statistics for both the original speed variable and the new log_speed variable using the Frequencies... or the Explore... procedure. Table 5.6 shows what SPSS would produce if the Frequencies ... procedure were used.

Skewness and kurtosis statistics and their standard errors for both the original speed variable and the new log_speed variable

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab6_HTML.jpg

Using the 3-step check rule described above, Maree could roughly evaluate the normality of the two variables as follows:

  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] 1.487 − .458 = 1.029 and 1.487 + .458 = 1.945 ➔ [Check] zero does not fall inside the interval bounded by 1.029 and 1.945, so there appears to be a significant problem with skewness. Since the value for the skewness statistic (1.487) is positive, this means the problem is positive skewness, confirming what the left graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] 3.071 − .91 = 2.161 and 3.071 + .91 = 3.981 ➔ [Check] zero does not fall in interval bounded by 2.161 and 3.981, so there appears to be a significant problem with kurtosis. Since the value for the kurtosis statistic (1.487) is positive, this means the problem is leptokurtosis—the peakedness of the distribution is too tall relative to what is expected in a normal distribution.
  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] −.050 − .458 = −.508 and −.050 + .458 = .408 ➔ [Check] zero falls within interval bounded by −.508 and .408, so there appears to be no problem with skewness. The log transform appears to have corrected the problem, confirming what the right graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] −.672 – .91 = −1.582 and −.672 + .91 = .238 ➔ [Check] zero falls within interval bounded by −1.582 and .238, so there appears to be no problem with kurtosis. The log transform appears to have corrected this problem as well, rendering the distribution more approximately mesokurtic (i.e. normal) in shape.

There are also more formal tests of significance (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1) that one can use to numerically evaluate normality, such as the Kolmogorov-Smirnov test and the Shapiro-Wilk’s test . Each of these tests, for example, can be produced by SPSS on request, via the Explore... procedure.

1 For more information, see Chap. 10.1007/978-981-15-2537-7_1 – The language of statistics .

References for Procedure 5.1

  • Allen P, Bennett K, Heritage B. SPSS statistics: A practical guide. 4. South Melbourne, VIC: Cengage Learning Australia Pty; 2019. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. New York: Routledge; 2019. [ Google Scholar ]

Useful Additional Readings for Procedure 5.1

  • Agresti A. Statistical methods for the social sciences. 5. Boston: Pearson; 2018. [ Google Scholar ]
  • Argyrous G. Statistics for research: With a guide to SPSS. 3. London: Sage; 2011. [ Google Scholar ]
  • De Vaus D. Analyzing social science data: 50 key problems in data analysis. London: Sage; 2002. [ Google Scholar ]
  • Glass GV, Hopkins KD. Statistical methods in education and psychology. 3. Upper Saddle River, NJ: Pearson; 1996. [ Google Scholar ]
  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 10. Belmont, CA: Wadsworth Cengage; 2017. [ Google Scholar ]
  • Steinberg WJ. Statistics alive. 2. Los Angeles: Sage; 2011. [ Google Scholar ]

References for Procedure 5.2

  • Chang W. R graphics cookbook: Practical recipes for visualizing data. 2. Sebastopol, CA: O’Reilly Media; 2019. [ Google Scholar ]
  • Jacoby WG. Statistical graphics for univariate and bivariate data. Thousand Oaks, CA: Sage; 1997. [ Google Scholar ]
  • McCandless D. Knowledge is beautiful. London: William Collins; 2014. [ Google Scholar ]
  • Smithson MJ. Statistics with confidence. London: Sage; 2000. [ Google Scholar ]
  • Toseland M, Toseland S. Infographica: The world as you have never seen it before. London: Quercus Books; 2012. [ Google Scholar ]
  • Wilkinson L. Cognitive science and graphic design. In: SYSTAT Software Inc, editor. SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. pp. 1–21. [ Google Scholar ]

Useful Additional Readings for Procedure 5.2

  • Field A. Discovering statistics using SPSS for windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. Boston, MA: Pearson Education; 2019. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Graphics. Kaysville, UT: Number Cruncher Statistical Systems; 2012. [ Google Scholar ]
  • StatPoint Technologies, Inc . STATGRAPHICS Centurion XVI user manual. Warrenton, VA: StatPoint Technologies Inc.; 2010. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

References for Procedure 5.3

  • Cleveland WR. Visualizing data. Summit, NJ: Hobart Press; 1995. [ Google Scholar ]
  • Jacoby WJ. Statistical graphics for visualizing multivariate data. Thousand Oaks, CA: Sage; 1998. [ Google Scholar ]

Useful Additional Readings for Procedure 5.3

  • Kirk A. Data visualisation: A handbook for data driven design. Los Angeles: Sage; 2016. [ Google Scholar ]
  • Knaflic CN. Storytelling with data: A data visualization guide for business professionals. Hoboken, NJ: Wiley; 2015. [ Google Scholar ]
  • Tufte E. The visual display of quantitative information. 2. Cheshire, CN: Graphics Press; 2001. [ Google Scholar ]

Reference for Procedure 5.4

Useful additional readings for procedure 5.4.

  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill Inc; 1991. [ Google Scholar ]

References for Procedure 5.5

Useful additional readings for procedure 5.5.

  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 9. Belmont, CA: Wadsworth Cengage; 2012. [ Google Scholar ]

References for Fundamental Concept I

Useful additional readings for fundamental concept i.

  • Howell DC. Statistical methods for psychology. 8. Belmont, CA: Cengage Wadsworth; 2013. [ Google Scholar ]

References for Procedure 5.6

  • Norušis MJ. IBM SPSS statistics 19 guide to data analysis. Upper Saddle River, NJ: Prentice Hall; 2012. [ Google Scholar ]
  • Field A. Discovering statistics using SPSS for Windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Introduction. Kaysville, UT: Number Cruncher Statistical System; 2012. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Statistics - I. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

Useful Additional Readings for Procedure 5.6

  • Hartwig F, Dearing BE. Exploratory data analysis. Beverly Hills, CA: Sage; 1979. [ Google Scholar ]
  • Leinhardt G, Leinhardt L. Exploratory data analysis. In: Keeves JP, editor. Educational research, methodology, and measurement: An international handbook. 2. Oxford: Pergamon Press; 1997. pp. 519–528. [ Google Scholar ]
  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill, Inc.; 1991. [ Google Scholar ]
  • Tukey JW. Exploratory data analysis. Reading, MA: Addison-Wesley Publishing; 1977. [ Google Scholar ]
  • Velleman PF, Hoaglin DC. ABC’s of EDA. Boston: Duxbury Press; 1981. [ Google Scholar ]

Useful Additional Readings for Procedure 5.7

References for fundemental concept ii.

  • Diebold FX, Schuermann T, Stroughair D. Pitfalls and opportunities in the use of extreme value theory in risk management. The Journal of Risk Finance. 2000; 1 (2):30–35. doi: 10.1108/eb043443. [ CrossRef ] [ Google Scholar ]
  • Lane D. Online statistics education: A multimedia course of study. Houston, TX: Rice University; 2007. [ Google Scholar ]

Useful Additional Readings for Fundemental Concept II

  • Keller DK. The tao of statistics: A path to understanding (with no math) Thousand Oaks, CA: Sage; 2006. [ Google Scholar ]
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

what is descriptive analysis in research

Home Market Research

Descriptive Research: Definition, Characteristics, Methods + Examples

Descriptive Research

Suppose an apparel brand wants to understand the fashion purchasing trends among New York’s buyers, then it must conduct a demographic survey of the specific region, gather population data, and then conduct descriptive research on this demographic segment.

The study will then uncover details on “what is the purchasing pattern of New York buyers,” but will not cover any investigative information about “ why ” the patterns exist. Because for the apparel brand trying to break into this market, understanding the nature of their market is the study’s main goal. Let’s talk about it.

What is descriptive research?

Descriptive research is a research method describing the characteristics of the population or phenomenon studied. This descriptive methodology focuses more on the “what” of the research subject than the “why” of the research subject.

The method primarily focuses on describing the nature of a demographic segment without focusing on “why” a particular phenomenon occurs. In other words, it “describes” the research subject without covering “why” it happens.

Characteristics of descriptive research

The term descriptive research then refers to research questions, the design of the study, and data analysis conducted on that topic. We call it an observational research method because none of the research study variables are influenced in any capacity.

Some distinctive characteristics of descriptive research are:

  • Quantitative research: It is a quantitative research method that attempts to collect quantifiable information for statistical analysis of the population sample. It is a popular market research tool that allows us to collect and describe the demographic segment’s nature.
  • Uncontrolled variables: In it, none of the variables are influenced in any way. This uses observational methods to conduct the research. Hence, the nature of the variables or their behavior is not in the hands of the researcher.
  • Cross-sectional studies: It is generally a cross-sectional study where different sections belonging to the same group are studied.
  • The basis for further research: Researchers further research the data collected and analyzed from descriptive research using different research techniques. The data can also help point towards the types of research methods used for the subsequent research.

Applications of descriptive research with examples

A descriptive research method can be used in multiple ways and for various reasons. Before getting into any survey , though, the survey goals and survey design are crucial. Despite following these steps, there is no way to know if one will meet the research outcome. How to use descriptive research? To understand the end objective of research goals, below are some ways organizations currently use descriptive research today:

  • Define respondent characteristics: The aim of using close-ended questions is to draw concrete conclusions about the respondents. This could be the need to derive patterns, traits, and behaviors of the respondents. It could also be to understand from a respondent their attitude, or opinion about the phenomenon. For example, understand millennials and the hours per week they spend browsing the internet. All this information helps the organization researching to make informed business decisions.
  • Measure data trends: Researchers measure data trends over time with a descriptive research design’s statistical capabilities. Consider if an apparel company researches different demographics like age groups from 24-35 and 36-45 on a new range launch of autumn wear. If one of those groups doesn’t take too well to the new launch, it provides insight into what clothes are like and what is not. The brand drops the clothes and apparel that customers don’t like.
  • Conduct comparisons: Organizations also use a descriptive research design to understand how different groups respond to a specific product or service. For example, an apparel brand creates a survey asking general questions that measure the brand’s image. The same study also asks demographic questions like age, income, gender, geographical location, geographic segmentation , etc. This consumer research helps the organization understand what aspects of the brand appeal to the population and what aspects do not. It also helps make product or marketing fixes or even create a new product line to cater to high-growth potential groups.
  • Validate existing conditions: Researchers widely use descriptive research to help ascertain the research object’s prevailing conditions and underlying patterns. Due to the non-invasive research method and the use of quantitative observation and some aspects of qualitative observation , researchers observe each variable and conduct an in-depth analysis . Researchers also use it to validate any existing conditions that may be prevalent in a population.
  • Conduct research at different times: The analysis can be conducted at different periods to ascertain any similarities or differences. This also allows any number of variables to be evaluated. For verification, studies on prevailing conditions can also be repeated to draw trends.

Advantages of descriptive research

Some of the significant advantages of descriptive research are:

Advantages of descriptive research

  • Data collection: A researcher can conduct descriptive research using specific methods like observational method, case study method, and survey method. Between these three, all primary data collection methods are covered, which provides a lot of information. This can be used for future research or even for developing a hypothesis for your research object.
  • Varied: Since the data collected is qualitative and quantitative, it gives a holistic understanding of a research topic. The information is varied, diverse, and thorough.
  • Natural environment: Descriptive research allows for the research to be conducted in the respondent’s natural environment, which ensures that high-quality and honest data is collected.
  • Quick to perform and cheap: As the sample size is generally large in descriptive research, the data collection is quick to conduct and is inexpensive.

Descriptive research methods

There are three distinctive methods to conduct descriptive research. They are:

Observational method

The observational method is the most effective method to conduct this research, and researchers make use of both quantitative and qualitative observations.

A quantitative observation is the objective collection of data primarily focused on numbers and values. It suggests “associated with, of or depicted in terms of a quantity.” Results of quantitative observation are derived using statistical and numerical analysis methods. It implies observation of any entity associated with a numeric value such as age, shape, weight, volume, scale, etc. For example, the researcher can track if current customers will refer the brand using a simple Net Promoter Score question .

Qualitative observation doesn’t involve measurements or numbers but instead just monitoring characteristics. In this case, the researcher observes the respondents from a distance. Since the respondents are in a comfortable environment, the characteristics observed are natural and effective. In a descriptive research design, the researcher can choose to be either a complete observer, an observer as a participant, a participant as an observer, or a full participant. For example, in a supermarket, a researcher can from afar monitor and track the customers’ selection and purchasing trends. This offers a more in-depth insight into the purchasing experience of the customer.

Case study method

Case studies involve in-depth research and study of individuals or groups. Case studies lead to a hypothesis and widen a further scope of studying a phenomenon. However, case studies should not be used to determine cause and effect as they can’t make accurate predictions because there could be a bias on the researcher’s part. The other reason why case studies are not a reliable way of conducting descriptive research is that there could be an atypical respondent in the survey. Describing them leads to weak generalizations and moving away from external validity.

Survey research

In survey research, respondents answer through surveys or questionnaires or polls . They are a popular market research tool to collect feedback from respondents. A study to gather useful data should have the right survey questions. It should be a balanced mix of open-ended questions and close ended-questions . The survey method can be conducted online or offline, making it the go-to option for descriptive research where the sample size is enormous.

Examples of descriptive research

Some examples of descriptive research are:

  • A specialty food group launching a new range of barbecue rubs would like to understand what flavors of rubs are favored by different people. To understand the preferred flavor palette, they conduct this type of research study using various methods like observational methods in supermarkets. By also surveying while collecting in-depth demographic information, offers insights about the preference of different markets. This can also help tailor make the rubs and spreads to various preferred meats in that demographic. Conducting this type of research helps the organization tweak their business model and amplify marketing in core markets.
  • Another example of where this research can be used is if a school district wishes to evaluate teachers’ attitudes about using technology in the classroom. By conducting surveys and observing their comfortableness using technology through observational methods, the researcher can gauge what they can help understand if a full-fledged implementation can face an issue. This also helps in understanding if the students are impacted in any way with this change.

Some other research problems and research questions that can lead to descriptive research are:

  • Market researchers want to observe the habits of consumers.
  • A company wants to evaluate the morale of its staff.
  • A school district wants to understand if students will access online lessons rather than textbooks.
  • To understand if its wellness questionnaire programs enhance the overall health of the employees.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

data information vs insight

Data Information vs Insight: Essential differences

May 14, 2024

pricing analytics software

Pricing Analytics Software: Optimize Your Pricing Strategy

May 13, 2024

relationship marketing

Relationship Marketing: What It Is, Examples & Top 7 Benefits

May 8, 2024

email survey tool

The Best Email Survey Tool to Boost Your Feedback Game

May 7, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

what is descriptive analysis in research

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

What Is Descriptive Analytics? 5 Examples

Professional looking at descriptive analytics on computer

  • 09 Nov 2021

Data analytics is a valuable tool for businesses aiming to increase revenue, improve products, and retain customers. According to research by global management consulting firm McKinsey & Company, companies that use data analytics are 23 times more likely to outperform competitors in terms of new customer acquisition than non-data-driven companies. They were also nine times more likely to surpass them in measures of customer loyalty and 19 times more likely to achieve above-average profitability.

Data analytics can be broken into four key types :

  • Descriptive, which answers the question, “What happened?”
  • Diagnostic , which answers the question, “Why did this happen?”
  • Predictive , which answers the question, “What might happen in the future?”
  • Prescriptive , which answers the question, “What should we do next?”

Each type of data analysis can help you reach specific goals and be used in tandem to create a full picture of data that informs your organization’s strategy formulation and decision-making.

Descriptive analytics can be leveraged on its own or act as a foundation for the other three analytics types. If you’re new to the field of business analytics, descriptive analytics is an accessible and rewarding place to start.

Access your free e-book today.

What Is Descriptive Analytics?

Descriptive analytics is the process of using current and historical data to identify trends and relationships. It’s sometimes called the simplest form of data analysis because it describes trends and relationships but doesn’t dig deeper.

Descriptive analytics is relatively accessible and likely something your organization uses daily. Basic statistical software, such as Microsoft Excel or data visualization tools , such as Google Charts and Tableau, can help parse data, identify trends and relationships between variables, and visually display information.

Descriptive analytics is especially useful for communicating change over time and uses trends as a springboard for further analysis to drive decision-making .

Here are five examples of descriptive analytics in action to apply at your organization.

Related: 5 Business Analytics Skills for Professionals

5 Examples of Descriptive Analytics

1. traffic and engagement reports.

One example of descriptive analytics is reporting. If your organization tracks engagement in the form of social media analytics or web traffic, you’re already using descriptive analytics.

These reports are created by taking raw data—generated when users interact with your website, advertisements, or social media content—and using it to compare current metrics to historical metrics and visualize trends.

For example, you may be responsible for reporting on which media channels drive the most traffic to the product page of your company’s website. Using descriptive analytics, you can analyze the page’s traffic data to determine the number of users from each source. You may decide to take it one step further and compare traffic source data to historical data from the same sources. This can enable you to update your team on movement; for instance, highlighting that traffic from paid advertisements increased 20 percent year over year.

The three other analytics types can then be used to determine why traffic from each source increased or decreased over time, if trends are predicted to continue, and what your team’s best course of action is moving forward.

2. Financial Statement Analysis

Another example of descriptive analytics that may be familiar to you is financial statement analysis. Financial statements are periodic reports that detail financial information about a business and, together, give a holistic view of a company’s financial health.

There are several types of financial statements, including the balance sheet , income statement , cash flow statement , and statement of shareholders’ equity. Each caters to a specific audience and conveys different information about a company’s finances.

Financial statement analysis can be done in three primary ways: vertical, horizontal, and ratio.

Vertical analysis involves reading a statement from top to bottom and comparing each item to those above and below it. This helps determine relationships between variables. For instance, if each line item is a percentage of the total, comparing them can provide insight into which are taking up larger and smaller percentages of the whole.

Horizontal analysis involves reading a statement from left to right and comparing each item to itself from a previous period. This type of analysis determines change over time.

Finally, ratio analysis involves comparing one section of a report to another based on their relationships to the whole. This directly compares items across periods, as well as your company’s ratios to the industry’s to gauge whether yours is over- or underperforming.

Each of these financial statement analysis methods are examples of descriptive analytics, as they provide information about trends and relationships between variables based on current and historical data.

Credential of Readiness | Master the fundamentals of business | Learn More

3. Demand Trends

Descriptive analytics can also be used to identify trends in customer preference and behavior and make assumptions about the demand for specific products or services.

Streaming provider Netflix’s trend identification provides an excellent use case for descriptive analytics. Netflix’s team—which has a track record of being heavily data-driven—gathers data on users’ in-platform behavior. They analyze this data to determine which TV series and movies are trending at any given time and list trending titles in a section of the platform’s home screen.

Not only does this data allow Netflix users to see what’s popular—and thus, what they might enjoy watching—but it allows the Netflix team to know which types of media, themes, and actors are especially favored at a certain time. This can drive decision-making about future original content creation, contracts with existing production companies, marketing, and retargeting campaigns.

4. Aggregated Survey Results

Descriptive analytics is also useful in market research. When it comes time to glean insights from survey and focus group data, descriptive analytics can help identify relationships between variables and trends.

For instance, you may conduct a survey and identify that as respondents’ age increases, so does their likelihood to purchase your product. If you’ve conducted this survey multiple times over several years, descriptive analytics can tell you if this age-purchase correlation has always existed or if it was something that only occurred this year.

Insights like this can pave the way for diagnostic analytics to explain why certain factors are correlated. You can then leverage predictive and prescriptive analytics to plan future product improvements or marketing campaigns based on those trends.

Related: What Is Marketing Analytics?

5. Progress to Goals

Finally, descriptive analytics can be applied to track progress to goals. Reporting on progress toward key performance indicators (KPIs) can help your team understand if efforts are on track or if adjustments need to be made.

For example, if your organization aims to reach 500,000 monthly unique page views, you can use traffic data to communicate how you’re tracking toward it. Perhaps halfway through the month, you’re at 200,000 unique page views. This would be underperforming because you’d like to be halfway to your goal at that point—at 250,000 unique page views. This descriptive analysis of your team’s progress can allow further analysis to examine what can be done differently to improve traffic numbers and get back on track to hit your KPI.

Business Analytics | Become a data-driven leader | Learn More

Using Data to Identify Relationships and Trends

“Never before has so much data about so many different things been collected and stored every second of every day,” says Harvard Business School Professor Jan Hammond in the online course Business Analytics . “In this world of big data, data literacy —the ability to analyze, interpret, and even question data—is an increasingly valuable skill.”

Leveraging descriptive analytics to communicate change based on current and historical data and as a foundation for diagnostic, predictive, and prescriptive analytics has the potential to take you and your organization far.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems.

what is descriptive analysis in research

About the Author

What Is Descriptive Analytics? A Complete Guide

When presented with new data, the first step a data analyst must take is always to understand what story it’s trying to tell. Data analytics is a complex beast, however, involving many different tools and analytical approaches. So which one should you use?

If you’re new to data and want to learn the basics, descriptive analytics is a good place to start. But what exactly is descriptive analytics, and how does it work? In this post, we’ll dive deep on the topic, answering all your questions, including:

  • How does descriptive analytics work?
  • How is descriptive analytics used?
  • Advantages of descriptive analytics
  • Disadvantages of descriptive analytics
  • Descriptive analytics use cases
  • Key takeaways

Ready to get the low-down on descriptive analytics? Let’s dive in.

1. How does descriptive analytics work?

Of all data analytics techniques , descriptive analytics is perhaps the most straightforward. It involves parsing (or breaking down) data and summarizing its main features and characteristics. In this way, descriptive analytics presents what has happened in the past without exploring why or how.        

Because it is merely explanatory, descriptive analytics uses basic descriptive statistics. This includes measures of distribution (frequency or count), central tendency (mean, mode, and median), and variability (such as variance and standard deviation). Where relevant, it also measures the position of various data points, including the interquartile or percentile range.

Descriptive analytics often presents its findings using reports, pivot tables, and visualizations like histograms, line graphs, pie charts, and box and whisker plots . We won’t explore these further here, but you can learn more about descriptive statistics in this post . 

2. How is descriptive analytics used?

Data analysts can use descriptive statistics to summarize more or less any type of data, although it helps to think of it as the first step in a more protracted process. That’s because while descriptive statistics may describe trends or patterns, it won’t dig deeper. For this, we need tools like diagnostic and predictive analytics. Nevertheless, descriptive analytics is exceptionally useful for introducing yourself to unknown data.

The following kinds of data can all be summarized using descriptive analytics:

  • Financial statements
  • Social media engagement
  • Website traffic
  • Scientific findings
  • Weather reports
  • Traffic data

The list goes on! Essentially, any data set can be summarized in one way or another, meaning descriptive analytics has an almost endless number of applications. We’ll explore these in more depth in section five. First, let’s look at some of the benefits and drawbacks of descriptive analytics.

3. Advantages of descriptive analytics

Although relatively simplistic as analytical approaches go, descriptive analytics nevertheless has many advantages. Descriptive analytics:

  • Presents otherwise complex data in an easily digestible format.
  • Provides a direct measure of the incidence of key data points.
  • Is inexpensive and only requires basic mathematical skills to carry out.
  • Is faster to carry out, especially with help from tools like Python or MS Excel.
  • Relies on data that organizations already have access to, meaning there’s no need to source additional data.
  • Looks at a complete population (rather than data sampling), making it considerably more accurate than inferential statistics . 

But, of course, being so straightforward means descriptive analytics also has its limitations. Let’s explore some of these next.

4. Disadvantages of descriptive analytics

Okay, we’ve looked at the strengths of descriptive analytics—but where does it fall short? Some disadvantages of descriptive analytics include:

  • You can summarize data sets you have access to, but these may not tell a complete story.
  • You cannot use descriptive analytics to test a hypothesis or understand why data present the way they do.
  • You cannot use descriptive analytics to predict what may happen in the future.
  • You cannot generalize your findings to a broader population.
  • Descriptive analytics tells you nothing about the data collection methodology, meaning the data set may include errors.

As you may suspect, although descriptive analytics are useful, it’s important not to overstretch their capabilities. Fortunately, we have diagnostic and predictive analytics to help fill in the gaps where descriptive analytics falls short. 

5. Descriptive analytics use cases

Now we’ve covered the theory around descriptive analytics, how can it be used in the real world? While descriptive analytics only focuses on what has happened, not why, it remains a valuable first step in the broader data analytics process. Let’s take a look.

Tracking social media engagement

Social media is a key touchpoint along the sales journey. The ability to measure and present engagement metrics across a complex constellation of campaigns and social networks is, therefore, vital for determining the most successful approaches to digital marketing. Fortunately, marketing reports on social media engagement will include descriptive analytics by default. Clicks, likes, shares, detail expands, bounce rates, and so on are all measures of social media engagement that can be easily summarized using descriptive techniques.

For instance, perhaps a company is interested in knowing which social media account is driving the most traffic to their website. Using descriptive statistics, visualizations , and dashboards, they can easily compare information about different channels. Similarly, marketing teams can look at specific shareable content, perhaps comparing videos with blog posts, to see which results in the most clicks.

While none of this information draws direct conclusions (in that it doesn’t measure cause and effect) it’s still valuable. It helps teams to devise hypotheses or make informed guesses about where to invest their time and budget.

Streaming and e-commerce

Subscription streaming services like Spotify and Netflix, and e-commerce sites like Amazon and eBay all use descriptive analytics to identify trends. Descriptive measures help determine what’s currently most popular with users and buyers. Spotify , for example, uses descriptive analytics to learn which albums or artists subscribers are listening to. Meanwhile, Amazon uses descriptive analytics to compare customer purchases . In both cases, these insights inform their recommendation engines. 

Netflix, meanwhile, takes this use of descriptive analytics even further. A highly data-driven company, Netflix uses descriptive analytics to see what genres and TV shows interest their subscribers most . These insights inform decision-making in areas from new content creation to marketing campaigns, and even which production companies they work with. 

Learning management systems

From traditional education to corporate training, many organizations and schools now use online/offline hybrid learning. Learning management systems (or LMSs for those in the know!) are a ubiquitous part of this. LMS platforms track everything from user participation and attendance to test scores, and—in the case of e-learning courses—even how long it takes learners to complete. Summarizing this information, descriptive-analytical reports offer a high-level overview of what’s working and what’s not.

Using these data, teachers and training providers can track both individual and organization-level targets. They can analyze grade curves, or see which teaching resources are most popular. And while they won’t necessarily know why , it may be possible to infer from the data that videos, for example, are more popular than, say, written documents. Presenting this information is the first step towards improving course design and creating better learner outcomes.

6. Key takeaways

This post has offered a full introduction to descriptive analytics. We’ve learned that:

  • Descriptive analytics is the simplest form of data analysis, and involves summarizing a data set’s main features and characteristics.
  • Descriptive analytics relies on statistical measures of distribution, central tendency, and variability.
  • It provides an overview of varied data types, from financial statements to surveys, website traffic, and scientific data.
  • A key advantage of descriptive analytics is that it requires only basic math skills and allows you to present otherwise complex data in an easily digestible format.
  • The main disadvantage of descriptive analytics is that it only summarizes data; it doesn’t draw conclusions or test hypotheses.
  • We can use descriptive analytics to measure things like social media engagement, content curation, and learner outcomes.

To learn more about data analytics, or to try some test exercises, why not sign up for this free, 5-day data analytics short course ? You can also supplement your knowledge with the following introductory topics:

  • Standard Error vs. Standard Deviation: What’s the Difference?
  • What Is Data Visualization and Why Is It Important? A Complete Introduction
  • The 7 Most Useful Data Analysis Methods and Techniques
  • What is descriptive research?

Last updated

5 February 2023

Reviewed by

Cathy Heath

Descriptive research is a common investigatory model used by researchers in various fields, including social sciences, linguistics, and academia.

Read on to understand the characteristics of descriptive research and explore its underlying techniques, processes, and procedures.

Analyze your descriptive research

Dovetail streamlines analysis to help you uncover and share actionable insights

Descriptive research is an exploratory research method. It enables researchers to precisely and methodically describe a population, circumstance, or phenomenon.

As the name suggests, descriptive research describes the characteristics of the group, situation, or phenomenon being studied without manipulating variables or testing hypotheses . This can be reported using surveys , observational studies, and case studies. You can use both quantitative and qualitative methods to compile the data.

Besides making observations and then comparing and analyzing them, descriptive studies often develop knowledge concepts and provide solutions to critical issues. It always aims to answer how the event occurred, when it occurred, where it occurred, and what the problem or phenomenon is.

  • Characteristics of descriptive research

The following are some of the characteristics of descriptive research:

Quantitativeness

Descriptive research can be quantitative as it gathers quantifiable data to statistically analyze a population sample. These numbers can show patterns, connections, and trends over time and can be discovered using surveys, polls, and experiments.

Qualitativeness

Descriptive research can also be qualitative. It gives meaning and context to the numbers supplied by quantitative descriptive research .

Researchers can use tools like interviews, focus groups, and ethnographic studies to illustrate why things are what they are and help characterize the research problem. This is because it’s more explanatory than exploratory or experimental research.

Uncontrolled variables

Descriptive research differs from experimental research in that researchers cannot manipulate the variables. They are recognized, scrutinized, and quantified instead. This is one of its most prominent features.

Cross-sectional studies

Descriptive research is a cross-sectional study because it examines several areas of the same group. It involves obtaining data on multiple variables at the personal level during a certain period. It’s helpful when trying to understand a larger community’s habits or preferences.

Carried out in a natural environment

Descriptive studies are usually carried out in the participants’ everyday environment, which allows researchers to avoid influencing responders by collecting data in a natural setting. You can use online surveys or survey questions to collect data or observe.

Basis for further research

You can further dissect descriptive research’s outcomes and use them for different types of investigation. The outcomes also serve as a foundation for subsequent investigations and can guide future studies. For example, you can use the data obtained in descriptive research to help determine future research designs.

  • Descriptive research methods

There are three basic approaches for gathering data in descriptive research: observational, case study, and survey.

You can use surveys to gather data in descriptive research. This involves gathering information from many people using a questionnaire and interview .

Surveys remain the dominant research tool for descriptive research design. Researchers can conduct various investigations and collect multiple types of data (quantitative and qualitative) using surveys with diverse designs.

You can conduct surveys over the phone, online, or in person. Your survey might be a brief interview or conversation with a set of prepared questions intended to obtain quick information from the primary source.

Observation

This descriptive research method involves observing and gathering data on a population or phenomena without manipulating variables. It is employed in psychology, market research , and other social science studies to track and understand human behavior.

Observation is an essential component of descriptive research. It entails gathering data and analyzing it to see whether there is a relationship between the two variables in the study. This strategy usually allows for both qualitative and quantitative data analysis.

Case studies

A case study can outline a specific topic’s traits. The topic might be a person, group, event, or organization.

It involves using a subset of a larger group as a sample to characterize the features of that larger group.

You can generalize knowledge gained from studying a case study to benefit a broader audience.

This approach entails carefully examining a particular group, person, or event over time. You can learn something new about the study topic by using a small group to better understand the dynamics of the entire group.

  • Types of descriptive research

There are several types of descriptive study. The most well-known include cross-sectional studies, census surveys, sample surveys, case reports, and comparison studies.

Case reports and case series

In the healthcare and medical fields, a case report is used to explain a patient’s circumstances when suffering from an uncommon illness or displaying certain symptoms. Case reports and case series are both collections of related cases. They have aided the advancement of medical knowledge on countless occasions.

The normative component is an addition to the descriptive survey. In the descriptive–normative survey, you compare the study’s results to the norm.

Descriptive survey

This descriptive type of research employs surveys to collect information on various topics. This data aims to determine the degree to which certain conditions may be attained.

You can extrapolate or generalize the information you obtain from sample surveys to the larger group being researched.

Correlative survey

Correlative surveys help establish if there is a positive, negative, or neutral connection between two variables.

Performing census surveys involves gathering relevant data on several aspects of a given population. These units include individuals, families, organizations, objects, characteristics, and properties.

During descriptive research, you gather different degrees of interest over time from a specific population. Cross-sectional studies provide a glimpse of a phenomenon’s prevalence and features in a population. There are no ethical challenges with them and they are quite simple and inexpensive to carry out.

Comparative studies

These surveys compare the two subjects’ conditions or characteristics. The subjects may include research variables, organizations, plans, and people.

Comparison points, assumption of similarities, and criteria of comparison are three important variables that affect how well and accurately comparative studies are conducted.

For instance, descriptive research can help determine how many CEOs hold a bachelor’s degree and what proportion of low-income households receive government help.

  • Pros and cons

The primary advantage of descriptive research designs is that researchers can create a reliable and beneficial database for additional study. To conduct any inquiry, you need access to reliable information sources that can give you a firm understanding of a situation.

Quantitative studies are time- and resource-intensive, so knowing the hypotheses viable for testing is crucial. The basic overview of descriptive research provides helpful hints as to which variables are worth quantitatively examining. This is why it’s employed as a precursor to quantitative research designs.

Some experts view this research as untrustworthy and unscientific. However, there is no way to assess the findings because you don’t manipulate any variables statistically.

Cause-and-effect correlations also can’t be established through descriptive investigations. Additionally, observational study findings cannot be replicated, which prevents a review of the findings and their replication.

The absence of statistical and in-depth analysis and the rather superficial character of the investigative procedure are drawbacks of this research approach.

  • Descriptive research examples and applications

Several descriptive research examples are emphasized based on their types, purposes, and applications. Research questions often begin with “What is …” These studies help find solutions to practical issues in social science, physical science, and education.

Here are some examples and applications of descriptive research:

Determining consumer perception and behavior

Organizations use descriptive research designs to determine how various demographic groups react to a certain product or service.

For example, a business looking to sell to its target market should research the market’s behavior first. When researching human behavior in response to a cause or event, the researcher pays attention to the traits, actions, and responses before drawing a conclusion.

Scientific classification

Scientific descriptive research enables the classification of organisms and their traits and constituents.

Measuring data trends

A descriptive study design’s statistical capabilities allow researchers to track data trends over time. It’s frequently used to determine the study target’s current circumstances and underlying patterns.

Conduct comparison

Organizations can use a descriptive research approach to learn how various demographics react to a certain product or service. For example, you can study how the target market responds to a competitor’s product and use that information to infer their behavior.

  • Bottom line

A descriptive research design is suitable for exploring certain topics and serving as a prelude to larger quantitative investigations. It provides a comprehensive understanding of the “what” of the group or thing you’re investigating.

This research type acts as the cornerstone of other research methodologies . It is distinctive because it can use quantitative and qualitative research approaches at the same time.

What is descriptive research design?

Descriptive research design aims to systematically obtain information to describe a phenomenon, situation, or population. More specifically, it helps answer the what, when, where, and how questions regarding the research problem rather than the why.

How does descriptive research compare to qualitative research?

Despite certain parallels, descriptive research concentrates on describing phenomena, while qualitative research aims to understand people better.

How do you analyze descriptive research data?

Data analysis involves using various methodologies, enabling the researcher to evaluate and provide results regarding validity and reliability.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 11 January 2024

Last updated: 15 January 2024

Last updated: 17 January 2024

Last updated: 12 May 2023

Last updated: 30 April 2024

Last updated: 18 May 2023

Last updated: 25 November 2023

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

what is descriptive analysis in research

Users report unexpectedly high data usage, especially during streaming sessions.

what is descriptive analysis in research

Users find it hard to navigate from the home page to relevant playlists in the app.

what is descriptive analysis in research

It would be great to have a sleep timer feature, especially for bedtime listening.

what is descriptive analysis in research

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Chapter 14 Quantitative Analysis Descriptive Statistics

Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarize themselves with one of these programs for understanding the concepts described in this chapter.

Data Preparation

In research projects, data may be collected from a variety of sources: mail-in surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine -readable, numeric format, such as in a spreadsheet or a text file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation usually follows the following steps.

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, for measuring a construct such as “benefits of computers,” if a survey provided respondents with a checklist of b enefits that they could select from (i.e., they could choose as many of those benefits as they wanted), then the total number of checked items can be used as an aggregate measure of benefits. Note that many other forms of data, such as interview transcripts, cannot be converted into a numeric format for statistical analysis. Coding is especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format (e.g., SPSS stores data as .sav files), which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database, where they can be reorganized as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet and each measurement item can be represented as one column. The entered data should be frequently checked for accuracy, via occasional spot checks on a set of items or observations, during and after entry. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the “strongly agree” response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive. Such problems should be detected earlier during pretests and corrected before the main data collection process begins. During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion . Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation . For instance, if the missing value is one item in a multi-item scale, the imputed value may be the average of the respondent’s responses to remaining items on that scale. If the missing value belongs to a single-item scale, many researchers use the average of other respondent’s responses to that item as the imputed value. Such imputation may be biased if the missing value is of a systematic nature rather than a random nature. Two methods that can produce relatively unbiased estimates for imputation are the maximum likelihood procedures and multiple imputation methods, both of which are supported in popular software programs such as SPSS and SAS.

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items, where items convey the opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate Analysis

Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services (as a measure of their “religiosity”) using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for “did not answer.” If we count the number (or percentage) of observations within each category (except “did not answer” which is really a missing value rather than a category), and display it in the form of a table as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

what is descriptive analysis in research

Figure 14.1. Frequency distribution of religiosity.

With very large samples where observations are independent and random, the frequency distribution tends to follow a plot that looked like a bell-shaped curve (a smoothed bar chart of the frequency distribution) similar to that shown in Figure 14.2, where most observations are clustered toward the center of the range of values, and fewer and fewer observations toward the extreme ends of the range. Such a curve is called a normal distribution.

Central tendency is an estimate of the center of a distribution of values. There are three major estimates of central tendency: mean, median, and mode. The arithmetic mean (often simply called the “mean”) is the simple average of all values in a given distribution. Consider a set of eight test scores: 15, 22, 21, 18, 36, 15, 25, 15. The arithmetic mean of these values is (15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 = 20.875. Other types of means include geometric mean (n th root of the product of n numbers in a distribution) and harmonic mean (the reciprocal of the arithmetic means of the reciprocal of each value in a distribution), but these means are not very popular for statistical analysis of social research data.

The second measure of central tendency, the median , is the middle value within a range of values in a distribution. This is computed by sorting all values in a distribution in increasing order and selecting the middle value. In case there are two middle values (if there is an even number of values in a distribution), the average of the two middle values represent the median. In the above example, the sorted values are: 15, 15, 15, 18, 22, 21, 25, 36. The two middle values are 18 and 22, and hence the median is (18 + 22)/2 = 20.

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

Dispersion refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. Two common measures of dispersion are the range and standard deviation. The range is the difference between the highest and lowest values in a distribution. The range in our previous example is 36-15 = 21.

The range is particularly sensitive to the presence of outliers. For instance, if the highest value in the above distribution was 85 and the other vales remained the same, the range would be 85-15 = 70. Standard deviation , the second measure of dispersion, corrects for such outliers by using a formula that takes into account how close or how far each value from the distribution mean:

what is descriptive analysis in research

Figure 14.2. Normal distribution.

what is descriptive analysis in research

Table 14.1. Hypothetical data on age and self-esteem.

The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.” The histogram of each variable is shown on the left side of Figure 14.3. The formula for calculating bivariate correlation is:

what is descriptive analysis in research

Figure 14.3. Histogram and correlation plot of age and self-esteem.

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

H 0 : r = 0

H 1 : r ≠ 0

H 0 is called the null hypotheses , and H 1 is called the alternative hypothesis (sometimes, also represented as H a ). Although they may seem like two hypotheses, H 0 and H 1 actually represent a single hypothesis since they are direct opposites of each other. We are interested in testing H 1 rather than H 0 . Also note that H 1 is a non-directional hypotheses since it does not specify whether r is greater than or less than zero. Directional hypotheses will be specified as H 0 : r ≤ 0; H 1 : r > 0 (if we are testing for a positive correlation). Significance testing of directional hypothesis is done using a one-tailed t-test, while that for non-directional hypothesis is done using a two-tailed t-test.

In statistical testing, the alternative hypothesis cannot be tested directly. Rather, it is tested indirectly by rejecting the null hypotheses with a certain level of probability. Statistical testing is always probabilistic, because we are never sure if our inferences, based on sample data, apply to the population, since our sample never equals the population. The probability that a statistical inference is caused pure chance is called the p-value . The p-value is compared with the significance level (α), which represents the maximum level of risk that we are willing to take that our inference is incorrect. For most statistical analysis, α is set to 0.05. A p-value less than α=0.05 indicates that we have enough statistical evidence to reject the null hypothesis, and thereby, indirectly accept the alternative hypothesis. If p>0.05, then we do not have adequate statistical evidence to reject the null hypothesis or accept the alternative hypothesis.

The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text book on statistics or on the Internet (most software programs also perform significance testing). The critical value of r depends on our desired significance level (α = 0.05), the degrees of freedom (df), and whether the desired test is a one-tailed or two-tailed test. The degree of freedom is the number of values that can vary freely in any calculation of a statistic. In case of correlation, the df simply equals n – 2, or for the data in Table 14.1, df is 20 – 2 = 18. There are two different statistical tables for one-tailed and two -tailed test. In the two -tailed table, the critical value of r for α = 0.05 and df = 18 is 0.44. For our computed correlation of 0.79 to be significant, it must be larger than the critical value of 0.44 or less than -0.44. Since our computed value of 0.79 is greater than 0.44, we conclude that there is a significant correlation between age and self-esteem in our data set, or in other words, the odds are less than 5% that this correlation is a chance occurrence. Therefore, we can reject the null hypotheses that r ≤ 0, which is an indirect way of saying that the alternative hypothesis r > 0 is probably correct.

Most research studies involve more than two variables. If there are n variables, then we will have a total of n*(n-1)/2 possible correlations between these n variables. Such correlations are easily computed using a software program like SPSS, rather than manually using the formula for correlation (as we did in Table 14.1), and represented using a correlation matrix, as shown in Table 14.2. A correlation matrix is a matrix that lists the variable names along the first row and the first column, and depicts bivariate correlations between pairs of variables in the appropriate cell in the matrix. The values along the principal diagonal (from the top left to the bottom right corner) of this matrix are always 1, because any variable is always perfectly correlated with itself. Further, since correlations are non-directional, the correlation between variables V1 and V2 is the same as that between V2 and V1. Hence, the lower triangular matrix (values below the principal diagonal) is a mirror reflection of the upper triangular matrix (values above the principal diagonal), and therefore, we often list only the lower triangular matrix for simplicity. If the correlations involve variables measured using interval scales, then this specific type of correlations are called Pearson product moment correlations .

Another useful way of presenting bivariate data is cross-tabulation (often abbreviated to cross-tab, and sometimes called more formally as a contingency table). A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables. As an example, let us assume that we have the following observations of gender and grade for a sample of 20 students, as shown in Figure 14.3. Gender is a nominal variable (male/female or M/F), and grade is a categorical variable with three levels (A, B, and C). A simple cross-tabulation of the data may display the joint distribution of gender and grades (i.e., how many students of each gender are in each grade category, as a raw frequency count or as a percentage) in a 2 x 3 matrix. This matrix will help us see if A, B, and C grades are equally distributed across male and female students. The cross-tab data in Table 14.3 shows that the distribution of A grades is biased heavily toward female students: in a sample of 10 male and 10 female students, five female students received the A grade compared to only one male students. In contrast, the distribution of C grades is biased toward male students: three male students received a C grade, compared to only one female student. However, the distribution of B grades was somewhat uniform, with six male students and five female students. The last row and the last column of this table are called marginal totals because they indicate the totals across each category and displayed along the margins of the table.

what is descriptive analysis in research

Table 14.2. A hypothetical correlation matrix for eight variables.

what is descriptive analysis in research

Table 14.3. Example of cross-tab analysis.

Although we can see a distinct pattern of grade distribution between male and female students in Table 14.3, is this pattern real or “statistically significant”? In other words, do the above frequency counts differ from that that may be expected from pure chance? To answer this question, we should compute the expected count of observation in each cell of the 2 x 3 cross-tab matrix. This is done by multiplying the marginal column total and the marginal row total for each cell and dividing it by the total number of observations. For example, for the male/A grade cell, expected count = 5 * 10 / 20 = 2.5. In other words, we were expecting 2.5 male students to receive an A grade, but in reality, only one student received the A grade. Whether this difference between expected and actual count is significant can be tested using a chi-square test . The chi-square statistic can be computed as the average difference between observed and expected counts across all cells. We can then compare this number to the critical value associated with a desired probability level (p < 0.05) and the degrees of freedom, which is simply (m-1)*(n-1), where m and n are the number of rows and columns respectively. In this example, df = (2 – 1) * (3 – 1) = 2. From standard chi-square tables in any statistics book, the critical chi-square value for p=0.05 and df=2 is 5.99. The computed chi -square value, based on our observed data, is 1.00, which is less than the critical value. Hence, we must conclude that the observed grade pattern is not statistically different from the pattern that can be expected by pure chance.

  • Social Science Research: Principles, Methods, and Practices. Authored by : Anol Bhattacherjee. Provided by : University of South Florida. Located at : http://scholarcommons.usf.edu/oa_textbooks/3/ . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
  • Privacy Policy

Research Method

Home » Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics – Types, Methods and Examples

Table of Contents

Descriptive Statistics

Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution, as well as graphical representations of data.

Here are the main components of descriptive statistics:

  • Measures of Central Tendency : These provide a summary statistic that represents the center point or typical value of a dataset. The most common measures of central tendency are the mean (average), median (middle value), and mode (most frequent value).
  • Measures of Dispersion or Variability : These provide a summary statistic that represents the spread of values in a dataset. Common measures of dispersion include the range (difference between the highest and lowest values), variance (average of the squared differences from the mean), standard deviation (square root of the variance), and interquartile range (difference between the upper and lower quartiles).
  • Measures of Position : These are used to understand the distribution of values within a dataset. They include percentiles and quartiles.
  • Graphical Representations : Data can be visually represented using various methods like bar graphs, histograms, pie charts, box plots, and scatter plots. These visuals provide a clear, intuitive way to understand the data.
  • Measures of Association : These measures provide insight into the relationships between variables in the dataset, such as correlation and covariance.

Descriptive Statistics Types

Descriptive statistics can be classified into two types:

Measures of Central Tendency

These measures help describe the center point or average of a data set. There are three main types:

  • Mean : The average value of the dataset, obtained by adding all the data points and dividing by the number of data points.
  • Median : The middle value of the dataset, obtained by ordering all data points and picking out the one in the middle (or the average of the two middle numbers if the dataset has an even number of observations).
  • Mode : The most frequently occurring value in the dataset.

Measures of Variability (or Dispersion)

These measures describe the spread or variability of the data points in the dataset. There are four main types:

  • Range : The difference between the largest and smallest values in the dataset.
  • Variance : The average of the squared differences from the mean.
  • Standard Deviation : The square root of the variance, giving a measure of dispersion that is in the same units as the original dataset.
  • Interquartile Range (IQR) : The range between the first quartile (25th percentile) and the third quartile (75th percentile), which provides a measure of variability that is resistant to outliers.

Descriptive Statistics Formulas

Sure, here are some of the most commonly used formulas in descriptive statistics:

Mean (μ or x̄) :

The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations.

Formula : μ = Σx/n or x̄ = Σx/n (where Σx is the sum of all observations and n is the number of observations)

The middle value in the dataset when the observations are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.

The most frequently occurring number in the dataset. There’s no formula for this as it’s determined by observation.

The difference between the highest (max) and lowest (min) values in the dataset.

Formula : Range = max – min

Variance (σ² or s²) :

The average of the squared differences from the mean. Variance is a measure of how spread out the numbers in the dataset are.

Population Variance formula : σ² = Σ(x – μ)² / N Sample Variance formula: s² = Σ(x – x̄)² / (n – 1)

(where x is each individual observation, μ is the population mean, x̄ is the sample mean, N is the size of the population, and n is the size of the sample)

Standard Deviation (σ or s) :

The square root of the variance. It measures the amount of variability or dispersion for a set of data. Population Standard Deviation formula: σ = √σ² Sample Standard Deviation formula: s = √s²

Interquartile Range (IQR) :

The range between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile). It measures statistical dispersion, or how far apart the data points are.

Formula : IQR = Q3 – Q1

Descriptive Statistics Methods

Here are some of the key methods used in descriptive statistics:

This method involves arranging data into a table format, making it easier to understand and interpret. Tables often show the frequency distribution of variables.

Graphical Representation

This method involves presenting data visually to help reveal patterns, trends, outliers, or relationships between variables. There are many types of graphs used, such as bar graphs, histograms, pie charts, line graphs, box plots, and scatter plots.

Calculation of Central Tendency Measures

This involves determining the mean, median, and mode of a dataset. These measures indicate where the center of the dataset lies.

Calculation of Dispersion Measures

This involves calculating the range, variance, standard deviation, and interquartile range. These measures indicate how spread out the data is.

Calculation of Position Measures

This involves determining percentiles and quartiles, which tell us about the position of particular data points within the overall data distribution.

Calculation of Association Measures

This involves calculating statistics like correlation and covariance to understand relationships between variables.

Summary Statistics

Often, a collection of several descriptive statistics is presented together in what’s known as a “summary statistics” table. This provides a comprehensive snapshot of the data at a glanc

Descriptive Statistics Examples

Descriptive Statistics Examples are as follows:

Example 1: Student Grades

Let’s say a teacher has the following set of grades for 7 students: 85, 90, 88, 92, 78, 88, and 94. The teacher could use descriptive statistics to summarize this data:

  • Mean (average) : (85 + 90 + 88 + 92 + 78 + 88 + 94)/7 = 88
  • Median (middle value) : First, rearrange the grades in ascending order (78, 85, 88, 88, 90, 92, 94). The median grade is 88.
  • Mode (most frequent value) : The grade 88 appears twice, more frequently than any other grade, so it’s the mode.
  • Range (difference between highest and lowest) : 94 (highest) – 78 (lowest) = 16
  • Variance and Standard Deviation : These would be calculated using the appropriate formulas, providing a measure of the dispersion of the grades.

Example 2: Survey Data

A researcher conducts a survey on the number of hours of TV watched per day by people in a particular city. They collect data from 1,000 respondents and can use descriptive statistics to summarize this data:

  • Mean : Calculate the average hours of TV watched by adding all the responses and dividing by the total number of respondents.
  • Median : Sort the data and find the middle value.
  • Mode : Identify the most frequently reported number of hours watched.
  • Histogram : Create a histogram to visually display the frequency of responses. This could show, for example, that the majority of people watch 2-3 hours of TV per day.
  • Standard Deviation : Calculate this to find out how much variation there is from the average.

Importance of Descriptive Statistics

Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important:

  • Data Summarization : Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large dataset, it’s often difficult to identify patterns or tendencies just by looking at the raw data. Descriptive statistics provide numerical and graphical summaries that can highlight important aspects of the data.
  • Data Simplification : They simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary, making it easier to understand and interpret the dataset.
  • Identification of Patterns and Trends : Descriptive statistics can help identify patterns and trends in the data, providing valuable insights. Measures like the mean and median can tell you about the central tendency of your data, while measures like the range and standard deviation tell you about the dispersion.
  • Data Comparison : By summarizing data into measures such as the mean and standard deviation, it’s easier to compare different datasets or different groups within a dataset.
  • Data Quality Assessment : Descriptive statistics can help identify errors or outliers in the data, which might indicate issues with data collection or entry.
  • Foundation for Further Analysis : Descriptive statistics are typically the first step in data analysis. They help create a foundation for further statistical or inferential analysis. In fact, advanced statistical techniques often assume that one has first examined their data using descriptive methods.

When to use Descriptive Statistics

They can be used in a wide range of situations, including:

  • Understanding a New Dataset : When you first encounter a new dataset, using descriptive statistics is a useful first step to understand the main characteristics of the data, such as the central tendency, dispersion, and distribution.
  • Data Exploration in Research : In the initial stages of a research project, descriptive statistics can help to explore the data, identify trends and patterns, and generate hypotheses for further testing.
  • Presenting Research Findings : Descriptive statistics can be used to present research findings in a clear and understandable way, often using visual aids like graphs or charts.
  • Monitoring and Quality Control : In fields like business or manufacturing, descriptive statistics are often used to monitor processes, track performance over time, and identify any deviations from expected standards.
  • Comparing Groups : Descriptive statistics can be used to compare different groups or categories within your data. For example, you might want to compare the average scores of two groups of students, or the variance in sales between different regions.
  • Reporting Survey Results : If you conduct a survey, you would use descriptive statistics to summarize the responses, such as calculating the percentage of respondents who agree with a certain statement.

Applications of Descriptive Statistics

Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications:

  • Business : Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month to understand trends, or use graphical representations like bar charts to present sales data.
  • Healthcare : In healthcare, descriptive statistics are used to summarize patient data, such as age, weight, blood pressure, or cholesterol levels. They are also used to describe the incidence and prevalence of diseases in a population.
  • Education : Educators use descriptive statistics to summarize student performance, like average test scores or grade distribution. This information can help identify areas where students are struggling and inform instructional decisions.
  • Social Sciences : Social scientists use descriptive statistics to summarize data collected from surveys, experiments, and observational studies. This can involve describing demographic characteristics of participants, response frequencies to survey items, and more.
  • Psychology : Psychologists use descriptive statistics to describe the characteristics of their study participants and the main findings of their research, such as the average score on a psychological test.
  • Sports : Sports analysts use descriptive statistics to summarize athlete and team performance, such as batting averages in baseball or points per game in basketball.
  • Government : Government agencies use descriptive statistics to summarize data about the population, such as census data on population size and demographics.
  • Finance and Economics : In finance, descriptive statistics can be used to summarize past investment performance or economic data, such as changes in stock prices or GDP growth rates.
  • Quality Control : In manufacturing, descriptive statistics can be used to summarize measures of product quality, such as the average dimensions of a product or the frequency of defects.

Limitations of Descriptive Statistics

While descriptive statistics are a crucial part of data analysis and provide valuable insights about a dataset, they do have certain limitations:

  • Lack of Depth : Descriptive statistics provide a summary of your data, but they can oversimplify the data, resulting in a loss of detail and potentially significant nuances.
  • Vulnerability to Outliers : Some descriptive measures, like the mean, are sensitive to outliers. A single extreme value can significantly skew your mean, making it less representative of your data.
  • Inability to Make Predictions : Descriptive statistics describe what has been observed in a dataset. They don’t allow you to make predictions or generalizations about unobserved data or larger populations.
  • No Insight into Correlations : While some descriptive statistics can hint at potential relationships between variables, they don’t provide detailed insights into the nature or strength of these relationships.
  • No Causality or Hypothesis Testing : Descriptive statistics cannot be used to determine cause and effect relationships or to test hypotheses. For these purposes, inferential statistics are needed.
  • Can Mislead : When used improperly, descriptive statistics can be used to present a misleading picture of the data. For instance, choosing to only report the mean without also reporting the standard deviation or range can hide a large amount of variability in the data.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

  • Search Search Please fill out this field.

What Are Descriptive Statistics?

  • How They Work

Univariate vs. Bivariate

Descriptive statistics and visualizations, descriptive statistics and outliers.

  • Descriptive vs. Inferential

The Bottom Line

  • Corporate Finance
  • Financial Analysis

Descriptive Statistics: Definition, Overview, Types, and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

what is descriptive analysis in research

Descriptive statistics are brief informational coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability (spread). Measures of central tendency include the mean, median, and mode, while measures of variability include standard deviation, variance, minimum and maximum variables, kurtosis , and skewness .

Key Takeaways

  • Descriptive statistics summarizes or describes the characteristics of a data set.
  • Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.
  • Measures of central tendency describe the center of the data set (mean, median, mode).
  • Measures of variability describe the dispersion of the data set (variance, standard deviation).
  • Measures of frequency distribution describe the occurrence of data within the data set (count).

Jessica Olah

Understanding Descriptive Statistics

Descriptive statistics help describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data. The most recognized types of descriptive statistics are measures of center. For example, the mean , median , and mode , which are used at almost all levels of math and statistics, are used to define and describe a data set. The mean, or the average, is calculated by adding all the figures within the data set and then dividing by the number of figures within the set.

For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5). The mode of a data set is the value appearing most often, and the median is the figure situated in the middle of the data set. It is the figure separating the higher figures from the lower figures within a data set. However, there are less common types of descriptive statistics that are still very important.

People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large data set into bite-sized descriptions. A student's grade point average (GPA), for example, provides a good understanding of descriptive statistics. The idea of a GPA is that it takes data points from a wide range of exams, classes, and grades, and averages them together to provide a general understanding of a student's overall academic performance. A student's personal GPA reflects their mean academic performance.

Descriptive statistics, especially in fields such as medicine, often visually depict data using scatter plots, histograms, line graphs, or stem and leaf displays. We'll talk more about visuals later in this article.

Types of Descriptive Statistics

All descriptive statistics are either measures of central tendency or measures of variability , also known as measures of dispersion.

Central Tendency

Measures of central tendency focus on the average or middle values of data sets, whereas measures of variability focus on the dispersion of data. These two measures use graphs, tables and general discussions to help people understand the meaning of the analyzed data.

Measures of central tendency describe the center position of a distribution for a data set. A person analyzes the frequency of each data point in the distribution and describes it using the mean, median, or mode, which measures the most common patterns of the analyzed data set.

Measures of Variability

Measures of variability (or the measures of spread) aid in analyzing how dispersed the distribution is for a set of data. For example, while the measures of central tendency may give a person the average of a data set, it does not describe how the data is distributed within the set.

So while the average of the data maybe 65 out of 100, there can still be data points at both 1 and 100. Measures of variability help communicate this by describing the shape and spread of the data set. Range, quartiles , absolute deviation, and variance are all examples of measures of variability.

Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that data set is 95, which is calculated by subtracting the lowest number (5) in the data set from the highest (100).

Distribution

Distribution (or frequency distribution) refers to the quantity of times a data point occurs. Alternatively, it is the measurement of a data point failing to occur. Consider a data set: male, male, female, female, female, other. The distribution of this data can be classified as:

  • The number of males in the data set is 2.
  • The number of females in the data set is 3.
  • The number of individuals identifying as other is 1.
  • The number of non-males is 4.

In descriptive statistics, univariate data analyzes only one variable. It is used to identify characteristics of a single trait and is not used to analyze any relationships or causations.

For example, imagine a room full of high school students. Say you wanted to gather the average age of the individuals in the room. This univariate data is only dependent on one factor: each person's age. By gathering this one piece of information from each person and dividing by the total number of people, you can determine the average age.

Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two types of data are collected, and the relationship between the two pieces of information is analyzed together. Because multiple variables are analyzed, this approach may also be referred to as multivariate .

Let's say each high school student in the example above takes a college assessment test, and we want to see whether older students are testing better than younger students. In addition to gathering the age of the students, we need to gather each student's test score. Then, using data analytics, we mathematically or graphically depict whether there is a relationship between student age and test scores.

The preparation and reporting of financial statements is an example of descriptive statistics. Analyzing that financial information to make decisions on the future is inferential statistics.

One essential aspect of descriptive statistics is graphical representation. Visualizing data distributions effectively can be incredibly powerful, and this is done in several ways.

Histograms are tools for displaying the distribution of numerical data. They divide the data into bins or intervals and represent the frequency or count of data points falling into each bin through bars of varying heights. Histograms help identify the shape of the distribution, central tendency, and variability of the data.

Another visualization is boxplots. Boxplots, also known as box-and-whisker plots, provide a concise summary of a data distribution by highlighting key summary statistics including the median (middle line inside the box), quartiles (edges of the box), and potential outliers (points outside the "whiskers"). Boxplots visually depict the spread and skewness of the data and are particularly useful for comparing distributions across different groups or variables.

Anytime descriptive statistics are being discussed, it's important to note outliers. Outliers are data points that significantly differ from other observations in a dataset. These could be errors, anomalies, or rare events within the data.

Detecting and managing outliers is a step in descriptive statistics to ensure accurate and reliable data analysis. To identify outliers, you can use graphical techniques (such as boxplots or scatter plots) or statistical methods (such as Z-score or IQR method). These approaches help pinpoint observations that deviate substantially from the overall pattern of the data.

The presence of outliers can have a notable impact on descriptive statistics. This is vitally important in statistics, as this can skew results and affect the interpretation of data. Outliers can disproportionately influence measures of central tendency, such as the mean, pulling it towards their extreme values. For example, the dataset of (1, 1, 1, 997) is 250, even though that is hardly representative of the dataset. This distortion can lead to misleading conclusions about the typical behavior of the dataset.

Depending on the context, outliers can be treated by either removing them (if they are genuinely erroneous or irrelevant). Alternatively, outliers may hold important information and should be kept for the value they may be able to demonstrate. As you analyze your data, consider the relevance of what outliers can contribute and whether it makes more sense to just strike those data points from your descriptive statistic calculations.

Descriptive Statistics vs. Inferential Statistics

Descriptive statistics have a different function than inferential statistics, data sets that are used to make decisions or apply characteristics from one data set to another.

Imagine another example where a company sells hot sauce. The company gathers data such as the count of sales , average quantity purchased per transaction , and average sale per day of the week. All of this information is descriptive, as it tells a story of what actually happened in the past. In this case, it is not being used beyond being informational.

Let's say the same company wants to roll out a new hot sauce. It gathers the same sales data above, but it crafts the information to make predictions about what the sales of the new hot sauce will be. The act of using descriptive statistics and applying characteristics to a different data set makes the data set inferential statistics. We are no longer simply summarizing data; we are using it predict what will happen regarding an entirely different body of data (the new hot sauce product).

What Is Descriptive Statistics?

Descriptive statistics is a means of describing features of a data set by generating summaries about data samples. It's often depicted as a summary of data shown that explains the contents of data. For example, a population census may include descriptive statistics regarding the ratio of men and women in a specific city.

What Are Examples of Descriptive Statistics?

Descriptive statistics are informational and meant to describe the actual characteristics of a data set. When analyzing numbers regarding the prior Major League Baseball season, descriptive statistics including the highest batting average for a single player, the number of runs allowed per team, and the average wins per division.

What Is the Main Purpose of Descriptive Statistics?

The main purpose of descriptive statistics is to provide information about a data set. In the example above, there are hundreds of baseballs players that engage in thousands of games. Descriptive statistics summarizes the large amount of data into several useful bits of information.

What Are the Types of Descriptive Statistics?

The three main types of descriptive statistics are frequency distribution, central tendency, and variability of a data set. The frequency distribution records how often data occurs, central tendency records the data's center point of distribution, and variability of a data set records its degree of dispersion.

Can Descriptive Statistics Be Used to Make Inference or Predictions?

No. While these descriptives help understand data attributes, inferential statistical techniques—a separate branch of statistics—are required to understand how variables interact with one another in a data set.

Descriptive statistics refers to the analysis, summary, and communication of findings that describe a data set. Often not useful for decision-making, descriptive statistics still hold value in explaining high-level summaries of a set of information such as the mean, median, mode, variance, range, and count of information.

Purdue Online Writing Lab. " Writing with Statistics: Descriptive Statistics ."

Cooksey, Ray W. " Descriptive Statistics for Summarizing Data ." Illustrating Statistical Procedures: Finding Meaning in Quantitative Data , vol. 15, May 2020, pp. 61–139.

Professor Andrew Ainsworth, California State University Northridge. " Measures of Variability, Descriptive Statistics Part 2 ." Page 2.

Professor Beverly Reed, Kent State University. " Summary: Differences Between Univariate and Bivariate Data ."

Purdue Online Writing Lab. " Writing with Statistics: Basic Inferential Statistics: Theory and Application ."

what is descriptive analysis in research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Enago Academy

Bridging the Gap: Overcome these 7 flaws in descriptive research design

' src=

Descriptive research design is a powerful tool used by scientists and researchers to gather information about a particular group or phenomenon. This type of research provides a detailed and accurate picture of the characteristics and behaviors of a particular population or subject. By observing and collecting data on a given topic, descriptive research helps researchers gain a deeper understanding of a specific issue and provides valuable insights that can inform future studies.

In this blog, we will explore the definition, characteristics, and common flaws in descriptive research design, and provide tips on how to avoid these pitfalls to produce high-quality results. Whether you are a seasoned researcher or a student just starting, understanding the fundamentals of descriptive research design is essential to conducting successful scientific studies.

Table of Contents

What Is Descriptive Research Design?

The descriptive research design involves observing and collecting data on a given topic without attempting to infer cause-and-effect relationships. The goal of descriptive research is to provide a comprehensive and accurate picture of the population or phenomenon being studied and to describe the relationships, patterns, and trends that exist within the data.

Descriptive research methods can include surveys, observational studies , and case studies, and the data collected can be qualitative or quantitative . The findings from descriptive research provide valuable insights and inform future research, but do not establish cause-and-effect relationships.

Importance of Descriptive Research in Scientific Studies

1. understanding of a population or phenomenon.

Descriptive research provides a comprehensive picture of the characteristics and behaviors of a particular population or phenomenon, allowing researchers to gain a deeper understanding of the topic.

2. Baseline Information

The information gathered through descriptive research can serve as a baseline for future research and provide a foundation for further studies.

3. Informative Data

Descriptive research can provide valuable information and insights into a particular topic, which can inform future research, policy decisions, and programs.

4. Sampling Validation

Descriptive research can be used to validate sampling methods and to help researchers determine the best approach for their study.

5. Cost Effective

Descriptive research is often less expensive and less time-consuming than other research methods , making it a cost-effective way to gather information about a particular population or phenomenon.

6. Easy to Replicate

Descriptive research is straightforward to replicate, making it a reliable way to gather and compare information from multiple sources.

Key Characteristics of Descriptive Research Design

The primary purpose of descriptive research is to describe the characteristics, behaviors, and attributes of a particular population or phenomenon.

2. Participants and Sampling

Descriptive research studies a particular population or sample that is representative of the larger population being studied. Furthermore, sampling methods can include convenience, stratified, or random sampling.

3. Data Collection Techniques

Descriptive research typically involves the collection of both qualitative and quantitative data through methods such as surveys, observational studies, case studies, or focus groups.

4. Data Analysis

Descriptive research data is analyzed to identify patterns, relationships, and trends within the data. Statistical techniques , such as frequency distributions and descriptive statistics, are commonly used to summarize and describe the data.

5. Focus on Description

Descriptive research is focused on describing and summarizing the characteristics of a particular population or phenomenon. It does not make causal inferences.

6. Non-Experimental

Descriptive research is non-experimental, meaning that the researcher does not manipulate variables or control conditions. The researcher simply observes and collects data on the population or phenomenon being studied.

When Can a Researcher Conduct Descriptive Research?

A researcher can conduct descriptive research in the following situations:

  • To better understand a particular population or phenomenon
  • To describe the relationships between variables
  • To describe patterns and trends
  • To validate sampling methods and determine the best approach for a study
  • To compare data from multiple sources.

Types of Descriptive Research Design

1. survey research.

Surveys are a type of descriptive research that involves collecting data through self-administered or interviewer-administered questionnaires. Additionally, they can be administered in-person, by mail, or online, and can collect both qualitative and quantitative data.

2. Observational Research

Observational research involves observing and collecting data on a particular population or phenomenon without manipulating variables or controlling conditions. It can be conducted in naturalistic settings or controlled laboratory settings.

3. Case Study Research

Case study research is a type of descriptive research that focuses on a single individual, group, or event. It involves collecting detailed information on the subject through a variety of methods, including interviews, observations, and examination of documents.

4. Focus Group Research

Focus group research involves bringing together a small group of people to discuss a particular topic or product. Furthermore, the group is usually moderated by a researcher and the discussion is recorded for later analysis.

5. Ethnographic Research

Ethnographic research involves conducting detailed observations of a particular culture or community. It is often used to gain a deep understanding of the beliefs, behaviors, and practices of a particular group.

Advantages of Descriptive Research Design

1. provides a comprehensive understanding.

Descriptive research provides a comprehensive picture of the characteristics, behaviors, and attributes of a particular population or phenomenon, which can be useful in informing future research and policy decisions.

2. Non-invasive

Descriptive research is non-invasive and does not manipulate variables or control conditions, making it a suitable method for sensitive or ethical concerns.

3. Flexibility

Descriptive research allows for a wide range of data collection methods , including surveys, observational studies, case studies, and focus groups, making it a flexible and versatile research method.

4. Cost-effective

Descriptive research is often less expensive and less time-consuming than other research methods. Moreover, it gives a cost-effective option to many researchers.

5. Easy to Replicate

Descriptive research is easy to replicate, making it a reliable way to gather and compare information from multiple sources.

6. Informs Future Research

The insights gained from a descriptive research can inform future research and inform policy decisions and programs.

Disadvantages of Descriptive Research Design

1. limited scope.

Descriptive research only provides a snapshot of the current situation and cannot establish cause-and-effect relationships.

2. Dependence on Existing Data

Descriptive research relies on existing data, which may not always be comprehensive or accurate.

3. Lack of Control

Researchers have no control over the variables in descriptive research, which can limit the conclusions that can be drawn.

The researcher’s own biases and preconceptions can influence the interpretation of the data.

5. Lack of Generalizability

Descriptive research findings may not be applicable to other populations or situations.

6. Lack of Depth

Descriptive research provides a surface-level understanding of a phenomenon, rather than a deep understanding.

7. Time-consuming

Descriptive research often requires a large amount of data collection and analysis, which can be time-consuming and resource-intensive.

7 Ways to Avoid Common Flaws While Designing Descriptive Research

what is descriptive analysis in research

1. Clearly define the research question

A clearly defined research question is the foundation of any research study, and it is important to ensure that the question is both specific and relevant to the topic being studied.

2. Choose the appropriate research design

Choosing the appropriate research design for a study is crucial to the success of the study. Moreover, researchers should choose a design that best fits the research question and the type of data needed to answer it.

3. Select a representative sample

Selecting a representative sample is important to ensure that the findings of the study are generalizable to the population being studied. Researchers should use a sampling method that provides a random and representative sample of the population.

4. Use valid and reliable data collection methods

Using valid and reliable data collection methods is important to ensure that the data collected is accurate and can be used to answer the research question. Researchers should choose methods that are appropriate for the study and that can be administered consistently and systematically.

5. Minimize bias

Bias can significantly impact the validity and reliability of research findings.  Furthermore, it is important to minimize bias in all aspects of the study, from the selection of participants to the analysis of data.

6. Ensure adequate sample size

An adequate sample size is important to ensure that the results of the study are statistically significant and can be generalized to the population being studied.

7. Use appropriate data analysis techniques

The appropriate data analysis technique depends on the type of data collected and the research question being asked. Researchers should choose techniques that are appropriate for the data and the question being asked.

Have you worked on descriptive research designs? How was your experience creating a descriptive design? What challenges did you face? Do write to us or leave a comment below and share your insights on descriptive research designs!

' src=

extremely very educative

Indeed very educative and useful. Well explained. Thank you

Simple,easy to understand

Rate this article Cancel Reply

Your email address will not be published.

what is descriptive analysis in research

Enago Academy's Most Popular Articles

7 Step Guide for Optimizing Impactful Research Process

  • Publishing Research
  • Reporting Research

How to Optimize Your Research Process: A step-by-step guide

For researchers across disciplines, the path to uncovering novel findings and insights is often filled…

Launch of "Sony Women in Technology Award with Nature"

  • Industry News
  • Trending Now

Breaking Barriers: Sony and Nature unveil “Women in Technology Award”

Sony Group Corporation and the prestigious scientific journal Nature have collaborated to launch the inaugural…

Guide to Adhere Good Research Practice (FREE CHECKLIST)

Achieving Research Excellence: Checklist for good research practices

Academia is built on the foundation of trustworthy and high-quality research, supported by the pillars…

ResearchSummary

  • Promoting Research

Plain Language Summary — Communicating your research to bridge the academic-lay gap

Science can be complex, but does that mean it should not be accessible to the…

Journals Combat Image Manipulation with AI

Science under Surveillance: Journals adopt advanced AI to uncover image manipulation

Journals are increasingly turning to cutting-edge AI tools to uncover deceitful images published in manuscripts.…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

Research Recommendations – Guiding policy-makers for evidence-based decision making

what is descriptive analysis in research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

what is descriptive analysis in research

As a researcher, what do you consider most when choosing an image manipulation detector?

Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What?

Affiliation.

  • 1 From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
  • PMID: 28891910
  • DOI: 10.1213/ANE.0000000000002471

Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"

  • Analysis of Variance
  • Biomedical Research / statistics & numerical data*
  • Computer Graphics
  • Confidence Intervals
  • Data Collection / statistics & numerical data*
  • Data Interpretation, Statistical*
  • Models, Statistical*
  • Research Design / statistics & numerical data*
  • Sample Size
  • Open access
  • Published: 13 May 2024

Patient medication management, understanding and adherence during the transition from hospital to outpatient care - a qualitative longitudinal study in polymorbid patients with type 2 diabetes

  • Léa Solh Dost   ORCID: orcid.org/0000-0001-5767-1305 1 , 2 ,
  • Giacomo Gastaldi   ORCID: orcid.org/0000-0001-6327-7451 3 &
  • Marie P. Schneider   ORCID: orcid.org/0000-0002-7557-9278 1 , 2  

BMC Health Services Research volume  24 , Article number:  620 ( 2024 ) Cite this article

131 Accesses

Metrics details

Continuity of care is under great pressure during the transition from hospital to outpatient care. Medication changes during hospitalization may be poorly communicated and understood, compromising patient safety during the transition from hospital to home. The main aims of this study were to investigate the perspectives of patients with type 2 diabetes and multimorbidities on their medications from hospital discharge to outpatient care, and their healthcare journey through the outpatient healthcare system. In this article, we present the results focusing on patients’ perspectives of their medications from hospital to two months after discharge.

Patients with type 2 diabetes, with at least two comorbidities and who returned home after discharge, were recruited during their hospitalization. A descriptive qualitative longitudinal research approach was adopted, with four in-depth semi-structured interviews per participant over a period of two months after discharge. Interviews were based on semi-structured guides, transcribed verbatim, and a thematic analysis was conducted.

Twenty-one participants were included from October 2020 to July 2021. Seventy-five interviews were conducted. Three main themes were identified: (A) Medication management, (B) Medication understanding, and (C) Medication adherence, during three periods: (1) Hospitalization, (2) Care transition, and (3) Outpatient care. Participants had varying levels of need for medication information and involvement in medication management during hospitalization and in outpatient care. The transition from hospital to autonomous medication management was difficult for most participants, who quickly returned to their routines with some participants experiencing difficulties in medication adherence.

Conclusions

The transition from hospital to outpatient care is a challenging process during which discharged patients are vulnerable and are willing to take steps to better manage, understand, and adhere to their medications. The resulting tension between patients’ difficulties with their medications and lack of standardized healthcare support calls for interprofessional guidelines to better address patients’ needs, increase their safety, and standardize physicians’, pharmacists’, and nurses’ roles and responsibilities.

Peer Review reports

Introduction

Continuity of patient care is characterized as the collaborative engagement between the patient and their physician-led care team in the ongoing management of healthcare, with the mutual objective of delivering high-quality and cost-effective medical care [ 1 ]. Continuity of care is under great pressure during the transition of care from hospital to outpatient care, with a risk of compromising patients’ safety [ 2 , 3 ]. The early post-discharge period is a high-risk and fragile transition: once discharged, one in five patients experience at least one adverse event during the first three weeks following discharge, and more than half of these adverse events are drug-related [ 4 , 5 ]. A retrospective study examining all discharged patients showed that adverse drug events (ADEs) account for up to 20% of 30-day hospital emergency readmissions [ 6 ]. During hospitalization, patients’ medications are generally modified, with an average of nearly four medication changes per patient [ 7 ]. Information regarding medications such as medication changes, the expected effect, side effects, and instructions for use are frequently poorly communicated to patients during hospitalization and at discharge [ 8 , 9 , 10 , 11 ]. Between 20 and 60% of discharged patients lack knowledge of their medications [ 12 , 13 ]. Consideration of patients’ needs and their active engagement in decision-making during hospitalization regarding their medications are often lacking [ 11 , 14 , 15 ]. This can lead to unsafe discharge and contribute to medication adherence difficulties, such as non-implementation of newly prescribed medications [ 16 , 17 ].

Patients with multiple comorbidities and polypharmacy are at higher risk of ADE [ 18 ]. Type 2 diabetes is one of the chronic health conditions most frequently associated with comorbidities and patients with type 2 diabetes often lack care continuum [ 19 , 20 , 21 ]. The prevalence of patients hospitalized with type 2 diabetes can exceed 40% [ 22 ] and these patients are at higher risk for readmission due to their comorbidities and their medications, such as insulin and oral hypoglycemic agents [ 23 , 24 , 25 ].

Interventions and strategies to improve patient care and safety at transition have shown mixed results worldwide in reducing cost, rehospitalization, ADE, and non-adherence [ 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ]. However, interventions that are patient-centered, with a patient follow-up and led by interprofessional healthcare teams showed promising results [ 34 , 35 , 36 ]. Most of these interventions have not been implemented routinely due to the extensive time to translate research into practice and the lack of hybrid implementation studies [ 37 , 38 , 39 , 40 , 41 ]. In addition, patient-reported outcomes and perspectives have rarely been considered, yet patients’ involvement is essential for seamless and integrated care [ 42 , 43 ]. Interprofessional collaboration in which patients are full members of the interprofessional team, is still in its infancy in outpatient care [ 44 ]. Barriers and facilitators regarding medications at the transition of care have been explored in multiple qualitative studies at one given time in a given setting (e.g., at discharge, one-month post-discharge) [ 8 , 45 , 46 , 47 , 48 ]. However, few studies have adopted a holistic methodology from the hospital to the outpatient setting to explore changes in patients’ perspectives over time [ 49 , 50 , 51 ]. Finally, little is known about whether, how, and when patients return to their daily routine following hospitalization and the impact of hospitalization weeks after discharge.

In Switzerland, continuity of care after hospital discharge is still poorly documented, both in terms of contextual analysis and interventional studies, and is mainly conducted in the hospital setting [ 31 , 35 , 52 , 53 , 54 , 55 , 56 ]. The first step of an implementation science approach is to perform a contextual analysis to set up effective interventions adapted to patients’ needs and aligned to healthcare professionals’ activities in a specific context [ 41 , 57 ]. Therefore, the main aims of this study were to investigate the perspectives of patients with type 2 diabetes and multimorbidities on their medications from hospital discharge to outpatient care, and on their healthcare journey through the outpatient healthcare system. In this article, we present the results focusing on patients’ perspectives of their medications from hospital to two months after discharge.

Study design

This qualitative longitudinal study, conducted from October 2020 to July 2021, used a qualitative descriptive methodology through four consecutive in-depth semi-structured interviews per participant at three, 10-, 30- and 60-days post-discharge, as illustrated in Fig.  1 . Longitudinal qualitative research is characterized by qualitative data collection at different points in time and focuses on temporality, such as time and change [ 58 , 59 ]. Qualitative descriptive studies aim to explore and describe the depth and complexity of human experiences or phenomena [ 60 , 61 , 62 ]. We focused our qualitative study on the 60 first days after discharge as this period is considered highly vulnerable and because studies often use 30- or 60-days readmission as an outcome measure [ 5 , 63 ].

This qualitative study follows the Consolidated Criteria for Reporting Qualitative Research (COREQ). Ethics committee approval was sought and granted by the Cantonal Research Ethics Commission, Geneva (CCER) (2020 − 01779).

Recruitment took place during participants’ hospitalization in the general internal medicine divisions at the Geneva University Hospitals in the canton of Geneva (500 000 inhabitants), Switzerland. Interviews took place at participants’ homes, in a private office at the University of Geneva, by telephone or by secure video call, according to participants’ preference. Informal caregivers could also participate alongside the participants.

figure 1

Study flowchart

Researcher characteristics

All the researchers were trained in qualitative studies. The diabetologist and researcher (GG) who enrolled the patients in the study was involved directly or indirectly (advice asked to the Geneva University Hospital diabetes team of which he was a part) for most participants’ care during hospitalization. LS (Ph.D. student and community pharmacist) was unknown to participants and presented herself during hospitalization as a “researcher” and not as a healthcare professional to avoid any risk of influencing participants’ answers. This study was not interventional, and the interviewer (LS) invited participants to contact a healthcare professional for any questions related to their medication or medical issues.

Population and sampling strategy

Patients with type 2 diabetes were chosen as an example population to describe polypharmacy patients as these patients usually have several health issues and polypharmacy [ 20 , 22 , 25 ]. Inclusions criteria for the study were: adult patients with type 2 diabetes, with at least two other comorbidities, hospitalized for at least three days in a general internal medicine ward, with a minimum of one medication change during hospital stay, and who self-managed their medications once discharged home. Exclusion criteria were patients not reachable by telephone following discharge, unable to give consent (patients with schizophrenia, dementia, brain damage, or drug/alcohol misuse), and who could not communicate in French. A purposive sampling methodology was applied aiming to include participants with different ages, genders, types, and numbers of health conditions by listing participants’ characteristics in a double-entry table, available in Supplementary Material 1 , until thematic saturation was reached. Thematic saturation was considered achieved when no new code or theme emerged and new data repeated previously coded information [ 64 ]. The participants were identified if they were hospitalized in the ward dedicated to diabetes care or when the diabetes team was contacted for advice. The senior ward physician (GG) screened eligible patients and the interviewer (LS) obtained written consent before hospital discharge.

Data collection and instruments

Sociodemographic (age, gender, educational level, living arrangement) and clinical characteristics (reason for hospitalization, date of admission, health conditions, diabetes diagnosis, medications before and during hospitalization) were collected by interviewing participants before their discharge and by extracting participants’ data from electronic hospital files by GG and LS. Participants’ pharmacies were contacted with the participant’s consent to obtain medication records from the last three months if information regarding medications before hospitalization was missing in the hospital files.

Semi-structured interview guides for each interview (at three, 10-, 30- and 60-days post-discharge) were developed based on different theories and components of health behavior and medication adherence: the World Health Organization’s (WHO) five dimensions for adherence, the Information-Motivation-Behavioral skills model and the Social Cognitive Theory [ 65 , 66 , 67 ]. Each interview explored participants’ itinerary in the healthcare system and their perspectives on their medications. Regarding medications, the following themes were mentioned at each interview: changes in medications, patients’ understanding and implication; information on their medications, self-management of their medications, and patients’ medication adherence. Other aspects were mentioned in specific interviews: patients’ hospitalization and experience on their return home (interview 1), motivation (interviews 2 and 4), and patient’s feedback on the past two months (interview 4). Interview guides translated from French are available in Supplementary Material 2 . The participants completed self-reported and self-administrated questionnaires at different interviews to obtain descriptive information on different factors that may affect medication management and adherence: self-report questionnaires on quality of life (EQ-5D-5 L) [ 68 ], literacy (Schooling-Opinion-Support questionnaire) [ 69 ], medication adherence (Adherence Visual Analogue Scale, A-VAS) [ 70 ] and Belief in Medication Questionnaire (BMQ) [ 71 ] were administered to each participant at the end of selected interviews to address the different factors that may affect medication management and adherence as well as to determine a trend of determinants over time. The BMQ contains two subscores: Specific-Necessity and Specific-Concerns, addressing respectively their perceived needs for their medications, and their concerns about adverse consequences associated with taking their medication [ 72 ].

Data management

Informed consent forms, including consent to obtain health data, were securely stored in a private office at the University of Geneva. The participants’ identification key was protected by a password known only by MS and LS. Confidentiality was guaranteed by pseudonymization of participants’ information and audio-recordings were destroyed once analyzed. Sociodemographic and clinical characteristics, medication changes, and answers to questionnaires were securely collected by electronic case report forms (eCRFs) on RedCap®. Interviews were double audio-recorded and field notes were taken during interviews. Recorded interviews were manually transcribed verbatim in MAXQDA® (2018.2) by research assistants and LS and transcripts were validated for accuracy by LS. A random sample of 20% of questionnaires was checked for accuracy for the transcription from the paper questionnaires to the eCRFs. Recorded sequences with no link to the discussed topics were not transcribed and this was noted in the transcripts.

Data analysis

A descriptive statistical analysis of sociodemographic, clinical characteristics and self-reported questionnaire data was carried out. A thematic analysis of transcripts was performed, as described by Braun and Clarke [ 73 ], by following six steps: raw data was read, text segments related to the study objectives were identified, text segments to create new categories were identified, similar or redundant categories were reduced and a model that integrated all significant categories was created. The analysis was conducted in parallel with patient enrolment to ensure data saturation. To ensure the validity of the coding method, transcripts were double coded independently and discussed by the research team until similar themes were obtained. The research group developed and validated an analysis grid, with which LS coded systematically the transcriptions and met regularly with the research team to discuss questions on data analysis and to ensure the quality of coding. The analysis was carried out in French, and the verbatims of interest cited in the manuscript were translated and validated by a native English-speaking researcher to preserve the meaning.

In this analysis, we used the term “healthcare professionals” when more than one profession could be involved in participants’ medication management. Otherwise, when a specific healthcare professional was involved, we used the designated profession (e.g. physicians, pharmacists).

Patient and public involvement

During the development phase of the study, interview guides and questionnaires were reviewed for clarity and validity and adapted by two patient partners, with multiple health conditions and who experienced previously a hospital discharge. They are part of the HUG Patients Partners + 3P platform for research and patient and public involvement.

Interviews and participants’ descriptions

A total of 75 interviews were conducted with 21 participants. In total, 31 patients were contacted, seven refused to participate (four at the project presentation and three at consent), two did not enter the selection criteria at discharge and one was unreachable after discharge. Among the 21 participants, 15 participated in all interviews, four in three interviews, one in two interviews, and one in one interview, due to scheduling constraints. Details regarding interviews and participants characteristics are presented in Tables  1 and 2 .

The median length of time between hospital discharge and interviews 1,2,3 and 4 was 5 (IQR: 4–7), 14 (13-20), 35 (22-38), and 63 days (61-68), respectively. On average, by comparing medications at hospital admission and discharge, a median of 7 medication changes (IQR: 6–9, range:2;17) occurred per participant during hospitalization and a median of 7 changes (5–12) during the two months following discharge. Details regarding participants’ medications are described in Table  3 .

Patient self-reported adherence over the past week for their three most challenging medications are available in Supplementary Material 3 .

Qualitative analysis

We defined care transition as the period from discharge until the first medical appointment post-discharge, and outpatient care as the period starting after the first medical appointment. Data was organized into three key themes (A. Medication management, B. Medication understanding, and C. Medication adherence) divided into subthemes at three time points (1. Hospitalization, 2. Care transition and 3. Outpatient care). Figure  2 summarizes and illustrates the themes and subthemes with their influencing factors as bullet points.

figure 2

Participants’ medication management, understanding and adherence during hospitalization, care transition and outpatient care

A. Medication management

A.1 medication management during hospitalization: medication management by hospital staff.

Medications during hospitalization were mainly managed by hospital healthcare professionals (i.e. nurses and physicians) with varying degrees of patient involvement: “At the hospital, they prepared the medications for me. […] I didn’t even know what the packages looked like.” Participant 22; interview 1 (P22.1) Some participants reported having therapeutic education sessions with specialized nurses and physicians, such as the explanation and demonstration of insulin injection and glucose monitoring. A patient reported that he was given the choice of several treatments and was involved in shared decision-making. Other participants had an active role in managing and optimizing dosages, such as rapid insulin, due to prior knowledge and use of medications before hospitalization.

A.2 Medication management at transition: obtaining the medication and initiating self-management

Once discharged, some participants had difficulties obtaining their medications at the pharmacy because some medications were not stored and had to be ordered, delaying medication initiation. To counter this problem upstream, a few participants were provided a 24-to-48-hour supply of medications at discharge. It was sometimes requested by the patient or suggested by the healthcare professionals but was not systematic. The transition from medication management by hospital staff to self-management was exhausting for most participants who were faced with a large amount of new information and changes in their medications: “ When I was in the hospital, I didn’t even realize all the changes. When I came back home, I took away the old medication packages and got out the new ones. And then I thought : « my God, all this…I didn’t know I had all these changes » ” P2.1 Written documentation, such as the discharge prescription or dosage labels on medication packages, was helpful in managing their medication at home. Most participants used weekly pill organizers to manage their medications, which were either already used before hospitalization or were introduced post-discharge. The help of a family caregiver in managing and obtaining medications was reported as a facilitator.

A.3 Medication management in outpatient care: daily self-management and medication burden

A couple of days or weeks after discharge, most participants had acquired a routine so that medication management was less demanding, but the medication burden varied depending on the participants. For some, medication management became a simple action well implemented in their routine (“It has become automatic” , P23.4), while for others, the number of medications and the fact that the medications reminded them of the disease was a heavy burden to bear on a daily basis (“ During the first few days after getting out of the hospital, I thought I was going to do everything right. In the end, well [laughs] it’s complicated. I ended up not always taking the medication, not monitoring the blood sugar” P12.2) To support medication self-management, some participants had written documentation such as treatment plans, medication lists, and pictures of their medication packages on their phones. Some participants had difficulties obtaining medications weeks after discharge as discharge prescriptions were not renewable and participants did not see their physician in time. Others had to visit multiple physicians to have their prescriptions updated. A few participants were faced with prescription or dispensing errors, such as prescribing or dispensing the wrong dosage, which affected medication management and decreased trust in healthcare professionals. In most cases, according to participants, the pharmacy staff worked in an interprofessional collaboration with physicians to provide new and updated prescriptions.

B. Medication understanding

B.1 medication understanding during hospitalization: new information and instructions.

The amount of information received during hospitalization varied considerably among participants with some reporting having received too much, while others saying they received too little information regarding medication changes, the reason for changes, or for introducing new medications: “They told me I had to take this medication all my life, but they didn’t tell me what the effects were or why I was taking it.” P5.3

Hospitalization was seen by some participants as a vulnerable and tiring period during which they were less receptive to information. Information and explanations were generally given verbally, making it complicated for most participants to recall it. Some participants reported that hospital staff was attentive to their needs for information and used communication techniques such as teach-back (a way of checking understanding by asking participants to say in their own words what they need to know or do about their health or medications). Some participants were willing to be proactive in the understanding of their medications while others were more passive, had no specific needs for information, and did not see how they could be engaged more.

B.2 Medication understanding at transition: facing medication changes

At hospital discharge, the most challenging difficulty for participants was to understand the changes made regarding their medications. For newly diagnosed participants, the addition of new medications was more difficult to understand, whereas, for experienced participants, changes in known medications such as dosage modification, changes within a therapeutic class, and generic substitutions were the most difficult to understand. Not having been informed about changes caused confusion and misunderstanding. Therefore, medication reconciliation done by the patient was time-consuming, especially for participants with multiple medications: “ They didn’t tell me at all that they had changed my treatment completely. They just told me : « We’ve changed a few things. But it was the whole treatment ». ” P2.3 Written information, such as the discharge prescription, the discharge report (brief letter summarizing information about the hospitalization, given to the patient at discharge), or the label on the medication box (written by the pharmacist with instructions on dosage) helped them find or recall information about their medications and diagnoses. However, technical terms were used in hospital documentations and were not always understandable. For example, this participant said: “ On the prescription of valsartan, they wrote: ‘resume in the morning once profile…’[once hypertension profile allows]… I don’t know what that means.” P8.1 In addition, some documents were incomplete, as mentioned by a patient who did not have the insulin dosage mentioned on the hospital prescription. Some participants sought help from healthcare professionals, such as pharmacists, hospital physicians, or general practitioners a few days after discharge to review medications, answer questions, or obtain additional information.

B.3 Medication understanding in the outpatient care: concerns and knowledge

Weeks after discharge, most participants had concerns about the long-term use of their medications, their usefulness, and the possible risk of interactions or side effects. Some participants also reported having some lack of knowledge regarding indications, names, or how the medication worked: “I don’t even know what Brilique® [ticagrelor, antiplatelet agent] is for. It’s for blood pressure, isn’t it?. I don’t know.” P11.4 According to participants, the main reasons for the lack of understanding were the lack of information at the time of prescribing and the large number of medications, making it difficult to search for information and remember it. Participants sought information from different healthcare professionals or by themselves, on package inserts, through the internet, or from family and friends. Others reported having had all the information needed or were not interested in having more information. In addition, participants with low medication literacy, such as non-native speakers or elderly people, struggled more with medication understanding and sought help from family caregivers or healthcare professionals, even weeks after discharge: “ I don’t understand French very well […] [The doctor] explained it very quickly…[…] I didn’t understand everything he was saying” P16.2

C. Medication adherence

C.2 medication adherence at transition: adopting new behaviors.

Medication adherence was not mentioned as a concern during hospitalization and a few participants reported difficulties in medication initiation once back home: “I have an injection of Lantus® [insulin] in the morning, but obviously, the first day [after discharge], I forgot to do it because I was not used to it.” P23.1 Participants had to quickly adopt new behaviors in the first few days after discharge, especially for participants with few medications pre-hospitalization. The use of weekly pill organizers, alarms and specific storage space were reported as facilitators to support adherence. One patient did not initiate one of his medications because he did not understand the medication indication, and another patient took her old medications because she was used to them. Moreover, most participants experienced their hospitalization as a turning point, a time when they focused on their health, thought about the importance of their medications, and discussed any new lifestyle or dietary measures that might be implemented.

C.3 Medication adherence in outpatient care: ongoing medication adherence

More medication adherence difficulties appeared a few weeks after hospital discharge when most participants reported nonadherence behaviors, such as difficulties implementing the dosage regimen, or intentionally discontinuing the medication and modifying the medication regimen on their initiative. Determinants positively influencing medication adherence were the establishment of a routine; organizing medications in weekly pill-organizers; organizing pocket doses (medications for a short period that participants take with them when away from home); seeking support from family caregivers; using alarm clocks; and using specific storage places. Reasons for nonadherence were changes in daily routine; intake times that were not convenient for the patient; the large number of medications; and poor knowledge of the medication or side effects. Healthcare professionals’ assistance for medication management, such as the help of home nurses or pharmacists for the preparation of weekly pill-organizers, was requested by participants or offered by healthcare professionals to support medication adherence: “ I needed [a home nurse] to put my pills in the pillbox. […] I felt really weak […] and I was making mistakes. So, I’m very happy [the doctor] offered me [home care]. […] I have so many medications.” P22.3 Some participants who experienced prehospitalization non-adherence were more aware of their non-adherence and implemented strategies, such as modifying the timing of intake: “I said to my doctor : « I forget one time out of two […], can I take them in the morning? » We looked it up and yes, I can take it in the morning.” P11.2 In contrast, some participants were still struggling with adherence difficulties that they had before hospitalization. Motivations for taking medications two months after discharge were to improve health, avoid complications, reduce symptoms, reduce the number of medications in the future or out of obligation: “ I force myself to take them because I want to get to the end of my diabetes, I want to reduce the number of pills as much as possible.” P14.2 After a few weeks post-hospitalization, for some participants, health and illness were no longer the priority because of other life imperatives (e.g., family or financial situation).

This longitudinal study provided a multi-faceted representation of how patients manage, understand, and adhere to their medications from hospital discharge to two months after discharge. Our findings highlighted the varying degree of participants’ involvement in managing their medications during their hospitalization, the individualized needs for information during and after hospitalization, the complicated transition from hospital to autonomous medication management, the adaptation of daily routines around medication once back home, and the adherence difficulties that surfaced in the outpatient care, with nonadherence prior to hospitalization being an indicator of the behavior after discharge. Finally, our results confirmed the lack of continuity in care and showed the lack of patient care standardization experienced by the participants during the transition from hospital to outpatient care.

This in-depth analysis of patients’ experiences reinforces common challenges identified in the existing literature such as the lack of personalized information [ 9 , 10 , 11 ], loss of autonomy during hospitalization [ 14 , 74 , 75 ], difficulties in obtaining medication at discharge [ 11 , 45 , 76 ] and challenges in understanding treatment modifications and generics substitution [ 11 , 32 , 77 , 78 ]. Some of these studies were conducted during patients’ hospitalization [ 10 , 75 , 79 ] or up to 12 months after discharge [ 80 , 81 ], but most studies focused on the few days following hospital discharge [ 9 , 11 , 14 , 82 ]. Qualitative studies on medications at transition often focused on a specific topic, such as medication information, or a specific moment in time, and often included healthcare professionals, which muted patients’ voices [ 9 , 10 , 11 , 47 , 49 ]. Our qualitative longitudinal methodology was interested in capturing the temporal dynamics, in-depth narratives, and contextual nuances of patients’ medication experiences during transitions of care [ 59 , 83 ]. This approach provided a comprehensive understanding of how patients’ perspectives and behaviors evolved over time, offering insights into the complex interactions of medication management, understanding and adherence, and turning points within their medication journeys. A qualitative longitudinal design was used by Fylan et al. to underline patients’ resilience in medication management during and after discharge, by Brandberg et al. to show the dynamic process of self-management during the 4 weeks post-discharge and by Lawton et al. to examine how patients with type 2 diabetes perceived their care after discharge over a period of four years [ 49 , 50 , 51 ]. Our study focused on the first two months following hospitalization and future studies should focus on following discharged and at-risk patients over a longer period, as “transitions of care do not comprise linear trajectories of patients’ movements, with a starting and finishing point. Instead, they are endless loops of movements” [ 47 ].

Our results provide a particularly thorough description of how participants move from a state of total dependency during hospitalization regarding their medication management to a sudden and complete autonomy after hospital discharge impacting medication management, understanding, and adherence in the first days after discharge for some participants. Several qualitative studies have described the lack of shared decision-making and the loss of patient autonomy during hospitalization, which had an impact on self-management and created conflicts with healthcare professionals [ 75 , 81 , 84 ]. Our study also highlights nuanced patient experiences, including varying levels of patient needs, involvement, and proactivity during hospitalization and outpatient care, and our results contribute to capturing different perspectives that contrast with some literature that often portrays patients as more passive recipients of care [ 14 , 15 , 74 , 75 ]. Shared decision-making and proactive medication are key elements as they contribute to a smoother transition and better outcomes for patients post-discharge [ 85 , 86 , 87 ].

Consistent with the literature, the study identifies some challenges in medication initiation post-discharge [ 16 , 17 , 88 ] but our results also describe how daily routine rapidly takes over, either solidifying adherence behavior or generating barriers to medication adherence. Participants’ nonadherence prior to hospitalization was a factor influencing participants’ adherence post-hospitalization and this association should be further investigated, as literature showed that hospitalized patients have high scores of non-adherence [ 89 ]. Mortel et al. showed that more than 20% of discharged patients stopped their medications earlier than agreed with the physician and 25% adapted their medication intake [ 90 ]. Furthermore, patients who self-managed their medications had a lower perception of the necessity of their medication than patients who received help, which could negatively impact medication adherence [ 91 ]. Although participants in our study had high BMQ scores for necessity and lower scores for concerns, some participants expressed doubts about the need for their medications and a lack of motivation a few weeks after discharge. Targeted pharmacy interventions for newly prescribed medications have been shown to improve medication adherence, and hospital discharge is an opportune moment to implement this service [ 92 , 93 ].

Many medication changes were made during the transition of care (a median number of 7 changes during hospitalization and 7 changes during the two months after discharge), especially medication additions during hospitalization and interruptions after hospitalization. While medication changes during hospitalization are well described, the many changes following discharge are less discussed [ 7 , 94 ]. A Danish study showed that approximately 65% of changes made during hospitalization were accepted by primary healthcare professionals but only 43% of new medications initiated during hospitalization were continued after discharge [ 95 ]. The numerous changes after discharge may be caused by unnecessary intensification of medications during hospitalization, delayed discharge letters, lack of standardized procedures, miscommunication, patient self-management difficulties, or in response to an acute situation [ 96 , 97 , 98 ]. During the transition of care, in our study, both new and experienced participants were faced with difficulties in managing and understanding medication changes, either for newly prescribed medication or changes in previous medications. Such difficulties corroborate the findings of the literature [ 9 , 10 , 47 ] and our results showed that the lack of understanding during hospitalization led to participants having questions about their medications, even weeks after discharge. Particular attention should be given to patients’ understanding of medication changes jointly by physicians, nurses and pharmacists during the transition of care and in the months that follow as medications are likely to undergo as many changes as during hospitalization.

Implication for practice and future research

The patients’ perspectives in this study showed, at a system level, that there was a lack of standardization in healthcare professional practices regarding medication dispensing and follow-up. For now, in Switzerland, there are no official guidelines on medication prescription and dispensation during the transition of care although some international guidelines have been developed for outpatient healthcare professionals [ 3 , 99 , 100 , 101 , 102 ]. Here are some suggestions for improvement arising from our results. Patients should be included as partners and healthcare professionals should systematically assess (i) previous medication adherence, (ii) patients’ desired level of involvement and (iii) their needs for information during hospitalization. Hospital discharge processes should be routinely implemented to standardize hospital discharge preparation, medication prescribing, and dispensing. Discharge from the hospital should be planned with community pharmacies to ensure that all medications are available and, if necessary, doses of medications should be supplied by the hospital to bridge the gap. A partnership with outpatient healthcare professionals, such as general practitioners, community pharmacists, and homecare nurses, should be set up for effective asynchronous interprofessional collaboration to consolidate patients’ medication management, knowledge, and adherence, as well as to monitor signs of deterioration or adverse drug events.

Future research should consolidate our first attempt to develop a framework to better characterize medication at the transition of care, using Fig. 2   as a starting point. Contextualized interventions, co-designed by health professionals, patients and stakeholders, should be tested in a hybrid implementation study to test the implementation and effectiveness of the intervention for the health system [ 103 ].

Limitations

This study has some limitations. First, the transcripts were validated for accuracy by the interviewer but not by a third party, which could have increased the robustness of the transcription. Nevertheless, the interviewer followed all methodological recommendations for transcription. Second, patient inclusion took place during the COVID-19 pandemic, which may have had an impact on patient care and the availability of healthcare professionals. Third, we cannot guarantee the accuracy of some participants’ medication history before hospitalization, even though we contacted the participants’ main pharmacy, as participants could have gone to different pharmacies to obtain their medications. Fourth, our findings may not be generalizable to other populations and other healthcare systems because some issues may be specific to multimorbid patients with type 2 diabetes or to the Swiss healthcare setting. Nevertheless, issues encountered by our participants regarding their medications correlate with findings in the literature. Fifth, only 15 out of 21 participants took part in all the interviews, but most participants took part in at least three interviews and data saturation was reached. Lastly, by its qualitative and longitudinal design, it is possible that the discussion during interviews and participants’ reflections between interviews influenced participants’ management, knowledge, and adherence, even though this study was observational, and no advice or recommendations were given by the interviewer during interviews.

Discharged patients are willing to take steps to better manage, understand, and adhere to their medications, yet they are also faced with difficulties in the hospital and outpatient care. Furthermore, extensive changes in medications not only occur during hospitalization but also during the two months following hospital discharge, for which healthcare professionals should give particular attention. The different degrees of patients’ involvement, needs and resources should be carefully considered to enable them to better manage, understand and adhere to their medications. At a system level, patients’ experiences revealed a lack of standardization of medication practices during the transition of care. The healthcare system should provide the ecosystem needed for healthcare professionals responsible for or involved in the management of patients’ medications during the hospital stay, discharge, and outpatient care to standardize their practices while considering the patient as an active partner.

Data availability

The anonymized quantitative survey datasets and the qualitative codes are available in French from the corresponding author on reasonable request.

Abbreviations

adverse drug events

Adherence Visual Analogue Scale

Belief in Medication Questionnaire

Consolidated Criteria for Reporting Qualitative Research

case report form

standard deviation

World Health Organization

American Academy of Family Physician. Continuity of Care, Definition of 2020. Accessed 10 July 2022 https://www.aafp.org/about/policies/all/continuity-of-care-definition.html

Kripalani S, LeFevre F, Phillips CO, Williams MV, Basaviah P, Baker DW. Deficits in communication and information transfer between hospital-based and primary care physicians: implications for patient safety and continuity of care. JAMA. 2007;297(8):831–41.

Article   CAS   PubMed   Google Scholar  

World Health Organization (WHO). Medication Safety in Transitions of Care. 2019.

Forster AJ, Murff HJ, Peterson JF, Gandhi TK, Bates DW. The incidence and severity of adverse events affecting patients after discharge from the hospital. Ann Intern Med. 2003;138(3):161–7.

Article   PubMed   Google Scholar  

Krumholz HM. Post-hospital syndrome–an acquired, transient condition of generalized risk. N Engl J Med. 2013;368(2):100–2.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Banholzer S, Dunkelmann L, Haschke M, Derungs A, Exadaktylos A, Krähenbühl S, et al. Retrospective analysis of adverse drug reactions leading to short-term emergency hospital readmission. Swiss Med Wkly. 2021;151:w20400.

Blozik E, Signorell A, Reich O. How does hospitalization affect continuity of drug therapy: an exploratory study. Ther Clin Risk Manag. 2016;12:1277–83.

Article   PubMed   PubMed Central   Google Scholar  

Allen J, Hutchinson AM, Brown R, Livingston PM. User experience and care for older people transitioning from hospital to home: patients’ and carers’ perspectives. Health Expect. 2018;21(2):518–27.

Daliri S, Bekker CL, Buurman BM, Scholte Op Reimer WJM, van den Bemt BJF, Karapinar-Çarkit F. Barriers and facilitators with medication use during the transition from hospital to home: a qualitative study among patients. BMC Health Serv Res. 2019;19(1):204.

Bekker CL, Mohsenian Naghani S, Natsch S, Wartenberg NS, van den Bemt BJF. Information needs and patient perceptions of the quality of medication information available in hospitals: a mixed method study. Int J Clin Pharm. 2020;42(6):1396–404.

Foulon V, Wuyts J, Desplenter F, Spinewine A, Lacour V, Paulus D, et al. Problems in continuity of medication management upon transition between primary and secondary care: patients’ and professionals’ experiences. Acta Clin Belgica: Int J Clin Lab Med. 2019;74(4):263–71.

Article   Google Scholar  

Micheli P, Kossovsky MP, Gerstel E, Louis-Simonet M, Sigaud P, Perneger TV, et al. Patients’ knowledge of drug treatments after hospitalisation: the key role of information. Swiss Med Wkly. 2007;137(43–44):614–20.

PubMed   Google Scholar  

Ziaeian B, Araujo KL, Van Ness PH, Horwitz LI. Medication reconciliation accuracy and patient understanding of intended medication changes on hospital discharge. J Gen Intern Med. 2012;27(11):1513–20.

Allen J, Hutchinson AM, Brown R, Livingston PM. User experience and care integration in Transitional Care for older people from hospital to home: a Meta-synthesis. Qual Health Res. 2016;27(1):24–36.

Mackridge AJ, Rodgers R, Lee D, Morecroft CW, Krska J. Cross-sectional survey of patients’ need for information and support with medicines after discharge from hospital. Int J Pharm Pract. 2018;26(5):433–41.

Mulhem E, Lick D, Varughese J, Barton E, Ripley T, Haveman J. Adherence to medications after hospital discharge in the elderly. Int J Family Med. 2013;2013:901845.

Fallis BA, Dhalla IA, Klemensberg J, Bell CM. Primary medication non-adherence after discharge from a general internal medicine service. PLoS ONE. 2013;8(5):e61735.

Zhou L, Rupa AP. Categorization and association analysis of risk factors for adverse drug events. Eur J Clin Pharmacol. 2018;74(4):389–404.

Moreau-Gruet F. La multimorbidité chez les personnes de 50 ans et plus. Résultats basés sur l’enqête SHARE (Survey of Health, Ageing and Retirement in Europe. Obsan Bulletin 4/2013. 2013(Neuchâtel: OBservatoire suisse de la santé).

Iglay K, Hannachi H, Joseph Howie P, Xu J, Li X, Engel SS, et al. Prevalence and co-prevalence of comorbidities among patients with type 2 diabetes mellitus. Curr Med Res Opin. 2016;32(7):1243–52.

Sibounheuang P, Olson PS, Kittiboonyakun P. Patients’ and healthcare providers’ perspectives on diabetes management: a systematic review of qualitative studies. Res Social Adm Pharm. 2020;16(7):854–74.

Müller-Wieland D, Merkel M, Hamann A, Siegel E, Ottillinger B, Woker R, et al. Survey to estimate the prevalence of type 2 diabetes mellitus in hospital patients in Germany by systematic HbA1c measurement upon admission. Int J Clin Pract. 2018;72(12):e13273.

Blanc AL, Fumeaux T, Stirnemann J, Dupuis Lozeron E, Ourhamoune A, Desmeules J, et al. Development of a predictive score for potentially avoidable hospital readmissions for general internal medicine patients. PLoS ONE. 2019;14(7):e0219348.

Hansen LO, Greenwald JL, Budnitz T, Howell E, Halasyamani L, Maynard G, et al. Project BOOST: effectiveness of a multihospital effort to reduce rehospitalization. J Hosp Med. 2013;8(8):421–7.

Khalid JM, Raluy-Callado M, Curtis BH, Boye KS, Maguire A, Reaney M. Rates and risk of hospitalisation among patients with type 2 diabetes: retrospective cohort study using the UK General Practice Research Database linked to English Hospital Episode statistics. Int J Clin Pract. 2014;68(1):40–8.

Lussier ME, Evans HJ, Wright EA, Gionfriddo MR. The impact of community pharmacist involvement on transitions of care: a systematic review and meta-analysis. J Am Pharm Assoc. 2020;60(1):153–.

van der Heijden A, de Bruijne MC, Nijpels G, Hugtenburg JG. Cost-effectiveness of a clinical medication review in vulnerable older patients at hospital discharge, a randomized controlled trial. Int J Clin Pharm. 2019;41(4):963–71.

Bingham J, Campbell P, Schussel K, Taylor AM, Boesen K, Harrington A, et al. The Discharge Companion Program: an interprofessional collaboration in Transitional Care Model Delivery. Pharm (Basel). 2019;7(2):68.

Google Scholar  

Farris KB, Carter BL, Xu Y, Dawson JD, Shelsky C, Weetman DB, et al. Effect of a care transition intervention by pharmacists: an RCT. BMC Health Serv Res. 2014;14:406.

Meslot C, Gauchet A, Hagger MS, Chatzisarantis N, Lehmann A, Allenet B. A Randomised Controlled Trial to test the effectiveness of planning strategies to improve Medication Adherence in patients with Cardiovascular Disease. Appl Psychol Health Well Being. 2017;9(1):106–29.

Garnier A, Rouiller N, Gachoud D, Nachar C, Voirol P, Griesser AC, et al. Effectiveness of a transition plan at discharge of patients hospitalized with heart failure: a before-and-after study. ESC Heart Fail. 2018;5(4):657–67.

Daliri S, Bekker CL, Buurman BM, Scholte Op Reimer WJM, van den Bemt BJF, Karapinar-Çarkit F. Medication management during transitions from hospital to home: a focus group study with hospital and primary healthcare providers in the Netherlands. Int J Clin Pharm. 2020.

Hansen LO, Young RS, Hinami K, Leung A, Williams MV. Interventions to reduce 30-day rehospitalization: a systematic review. Ann Intern Med. 2011;155(8):520–8.

Leppin AL, Gionfriddo MR, Kessler M, Brito JP, Mair FS, Gallacher K, et al. Preventing 30-day hospital readmissions: a systematic review and meta-analysis of randomized trials. JAMA Intern Med. 2014;174(7):1095–107.

Donzé J, John G, Genné D, Mancinetti M, Gouveia A, Méan M et al. Effects of a Multimodal Transitional Care Intervention in patients at high risk of readmission: the TARGET-READ Randomized Clinical Trial. JAMA Intern Med. 2023.

Rodrigues CR, Harrington AR, Murdock N, Holmes JT, Borzadek EZ, Calabro K, et al. Effect of pharmacy-supported transition-of-care interventions on 30-Day readmissions: a systematic review and Meta-analysis. Ann Pharmacother. 2017;51(10):866–89.

Lam MYY, Dodds LJ, Corlett SA. Engaging patients to access the community pharmacy medicine review service after discharge from hospital: a cross-sectional study in England. Int J Clin Pharm. 2019;41(4):1110–7.

Hossain LN, Fernandez-Llimos F, Luckett T, Moullin JC, Durks D, Franco-Trigo L, et al. Qualitative meta-synthesis of barriers and facilitators that influence the implementation of community pharmacy services: perspectives of patients, nurses and general medical practitioners. BMJ Open. 2017;7(9):e015471.

En-Nasery-de Heer S, Uitvlugt EB, Bet PM, van den Bemt BJF, Alai A, van den Bemt P et al. Implementation of a pharmacist-led transitional pharmaceutical care programme: process evaluation of medication actions to reduce hospital admissions through a collaboration between Community and Hospital pharmacists (MARCH). J Clin Pharm Ther. 2022.

Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011;104(12):510–20.

De Geest S, Zúñiga F, Brunkert T, Deschodt M, Zullig LL, Wyss K, et al. Powering Swiss health care for the future: implementation science to bridge the valley of death. Swiss Med Wkly. 2020;150:w20323.

Noonan VK, Lyddiatt A, Ware P, Jaglal SB, Riopelle RJ, Bingham CO 3, et al. Montreal Accord on patient-reported outcomes (PROs) use series - paper 3: patient-reported outcomes can facilitate shared decision-making and guide self-management. J Clin Epidemiol. 2017;89:125–35.

Hesselink G, Schoonhoven L, Barach P, Spijker A, Gademan P, Kalkman C, et al. Improving patient handovers from hospital to primary care: a systematic review. Ann Intern Med. 2012;157(6):417–28.

(OFSP) Interprofessionnalité dans le domaine de la santé Soins ambulatoire. Accessed 4 January 2024. https://www.bag.admin.ch/bag/fr/home/strategie-und-politik/nationale-gesundheitspolitik/foerderprogramme-der-fachkraefteinitiative-plus/foerderprogramme-interprofessionalitaet.html

Mitchell SE, Laurens V, Weigel GM, Hirschman KB, Scott AM, Nguyen HQ, et al. Care transitions from patient and caregiver perspectives. Ann Fam Med. 2018;16(3):225–31.

Davoody N, Koch S, Krakau I, Hägglund M. Post-discharge stroke patients’ information needs as input to proposing patient-centred eHealth services. BMC Med Inf Decis Mak. 2016;16:66.

Ozavci G, Bucknall T, Woodward-Kron R, Hughes C, Jorm C, Joseph K, et al. A systematic review of older patients’ experiences and perceptions of communication about managing medication across transitions of care. Res Social Adm Pharm. 2021;17(2):273–91.

Fylan B, Armitage G, Naylor D, Blenkinsopp A. A qualitative study of patient involvement in medicines management after hospital discharge: an under-recognised source of systems resilience. BMJ Qual Saf. 2018;27(7):539–46.

Fylan B, Marques I, Ismail H, Breen L, Gardner P, Armitage G, et al. Gaps, traps, bridges and props: a mixed-methods study of resilience in the medicines management system for patients with heart failure at hospital discharge. BMJ Open. 2019;9(2):e023440.

Brandberg C, Ekstedt M, Flink M. Self-management challenges following hospital discharge for patients with multimorbidity: a longitudinal qualitative study of a motivational interviewing intervention. BMJ Open. 2021;11(7):e046896.

Lawton J, Rankin D, Peel E, Douglas M. Patients’ perceptions and experiences of transitions in diabetes care: a longitudinal qualitative study. Health Expect. 2009;12(2):138–48.

Mabire C, Bachnick S, Ausserhofer D, Simon M. Patient readiness for hospital discharge and its relationship to discharge preparation and structural factors: a cross-sectional study. Int J Nurs Stud. 2019;90:13–20.

Meyers DC, Durlak JA, Wandersman A. The quality implementation framework: a synthesis of critical steps in the implementation process. Am J Community Psychol. 2012;50(3–4):462–80.

Meyer-Massetti C, Hofstetter V, Hedinger-Grogg B, Meier CR, Guglielmo BJ. Medication-related problems during transfer from hospital to home care: baseline data from Switzerland. Int J Clin Pharm. 2018;40(6):1614–20.

Neeman M, Dobrinas M, Maurer S, Tagan D, Sautebin A, Blanc AL, et al. Transition of care: a set of pharmaceutical interventions improves hospital discharge prescriptions from an internal medicine ward. Eur J Intern Med. 2017;38:30–7.

Geese F, Schmitt KU. Interprofessional Collaboration in Complex Patient Care Transition: a qualitative multi-perspective analysis. Healthc (Basel). 2023;11(3).

Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, Petticrew M. Developing and evaluating complex interventions: the new Medical Research Council guidance. Int J Nurs Stud. 2013;50(5):587–92.

Thomson R, Plumridge L, Holland J, Editorial. Int J Soc Res Methodol. 2003;6(3):185–7.

Audulv Å, Hall EOC, Kneck Å, Westergren T, Fegran L, Pedersen MK, et al. Qualitative longitudinal research in health research: a method study. BMC Med Res Methodol. 2022;22(1):255.

Kim H, Sefcik JS, Bradway C. Characteristics of qualitative descriptive studies: a systematic review. Res Nurs Health. 2017;40(1):23–42.

Sandelowski M. Whatever happened to qualitative description? Res Nurs Health. 2000;23(4):334–40.

Bradshaw C, Atkinson S, Doody O. Employing a qualitative description Approach in Health Care Research. Glob Qual Nurs Res. 2017;4:2333393617742282.

PubMed   PubMed Central   Google Scholar  

Bellone JM, Barner JC, Lopez DA. Postdischarge interventions by pharmacists and impact on hospital readmission rates. J Am Pharm Assoc (2003). 2012;52(3):358–62.

Hennink MM, Kaiser BN, Marconi VC. Code saturation versus meaning saturation: how many interviews are Enough? Qual Health Res. 2016;27(4):591–608.

World Health Organization. Adherence to long-term therapies: evidence for action. 2003.

Fisher JD, Fisher WA, Amico KR, Harman JJ. An information-motivation-behavioral skills model of adherence to antiretroviral therapy. Health Psychol. 2006;25(4):462–73.

Bandura A. Health promotion from the perspective of social cognitive theory. Psychol Health. 1998;13(4):623–49.

ShiftEUROQOL Research FOndation EQ 5D Instruments. Accessed 30 July 2022 https://euroqol.org/eq-5d-instruments/sample-demo/

Jeppesen KM, Coyle JD, Miser WF. Screening questions to predict limited health literacy: a cross-sectional study of patients with diabetes mellitus. Ann Fam Med. 2009;7(1):24–31.

Giordano TP, Guzman D, Clark R, Charlebois ED, Bangsberg DR. Measuring adherence to antiretroviral therapy in a diverse population using a visual analogue scale. HIV Clin Trials. 2004;5(2):74–9.

Horne R, Weinman J, Hankins M. The beliefs about medicines questionnaire: the development and evaluation of a new method for assessing the cognitive representation of medication. Psychol Health. 1999;14(1):1–24.

Horne R, Chapman SC, Parham R, Freemantle N, Forbes A, Cooper V. Understanding patients’ adherence-related beliefs about medicines prescribed for long-term conditions: a meta-analytic review of the necessity-concerns Framework. PLoS ONE. 2013;8(12):e80633.

Braun V, Clarke V. Using thematic analysis in psychology. Qualitative Res Psychol. 2006;3(2):77–101.

Waibel S, Henao D, Aller M-B, Vargas I, Vázquez M-L. What do we know about patients’ perceptions of continuity of care? A meta-synthesis of qualitative studies. Int J Qual Health Care. 2011;24(1):39–48.

Rognan SE, Jørgensen MJ, Mathiesen L, Druedahl LC, Lie HB, Bengtsson K, et al. The way you talk, do I have a choice?’ Patient narratives of medication decision-making during hospitalization. Int J Qualitative Stud Health Well-being. 2023;18(1):2250084.

Michel B, Hemery M, Rybarczyk-Vigouret MC, Wehrle P, Beck M. Drug-dispensing problems community pharmacists face when patients are discharged from hospitals: a study about 537 prescriptions in Alsace. Int J Qual Health Care. 2016;28(6):779–84.

Bruhwiler LD, Hersberger KE, Lutters M. Hospital discharge: what are the problems, information needs and objectives of community pharmacists? A mixed method approach. Pharm Pract (Granada). 2017;15(3):1046.

Knight DA, Thompson D, Mathie E, Dickinson A. Seamless care? Just a list would have helped!’ Older people and their carer’s experiences of support with medication on discharge home from hospital. Health Expect. 2013;16(3):277–91.

Gualandi R, Masella C, Viglione D, Tartaglini D. Exploring the hospital patient journey: what does the patient experience? PLoS ONE. 2019;14(12):e0224899.

Norberg H, Håkansson Lindqvist M, Gustafsson M. Older individuals’ experiences of Medication Management and Care after Discharge from Hospital: an interview study. Patient Prefer Adherence. 2023;17:781–92.

Jones KC, Austad K, Silver S, Cordova-Ramos EG, Fantasia KL, Perez DC, et al. Patient perspectives of the hospital discharge process: a qualitative study. J Patient Exp. 2023;10:23743735231171564.

Hesselink G, Flink M, Olsson M, Barach P, Dudzik-Urbaniak E, Orrego C, et al. Are patients discharged with care? A qualitative study of perceptions and experiences of patients, family members and care providers. BMJ Qual Saf. 2012;21(Suppl 1):i39–49.

Murray SA, Kendall M, Carduff E, Worth A, Harris FM, Lloyd A, et al. Use of serial qualitative interviews to understand patients’ evolving experiences and needs. BMJ. 2009;339:b3702.

Berger ZD, Boss EF, Beach MC. Communication behaviors and patient autonomy in hospital care: a qualitative study. Patient Educ Couns. 2017;100(8):1473–81.

Davis RE, Jacklin R, Sevdalis N, Vincent CA. Patient involvement in patient safety: what factors influence patient participation and engagement? Health Expect. 2007;10(3):259–67.

Greene J, Hibbard JH. Why does patient activation matter? An examination of the relationships between patient activation and health-related outcomes. J Gen Intern Med. 2012;27(5):520–6.

Mitchell SE, Gardiner PM, Sadikova E, Martin JM, Jack BW, Hibbard JH, et al. Patient activation and 30-day post-discharge hospital utilization. J Gen Intern Med. 2014;29(2):349–55.

Weir DL, Motulsky A, Abrahamowicz M, Lee TC, Morgan S, Buckeridge DL, et al. Failure to follow medication changes made at hospital discharge is associated with adverse events in 30 days. Health Serv Res. 2020;55(4):512–23.

Kripalani S, Goggins K, Nwosu S, Schildcrout J, Mixon AS, McNaughton C, et al. Medication nonadherence before hospitalization for Acute Cardiac events. J Health Commun. 2015;20(Suppl 2):34–42.

Mortelmans L, De Baetselier E, Goossens E, Dilles T. What happens after Hospital Discharge? Deficiencies in Medication Management encountered by geriatric patients with polypharmacy. Int J Environ Res Public Health. 2021;18(13).

Mortelmans L, Goossens E, Dilles T. Beliefs about medication after hospital discharge in geriatric patients with polypharmacy. Geriatr Nurs. 2022;43:280–7.

Bandiera C, Ribaut J, Dima AL, Allemann SS, Molesworth K, Kalumiya K et al. Swiss Priority setting on implementing Medication Adherence interventions as Part of the European ENABLE COST action. Int J Public Health. 2022;67.

Elliott R, Boyd M, Nde S. at e. Supporting adherence for people starting a new medication for a long-term condition through community pharmacies: a pragmaticrandomised controlled trial of the New Medicine Service. 2015.

Grimmsmann T, Schwabe U, Himmel W. The influence of hospitalisation on drug prescription in primary care–a large-scale follow-up study. Eur J Clin Pharmacol. 2007;63(8):783–90.

Larsen MD, Rosholm JU, Hallas J. The influence of comprehensive geriatric assessment on drug therapy in elderly patients. Eur J Clin Pharmacol. 2014;70(2):233–9.

Viktil KK, Blix HS, Eek AK, Davies MN, Moger TA, Reikvam A. How are drug regimen changes during hospitalisation handled after discharge: a cohort study. BMJ Open. 2012;2(6):e001461.

Strehlau AG, Larsen MD, Søndergaard J, Almarsdóttir AB, Rosholm J-U. General practitioners’ continuation and acceptance of medication changes at sectorial transitions of geriatric patients - a qualitative interview study. BMC Fam Pract. 2018;19(1):168.

Anderson TS, Lee S, Jing B, Fung K, Ngo S, Silvestrini M, et al. Prevalence of diabetes medication intensifications in older adults discharged from US Veterans Health Administration Hospitals. JAMA Netw Open. 2020;3(3):e201511.

Royal Pharmaceutical Society. Keeping patients safewhen they transfer between care providers– getting the medicines right June 2012. Accessed 27 October 2023 https://www.rpharms.com/Portals/0/RPS%20document%20library/Open%20access/Publications/Keeping%20patients%20safe%20transfer%20of%20care%20report.pdf

International Pharmaceutical Federation (FIP). Medicines reconciliation: A toolkit for pharmacists. Accessed 23 September 2023 https://www.fip.org/file/4949

Californian Pharmacist Assiociation Transitions of Care Resource Guide. https://cdn.ymaws.com/www.cshp.org/resource/resmgr/Files/Practice-Policy/For_Pharmacists/transitions_of_care_final_10.pdf

Royal Collegue of Physicians. Medication safety at hospital discharge: Improvement guide and resource. Accessed 18 September 2023 https://www.rcplondon.ac.uk/file/33421/download

Douglas N, Campbell W, Hinckley J. Implementation science: buzzword or game changer. J Speech Lang Hear Res. 2015;58.

Download references

Acknowledgements

The authors would like to thank all the patients who took part in this study. We would also like to thank the Geneva University Hospitals Patients Partners + 3P platform as well as Mrs. Tourane Corbière and Mr. Joël Mermoud, patient partners, who reviewed interview guides for clarity and significance. We would like to thank Samuel Fabbi, Vitcoryavarman Koh, and Pierre Repiton for the transcriptions of the audio recordings.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Open access funding provided by University of Geneva

Author information

Authors and affiliations.

School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland

Léa Solh Dost & Marie P. Schneider

Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Geneva, Switzerland

Division of Endocrinology, Diabetes, Hypertension and Nutrition, Department of Medicine, Geneva University Hospitals, Geneva, Switzerland

Giacomo Gastaldi

You can also search for this author in PubMed   Google Scholar

Contributions

LS, GG, and MS conceptualized and designed the study. LS and GG screened and recruited participants. LS conducted the interviews. LS, GG, and MS performed data analysis and interpretation. LS drafted the manuscript and LS and MS worked on the different versions. MS and GG approved the final manuscript.

Corresponding authors

Correspondence to Léa Solh Dost or Marie P. Schneider .

Ethics declarations

Ethics approval and consent to participate.

Ethics approval was sought and granted by the Cantonal Research Ethics Commission, Geneva (CCER) (2020 − 01779), and informed consent to participate was obtained from all participants.

Consent for publication

Informed consent for publication was obtained from all participants.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Solh Dost, L., Gastaldi, G. & Schneider, M. Patient medication management, understanding and adherence during the transition from hospital to outpatient care - a qualitative longitudinal study in polymorbid patients with type 2 diabetes. BMC Health Serv Res 24 , 620 (2024). https://doi.org/10.1186/s12913-024-10784-9

Download citation

Received : 28 June 2023

Accepted : 26 February 2024

Published : 13 May 2024

DOI : https://doi.org/10.1186/s12913-024-10784-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Continuity of care
  • Transition of care
  • Patient discharge
  • Medication management
  • Medication adherence
  • Qualitative research
  • Longitudinal studies
  • Patient-centered care
  • Interprofessional collaboration
  • Type 2 diabetes

BMC Health Services Research

ISSN: 1472-6963

what is descriptive analysis in research

Language selection

  • Français fr

Original quantitative research – Substance-related poisoning hospitalizations and homelessness in Canada: a descriptive study 

Health Promotion and Chronic Disease Prevention in Canada Journal

HPCDP Journal Home

Original quantitative research – Substance-related poisoning hospitalizations and homelessness in Canada: a descriptive study

Submit a manuscript

  • Information for authors

About HPCDP

  • About the Journal
  • Contact the Editors
  • Past issues

Previous | Table of Contents | Next

Rebecca Plouffe, MPH Author reference footnote 1 Author reference footnote * ; Rochelle White, MPH Author reference footnote 2 Author reference footnote * ; Heather Orpana, PhD Author reference footnote 1 Author reference footnote 3 ; Vera Grywacheski, MPH Author reference footnote 1

https://doi.org/10.24095/hpcdp.44.5.02

This article has been peer reviewed.

Creative Commons License

Recommended Attribution

Research article  by Plouffe R et al. in the HPCDP Journal licensed under a Creative Commons Attribution 4.0 International License

Rebecca Plouffe, Centre for Surveillance and Applied Research, Health Promotion and Chronic Disease Prevention Branch, Public Health Agency of Canada, 785 Carling Ave, Ottawa, ON  K1A 0K9; Tel: 437-326-9306; Email: [email protected]

Plouffe R, White R, Orpana H, Grywacheski V. Substance-related poisoning hospitalizations and homelessness in Canada: a descriptive study. Health Promot Chronic Dis Prev Can. 2024;44(5):208-17. https://doi.org/10.24095/hpcdp.44.5.02

Introduction: The objective of this analysis is to describe patient demographics, the context, characteristics and outcomes of a substance-related poisoning, and the recorded mental disorder of people with housing and those experiencing homelessness.

Methods: Hospitalization data for Canada (except Quebec) from 1 April 2019 to 31 March 2020 were retrieved from the Canadian Institute for Health Information ( CIHI ) Discharge Abstract Database using ICD -10- CA codes for up to 25 diagnoses for substance-related poisonings, homelessness status and other characteristics relevant to the patient’s hospitalization. We compared the characteristics of people experiencing homelessness with those of people who were housed, and their substance-related poisoning hospitalizations, using chi-square, t tests and Fisher exact test.

Results: There was a higher proportion of males, younger individuals and people with recorded mental disorders among people experiencing homelessness hospitalized for a substance-related poisoning than among their housed counterparts. Substance-related poisonings among people experiencing homelessness were more likely to be accidental, involve opioids and stimulants (most frequently fentanyl and its analogues and heroin), result in lengthier hospitalizations and end with leaving the hospital against medical advice.

Conclusion: These findings can be used to strengthen strategies and interventions to reduce substance-related harms in priority populations, particularly those experiencing homelessness.

Keywords: opioids, overdose, fentanyl, housing, mental disorder, hospitalization

  • People who are homeless were vastly overrepresented among people hospitalized for substance-related poisonings.
  • In fiscal year 2019/2020, people experiencing homelessness who were hospitalized for substance-related poisonings spent, on average, about 4 days longer in hospital than people with housing.
  • Almost one-quarter (23%) of the hospitalizations of people experiencing homelessness ended with the patients leaving against medical advice, compared to 8% of hospitalizations for people with housing.
  • An important area for future research would be to identify ways in which hospitals can retain and treat this at-risk population.
  • Research can also help inform additional prevention and harm reduction activities.

Introduction

Canada continues to experience an overdose crisis, with substance-related morbidity and mortality increasing significantly since 2016. Footnote 1 Between January 2016 and December 2020, there were 24 671 opioid-related and 11 176 stimulant-related poisoning hospitalizations in Canada (excluding Quebec). Footnote 1 Although most regions of the country have been affected, British Columbia, Alberta and Ontario continue to have the most opioid and stimulant-related poisoning hospitalizations. Footnote 1 Some subpopulations appear to be disproportionately affected by the overdose crisis, including people experiencing homelessness and housing insecurity. Footnote 2

The rates of substance use are disproportionally high among people experiencing homelessness, and they are at a greater risk of substance-related harms compared to people with housing. Footnote 3 Footnote 4 Footnote 5 Footnote 6 Footnote 7 People who are homeless are also more likely than people with housing to be diagnosed with a mental health disorder, remain hospitalized for longer, and be readmitted within 30 days following discharge. Footnote 4 Footnote 5 Footnote 6 Footnote 7 Footnote 8 Footnote 9

On average, at least 235 000 people experience homelessness in a given year in Canada, and at least 35 000 on a given night. Footnote 2 Across the country, an additional 50 000 people could be experiencing hidden homelessness every night, that is, staying temporarily with friends, relatives or others because they have no other housing option and no immediate prospect of permanent housing. Footnote 2 The number of people experiencing homelessness in Canada is very difficult to estimate, but it is thought to be increasing, possibly also as a result of job losses and evictions during and since the COVID-19 pandemic. Footnote 2 Footnote 10 Footnote 11

The objective of this analysis is to describe patterns of substance-related poisoning hospitalizations in Canada (excluding Quebec) among people with housing and people experiencing homelessness, using the Canadian Institute for Health Information ( CIHI ) Discharge Abstract Database ( DAD ) during the pre-pandemic year of 1 April 2019 to 31 March 2020. This study also examines patterns by patient demographics (sex and age); context of the poisoning (substances involved and intention of the poisoning); hospitalization characteristics and outcomes (length of stay, intensive care unit admission and discharge disposition); and recorded mental disorders.

To our knowledge, this is the first study comparing the characteristics of those experiencing homelessness and those who are housed among people hospitalized for a substance-related poisoning across Canada using this data source. The results of this study can be used to better understand the intersection of homelessness, mental health and substance-related harms and how hospital care is experienced differently by people who are homeless.

Data source

We obtained data from the DAD , which captures acute inpatient discharge records for hospitalizations across Canada, excluding Quebec. In 2019–2020, the DAD had full coverage for acute inpatient care, except from one facility that did not submit data for six periods (an estimated total of 1100 missing abstracts). Footnote 12 Data were presented for the time from 1 April 2019 to 31 March 2020. International Classification of Diseases and Related Health Problems, Tenth Revision, Canada ( ICD -10- CA ) codes were used to capture up to 25 diagnoses from the patient’s hospitalization.

Identifying the study sample

The methodology used to identify substance-related poisonings was adapted from existing CIHI methods. Footnote 13 Footnote 14 Trained coders reviewed medical records and assigned substance specific ICD -10- CA codes according to CIHI coding directives. Footnote 13 Footnote 15 Substance-related poisonings may be recorded in a patient’s chart based on toxicological analyses, self-report and/or responsiveness to treatment received (for instance, reversal of an opioid poisoning after being administered naloxone). Poisonings of interest were included if they were due to the following substances: opioids (T40.0, T40.1, T40.2, T40.20–T40.23, T40.28, T40.3, T40.4, T40.40, T40.41, T40.48, T40.6); stimulants (T40.5, T43.6); cannabis (T40.7); hallucinogens (T40.8, T40.9); alcohol (T51); other depressants (T42.3, T42.4, T42.6, T42.7); and psychotropic drugs (T43.8, T43.9).

This analysis was limited to significant poisonings, defined as cases where the poisoning influenced the duration of the time the patient spent in hospital and the treatment they received. Secondary diagnoses and unconfirmed or query diagnoses were excluded.

Additional variables

Homelessness status.

Any mention of the ICD -10- CA code Z59.0 on a patient discharge abstract was used to note confirmed or unconfirmed and suspected instances of homelessness status. Homelessness status upon admission to hospital is mandatory to code when mentioned in physician documentation or noted on routine review of the medical record.

Intention of poisoning

Intention of the poisoning was identified in line with CIHI coding standards, where coders assign an external cause ICD -10- CA code indicating whether the poisoning was accidental (X41, X42, X45), intentional (X61, X62, X65) or undetermined (Y11, Y12, Y15). Confirmed and suspected diagnoses were included in the intention analysis. Records containing one or more poisonings with a missing associated external cause code were excluded from analyses of intention.

Recorded mental disorders

Consistent with CIHI methodology, recorded mental disorders were identified using any relevant ICD -10- CA diagnoses recorded on the patient discharge abstract during their stay for the substance-related poisoning. Footnote 15 Footnote 16 It is mandatory to record the diagnosis of a mental disorder if having this disorder significantly affects the treatment received, requires treatment beyond maintenance of the pre-existing disorder or increases the length of stay in hospital by at least 24 hours.

All ICD -10- CA codes for a mental disorder on the patient discharge abstract were captured, including confirmed and suspected diagnoses. The following were included: substance-related and addictive disorders (F10–F19, F55, F63.0); schizophrenia and other psychotic disorders (F20–F25, F28, F29); mood disorders (F30–F34, F38, F39, F53.0, F53.1); anxiety disorders (F40, F41, F93.0–F93.2, F94.0); selected disorders of personality and behaviour (F60–F62, F68 [excluding F68.1], F69); and other mental disorders (F42–F45, F48.0, F48.1, F48.8, F48.9, F50–F52, F53.8, F53.9, F54, F59, F63 [excluding F63.0], F68.1, F90–F92, F93.3, F93.8, F93.9, F94.1, F94.2, F94.8, F94.9, F95, F98.0, F98.1–F98.5, F98.8, F98.9, F99, O99.3). Some examples of “other mental disorders” covered by these ICD -10- CA codes include hypochondriacal disorder, eating disorders, nonorganic sleep disorders, conduct disorders, and posttraumatic stress disorder.

Length of stay in hospital and discharge disposition

Total length of stay in hospital was calculated as the sum of the number of days a patient was in acute inpatient care and alternate level of care. Acute inpatient care length of stay describes when a patient is receiving necessary treatment for a disease or severe episode of illness for a short period; alternate level of care describes when a patient is occupying a bed, but not requiring the intensity of services provided in that care setting.

Discharge disposition refers to the status of the patient upon discharge or where the patient is discharged to, and is identified by examining the patient’s hospitalization record.

We conducted descriptive analyses of substance-related poisoning hospitalizations among people experiencing homelessness as well as among housed people (in order to have a reference category). Percentages of substance-related poisoning hospitalizations with a specific recorded mental disorder were calculated based on the denominator of the total study population; these may exceed 100% when summed because of polysubstance poisonings and diagnoses of multiple mental disorders. Counts of less than five per disaggregated category were suppressed in accordance with the CIHI privacy policy. Footnote 17

We used a Pearson chi-square test to determine significant associations between housing status and categorical variables, and a Fisher exact test when expected counts for cells were less than five. A Satterthwaite t test was used to test differences by housing status for continuous variables.

All analyses were completed using statistical package SAS Enterprise Guide version 7.1 ( SAS Institute Inc., Cary, NC , US ).

Between April 2019 and March 2020, there were 10 659 substance-related poisoning hospitalizations in Canada (excluding Quebec). Approximately 6% (623) of these were recorded among people experiencing homelessness.

Patient demographics

Among those hospitalized for substance-related poisonings, there was a higher proportion of males experiencing homelessness (71%) than females (29%), while among those with housing, slightly more females (53%) than males (47%) were hospitalized ( Table 1 ). Of those hospitalized for substance-related poisoning, the mean age of people experiencing homelessness was lower than the mean age of their housed counterparts (39.2 vs. 42.5 years; p < 0.001) ( Figure 1 ).

Figure 1. Text version below.

Hospitalization characteristics and outcomes

People who were homeless stayed for a significantly longer time in hospital for a substance-related poisoning than those with housing (11.0 vs. 6.6 days; p  < 0.05) ( Table 2 ). The proportions of hospitalizations admitted into intensive care did not differ between the two population groups, but people experiencing homelessness had a higher mean length of stay in alternate level of care than those with housing (3.7 vs. 0.8 days; p < 0.05). Among individuals with housing, 8% discharged themselves from the hospital against medical advice, whereas 23% of individuals who were homeless did the same ( p < 0.001). There was no difference between the two population groups in the proportions who died while hospitalized for a substance-related poisoning.

The majority (68%) of people with housing who were hospitalized for substance-related poisoning were discharged home. In comparison, 49% of hospitalizations of people experiencing homelessness on admission and who refused shelter upon discharge were “discharged home,” suggesting that this finding should be interpreted with caution.

Substances involved in poisoning hospitalization

Opioids were the most common type of substance involved in hospitalizations for a substance-related poisoning ( Table 3 ), but to a greater extent among people experiencing homelessness than among people with housing (61% vs. 40%; p < 0.001). Stimulants, such as cocaine and methamphetamine, were also involved in a greater proportion of hospitalizations of people who were homeless (29%) compared to people with housing (29% vs. 19%; p  < 0.001). In contrast, other depressants, for example, benzodiazepines and other sedatives, were more common in hospitalizations of people with housing compared to those experiencing homelessness (39% vs. 19%; p < 0.001).

Where an opioid was involved in a poisoning hospitalization, fentanyl and its analogues (34% vs. 20%; p < 0.001) and heroin (15% vs. 7%; p < 0.001) were more prevalent in higher proportions of people experiencing homelessness than of people with housing. In contrast, oxycodone, codeine and hydromorphone were significantly more prevalent in hospitalizations of people with housing.

The percentage of substance-related poisoning hospitalizations that involved one, two or three or more substances did not differ by housing status.

Intention of the poisoning

Higher proportions of substance-related poisoning hospitalizations were recorded as accidental among people who were homeless than among people with housing (62% vs. 45%; p < 0.001) ( Table 4 ). People with housing had a higher proportion of such hospitalizations recorded as intentional self-harm (46% vs. 26% for people experiencing homelessness; p  < 0.001). This pattern was also observed among females and males separately, although the magnitude of the differences varied.

People experiencing homelessness who were hospitalized for substance-related poisonings had a higher proportion of mental disorders recorded during their hospital stay than those with housing (61% vs. 52%; p < 0.001) ( Table 5 ). The most commonly recorded mental disorders for both populations were substance-related and addictive disorders, although a significantly greater proportion of people who were homeless than people with housing had this diagnosis (51% vs. 25%; p < 0.001). People with housing who were hospitalized for substance-related poisonings had a higher proportion of recorded mood disorders (21% vs. 11%; p < 0.001) and of recorded anxiety disorders (9% vs. 3%; p < 0.001) than their counterparts who were experiencing homelessness.

Stratification by sex showed significant differences in the distribution of substance-related poisoning hospitalizations with various mental disorders. Females experiencing homelessness were significantly more likely to have substance-related and addictive disorders (48% vs. 21%; p < 0.001) and schizophrenia and other psychotic disorders recorded (5% vs. 2%; p < 0.05) compared to their housed counterparts. Conversely, housed females were more likely to have mood (26% vs. 15%; p < 0.001) and anxiety disorders recorded (11% vs. 6%; p  < 0.05) compared to females who were homeless.

A similar trend was observed among males, with the most substantial difference between the two populations in diagnosed substance-related and addictive disorders. Among males experiencing homelessness, 53% had such a diagnosis compared to 29% of housed males ( p  < 0.001). Housed males were more likely to have mood disorders (15% vs. 9%; p  < 0.001) and anxiety disorders recorded (6% vs. 2%; p < 0.001) and less likely to have selected disorders of personality and behaviour (3% vs. 6%; p < 0.001) compared to males who were homeless.

Among hospitalizations for substance-related poisonings, males and younger adults were disproportionately represented among people experiencing homelessness, as compared to the housed population. Higher proportions of homelessness among men than among women have been previously reported. Footnote 2 Footnote 18 However, recent evidence suggests that many more women than men may be experiencing hidden homelessness, resulting in misclassification of housing status among females. Footnote 19 The younger mean age of people experiencing homelessness hospitalized for substance-related poisoning observed in this study likely reflects the younger age of people who are homeless. Footnote 2

We found that a higher proportion of substance-related poisoning hospitalizations among people experiencing homelessness were recorded as accidental rather than intentional self-harm, and that opioids and stimulants, notably fentanyl and its analogues and heroin, were most commonly involved in poisonings leading to hospitalizations. The large proportion of these poisonings being accidental is likely due to the increase in fentanyl and its analogues in the illicit (unregulated) drug supply. These substances have high potencies and are increasingly being combined with other controlled substances. Footnote 20 Footnote 21 Footnote 22 The people who were hospitalized may not have known that the substance they were taking also contained fentanyl and/or its analogues, they may have combined substances to alleviate withdrawal symptoms or to enhance their experience, Footnote 23 or the dose may have been higher than expected, leading to an accidental poisoning.

Hospitalizations for substance-related poisonings among people experiencing homelessness were longer than for those who were housed, with total lengths of stay averaging 11 and 6.6 days, respectively. This finding may be explained by the higher rates of infectious diseases, chronic diseases and long-term physical health conditions prevalent among people who are homeless Footnote 24 Footnote 25 as well as higher rates of mental disorders, as observed in our study. Hospitalized individuals may have also received treatment for comorbidities, resulting in increasing lengths of stay. Further, the lengths of stay for alternate level of care may have differed between the two populations because hospitals may not have options for discharging patients experiencing homelessness.

Lastly, people who were homeless were more likely than those with housing to leave the hospital against medical advice or before being formally discharged by a health care professional. This finding is consistent with previous literature that people who are discharged against medical advice are more likely to be young, male and experiencing homelessness. Footnote 26 Footnote 27 Choi et al. Footnote 26 found that people who are discharged against medical advice had higher rates of both readmission within 14 days and mortality within 12 months. This particular finding has important implications for clinical care settings looking for ways to decrease the number of patients leaving hospital against medical advice and thus reduce harms, mortality and associated costs and increase health care satisfaction.

Strengths and limitations

To our knowledge, this analysis is the first to examine characteristics of substance-related poisoning hospitalizations of people experiencing homelessness across Canada. The DAD includes inpatient acute hospitalization discharges from all provinces and territories except Quebec and therefore has substantial coverage of the population of interest in this study.

There are, however, limitations. First, this analysis only examined acute inpatient hospitalizations, and patterns of substance-related poisonings may vary across different health care settings. For example, individuals with less severe poisonings may be treated through emergency medical services or in the emergency department; not including these health care settings could lead to underestimating the overall prevalence of substance-related poisonings. Moreover, if the pattern of where people seek health care for such poisonings and who is admitted to hospital varies by housing status, these results may not adequately reflect true differences.

Also not captured were data on people who died before being admitted to hospital, which potentially focused this analysis on less severe cases or instances where help was more readily available.

The unit of analysis was hospital discharge and not at the person level or for entire episodes of care. People could have been readmitted multiple times during the study period, which would be counted as multiple hospitalizations. People with multiple admissions may have unique characteristics that are not presented in this study.

Another limitation was that identification of homelessness status may have relied on self-reported information. It is possible that some patients may not have disclosed their homelessness status, or were unable to due to disability or death, which could have resulted in their being misclassified. Similarly, it was only possible to examine housing status as a binary term, as either experiencing homelessness, or not. More nuance is required by including unstable housing, poor housing quality, overcrowding or former homelessness to fully understand the impact of housing status. It is a relatively new requirement to record homelessness status on hospital discharge records and therefore a trend analysis was not possible.

Identifying intention of the poisoning also relied on self-reported information, which can introduce bias if patients are unwilling or unable to disclose this information. Poisonings were classified as accidental unless other intentions were clearly documented, potentially leading to an overrepresentation of accidental poisonings. Throughout this analysis it was not possible to determine which poisonings were a result of pharmaceutical or illicit (or unregulated) opioids, or a combination of both, which hinders the ability to develop targeted interventions to reduce harms associated with substances from different sources.

The estimates of recorded mental disorders did not reflect the overall prevalence of mental disorders among those hospitalized for substance-related poisonings; rather, the mental disorders that were recorded were relevant to the patient’s stay in hospital.

Lastly, Canadian Armed Forces veterans are two to three times more likely to experience homelessness than the general population, and the absence of military status in these data hinders the ability to provide a comprehensive understanding of the relationships between military service, housing status and substance-related poisonings. Footnote 28

Implications

The COVID-19 pandemic has widened health disparities, particularly among hard-to-reach populations. Footnote 2 Footnote 3 Footnote 4 There has also been an increase in the number of people experiencing homelessness, as well as an increase in the number of substance-related poisonings across the country. Footnote 2 Footnote 10 Footnote 11 Although we examined a pre-pandemic period, the results of this study could be used to support actions to reduce substance-related harms by strengthening public health and social infrastructure as people continue to experience the long-term impacts associated with the COVID-19 pandemic as well as other economic impacts.

These findings highlight the need for health care professionals, researchers and policy makers to better understand the intersection of homelessness, mental illness and substance-related harms. They can also inform sectors that interact with vulnerably housed individuals. In particular, these findings demonstrate how substance-related harms and care in hospital settings may differ for people with housing compared to people experiencing homelessness, as exhibited by the high proportion of substance-related poisoning hospitalizations that ended with leaving against medical advice. This difference in care may be due to a variety of factors, such as care not meeting the needs of this population or due to a lack of trust or stigma, and may warrant further investigation to reduce barriers to care for people who are homeless.

Compared to people with housing, unhoused people hospitalized for a substance-related poisoning are more likely to be younger, male and with a recorded mental disorder. A higher proportion of substance-related poisoning hospitalizations of unhoused people were accidental and involving opioids and stimulants, particularly fentanyl and its analogues and heroin. Lastly, substance-related poisoning hospitalizations of unhoused people lasted longer and were more likely to end with leaving the hospital against medical advice.

These findings emphasize the importance of acknowledging the intersectionality of mental illness, substance use and housing status when considering options to address substance-related harms. Future studies should aim to determine how care in hospital settings and other social services can optimize support, in order to prevent further substance-related harms.

Acknowledgements

We thank the Canadian Institute for Health Information for collecting and providing the data used in this study, and Patrick Hunter and Nan Zhou from Infrastructure Canada for their input and support of this project.

Some material from this report has been previously published by the Government of Canada ; permission was obtained to reprint it. All major contributors were contacted and agreed to this publication.

This research did not receive any funding from agencies in the public, commercial or not-for-profit sectors.

Conflicts of interest

None to declare.

Authors’ contributions and statement

  • RP : Investigation, data curation, methodology, formal analysis, writing – original draft.
  • RW : Investigation, data curation, methodology, formal analysis, writing – review & editing.
  • HO : Conceptualization, supervision, writing – review & editing.
  • VG : Conceptualization, supervision, validation, writing – review & editing.

Parts of this material are based on data and information compiled and provided by the Canadian Institute for Health Information ( CIHI ). However, the analyses, conclusions, opinions and statements expressed herein are those of the authors and not necessarily those of CIHI or of the Government of Canada.

Page details

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Systematic analysis of approaches used in cardiac arrest trials to inform relatives about trial enrolment of non-surviving patients
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-7648-5313 Helen Pocock 1 , 2 ,
  • Abigail Dove 1 ,
  • Laura Pointeer 1 ,
  • http://orcid.org/0000-0003-2123-2022 Keith Couper 1 , 3 ,
  • Gavin D Perkins 1 , 3
  • 1 University of Warwick Medical School , Coventry , UK
  • 2 South Central Ambulance Service NHS Foundation Trust , Bicester , Oxfordshire , UK
  • 3 Critical Care Unit , University Hospitals Birmingham NHS Foundation Trust , Birmingham , UK
  • Correspondence to Helen Pocock, University of Warwick Medical School, Coventry, UK; helen.pocock{at}scas.nhs.uk

Background The recruitment of patients to emergency research studies without the requirement for prior informed consent has furthered the conduct of randomised studies in cardiac arrest. Frameworks enabling this vary around the world depending on local legal or ethical requirements. When an enrolled patient does not survive, researchers may take one of three approaches to inform relatives of their enrolment: a direct (active) approach, providing information indirectly (passively) and inviting relatives to seek further information if they choose, or providing no information about the trial (no attempt). Previous studies have described experiences of US researchers’ active approach but there is little known about approaches elsewhere.

We aimed to conduct a structured investigation of methods used in cardiac arrest trials to provide information about trial enrolment to relatives of non-surviving patients.

Methods We systematically searched trial registries to identify randomised clinical trials that recruited cardiac arrest patients. Trials were eligible for inclusion if they recruited adults during cardiac arrest (or within 1 hour of return of spontaneous circulation) between 2010 and 2022 (in the decade prior to study conception). We extracted data from trial registries and, where relevant, published papers and protocols. Investigators were contacted and asked to describe the style, rationale and timing of approach to relatives of non-surviving patients. We present descriptive statistics.

Results Our trial registry search identified 710 unique trials, of which 108 were eligible for inclusion. We obtained information from investigators for 64 (62%) trials. Approximately equal numbers of trials attempted to actively inform relatives of non-survivors (n=28 (44% (95% CI; 31% to 57%))), or made no attempt (n=25 (39% (95% CI; 27% to 52%))). The remaining studies provided general information about the trial to relatives but did not actively inform them (n=11 (17% (95% CI; 8% to 29%))).

Conclusions There is wide variability in the approach taken to informing relatives of non-surviving patients enrolled in cardiac arrest randomised clinical trials.

  • Out-of-Hospital Cardiac Arrest
  • heart arrest
  • Randomized Controlled Trial
  • research design

Data availability statement

Data sharing not applicable as no data sets generated and/or analysed for this study.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:  https://creativecommons.org/licenses/by/4.0/ .

https://doi.org/10.1136/emermed-2023-213648

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

In the UK cardiac arrest researchers do not routinely attempt to actively inform relatives of non-surviving patients of their enrolment in a trial. This contrasts with the US approach, which is mandated in law, but elsewhere other factors may determine the approach taken.

WHAT THIS STUDY ADD

This study systematically investigates the methods by which information is provided to relatives by researchers. We report the various approaches, influences and issues in 62% of the cardiac arrest trials registered around the world in the period 2010–2022. We found wide variability in practice.

HOW THIS STUDY MIGHT AFFECT RESEARCH PRACTICE OR POLICY

We suggest further research is needed into the relatives’ perspective to inform a best practice recommendation.

Introduction

The recruitment of patients to emergency research studies without the requirement for prior informed consent has extended the possibility of improving care through research to this previously neglected patient group. The legal and ethical frameworks governing this, and associated practices vary around the world.

Similarly, the approach to notification of relatives when a participant in the trial does not survive also varies. We are aware for instance, of major differences in approach between UK and US settings. To date, it has not been common practice in UK cardiac arrest trials to actively inform the relatives of non-surviving patients of their inclusion in research. This has been based on concerns surrounding the potential burdens to the recipients of this information and the practicalities of delivering such communications. Such an approach has been supported by patient and public advisors. 1–3 Previously, either no approach or a ‘passive’ approach has been made to relatives to avoid or minimise the introduction of additional emotional burden at a time of great distress. 1 2 4 A passive approach is one where general trial information is targeted to locations where relatives are likely to come across it. This information includes an invitation to seek further information should they wish. It places the choice of whether or not to find out more with the relatives. But this is in direct contrast to practice elsewhere.

In the USA the approach is mandated in law. Patients without capacity may be enrolled in drug and device research under the exception from informed consent regulations or the waiver of informed consent regulations for other types of study. 5 6 The former requires prior community consultation, whereby information is provided and opinions sought in advance of the research to inform the ethics board and to demonstrate the requisite respect for community members. In both types of study, if the patient dies before a legally authorised person can be contacted, the enrolment must be disclosed to them if possible. 7 Researchers are required to attempt to notify relatives and this is usually communicated via letter. 8 9 Difficulties with reliably obtaining valid contact information have been identified but studies report between 83% and 91% success in acquiring correct details and sending such letters. 8–10 Despite some difficulties, the information contained in the letters appears to have been acceptable to recipients since consent for data use was refused in fewer than 1% of cases. 9 10

During our recent cardiac arrest trial our patient and public partners cautiously stated their preference for actively informing the relatives of non-surviving patients. 11 With no UK precedent we sought the experience of researchers elsewhere. There is a need for a structured investigation to describe the different approaches taken by researchers across the world to inform the relatives of non-survivors which may guide others and inform practice. The primary objective was to determine the approach taken to informing the relatives of non-survivors enrolled in cardiac arrest trials recently published, registered or currently being conducted. The secondary objectives were to describe the methods used, to establish what influenced the selection of the approach and whether researchers experienced any specific issues.

Study design and setting

We conducted a systematic analysis of the practices of researchers in cardiac arrest trials using an online data collection tool (see online supplemental materials ).

Supplemental material

The project was carried out according to the Helsinki declaration and principles of Good Clinical Practice. We report our study in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology checklist for observational studies. 12

Selection of participants

To identify researchers, we searched international trial databases containing details of trials meeting WHO and International Committee of Medical Journal Editors registration criteria and issued with a registration number whether planned, underway or completed. 13 14 Specifically, in line with current best-practice guidance, we searched ClinicalTrials.gov, a US-hosted resource including studies from 221 countries, and the WHO International Clinical Trials Registry Platform meta-register, which contains trial records from 18 registries from across the world. 15

We contacted the lead investigators of studies fulfilling the following eligibility criteria:

Inclusion criteria

Randomised or quasi-randomised clinical trials registered, currently being undertaken, or published since 1 January 2010.

Effectiveness or efficacy trial of an intervention delivered intra-arrest or within 1 hour of return of spontaneous circulation.

Trial includes adult patients only (as defined by local legal or clinical definition).

Patients enrolled without prior informed consent.

Exclusion criteria

Unable to determine eligibility status from registry record and trial publication/protocol not available in English.

We searched trial registries on 05 October 2022 for interventional studies using the term ‘cardiac arrest’. To capture studies reporting from 2010, the search included studies first posted from 01 January 2007. Three years of balanced sensitivity and specificity for identification of studies within resource limitations. Screening for relevance was conducted by title by a single author (HP). Eligibility was determined by a full registry entry review and 10% was checked by a second author (KC). Discrepancies were resolved by a third author (GDP).

Data collection

We sought information regarding approaching the relatives of non-surviving patients from the registry record or from published papers and protocols. By searching using the study title or acronym from the registry, or keywords from that title, we identified publications and protocols. If required, publications were searched on ResearchGate. 16 If this information was not identified in these sources, we contacted researchers directly. Researchers’ contact information was obtained either from the registry record or from any other publications they had coauthored. We sent emails to individual recipients to minimise the chance of them being perceived as spam, and customised with their names and titles to make the invitation more personal. 17 Separate emails were sent for each trial and researchers were invited to respond separately to each of these. If we received no response after 7 days, the invitation email was sent again.

We constructed the data collection tool using Qualtrics XM (2005–2022, Provo, Utah, USA). 18 The content of the tool was informed by experienced critical care researchers and items were carefully worded for ease of interpretation and international applicability. To maximise online readability and ease of response each item was presented on a new page with a consistent spatial arrangement, colour and font. 19 20 We applied skip logic so that respondents viewed only relevant questions depending on their previous responses. 18 To minimise the cognitive load on respondents, the number of items was reduced to a minimum and no responses were forced. 17 21 Clinical sensibility testing of questions was conducted among experienced cardiac arrest researchers to establish face validity. 19 The tool was piloted for comprehension and performance, such as ensuring that skip logic and multiple response options were functional. A participant information sheet was embedded, and consent was sought prior to accessing the tool. We conducted data collection from December 2022 to January 2023 and de-identified all responses (by respondent) prior to analysis.

Data analysis

We summarised and presented quantitative survey responses using descriptive statistics. Qualitative data responses were subject to summative content analysis whereby keywords were identified to interpret the contextual meaning of the content. 22 We compared characteristics of responder trials and non-responder trials using a χ 2 test. Data analysis was performed by a single researcher (HP) and checked by a second (KC).

Patient and public involvement

Discussion with patient and public partners inspired this study, but as this was an investigation of researcher practice, they were not involved in the study design.

Characteristics of study participants

We identified 710 unique records through registry searches ( figure 1 ). After a review of trial records, we retained 108 for analysis. A summary of the study characteristics (population, intervention, comparator, primary outcome) is provided in the online supplemental material .

  • Download figure
  • Open in new tab
  • Download powerpoint

Study flow diagram. ICTRP, International Clinical Trials Registry Platform.

Since it was not possible to obtain the required information from the registries, publications or protocols, invitations to participate in the study were sent out for all studies (n=108). Five email addresses were returned as unknown and alternative addresses could not be identified. Information relating to 64 studies was provided by researchers (62% response rate). There was one ‘duplicate’ study (entered with a different title in different registries) but responses were not conflicting so the second response was removed from the data set.

Most trials that responded were set in the out-of-hospital setting (n=48, 75%), in Europe (n=40, 63%) and were not drug trials (n=46, 72%) ( table 1 ). We observed some differences in trial characteristics between responding trials and non-responding in relation to cardiac arrest setting and location of recruitment.

  • View inline

Characteristics of responding and non-responding studies

Main results

In 28 studies (44% (95% CI; 31% to 57%)), researchers took an active approach to informing relatives of non-surviving patients about study enrolment ( figure 2 ). In an approximately equal number of studies (n=25 (39% (95% CI; 27% to 52%))) no information was provided to relatives. A passive approach to information provision was taken in 11 studies (17% (95% CI; 8% to 29%)).

Type, method and rationale of approach taken with associated resulting issues.

The relative proportions of each approach and the global variation of the approach taken are shown in figure 3 . None of the Asian studies provided information to relatives. No information provision is the predominant approach in Australasian studies whereas an active approach is favoured in North America.

Approach taken to information relatives by continent.

Figure 4 shows how the approach has varied according to the date that trials opened for recruitment. No particular approach has consistently predominated. The reduction in the active approach seen post-pandemic is not related to the continent of origin of the studies.

Approach taken to informing relatives in studies opened between 2008 and 2023.

Of the 64 studies, 23% (n=15) used a model of a community consent whereby researchers engaged with the local community prior to data collection and the community supported the trial. This activity appeared to be independent of the method of informing relatives.

Method of active approach to informing relatives

Of the 28 studies where relatives were actively informed of the patients’ inclusion, 27 respondents indicated how first contact was made with relatives ( figure 2 ). Of those who responded, this was most frequently by letter or email after the event (n=11). The least frequently used contact strategy was a personal visit by a researcher after the event (n=2). 26 researchers submitted responses regarding the timing of information provision. This ranged from immediately following the resuscitation attempt to 90 days post event. The most frequent timing was within 24 hours; 15 studies delivered information during this time. These studies were mostly European (n=13) but none were recruited in the UK. In four studies at least a month elapsed before relatives were informed. The timing was protocolised in three studies (4–6 weeks in one study, 2 months in two others) to give relatives sufficient time to recover from the immediate grieving period. 11 23 24 The majority of studies (n=8) opting to delay informing relatives beyond 24 hours recruited patients from North America or Australia.

Method of passive approach to informing relatives

In 11 studies researchers took a passive approach to informing relatives. In these cases, general information about the study and contact details for the research team were shared via public channels. This was designed to enable relatives to decide whether to seek further information about the study such as whether their relative was enrolled. Most studies shared information via more than one medium. The most popular means of sharing information was via posters placed in community locations likely to be frequented by the relatives of non-surviving patients, such as emergency department waiting rooms and register offices. This method was cited in 10 studies. Except for one study opening in 2011, most of the studies using electronic media opened more recently (2018–2023).

What influenced choice of approach

Researchers revealed that there were many different influences on their choice of approach to informing relatives ( table 2 ). Of the 62 studies where an influence was reported, 45 indicated it was a requirement of the Research Ethics Committee (REC). However, this was not the dominant influence across all studies. Previous research experience (n=8) and advice from public groups (n=7) were more frequently indicated as determinants of the adoption of a passive approach. Those studies using a ‘no information’ approach were most strongly influenced by the REC.

Influences on approach to informing relatives of non-survivors

When asked for their rationale for opting to provide no information to relatives, researchers often cited more than one reason ( figure 2 ). Most often it was simply that there was no requirement to inform relatives (n=16). Researchers commonly indicated that they felt it would be inappropriate to inform relatives that their loved one had been enrolled in a research study (n=12).

Concerns/issues resulting from chosen approach

The investigators who used an active approach to informing relatives did not report any issues/concerns from relatives ( figure 2 ). In one study, relatives contacted researchers to say that they viewed the study enrolment positively and, in another, relatives had got in touch to find out whether any action was required on their part. In the 10 studies that had recruited participants, no issues were encountered with a passive approach. In the majority (n=20) of studies where no information was provided, investigators reported no concerns. In four studies, researchers reported concerns/issues arising from the approach taken. These included two studies where issues were received from relatives and a further two studies where relatives had misunderstandings about death. This question was not answered in one study.

In this systematic analysis of the approach taken to informing the relatives of non-surviving patients enrolled in cardiac arrest studies we found substantial variability in practice. In the last 15 years, of the studies where there was a response, 28 studies actively informed relatives, 25 provided no information and in 11 studies information was provided passively. The most common means of active contact was via letter or email. The most frequent timing for notification was within the first 24 hours post-mortem, although the maximum delay was 90 days before delivering this news. We are moderately confident that our findings are generalisable as we obtained information on 62% of the target population and the sample was diverse.

RECs commonly influenced the approach taken. However, it is not clear whether it was the REC or the local legal/regulatory requirements that were the key determinants. In Australia, legal requirements vary by jurisdiction. In some states there is no legal precedent for enrolling patients with impaired consent in research, though it is recognised that such enrolments do occur. 25 Even where legal processes do exist, there is no specific requirement of informing relatives of non-survivors. 25

In the UK patients may be enrolled in emergency medicines trials without prior informed consent under the provisions of the Medicines for Human Use (Clinical Trials) (Amendment) Regulations (2019) or other research in England and Wales under the Mental Capacity Act (2005). 26 27 Neither of these give explicit guidance on whether to advise the bereaved family of their relative’s enrolment. 28 In the absence of legal guidance, practice has been found to vary. Some researchers reported routinely seeking permission for the use of data from family members; others did not since a valid professional legal consultee declaration negated the need to put relatives through additional distress. 28 In Scotland, patients may be enrolled in medicines trials, but the Adults With Incapacity (Scotland) Act 2000 does not make provision for patients to be enrolled in other types of research. 29 Although a Scottish REC may consider such research, there would likely be subtle differences in practice within a UK-wide study.

The variability in approach reflects the fact that only 36% of studies’ approaches were mandated in law. It is also likely to be influenced by the prevailing culture. For example, talking about death is considered disrespectful or blasphemous in Chinese culture. 30 In this context, actively informing relatives about research participation would likely be extremely uncomfortable for both researchers and relatives. This may help to explain the consistent ‘no information’ approach we found in the Asian studies. Where researchers do not actively inform relatives, there is a risk that they may find out through ‘uncontrolled’ channels such as the media. Disclosure through such channels may lead to inaccuracies. 31 Where researchers used a passive strategy, this was most frequently via posters in targeted locations such as the waiting rooms of primary care physicians. However, variable effectiveness of this strategy has been reported, with between 5% and 78% of patients noticing posters in the waiting rooms of General Practitioner (GP) practices or hospital emergency departments. 32 33

A widespread passive provision of trial information may require significant research funding, with no guarantee that the information has reached the relevant target audience. 32 Active direct notification by writing to family members may be more specific but researchers have warned that the process is cumbersome, costly and resource-intensive. 8 Additionally, this precise targeting is limited by the ability of researchers to reliably obtain contact information for relatives. Researchers using postal notification have previously reported finding either incorrect or no contact details for relatives in 9–18% of cases. 8 9 Timing of the notification also needs careful consideration. We found that most studies contacted relatives within 24 hours of death. This contrasts with previous studies which report median notification delivery times of 6–8 days. 9 10 There is a risk that high withdrawal rates following notification of research participation may bias the research findings. In practice, this rarely occurs; between 0% and 0.91% of withdrawals have followed active notifications. 8–10

Our study has several important limitations. First, we may have increased our response rate had we followed-up by phone as well as email, although this was precluded by resource constraints. Second, we observed some differences between trials that responded and trials that did not respond, which may reflect some selection bias. For example, most of our responses were from Europe which may lead to a cultural bias in the findings. Third, some researchers represented multiple studies (a cluster) and the approach in these studies was usually homogeneous. This clustering effect may have introduced bias, as if you respond for one study you are more likely to respond for another. Finally, we did not ask whether regulations regarding emergency research had changed during the period of interest.

The most important perspective missing from the literature at this time is the perspective of the recipient of the notifications in whichever form. Qualitative research exploring their experiences would be an important contribution to the literature. This may also form a precursor to subsequent work looking to standardise the approach.

In summary, we found wide variability in the approach taken to informing relatives of non-surviving patients enrolled in cardiac arrest studies and researchers cited a variety of influences on their selection of approach.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

This study involves human participants and was approved by The University of Warwick Biomedical and Scientific Research Ethics Committee (BSREC 133/20-21). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

For the purpose of open access, the author has applied a Creative Commons Attribution (CC-BY) licence to any Author Accepted Manuscript version arising from this submission. Pilot work for this project was presented via poster at the European Resuscitation Council conference in Antwerp in June 2022. The abstract was subsequently published: Dove A, Pointeer L, Couper K, Perkins GD, Pocock H Variability in approach to informing the relatives of non-surviving participants in cardiac arrest research: a questionnaire study. Resuscitation, 2022; 175 (S1): S73.

  • Davies S , et al
  • Perkins GD ,
  • Deakin CD , et al
  • Lall R , et al
  • Benger J , et al
  • US Government Food and Drug Administration
  • Spence JM ,
  • Notarangelo V ,
  • Frank J , et al
  • Brienza AM ,
  • Sylvester R ,
  • Ryan CM , et al
  • Russell R , et al
  • Deakin CD ,
  • von Elm E ,
  • Altman DG ,
  • Egger M , et al
  • International Committee of Medical Journal Editors
  • World Health Organisation
  • Hunter KE ,
  • Webster AC ,
  • Page MJ , et al
  • ResearchGate
  • Burns KEA ,
  • Duffett M ,
  • Kho ME , et al
  • Stefkovics Á
  • Hsieh H-F ,
  • Bernard S ,
  • Cameron P , et al
  • Hein C , et al
  • Rallis Legal
  • UK Government Department for Health
  • Paddock K ,
  • Woolfall K ,
  • Frith L , et al
  • UK Government (Scottish Parliament)
  • Cheng H-WB ,
  • Shek P-SK ,
  • Man C-W , et al
  • Knowles T ,
  • Perkins GD , et al
  • O’Malley GF ,
  • Giraldo P ,
  • Deitch K , et al
  • Maskell K ,
  • McDonald P ,

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

Handling editor Edward Carlton

X @HelenPocock3

Contributors HP and GDP conceived the study. HP, KC and GDP designed the study. AD and LP designed and piloted the data collection tool with the advice and supervision of HP and KC. HP revised the data collection tool for the main study. HP conducted registry searches. HP, KC and GDP determined study eligibility. HP sought information from study publications. HP contacted researchers for further information, supported by KC and GDP. HP collated the data and analysed the data supported by KC and GDP. HP drafted the manuscript and all authors contributed substantively to its revision. GDP takes responsibility for the paper as a whole (guarantor).

Funding HP, Clinical Doctoral Research Fellow (ICA-CDRF-2018-04-ST2-005) is funded by National Health Service (NHS) England (NHSE)/National Institute for Health and Care Research (NIHR) for this research project. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR, University of Warwick, NHS or the UK Department of Health and Social Care. The funder had no input into study design, collection, analysis or interpretation of data, writing of the report or the decision to submit the article for publication. GDP is supported by the National Institute for Health and Care Research (NIHR) Applied Research Collaboration (ARC) West Midlands. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

IMAGES

  1. 18 Descriptive Research Examples (2024)

    what is descriptive analysis in research

  2. What is Descriptive Analysis?- Types and Advantages

    what is descriptive analysis in research

  3. PPT

    what is descriptive analysis in research

  4. How To Use Descriptive Analysis In Research

    what is descriptive analysis in research

  5. A Complete Guide for Descriptive Analysis

    what is descriptive analysis in research

  6. What is Descriptive Analysis?- Types and Advantages

    what is descriptive analysis in research

VIDEO

  1. Reporting Descriptive Analysis

  2. Data analysis and interpretation of descriptive research (part 2) with example

  3. Descriptive Analysis

  4. The arithmetic mean #shortsviral #shorts_video #shortsvideo #mean

  5. Data analysis and interpretation of descriptive research study

  6. Descriptive analysis in SPSS

COMMENTS

  1. What is Descriptive Analysis?- Types and Advantages

    Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns might emerge that fulfill every condition of the data. It is one of the most important steps for conducting statistical data analysis. It gives you a conclusion of the distribution of your data, helps ...

  2. Descriptive Research

    Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can use surveys, observations or case studies to answer what, where, when and how questions, but not why questions.

  3. Descriptive Analytics

    Descriptive Analytics. Definition: Descriptive analytics focused on describing or summarizing raw data and making it interpretable. This type of analytics provides insight into what has happened in the past. It involves the analysis of historical data to identify patterns, trends, and insights. Descriptive analytics often uses visualization ...

  4. Descriptive Analysis: How-To, Types, Examples

    Descriptive analysis is the process of using statistical techniques to describe or summarize a set of data. It does not make predictions about the future, but it can generate accessible insights from past data. Learn the benefits, types, and steps of descriptive analysis with examples.

  5. Descriptive Research Design

    Here are some common methods of data analysis for descriptive research: Descriptive Statistics. This method involves analyzing data to summarize and describe the key features of a sample or population. Descriptive statistics can include measures of central tendency (e.g., mean, median, mode) and measures of variability (e.g., range, standard ...

  6. PDF Descriptive analysis in education: A guide for researchers

    Box 1. Descriptive Analysis Is a Critical Component of Research Box 2. Examples of Using Descriptive Analyses to Diagnose Need and Target Intervention on the Topic of "Summer Melt" Box 3. An Example of Using Descriptive Analysis to Evaluate Plausible Causes and Generate Hypotheses Box 4.

  7. Descriptive Analysis: What It Is + Best Research Tips

    Descriptive analysis is a sort of data research that aids in describing, demonstrating, or helpfully summarizing data points so those patterns may develop that satisfy all of the conditions of the data. It is the technique of identifying patterns and links by utilizing recent and historical data. Because it identifies patterns and associations ...

  8. Quant Analysis 101: Descriptive Statistics

    Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis. Measures of central tendency include the mean (average), median and mode. Skewness indicates whether a dataset leans to one side or another. Measures of dispersion include the range, variance and standard deviation.

  9. Descriptive Statistics

    Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population . In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e ...

  10. Descriptive Research Design

    Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can use surveys, observations, or case studies to answer what, where, when, and how questions, but not why questions.

  11. Descriptive analysis in education: A guide for researchers

    Descriptive analysis identifies patterns in data to answer questions about who, what, where, when, and to what extent. This guide describes how to more effectively approach, conduct, and communicate quantitative descriptive analysis. The primary audience for this guide includes members of the research community who conduct and publish both ...

  12. Quantitative analysis: Descriptive statistics

    Numeric data collected in a research project can be analysed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs.Inferential analysis refers to the statistical testing of hypotheses (theory testing).

  13. Descriptive Statistics for Summarising Data

    Virtually any research design which produces quantitative data and statistics (even to the extent of just counting the number of occurrences of several events) provides opportunities for graphical data display which may help to clarify or illustrate important data characteristics or relationships. ... Analysis Descriptive Statistics Descriptive ...

  14. Descriptive Research: Characteristics, Methods + Examples

    Some distinctive characteristics of descriptive research are: Quantitative research: It is a quantitative research method that attempts to collect quantifiable information for statistical analysis of the population sample. It is a popular market research tool that allows us to collect and describe the demographic segment's nature.

  15. What is Descriptive Research? Definition, Methods, Types and Examples

    Descriptive research is a methodological approach that seeks to depict the characteristics of a phenomenon or subject under investigation. In scientific inquiry, it serves as a foundational tool for researchers aiming to observe, record, and analyze the intricate details of a particular topic. This method provides a rich and detailed account ...

  16. What Is Descriptive Analytics? 5 Examples

    Descriptive analytics is the process of using current and historical data to identify trends and relationships. It's sometimes called the simplest form of data analysis because it describes trends and relationships but doesn't dig deeper. Descriptive analytics is relatively accessible and likely something your organization uses daily.

  17. What Is Descriptive Analytics? A Complete Guide

    Descriptive analytics is the simplest form of data analysis, and involves summarizing a data set's main features and characteristics. Descriptive analytics relies on statistical measures of distribution, central tendency, and variability. It provides an overview of varied data types, from financial statements to surveys, website traffic, and ...

  18. Descriptive Research: Design, Methods, Examples, and FAQs

    Descriptive research is an exploratory research method.It enables researchers to precisely and methodically describe a population, circumstance, or phenomenon.. As the name suggests, descriptive research describes the characteristics of the group, situation, or phenomenon being studied without manipulating variables or testing hypotheses.This can be reported using surveys, observational ...

  19. Chapter 14 Quantitative Analysis Descriptive Statistics

    Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of ...

  20. Descriptive Statistics

    Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important: Data Summarization: Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large ...

  21. Descriptive Statistics: Definition, Overview, Types, and Example

    Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of it. Descriptive statistics ...

  22. Descriptive Research

    Descriptive research typically involves the collection of both qualitative and quantitative data through methods such as surveys, observational studies, case studies, or focus groups. 4. Data Analysis. Descriptive research data is analyzed to identify patterns, relationships, and trends within the data.

  23. (PDF) Descriptive Analysis

    Descriptive statistics is the initial stage of analysis used to describe and summarize data. The availability of a large amount of data and very efficient computational methods strengthened this ...

  24. Descriptive Statistics: Reporting the Answers to the 5 Basic Questions

    Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic …

  25. Qualitative and descriptive research: Data type versus data analysis

    Qualitative research collects data qualitatively, and the method of analysis is also primarily qualitative. This often involves an inductive exploration of the data to identify recurring themes, patterns, or concepts and then describing and interpreting those categories. Of course, in qualitative research, the data collected qualitatively can ...

  26. Patient medication management, understanding and adherence during the

    A descriptive qualitative longitudinal research approach was adopted, with four in-depth semi-structured interviews per participant over a period of two months after discharge. Interviews were based on semi-structured guides, transcribed verbatim, and a thematic analysis was conducted. ... A descriptive statistical analysis of sociodemographic ...

  27. Enhancing Road Safety Decision-Making through Analysis of Youth ...

    The descriptive analysis of the collected data is presented through the responses of our participants. This section provides a comprehensive analysis of road safety among youth in Morocco. It begins with an examination of demographic characteristics, followed by an exploration of road safety perceptions.

  28. Original quantitative research

    Abstract. Introduction: The objective of this analysis is to describe patient demographics, the context, characteristics and outcomes of a substance-related poisoning, and the recorded mental disorder of people with housing and those experiencing homelessness. Methods: Hospitalization data for Canada (except Quebec) from 1 April 2019 to 31 March 2020 were retrieved from the Canadian Institute ...

  29. Systematic analysis of approaches used in cardiac arrest trials to

    Background The recruitment of patients to emergency research studies without the requirement for prior informed consent has furthered the conduct of randomised studies in cardiac arrest. Frameworks enabling this vary around the world depending on local legal or ethical requirements. When an enrolled patient does not survive, researchers may take one of three approaches to inform relatives of ...