Skip to content

Read the latest news stories about Mailman faculty, research, and events. 

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

Content Analysis

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts. As an example, researchers can evaluate language used within a news article to search for bias or partiality. Researchers can then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of surrounding the text.

Description

Sources of data could be from interviews, open-ended questions, field research notes, conversations, or literally any occurrence of communicative language (such as books, essays, discussions, newspaper headlines, speeches, media, historical documents). A single study may analyze various forms of text in its analysis. To analyze the text using content analysis, the text must be coded, or broken down, into manageable code categories for analysis (i.e. “codes”). Once the text is coded into code categories, the codes can then be further categorized into “code categories” to summarize data even further.

Three different definitions of content analysis are provided below.

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics of messages.” (from Holsti, 1968)

Definition 2: “An interpretive and naturalistic approach. It is both observational and narrative in nature and relies less on the experimental elements normally associated with scientific research (reliability, validity, and generalizability) (from Ethnography, Observational Research, and Narrative Inquiry, 1994-2012).

Definition 3: “A research technique for the objective, systematic and quantitative description of the manifest content of communication.” (from Berelson, 1952)

Uses of Content Analysis

Identify the intentions, focus or communication trends of an individual, group or institution

Describe attitudinal and behavioral responses to communications

Determine the psychological or emotional state of persons or groups

Reveal international differences in communication content

Reveal patterns in communication content

Pre-test and improve an intervention or survey prior to launch

Analyze focus group interviews and open-ended questions to complement quantitative data

Types of Content Analysis

There are two general types of content analysis: conceptual analysis and relational analysis. Conceptual analysis determines the existence and frequency of concepts in a text. Relational analysis develops the conceptual analysis further by examining the relationships among concepts in a text. Each type of analysis may lead to different results, conclusions, interpretations and meanings.

Conceptual Analysis

Typically people think of conceptual analysis when they think of content analysis. In conceptual analysis, a concept is chosen for examination and the analysis involves quantifying and counting its presence. The main goal is to examine the occurrence of selected terms in the data. Terms may be explicit or implicit. Explicit terms are easy to identify. Coding of implicit terms is more complicated: you need to decide the level of implication and base judgments on subjectivity (an issue for reliability and validity). Therefore, coding of implicit terms involves using a dictionary or contextual translation rules or both.

To begin a conceptual content analysis, first identify the research question and choose a sample or samples for analysis. Next, the text must be coded into manageable content categories. This is basically a process of selective reduction. By reducing the text to categories, the researcher can focus on and code for specific words or patterns that inform the research question.

General steps for conducting a conceptual content analysis:

1. Decide the level of analysis: word, word sense, phrase, sentence, themes

2. Decide how many concepts to code for: develop a pre-defined or interactive set of categories or concepts. Decide either: A. to allow flexibility to add categories through the coding process, or B. to stick with the pre-defined set of categories.

Option A allows for the introduction and analysis of new and important material that could have significant implications to one’s research question.

Option B allows the researcher to stay focused and examine the data for specific concepts.

3. Decide whether to code for existence or frequency of a concept. The decision changes the coding process.

When coding for the existence of a concept, the researcher would count a concept only once if it appeared at least once in the data and no matter how many times it appeared.

When coding for the frequency of a concept, the researcher would count the number of times a concept appears in a text.

4. Decide on how you will distinguish among concepts:

Should text be coded exactly as they appear or coded as the same when they appear in different forms? For example, “dangerous” vs. “dangerousness”. The point here is to create coding rules so that these word segments are transparently categorized in a logical fashion. The rules could make all of these word segments fall into the same category, or perhaps the rules can be formulated so that the researcher can distinguish these word segments into separate codes.

What level of implication is to be allowed? Words that imply the concept or words that explicitly state the concept? For example, “dangerous” vs. “the person is scary” vs. “that person could cause harm to me”. These word segments may not merit separate categories, due the implicit meaning of “dangerous”.

5. Develop rules for coding your texts. After decisions of steps 1-4 are complete, a researcher can begin developing rules for translation of text into codes. This will keep the coding process organized and consistent. The researcher can code for exactly what he/she wants to code. Validity of the coding process is ensured when the researcher is consistent and coherent in their codes, meaning that they follow their translation rules. In content analysis, obeying by the translation rules is equivalent to validity.

6. Decide what to do with irrelevant information: should this be ignored (e.g. common English words like “the” and “and”), or used to reexamine the coding scheme in the case that it would add to the outcome of coding?

7. Code the text: This can be done by hand or by using software. By using software, researchers can input categories and have coding done automatically, quickly and efficiently, by the software program. When coding is done by hand, a researcher can recognize errors far more easily (e.g. typos, misspelling). If using computer coding, text could be cleaned of errors to include all available data. This decision of hand vs. computer coding is most relevant for implicit information where category preparation is essential for accurate coding.

8. Analyze your results: Draw conclusions and generalizations where possible. Determine what to do with irrelevant, unwanted, or unused text: reexamine, ignore, or reassess the coding scheme. Interpret results carefully as conceptual content analysis can only quantify the information. Typically, general trends and patterns can be identified.

Relational Analysis

Relational analysis begins like conceptual analysis, where a concept is chosen for examination. However, the analysis involves exploring the relationships between concepts. Individual concepts are viewed as having no inherent meaning and rather the meaning is a product of the relationships among concepts.

To begin a relational content analysis, first identify a research question and choose a sample or samples for analysis. The research question must be focused so the concept types are not open to interpretation and can be summarized. Next, select text for analysis. Select text for analysis carefully by balancing having enough information for a thorough analysis so results are not limited with having information that is too extensive so that the coding process becomes too arduous and heavy to supply meaningful and worthwhile results.

There are three subcategories of relational analysis to choose from prior to going on to the general steps.

Affect extraction: an emotional evaluation of concepts explicit in a text. A challenge to this method is that emotions can vary across time, populations, and space. However, it could be effective at capturing the emotional and psychological state of the speaker or writer of the text.

Proximity analysis: an evaluation of the co-occurrence of explicit concepts in the text. Text is defined as a string of words called a “window” that is scanned for the co-occurrence of concepts. The result is the creation of a “concept matrix”, or a group of interrelated co-occurring concepts that would suggest an overall meaning.

Cognitive mapping: a visualization technique for either affect extraction or proximity analysis. Cognitive mapping attempts to create a model of the overall meaning of the text such as a graphic map that represents the relationships between concepts.

General steps for conducting a relational content analysis:

1. Determine the type of analysis: Once the sample has been selected, the researcher needs to determine what types of relationships to examine and the level of analysis: word, word sense, phrase, sentence, themes. 2. Reduce the text to categories and code for words or patterns. A researcher can code for existence of meanings or words. 3. Explore the relationship between concepts: once the words are coded, the text can be analyzed for the following:

Strength of relationship: degree to which two or more concepts are related.

Sign of relationship: are concepts positively or negatively related to each other?

Direction of relationship: the types of relationship that categories exhibit. For example, “X implies Y” or “X occurs before Y” or “if X then Y” or if X is the primary motivator of Y.

4. Code the relationships: a difference between conceptual and relational analysis is that the statements or relationships between concepts are coded. 5. Perform statistical analyses: explore differences or look for relationships among the identified variables during coding. 6. Map out representations: such as decision mapping and mental models.

Reliability and Validity

Reliability : Because of the human nature of researchers, coding errors can never be eliminated but only minimized. Generally, 80% is an acceptable margin for reliability. Three criteria comprise the reliability of a content analysis:

Stability: the tendency for coders to consistently re-code the same data in the same way over a period of time.

Reproducibility: tendency for a group of coders to classify categories membership in the same way.

Accuracy: extent to which the classification of text corresponds to a standard or norm statistically.

Validity : Three criteria comprise the validity of a content analysis:

Closeness of categories: this can be achieved by utilizing multiple classifiers to arrive at an agreed upon definition of each specific category. Using multiple classifiers, a concept category that may be an explicit variable can be broadened to include synonyms or implicit variables.

Conclusions: What level of implication is allowable? Do conclusions correctly follow the data? Are results explainable by other phenomena? This becomes especially problematic when using computer software for analysis and distinguishing between synonyms. For example, the word “mine,” variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. Software can obtain an accurate count of that word’s occurrence and frequency, but not be able to produce an accurate accounting of the meaning inherent in each particular usage. This problem could throw off one’s results and make any conclusion invalid.

Generalizability of the results to a theory: dependent on the clear definitions of concept categories, how they are determined and how reliable they are at measuring the idea one is seeking to measure. Generalizability parallels reliability as much of it depends on the three criteria for reliability.

Advantages of Content Analysis

Directly examines communication using text

Allows for both qualitative and quantitative analysis

Provides valuable historical and cultural insights over time

Allows a closeness to data

Coded form of the text can be statistically analyzed

Unobtrusive means of analyzing interactions

Provides insight into complex models of human thought and language use

When done well, is considered a relatively “exact” research method

Content analysis is a readily-understood and an inexpensive research method

A more powerful tool when combined with other research methods such as interviews, observation, and use of archival records. It is very useful for analyzing historical material, especially for documenting trends over time.

Disadvantages of Content Analysis

Can be extremely time consuming

Is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation

Is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study

Is inherently reductive, particularly when dealing with complex texts

Tends too often to simply consist of word counts

Often disregards the context that produced the text, as well as the state of things after the text is produced

Can be difficult to automate or computerize

Textbooks & Chapters  

Berelson, Bernard. Content Analysis in Communication Research.New York: Free Press, 1952.

Busha, Charles H. and Stephen P. Harter. Research Methods in Librarianship: Techniques and Interpretation.New York: Academic Press, 1980.

de Sola Pool, Ithiel. Trends in Content Analysis. Urbana: University of Illinois Press, 1959.

Krippendorff, Klaus. Content Analysis: An Introduction to its Methodology. Beverly Hills: Sage Publications, 1980.

Fielding, NG & Lee, RM. Using Computers in Qualitative Research. SAGE Publications, 1991. (Refer to Chapter by Seidel, J. ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’.)

Methodological Articles  

Hsieh HF & Shannon SE. (2005). Three Approaches to Qualitative Content Analysis.Qualitative Health Research. 15(9): 1277-1288.

Elo S, Kaarianinen M, Kanste O, Polkki R, Utriainen K, & Kyngas H. (2014). Qualitative Content Analysis: A focus on trustworthiness. Sage Open. 4:1-10.

Application Articles  

Abroms LC, Padmanabhan N, Thaweethai L, & Phillips T. (2011). iPhone Apps for Smoking Cessation: A content analysis. American Journal of Preventive Medicine. 40(3):279-285.

Ullstrom S. Sachs MA, Hansson J, Ovretveit J, & Brommels M. (2014). Suffering in Silence: a qualitative study of second victims of adverse events. British Medical Journal, Quality & Safety Issue. 23:325-331.

Owen P. (2012).Portrayals of Schizophrenia by Entertainment Media: A Content Analysis of Contemporary Movies. Psychiatric Services. 63:655-659.

Choosing whether to conduct a content analysis by hand or by using computer software can be difficult. Refer to ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’ listed above in “Textbooks and Chapters” for a discussion of the issue.

QSR NVivo:  http://www.qsrinternational.com/products.aspx

Atlas.ti:  http://www.atlasti.com/webinars.html

R- RQDA package:  http://rqda.r-forge.r-project.org/

Rolly Constable, Marla Cowell, Sarita Zornek Crawford, David Golden, Jake Hartvigsen, Kathryn Morgan, Anne Mudgett, Kris Parrish, Laura Thomas, Erika Yolanda Thompson, Rosie Turner, and Mike Palmquist. (1994-2012). Ethnography, Observational Research, and Narrative Inquiry. Writing@CSU. Colorado State University. Available at: https://writing.colostate.edu/guides/guide.cfm?guideid=63 .

As an introduction to Content Analysis by Michael Palmquist, this is the main resource on Content Analysis on the Web. It is comprehensive, yet succinct. It includes examples and an annotated bibliography. The information contained in the narrative above draws heavily from and summarizes Michael Palmquist’s excellent resource on Content Analysis but was streamlined for the purpose of doctoral students and junior researchers in epidemiology.

At Columbia University Mailman School of Public Health, more detailed training is available through the Department of Sociomedical Sciences- P8785 Qualitative Research Methods.

Join the Conversation

Have a question about methods? Join us on Facebook

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Content Analysis | A Step-by-Step Guide with Examples

Published on 5 May 2022 by Amy Luo . Revised on 5 December 2022.

Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual:

  • Books, newspapers, and magazines
  • Speeches and interviews
  • Web content and social media posts
  • Photographs and films

Content analysis can be both quantitative (focused on counting and measuring) and qualitative (focused on interpreting and understanding). In both types, you categorise or ‘code’ words, themes, and concepts within the texts and then analyse the results.

Table of contents

What is content analysis used for, advantages of content analysis, disadvantages of content analysis, how to conduct content analysis.

Researchers use content analysis to find out about the purposes, messages, and effects of communication content. They can also make inferences about the producers and audience of the texts they analyse.

Content analysis can be used to quantify the occurrence of certain words, phrases, subjects, or concepts in a set of historical or contemporary texts.

In addition, content analysis can be used to make qualitative inferences by analysing the meaning and semantic relationship of words and concepts.

Because content analysis can be applied to a broad range of texts, it is used in a variety of fields, including marketing, media studies, anthropology, cognitive science, psychology, and many social science disciplines. It has various possible goals:

  • Finding correlations and patterns in how concepts are communicated
  • Understanding the intentions of an individual, group, or institution
  • Identifying propaganda and bias in communication
  • Revealing differences in communication in different contexts
  • Analysing the consequences of communication content, such as the flow of information or audience responses

Prevent plagiarism, run a free check.

  • Unobtrusive data collection

You can analyse communication and social interaction without the direct involvement of participants, so your presence as a researcher doesn’t influence the results.

  • Transparent and replicable

When done well, content analysis follows a systematic procedure that can easily be replicated by other researchers, yielding results with high reliability .

  • Highly flexible

You can conduct content analysis at any time, in any location, and at low cost. All you need is access to the appropriate sources.

Focusing on words or phrases in isolation can sometimes be overly reductive, disregarding context, nuance, and ambiguous meanings.

Content analysis almost always involves some level of subjective interpretation, which can affect the reliability and validity of the results and conclusions.

  • Time intensive

Manually coding large volumes of text is extremely time-consuming, and it can be difficult to automate effectively.

If you want to use content analysis in your research, you need to start with a clear, direct  research question .

Next, you follow these five steps.

Step 1: Select the content you will analyse

Based on your research question, choose the texts that you will analyse. You need to decide:

  • The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)
  • The criteria for inclusion (e.g., newspaper articles that mention a particular event, speeches by a certain politician, or websites selling a specific type of product)
  • The parameters in terms of date range, location, etc.

If there are only a small number of texts that meet your criteria, you might analyse all of them. If there is a large volume of texts, you can select a sample .

Step 2: Define the units and categories of analysis

Next, you need to determine the level at which you will analyse your chosen texts. This means defining:

  • The unit(s) of meaning that will be coded. For example, are you going to record the frequency of individual words and phrases, the characteristics of people who produced or appear in the texts, the presence and positioning of images, or the treatment of themes and concepts?
  • The set of categories that you will use for coding. Categories can be objective characteristics (e.g., aged 30–40, lawyer, parent) or more conceptual (e.g., trustworthy, corrupt, conservative, family-oriented).

Step 3: Develop a set of rules for coding

Coding involves organising the units of meaning into the previously defined categories. Especially with more conceptual categories, it’s important to clearly define the rules for what will and won’t be included to ensure that all texts are coded consistently.

Coding rules are especially important if multiple researchers are involved, but even if you’re coding all of the text by yourself, recording the rules makes your method more transparent and reliable.

Step 4: Code the text according to the rules

You go through each text and record all relevant data in the appropriate categories. This can be done manually or aided with computer programs, such as QSR NVivo , Atlas.ti , and Diction , which can help speed up the process of counting and categorising words and phrases.

Step 5: Analyse the results and draw conclusions

Once coding is complete, the collected data is examined to find patterns and draw conclusions in response to your research question. You might use statistical analysis to find correlations or trends, discuss your interpretations of what the results mean, and make inferences about the creators, context, and audience of the texts.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Luo, A. (2022, December 05). Content Analysis | A Step-by-Step Guide with Examples. Scribbr. Retrieved 26 May 2024, from https://www.scribbr.co.uk/research-methods/content-analysis-explained/

Is this article helpful?

Amy Luo

Other students also liked

How to do thematic analysis | guide & examples, data collection methods | step-by-step guide & examples, qualitative vs quantitative research | examples & methods.

Logo for Open Educational Resources

Chapter 17. Content Analysis

Introduction.

Content analysis is a term that is used to mean both a method of data collection and a method of data analysis. Archival and historical works can be the source of content analysis, but so too can the contemporary media coverage of a story, blogs, comment posts, films, cartoons, advertisements, brand packaging, and photographs posted on Instagram or Facebook. Really, almost anything can be the “content” to be analyzed. This is a qualitative research method because the focus is on the meanings and interpretations of that content rather than strictly numerical counts or variables-based causal modeling. [1] Qualitative content analysis (sometimes referred to as QCA) is particularly useful when attempting to define and understand prevalent stories or communication about a topic of interest—in other words, when we are less interested in what particular people (our defined sample) are doing or believing and more interested in what general narratives exist about a particular topic or issue. This chapter will explore different approaches to content analysis and provide helpful tips on how to collect data, how to turn that data into codes for analysis, and how to go about presenting what is found through analysis. It is also a nice segue between our data collection methods (e.g., interviewing, observation) chapters and chapters 18 and 19, whose focus is on coding, the primary means of data analysis for most qualitative data. In many ways, the methods of content analysis are quite similar to the method of coding.

what is content analysis in the research

Although the body of material (“content”) to be collected and analyzed can be nearly anything, most qualitative content analysis is applied to forms of human communication (e.g., media posts, news stories, campaign speeches, advertising jingles). The point of the analysis is to understand this communication, to systematically and rigorously explore its meanings, assumptions, themes, and patterns. Historical and archival sources may be the subject of content analysis, but there are other ways to analyze (“code”) this data when not overly concerned with the communicative aspect (see chapters 18 and 19). This is why we tend to consider content analysis its own method of data collection as well as a method of data analysis. Still, many of the techniques you learn in this chapter will be helpful to any “coding” scheme you develop for other kinds of qualitative data. Just remember that content analysis is a particular form with distinct aims and goals and traditions.

An Overview of the Content Analysis Process

The first step: selecting content.

Figure 17.2 is a display of possible content for content analysis. The first step in content analysis is making smart decisions about what content you will want to analyze and to clearly connect this content to your research question or general focus of research. Why are you interested in the messages conveyed in this particular content? What will the identification of patterns here help you understand? Content analysis can be fun to do, but in order to make it research, you need to fit it into a research plan.

Figure 17.1. A Non-exhaustive List of "Content" for Content Analysis

To take one example, let us imagine you are interested in gender presentations in society and how presentations of gender have changed over time. There are various forms of content out there that might help you document changes. You could, for example, begin by creating a list of magazines that are coded as being for “women” (e.g., Women’s Daily Journal ) and magazines that are coded as being for “men” (e.g., Men’s Health ). You could then select a date range that is relevant to your research question (e.g., 1950s–1970s) and collect magazines from that era. You might create a “sample” by deciding to look at three issues for each year in the date range and a systematic plan for what to look at in those issues (e.g., advertisements? Cartoons? Titles of articles? Whole articles?). You are not just going to look at some magazines willy-nilly. That would not be systematic enough to allow anyone to replicate or check your findings later on. Once you have a clear plan of what content is of interest to you and what you will be looking at, you can begin, creating a record of everything you are including as your content. This might mean a list of each advertisement you look at or each title of stories in those magazines along with its publication date. You may decide to have multiple “content” in your research plan. For each content, you want a clear plan for collecting, sampling, and documenting.

The Second Step: Collecting and Storing

Once you have a plan, you are ready to collect your data. This may entail downloading from the internet, creating a Word document or PDF of each article or picture, and storing these in a folder designated by the source and date (e.g., “ Men’s Health advertisements, 1950s”). Sølvberg ( 2021 ), for example, collected posted job advertisements for three kinds of elite jobs (economic, cultural, professional) in Sweden. But collecting might also mean going out and taking photographs yourself, as in the case of graffiti, street signs, or even what people are wearing. Chaise LaDousa, an anthropologist and linguist, took photos of “house signs,” which are signs, often creative and sometimes offensive, hung by college students living in communal off-campus houses. These signs were a focal point of college culture, sending messages about the values of the students living in them. Some of the names will give you an idea: “Boot ’n Rally,” “The Plantation,” “Crib of the Rib.” The students might find these signs funny and benign, but LaDousa ( 2011 ) argued convincingly that they also reproduced racial and gender inequalities. The data here already existed—they were big signs on houses—but the researcher had to collect the data by taking photographs.

In some cases, your content will be in physical form but not amenable to photographing, as in the case of films or unwieldy physical artifacts you find in the archives (e.g., undigitized meeting minutes or scrapbooks). In this case, you need to create some kind of detailed log (fieldnotes even) of the content that you can reference. In the case of films, this might mean watching the film and writing down details for key scenes that become your data. [2] For scrapbooks, it might mean taking notes on what you are seeing, quoting key passages, describing colors or presentation style. As you might imagine, this can take a lot of time. Be sure you budget this time into your research plan.

Researcher Note

A note on data scraping : Data scraping, sometimes known as screen scraping or frame grabbing, is a way of extracting data generated by another program, as when a scraping tool grabs information from a website. This may help you collect data that is on the internet, but you need to be ethical in how to employ the scraper. A student once helped me scrape thousands of stories from the Time magazine archives at once (although it took several hours for the scraping process to complete). These stories were freely available, so the scraping process simply sped up the laborious process of copying each article of interest and saving it to my research folder. Scraping tools can sometimes be used to circumvent paywalls. Be careful here!

The Third Step: Analysis

There is often an assumption among novice researchers that once you have collected your data, you are ready to write about what you have found. Actually, you haven’t yet found anything, and if you try to write up your results, you will probably be staring sadly at a blank page. Between the collection and the writing comes the difficult task of systematically and repeatedly reviewing the data in search of patterns and themes that will help you interpret the data, particularly its communicative aspect (e.g., What is it that is being communicated here, with these “house signs” or in the pages of Men’s Health ?).

The first time you go through the data, keep an open mind on what you are seeing (or hearing), and take notes about your observations that link up to your research question. In the beginning, it can be difficult to know what is relevant and what is extraneous. Sometimes, your research question changes based on what emerges from the data. Use the first round of review to consider this possibility, but then commit yourself to following a particular focus or path. If you are looking at how gender gets made or re-created, don’t follow the white rabbit down a hole about environmental injustice unless you decide that this really should be the focus of your study or that issues of environmental injustice are linked to gender presentation. In the second round of review, be very clear about emerging themes and patterns. Create codes (more on these in chapters 18 and 19) that will help you simplify what you are noticing. For example, “men as outdoorsy” might be a common trope you see in advertisements. Whenever you see this, mark the passage or picture. In your third (or fourth or fifth) round of review, begin to link up the tropes you’ve identified, looking for particular patterns and assumptions. You’ve drilled down to the details, and now you are building back up to figure out what they all mean. Start thinking about theory—either theories you have read about and are using as a frame of your study (e.g., gender as performance theory) or theories you are building yourself, as in the Grounded Theory tradition. Once you have a good idea of what is being communicated and how, go back to the data at least one more time to look for disconfirming evidence. Maybe you thought “men as outdoorsy” was of importance, but when you look hard, you note that women are presented as outdoorsy just as often. You just hadn’t paid attention. It is very important, as any kind of researcher but particularly as a qualitative researcher, to test yourself and your emerging interpretations in this way.

The Fourth and Final Step: The Write-Up

Only after you have fully completed analysis, with its many rounds of review and analysis, will you be able to write about what you found. The interpretation exists not in the data but in your analysis of the data. Before writing your results, you will want to very clearly describe how you chose the data here and all the possible limitations of this data (e.g., historical-trace problem or power problem; see chapter 16). Acknowledge any limitations of your sample. Describe the audience for the content, and discuss the implications of this. Once you have done all of this, you can put forth your interpretation of the communication of the content, linking to theory where doing so would help your readers understand your findings and what they mean more generally for our understanding of how the social world works. [3]

Analyzing Content: Helpful Hints and Pointers

Although every data set is unique and each researcher will have a different and unique research question to address with that data set, there are some common practices and conventions. When reviewing your data, what do you look at exactly? How will you know if you have seen a pattern? How do you note or mark your data?

Let’s start with the last question first. If your data is stored digitally, there are various ways you can highlight or mark up passages. You can, of course, do this with literal highlighters, pens, and pencils if you have print copies. But there are also qualitative software programs to help you store the data, retrieve the data, and mark the data. This can simplify the process, although it cannot do the work of analysis for you.

Qualitative software can be very expensive, so the first thing to do is to find out if your institution (or program) has a universal license its students can use. If they do not, most programs have special student licenses that are less expensive. The two most used programs at this moment are probably ATLAS.ti and NVivo. Both can cost more than $500 [4] but provide everything you could possibly need for storing data, content analysis, and coding. They also have a lot of customer support, and you can find many official and unofficial tutorials on how to use the programs’ features on the web. Dedoose, created by academic researchers at UCLA, is a decent program that lacks many of the bells and whistles of the two big programs. Instead of paying all at once, you pay monthly, as you use the program. The monthly fee is relatively affordable (less than $15), so this might be a good option for a small project. HyperRESEARCH is another basic program created by academic researchers, and it is free for small projects (those that have limited cases and material to import). You can pay a monthly fee if your project expands past the free limits. I have personally used all four of these programs, and they each have their pluses and minuses.

Regardless of which program you choose, you should know that none of them will actually do the hard work of analysis for you. They are incredibly useful for helping you store and organize your data, and they provide abundant tools for marking, comparing, and coding your data so you can make sense of it. But making sense of it will always be your job alone.

So let’s say you have some software, and you have uploaded all of your content into the program: video clips, photographs, transcripts of news stories, articles from magazines, even digital copies of college scrapbooks. Now what do you do? What are you looking for? How do you see a pattern? The answers to these questions will depend partially on the particular research question you have, or at least the motivation behind your research. Let’s go back to the idea of looking at gender presentations in magazines from the 1950s to the 1970s. Here are some things you can look at and code in the content: (1) actions and behaviors, (2) events or conditions, (3) activities, (4) strategies and tactics, (5) states or general conditions, (6) meanings or symbols, (7) relationships/interactions, (8) consequences, and (9) settings. Table 17.1 lists these with examples from our gender presentation study.

Table 17.1. Examples of What to Note During Content Analysis

One thing to note about the examples in table 17.1: sometimes we note (mark, record, code) a single example, while other times, as in “settings,” we are recording a recurrent pattern. To help you spot patterns, it is useful to mark every setting, including a notation on gender. Using software can help you do this efficiently. You can then call up “setting by gender” and note this emerging pattern. There’s an element of counting here, which we normally think of as quantitative data analysis, but we are using the count to identify a pattern that will be used to help us interpret the communication. Content analyses often include counting as part of the interpretive (qualitative) process.

In your own study, you may not need or want to look at all of the elements listed in table 17.1. Even in our imagined example, some are more useful than others. For example, “strategies and tactics” is a bit of a stretch here. In studies that are looking specifically at, say, policy implementation or social movements, this category will prove much more salient.

Another way to think about “what to look at” is to consider aspects of your content in terms of units of analysis. You can drill down to the specific words used (e.g., the adjectives commonly used to describe “men” and “women” in your magazine sample) or move up to the more abstract level of concepts used (e.g., the idea that men are more rational than women). Counting for the purpose of identifying patterns is particularly useful here. How many times is that idea of women’s irrationality communicated? How is it is communicated (in comic strips, fictional stories, editorials, etc.)? Does the incidence of the concept change over time? Perhaps the “irrational woman” was everywhere in the 1950s, but by the 1970s, it is no longer showing up in stories and comics. By tracing its usage and prevalence over time, you might come up with a theory or story about gender presentation during the period. Table 17.2 provides more examples of using different units of analysis for this work along with suggestions for effective use.

Table 17.2. Examples of Unit of Analysis in Content Analysis

Every qualitative content analysis is unique in its particular focus and particular data used, so there is no single correct way to approach analysis. You should have a better idea, however, of what kinds of things to look for and what to look for. The next two chapters will take you further into the coding process, the primary analytical tool for qualitative research in general.

Further Readings

Cidell, Julie. 2010. “Content Clouds as Exploratory Qualitative Data Analysis.” Area 42(4):514–523. A demonstration of using visual “content clouds” as a form of exploratory qualitative data analysis using transcripts of public meetings and content of newspaper articles.

Hsieh, Hsiu-Fang, and Sarah E. Shannon. 2005. “Three Approaches to Qualitative Content Analysis.” Qualitative Health Research 15(9):1277–1288. Distinguishes three distinct approaches to QCA: conventional, directed, and summative. Uses hypothetical examples from end-of-life care research.

Jackson, Romeo, Alex C. Lange, and Antonio Duran. 2021. “A Whitened Rainbow: The In/Visibility of Race and Racism in LGBTQ Higher Education Scholarship.” Journal Committed to Social Change on Race and Ethnicity (JCSCORE) 7(2):174–206.* Using a “critical summative content analysis” approach, examines research published on LGBTQ people between 2009 and 2019.

Krippendorff, Klaus. 2018. Content Analysis: An Introduction to Its Methodology . 4th ed. Thousand Oaks, CA: SAGE. A very comprehensive textbook on both quantitative and qualitative forms of content analysis.

Mayring, Philipp. 2022. Qualitative Content Analysis: A Step-by-Step Guide . Thousand Oaks, CA: SAGE. Formulates an eight-step approach to QCA.

Messinger, Adam M. 2012. “Teaching Content Analysis through ‘Harry Potter.’” Teaching Sociology 40(4):360–367. This is a fun example of a relatively brief foray into content analysis using the music found in Harry Potter films.

Neuendorft, Kimberly A. 2002. The Content Analysis Guidebook . Thousand Oaks, CA: SAGE. Although a helpful guide to content analysis in general, be warned that this textbook definitely favors quantitative over qualitative approaches to content analysis.

Schrier, Margrit. 2012. Qualitative Content Analysis in Practice . Thousand Okas, CA: SAGE. Arguably the most accessible guidebook for QCA, written by a professor based in Germany.

Weber, Matthew A., Shannon Caplan, Paul Ringold, and Karen Blocksom. 2017. “Rivers and Streams in the Media: A Content Analysis of Ecosystem Services.” Ecology and Society 22(3).* Examines the content of a blog hosted by National Geographic and articles published in The New York Times and the Wall Street Journal for stories on rivers and streams (e.g., water-quality flooding).

  • There are ways of handling content analysis quantitatively, however. Some practitioners therefore specify qualitative content analysis (QCA). In this chapter, all content analysis is QCA unless otherwise noted. ↵
  • Note that some qualitative software allows you to upload whole films or film clips for coding. You will still have to get access to the film, of course. ↵
  • See chapter 20 for more on the final presentation of research. ↵
  • . Actually, ATLAS.ti is an annual license, while NVivo is a perpetual license, but both are going to cost you at least $500 to use. Student rates may be lower. And don’t forget to ask your institution or program if they already have a software license you can use. ↵

A method of both data collection and data analysis in which a given content (textual, visual, graphic) is examined systematically and rigorously to identify meanings, themes, patterns and assumptions.  Qualitative content analysis (QCA) is concerned with gathering and interpreting an existing body of material.    

Introduction to Qualitative Research Methods Copyright © 2023 by Allison Hurst is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Reference management. Clean and simple.

How to do a content analysis

Content analysis illustration

What is content analysis?

Why would you use a content analysis, types of content analysis, conceptual content analysis, relational content analysis, reliability and validity, reliability, the advantages and disadvantages of content analysis, a step-by-step guide to conducting a content analysis, step 1: develop your research questions, step 2: choose the content you’ll analyze, step 3: identify your biases, step 4: define the units and categories of coding, step 5: develop a coding scheme, step 6: code the content, step 7: analyze the results, frequently asked questions about content analysis, related articles.

In research, content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. Simply put, content analysis is a research method that aims to present the trends, patterns, concepts, and ideas in content as objective, quantitative or qualitative data , depending on the specific use case.

As such, some of the objectives of content analysis include:

  • Simplifying complex, unstructured content.
  • Identifying trends, patterns, and relationships in the content.
  • Determining the characteristics of the content.
  • Identifying the intentions of individuals through the analysis of the content.
  • Identifying the implied aspects in the content.

Typically, when doing a content analysis, you’ll gather data not only from written text sources like newspapers, books, journals, and magazines but also from a variety of other oral and visual sources of content like:

  • Voice recordings, speeches, and interviews.
  • Web content, blogs, and social media content.
  • Films, videos, and photographs.

One of content analysis’s distinguishing features is that you'll be able to gather data for research without physically gathering data from participants. In other words, when doing a content analysis, you don't need to interact with people directly.

The process of doing a content analysis usually involves categorizing or coding concepts, words, and themes within the content and analyzing the results. We’ll look at the process in more detail below.

Typically, you’ll use content analysis when you want to:

  • Identify the intentions, communication trends, or communication patterns of an individual, a group of people, or even an institution.
  • Analyze and describe the behavioral and attitudinal responses of individuals to communications.
  • Determine the emotional or psychological state of an individual or a group of people.
  • Analyze the international differences in communication content.
  • Analyzing audience responses to content.

Keep in mind, though, that these are just some examples of use cases where a content analysis might be appropriate and there are many others.

The key thing to remember is that content analysis will help you quantify the occurrence of specific words, phrases, themes, and concepts in content. Moreover, it can also be used when you want to make qualitative inferences out of the data by analyzing the semantic meanings and interrelationships between words, themes, and concepts.

In general, there are two types of content analysis: conceptual and relational analysis . Although these two types follow largely similar processes, their outcomes differ. As such, each of these types can provide different results, interpretations, and conclusions. With that in mind, let’s now look at these two types of content analysis in more detail.

With conceptual analysis, you’ll determine the existence of certain concepts within the content and identify their frequency. In other words, conceptual analysis involves the number of times a specific concept appears in the content.

Conceptual analysis is typically focused on explicit data, which means you’ll focus your analysis on a specific concept to identify its presence in the content and determine its frequency.

However, when conducting a content analysis, you can also use implicit data. This approach is more involved, complicated, and requires the use of a dictionary, contextual translation rules, or a combination of both.

No matter what type you use, conceptual analysis brings an element of quantitive analysis into a qualitative approach to research.

Relational content analysis takes conceptual analysis a step further. So, while the process starts in the same way by identifying concepts in content, it doesn’t focus on finding the frequency of these concepts, but rather on the relationships between the concepts, the context in which they appear in the content, and their interrelationships.

Before starting with a relational analysis, you’ll first need to decide on which subcategory of relational analysis you’ll use:

  • Affect extraction: With this relational content analysis approach, you’ll evaluate concepts based on their emotional attributes. You’ll typically assess these emotions on a rating scale with higher values assigned to positive emotions and lower values to negative ones. In turn, this allows you to capture the emotions of the writer or speaker at the time the content is created. The main difficulty with this approach is that emotions can differ over time and across populations.
  • Proximity analysis: With this approach, you’ll identify concepts as in conceptual analysis, but you’ll evaluate the way in which they occur together in the content. In other words, proximity analysis allows you to analyze the relationship between concepts and derive a concept matrix from which you’ll be able to develop meaning. Proximity analysis is typically used when you want to extract facts from the content rather than contextual, emotional, or cultural factors.
  • Cognitive mapping: Finally, cognitive mapping can be used with affect extraction or proximity analysis. It’s a visualization technique that allows you to create a model that represents the overall meaning of content and presents it as a graphic map of the relationships between concepts. As such, it’s also commonly used when analyzing the changes in meanings, definitions, and terms over time.

Now that we’ve seen what content analysis is and looked at the different types of content analysis, it’s important to understand how reliable it is as a research method . We’ll also look at what criteria impact the validity of a content analysis.

There are three criteria that determine the reliability of a content analysis:

  • Stability . Stability refers to the tendency of coders to consistently categorize or code the same data in the same way over time.
  • Reproducibility . This criterion refers to the tendency of coders to classify categories membership in the same way.
  • Accuracy . Accuracy refers to the extent to which the classification of content corresponds to a specific standard.

Keep in mind, though, that because you’ll need to code or categorize the concepts you’ll aim to identify and analyze manually, you’ll never be able to eliminate human error. However, you’ll be able to minimize it.

In turn, three criteria determine the validity of a content analysis:

  • Closeness of categories . This is achieved by using multiple classifiers to get an agreed-upon definition for a specific category by using either implicit variables or synonyms. In this way, the category can be broadened to include more relevant data.
  • Conclusions . Here, it’s crucial to decide what level of implication will be allowable. In other words, it’s important to consider whether the conclusions are valid based on the data or whether they can be explained using some other phenomena.
  • Generalizability of the results of the analysis to a theory . Generalizability comes down to how you determine your categories as mentioned above and how reliable those categories are. In turn, this relies on how accurately the categories are at measuring the concepts or ideas that you’re looking to measure.

Considering everything mentioned above, there are definite advantages and disadvantages when it comes to content analysis:

Let’s now look at the steps you’ll need to follow when doing a content analysis.

The first step will always be to formulate your research questions. This is simply because, without clear and defined research questions, you won’t know what question to answer and, by implication, won’t be able to code your concepts.

Based on your research questions, you’ll then need to decide what content you’ll analyze. Here, you’ll use three factors to find the right content:

  • The type of content . Here you’ll need to consider the various types of content you’ll use and their medium like, for example, blog posts, social media, newspapers, or online articles.
  • What criteria you’ll use for inclusion . Here you’ll decide what criteria you’ll use to include content. This can, for instance, be the mentioning of a certain event or advertising a specific product.
  • Your parameters . Here, you’ll decide what content you’ll include based on specified parameters in terms of date and location.

The next step is to consider your own pre-conception of the questions and identify your biases. This process is referred to as bracketing and allows you to be aware of your biases before you start your research with the result that they’ll be less likely to influence the analysis.

Your next step would be to define the units of meaning that you’ll code. This will, for example, be the number of times a concept appears in the content or the treatment of concept, words, or themes in the content. You’ll then need to define the set of categories you’ll use for coding which can be either objective or more conceptual.

Based on the above, you’ll then organize the units of meaning into your defined categories. Apart from this, your coding scheme will also determine how you’ll analyze the data.

The next step is to code the content. During this process, you’ll work through the content and record the data according to your coding scheme. It’s also here where conceptual and relational analysis starts to deviate in relation to the process you’ll need to follow.

As mentioned earlier, conceptual analysis aims to identify the number of times a specific concept, idea, word, or phrase appears in the content. So, here, you’ll need to decide what level of analysis you’ll implement.

In contrast, with relational analysis, you’ll need to decide what type of relational analysis you’ll use. So, you’ll need to determine whether you’ll use affect extraction, proximity analysis, cognitive mapping, or a combination of these approaches.

Once you’ve coded the data, you’ll be able to analyze it and draw conclusions from the data based on your research questions.

Content analysis offers an inexpensive and flexible way to identify trends and patterns in communication content. In addition, it’s unobtrusive which eliminates many ethical concerns and inaccuracies in research data. However, to be most effective, a content analysis must be planned and used carefully in order to ensure reliability and validity.

The two general types of content analysis: conceptual and relational analysis . Although these two types follow largely similar processes, their outcomes differ. As such, each of these types can provide different results, interpretations, and conclusions.

In qualitative research coding means categorizing concepts, words, and themes within your content to create a basis for analyzing the results. While coding, you work through the content and record the data according to your coding scheme.

Content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. The goal of a content analysis is to present the trends, patterns, concepts, and ideas in content as objective, quantitative or qualitative data, depending on the specific use case.

Content analysis is a qualitative method of data analysis and can be used in many different fields. It is particularly popular in the social sciences.

It is possible to do qualitative analysis without coding, but content analysis as a method of qualitative analysis requires coding or categorizing data to then analyze it according to your coding scheme in the next step.

what is content analysis in the research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Am J Pharm Educ
  • v.84(1); 2020 Jan

Demystifying Content Analysis

A. j. kleinheksel.

a The Medical College of Georgia at Augusta University, Augusta, Georgia

Nicole Rockich-Winston

Huda tawfik.

b Central Michigan University, College of Medicine, Mt. Pleasant, Michigan

Tasha R. Wyatt

Objective. In the course of daily teaching responsibilities, pharmacy educators collect rich data that can provide valuable insight into student learning. This article describes the qualitative data analysis method of content analysis, which can be useful to pharmacy educators because of its application in the investigation of a wide variety of data sources, including textual, visual, and audio files.

Findings. Both manifest and latent content analysis approaches are described, with several examples used to illustrate the processes. This article also offers insights into the variety of relevant terms and visualizations found in the content analysis literature. Finally, common threats to the reliability and validity of content analysis are discussed, along with suitable strategies to mitigate these risks during analysis.

Summary. This review of content analysis as a qualitative data analysis method will provide clarity and actionable instruction for both novice and experienced pharmacy education researchers.

INTRODUCTION

The Academy’s growing interest in qualitative research indicates an important shift in the field’s scientific paradigm. Whereas health science researchers have historically looked to quantitative methods to answer their questions, this shift signals that a purely positivist, objective approach is no longer sufficient to answer pharmacy education’s research questions. Educators who want to study their teaching and students’ learning will find content analysis an easily accessible, robust method of qualitative data analysis that can yield rigorous results for both publication and the improvement of their educational practice. Content analysis is a method designed to identify and interpret meaning in recorded forms of communication by isolating small pieces of the data that represent salient concepts and then applying or creating a framework to organize the pieces in a way that can be used to describe or explain a phenomenon. 1 Content analysis is particularly useful in situations where there is a large amount of unanalyzed textual data, such as those many pharmacy educators have already collected as part of their teaching practice. Because of its accessibility, content analysis is also an appropriate qualitative method for pharmacy educators with limited experience in educational research. This article will introduce and illustrate the process of content analysis as a way to analyze existing data, but also as an approach that may lead pharmacy educators to ask new types of research questions.

Content analysis is a well-established data analysis method that has evolved in its treatment of textual data. Content analysis was originally introduced as a strictly quantitative method, recording counts to measure the observed frequency of pre-identified targets in consumer research. 1 However, as the naturalistic qualitative paradigm became more prevalent in social sciences research and researchers became increasingly interested in the way people behave in natural settings, the process of content analysis was adapted into a more interesting and meaningful approach. Content analysis has the potential to be a useful method in pharmacy education because it can help educational researchers develop a deeper understanding of a particular phenomenon by providing structure in a large amount of textual data through a systematic process of interpretation. It also offers potential value because it can help identify problematic areas in student understanding and guide the process of targeted teaching. Several research studies in pharmacy education have used the method of content analysis. 2-7 Two studies in particular offer noteworthy examples: Wallman and colleagues employed manifest content analysis to analyze semi-structured interviews in order to explore what students learn during experiential rotations, 7 while Moser and colleagues adopted latent content analysis to evaluate open-ended survey responses on student perceptions of learning communities. 6 To elaborate on these approaches further, we will describe the two types of qualitative content analysis, manifest and latent, and demonstrate the corresponding analytical processes using examples that illustrate their benefit.

Qualitative Content Analysis

Content analysis rests on the assumption that texts are a rich data source with great potential to reveal valuable information about particular phenomena. 8 It is the process of considering both the participant and context when sorting text into groups of related categories to identify similarities and differences, patterns, and associations, both on the surface and implied within. 9-11 The method is considered high-yield in educational research because it is versatile and can be applied in both qualitative and quantitative studies. 12 While it is important to note that content analysis has application in visual and auditory artifacts (eg, an image or song), for our purposes we will largely focus on the most common application, which is the analysis of textual or transcribed content (eg, open-ended survey responses, print media, interviews, recorded observations, etc). The terminology of content analysis can vary throughout quantitative and qualitative literature, which may lead to some confusion among both novice and experienced researchers. However, there are also several agreed-upon terms and phrases that span the literature, as found in Table 1 .

Terms and Definitions Used in Qualitative Content Analysis

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-t1.jpg

There is more often disagreement on terminology in the methodological approaches to content analysis, though the most common differentiation is between the two types of content: manifest and latent. In much of the literature, manifest content analysis is defined as describing what is occurring on the surface, what is and literally present, and as “staying close to the text.” 8,13 Manifest content analysis is concerned with data that are easily observable both to researchers and the coders who assist in their analyses, without the need to discern intent or identify deeper meaning. It is content that can be recognized and counted with little training. Early applications of manifest analysis focused on identifying easily observable targets within text (eg, the number of instances a certain word appears in newspaper articles), film (eg, the occupation of a character), or interpersonal interactions (eg, tracking the number of times a participant blinks during an interview). 14 This application, in which frequency counts are used to understand a phenomenon, reflects a surface-level analysis and assumes there is objective truth in the data that can be revealed with very little interpretation. The number of times a target (ie, code) appears within the text is used as a way to understand its prevalence. Quantitative content analysis is always describing a positivist manifest content analysis, in that the nature of truth is believed to be objective, observable, and measurable. Qualitative research, which favors the researcher’s interpretation of an individual’s experience, may also be used to analyze manifest content. However, the intent of the application is to describe a dynamic reality that cannot be separated from the lived experiences of the researcher. Although qualitative content analysis can be conducted whether knowledge is thought to be innate, acquired, or socially constructed, the purpose of qualitative manifest content analysis is to transcend simple word counts and delve into a deeper examination of the language in order to organize large amounts of text into categories that reflect a shared meaning. 15,16 The practical distinction between quantitative and qualitative manifest content analysis is the intention behind the analysis. The quantitative method seeks to generate a numerical value to either cite prevalence or use in statistical analyses, while the qualitative method seeks to identify a construct or concept within the text using specific words or phrases for substantiation, or to provide a more organized structure to the text being described.

Latent content analysis is most often defined as interpreting what is hidden deep within the text. In this method, the role of the researcher is to discover the implied meaning in participants’ experiences. 8,13 For example, in a transcribed exchange in an office setting, a participant might say to a coworker, “Yeah, here we are…another Monday. So exciting!” The researcher would apply context in order to discover the emotion being conveyed (ie, the implied meaning). In this example, the comment could be interpreted as genuine, it could be interpreted as a sarcastic comment made in an attempt at humor in order to develop or sustain social bonds with the coworker, or the context might imply that the sarcasm was meant to convey displeasure and end the interaction.

Latent content analysis acknowledges that the researcher is intimately involved in the analytical process and that the their role is to actively use mental schema, theories, and lenses to interpret and understand the data. 10 Whereas manifest analyses are typically conducted in a way that the researcher is thought to maintain distance and separation from the objects of study, latent analyses underscore the importance of the researcher co-creating meaning with the text. 17 Adding nuance to this type of content, Potter and Levine‐Donnerstein argue that within latent content analysis, there are two distinct types: latent pattern and latent projective . 14 Latent pattern content analysis seeks to establish a pattern of characteristics in the text itself, while latent projective content analysis leverages the researcher’s own interpretations of the meaning of the text. While both approaches rely on codes that emerge from the content using the coder’s own perspectives and mental schema, the distinction between these two types of analyses are in their foci. 14 Though we do not agree, some researchers believe that all qualitative content analysis is latent content analysis. 11 These disagreements typically occur where there are differences in intent and where there are areas of overlap in the results. For example, both qualitative manifest and latent pattern content analyses may identify patterns as a result of their application. Though in their research design, the researcher would have approached the content with different methodological approaches, with a manifest approach seeking only to describe what is observed, and the latent pattern approach seeking to discover an unseen pattern. At this point, these distinctions may seem too philosophical to serve a practical purpose, so we will attempt to clarify these concepts by presenting three types of analyses for illustrative purposes, beginning with a description of how codes are created and used.

Creating and Using Codes

Codes are the currency of content analysis. Researchers use codes to organize and understand their data. Through the coding process, pharmacy educators can systematically and rigorously categorize and interpret vast amounts of text for use in their educational practice or in publication. Codes themselves are short, descriptive labels that symbolically assign a summative or salient attribute to more than one unit of meaning identified in the text. 18 To create codes, a researcher must first become immersed in the data, which typically occurs when a researcher transcribes recorded data or conducts several readings of the text. This process allows the researcher to become familiar with the scope of the data, which spurs nascent ideas about potential concepts or constructs that may exist within it. If studying a phenomenon that has already been described through an existing framework, codes can be created a priori using theoretical frameworks or concepts identified in the literature. If there is no existing framework to apply, codes can emerge during the analytical process. However, emergent codes can also be created as addenda to a priori codes that were identified before the analysis begins if the a priori codes do not sufficiently capture the researcher’s area of interest.

The process of detecting emergent codes begins with identification of units of meaning. While there is no one way to decide what qualifies as a meaning unit, researchers typically define units of meaning differently depending on what kind of analysis is being conducted. As a general rule, when dialogue is being analyzed, such as interviews or focus groups, meaning units are identified as conversational turns, though a code can be as short as one or two words. In written text, such as student reflections or course evaluation data, the researcher must decide if the text should be divided into phrases or sentences, or remain as paragraphs. This decision is usually made based on how many different units of meaning are expressed in a block of text. For example, in a paragraph, if there are several thoughts or concepts being expressed, it is best to break up the paragraph into sentences. If one sentence contains multiple ideas of interest, making it difficult to separate one important thought or behavior from another, then the sentence can be divided into smaller units, such as phrases or sentence fragments. These phrases or sentence fragments are then coded as separate meaning units. Conversely, longer or more complex units of meaning should be condensed into shorter representations that still retain the original meaning in order to reduce the cognitive burden of the analytical process. This could entail removing verbal ticks (eg, “well, uhm…”) from transcribed data or simplifying a compound sentence. Condensation does not ascribe interpretation or implied meaning to a unit, but only shortens a meaning unit as much as possible while preserving the original meaning identified. 18 After condensation, a researcher can proceed to the creation of codes.

Many researchers begin their analyses with several general codes in mind that help guide their focus as defined by their research question, even in instances where the researcher has no a priori model or theory. For example, if a group of instructors are interested in examining recorded videos of their lectures to identify moments of student engagement, they may begin with using generally agreed upon concepts of engagement as codes, such as students “raising their hands,” “taking notes,” and “speaking in class.” However, as the instructors continue to watch their videos, they may notice other behaviors which were not initially anticipated. Perhaps students were seen creating flow charts based on information presented in class. Alternatively, perhaps instructors wanted to include moments when students posed questions to their peers without being prompted. In this case, the instructors would allow the codes of “creating graphic organizers” and “questioning peers” to emerge as additional ways to identify the behavior of student engagement.

Once a researcher has identified condensed units of meaning and labeled them with codes, the codes are then sorted into categories which can help provide more structure to the data. In the above example of recorded lectures, perhaps the category of “verbal behaviors” could be used to group the codes of “speaking in class” and “questioning peers.” For complex analyses, subcategories can also be used to better organize a large amount of codes, but solely at the discretion of the researcher. Two or more categories of codes are then used to identify or support a broader underlying meaning which develops into themes. Themes are most often employed in latent analyses; however, they are appropriate in manifest analyses as well. Themes describe behaviors, experiences, or emotions that occur throughout several categories. 18 Figure 1 illustrates this process. Using the same videotaped lecture example, the instructors might identify two themes of student engagement, “active engagement” and “passive engagement,” where active engagement is supported by the category of “verbal behavior” and also a category that includes the code of “raising their hands” (perhaps something along the lines of “pursuing engagement”), and the theme of “passive engagement” is supported by a category used to organize the behaviors of “taking notes” and “creating graphic organizers.”

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig1.jpg

The Process of Qualitative Content Analysis

To more fully demonstrate the process of content analysis and the generation and use of codes, categories, and themes, we present and describe examples of both manifest and latent content analysis. Given that there are multiple ways to create and use codes, our examples illustrate both processes of creating and using a predetermined set of codes. Regardless of the kind of content analysis instructors want to conduct, the initial steps are the same. The instructor must analyze the data using codes as a sense-making process.

Manifest Content Analysis

The first form of analysis, manifest content analysis, examines text for elements that exist on the surface of the text, the meaning of which is taken at face value. Schools and colleges of pharmacy may benefit from conducting manifest content analyses at a programmatic level, including analysis of student evaluations to determine the value of certain courses, or analysis of recruitment materials for addressing issues of cultural humility in a uniform manner. Such uses for manifest content analysis may help administrators make more data-based decisions about students and courses. However, for our example of manifest content analysis, we illustrate the use of content analysis in informing instruction for a single pharmacy educator ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig2.jpg

A Student’s Completed Beta-blocker Case with Codes in Underlined Bold Text

In the example, a pharmacology instructor is trying to assess students’ understanding of three concepts related to the beta-blocker class of drugs: indication of the drug, relevance of family history, and contraindications and precautions. To do so, the instructor asks the students to write a patient case in which beta-blockers are indicated. The instructor gives the students the following prompt: “Reverse-engineer a case in which beta-blockers would be prescribed to the patient. Include a history of the present illness, the patients’ medical, family, and social history, medications, allergies, and relevant lab tests.” Figure 2 is a hypothetical student’s completed assignment, in which they demonstrate their understanding of when and why a beta-blocker would be prescribed.

The student-generated cases are then treated as data and analyzed for the presence of the three previously identified indicators of understanding in order to help the instructor make decisions about where and how to focus future teaching efforts related to this drug class. Codes are created a priori out of the instructor’s interest in analyzing students’ understanding of the concepts related to beta-blocker prescriptions. A codebook ( Table 2 ) is created with the following columns: name of code, code description, and examples of the code. This codebook helps an individual researcher to approach their analysis systematically, but it can also facilitate coding by multiple coders who would apply the same rules outlined in the codebook to the coding process.

Example Code Book Created for Manifest Content Analysis

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-t2.jpg

Using multiple coders introduces complexity to the analysis process, but it is oftentimes the only practical way to analyze large amounts of data. To ensure that all coders are working in tandem, they must establish inter-rater reliability as part of their training process. This process requires that a single form of text be selected, such as one student evaluation. After reviewing the codebook and receiving instruction, everyone on the team individually codes the same piece of data. While calculating percentage agreement has sometimes been used to establish inter-rater reliability, most publication editors require more rigorous statistical analysis (eg, Krippendorf’s alpha, or Cohen’s kappa). 19 Detailed descriptions of these statistics fall outside the scope of this introduction, but it is important to note that the choice depends on the number of coders, the sample size, and the type of data to be analyzed.

Latent Content Analysis

Latent content analysis is another option for pharmacy educators, especially when there are theoretical frameworks or lenses the educator proposes to apply. Such frameworks describe and provide structure to complex concepts and may often be derived from relevant theories. Latent content analysis requires that the researcher is intimately involved in interpreting and finding meaning in the text because meaning is not readily apparent on the surface. 10 To illustrate a latent content analysis using a combination of a priori and emergent codes, we will use the example of a transcribed video excerpt from a student pharmacist interaction with a standardized patient. In this example, the goal is for first-year students to practice talking to a customer about an over-the-counter medication. The case is designed to simulate a customer at a pharmacy counter, who is seeking advice on a medication. The learning objectives for the pharmacist in-training are to assess the customer’s symptoms, determine if the customer can self-treat or if they need to seek out their primary care physician, and then prescribe a medication to alleviate the patient’s symptoms.

To begin, pharmacy educators conducting educational research should first identify what they are looking for in the video transcript. In this case, because the primary outcome for this exercise is aimed at assessing the “soft skills” of student pharmacists, codes are created using the counseling rubric created by Horton and colleagues. 20 Four a priori codes are developed using the literature: empathy, patient-friendly terms, politeness, and positive attitude. However, because the original four codes are inadequate to capture all areas representing the skills the instructor is looking for during the process of analysis, four additional codes are also created: active listening, confidence, follow-up, and patient at ease. Figure 3 presents the video transcript with each of the codes assigned to the meaning units in bolded parentheses.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig3.jpg

A Transcript of a Student’s (JR) Experience with a Standardized Patient (SP) in Which the Codes are Bolded in Parentheses

Following the initial coding using these eight codes, the codes are consolidated to create categories, which are depicted in the taxonomy in Figure 4 . Categories are relationships between codes that represent a higher level of abstraction in the data. 18 To reach conclusions and interpret the fundamental underlying meaning in the data, categories are then organized into themes ( Figure 1 ). Once the data are analyzed, the instructor can assign value to the student’s performance. In this case, the coding process determines that the exercise demonstrated both positive and negative elements of communication and professionalism. Under the category of professionalism, the student generally demonstrated politeness and a positive attitude toward the standardized patient, indicating to the reviewer that the theme of perceived professionalism was apparent during the encounter. However, there were several instances in which confidence and appropriate follow-up were absent. Thus, from a reviewer perspective, the student's performance could be perceived as indicating an opportunity to grow and improve as a future professional. Typically, there are multiple codes in a category and multiple categories in a theme. However, as seen in the example taxonomy, this is not always the case.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig4.jpg

Example of a Latent Content Analysis Taxonomy

If the educator is interested in conducting a latent projective analysis, after identifying the construct of “soft skills,” the researcher allows for each coder to apply their own mental schema as they look for positive and negative indicators of the non-technical skills they believe a student should develop. Mental schema are the cognitive structures that provide organization to knowledge, which in this case allows coders to categorize the data in ways that fit their existing understanding of the construct. The coders will use their own judgement to identify the codes they feel are relevant. The researcher could also choose to apply a theoretical lens to more effectively conceptualize the construct of “soft skills,” such as Rogers' humanism theory, and more specifically, concepts underlying his client-centered therapy. 21 The role of theory in both latent pattern and latent projective analyses is at the discretion of the researcher, and often is determined by what already exists in the literature related to the research question. Though, typically, in latent pattern analyses theory is used for deductive coding, and in latent projective analyses underdeveloped theory is used to first deduce codes and then for induction of the results to strengthen the theory applied. For our example, Rogers describes three salient qualities to develop and maintain a positive client-professional relationship: unconditional positive regard, genuineness, and empathetic understanding. 21 For the third element, specifically, the educator could look for units of meaning that imply empathy and active listening. For our video transcript analysis, this is evident when the student pharmacist demonstrated empathy by responding, "Yeah, I understand," when discussing aggravating factors for the patient's condition. The outcome for both latent pattern and latent projective content analysis is to discover the underlying meaning in a text, such as social rules or mental models. In this example, both pattern and projective approaches can discover interpreted aspects of a student’s abilities and mental models for constructs such as professionalism and empathy. The difference in the approaches is where the precedence lies: in the belief that a pattern is recognizable in the content, or in the mental schema and lived experiences of the coder(s). To better illustrate the differences in the processes of latent pattern and projective content analyses, Figure 5 presents a general outline of each method beginning with the creation of codes and concluding with the generation of themes.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig5.jpg

Flow Chart of the Stages of Latent Pattern and Latent Projective Content Analysis

How to Choose a Methodological Approach to Content Analysis

To determine which approach a researcher should take in their content analysis, two decisions need to be made. First, researchers must determine their goal for the analysis. Second, the researcher must decide where they believe meaning is located. 14 If meaning is located in the discrete elements of the content that are easily identified on the surface of the text, then manifest content analysis is appropriate. If meaning is located deep within the content and the researcher plans to discover context cues and make judgements about implied meaning, then latent content analysis should be applied. When designing the latent content analysis, a researcher then must also identify their focus. If the analysis is intended to identify a recognizable truth within the content by uncovering connections and characteristics that all coders should be able to discover, then latent pattern content analysis is appropriate. If, on the other hand, the researcher will rely heavily on the judgment of the coders and believes that interpretation of the content must leverage the mental schema of the coders to locate deeper meaning, then latent projective content analysis is the best choice.

To demonstrate how a researcher might choose a methodological approach, we have presented a third example of data in Figure 6 . In our two previous examples of content analysis, we used student data. However, faculty data can also be analyzed as part of educational research or for faculty members to improve their own teaching practices. Recall in the video data analyzed using latent content analysis, the student was tasked to identify a suitable over-the-counter medication for a patient complaining of heartburn symptoms. We have extended this example by including an interview with the pharmacy educator supervising the student who was videotaped. The goal of the interview is to evaluate the educator’s ability to assess the student’s performance with the standardized patient. Figure 6 is an excerpt of the interview between the course instructor and an instructional coach. In this conversation, the instructional coach is eliciting evidence to support the faculty member’s views, judgements, and rationale for the educator’s evaluation of the student’s performance.

An external file that holds a picture, illustration, etc.
Object name is ajpe7113-fig6.jpg

A Transcript of an Interview in Which the Interviewer (IN) Questions a Faculty Member (FM) Regarding Their Student’s Standardized Patient Experience

Manifest content analysis would be a valid choice for this data if the researcher was looking to identify evidence of the construct of “instructor priorities” and defined discrete codes that described aspects of performance such as “communication,” “referrals,” or “accurate information.” These codes could be easily identified on the surface of the transcribed interview by identifying keywords related to each code, such as “communicate,” “talk,” and “laugh,” for the code of “communication.” This would allow coders to identify evidence of the concept of “instructor priorities” by sorting through a potentially large amount of text with predetermined targets in mind.

To conduct a latent pattern analysis of this interview, researchers would first immerse themselves in the data to identify a theoretical framework or concepts that represent the area of interest so that coders could discover an emerging truth underneath the surface of the data. After immersion in the data, a researcher might believe it would be interesting to more closely examine the strategies the coach uses to establish rapport with the instructor as a way to better understand models of professional development. These strategies could not be easily identified in the transcripts if read literally, but by looking for connections within the text, codes related to instructional coaching tactics emerge. A latent pattern analysis would require that the researcher code the data in a way that looks for patterns, such as a code of “facilitating reflection,” that could be identified in open-ended questions and other units of meaning where the coder saw evidence of probing techniques, or a code of “establishing rapport” for which a coder could identify nonverbal cues such as “[IN leans forward in chair].”

Conducting latent projective content analysis might be useful if the researcher was interested in using a broader theoretical lens, such as Mezirow’s theory of transformative learning. 22 In this example, the faculty member is understood to have attempted to change a learner’s frame of reference by facilitating cognitive dissonance or a disorienting experience through a standardized patient simulation. To conduct a latent projective analysis, the researcher could analyze the faculty member’s interview using concepts found in this theory. This kind of analysis will help the researcher assess the level of change that the faculty member was able to perceive, or expected to witness, in their attempt to help their pharmacy students improve their interactions with patients. The units of meaning and subsequent codes would rely on the coders to apply their own knowledge of transformative learning because of the absence in the theory of concrete, context-specific behaviors to identify. For this analysis, the researcher would rely on their interpretations of what challenging educational situations look like, what constitutes cognitive dissonance, or what the faculty member is really expecting from his students’ performance. The subsequent analysis could provide evidence to support the use of such standardized patient encounters within the curriculum as a transformative learning experience and would also allow the educator to self-reflect on his ability to assess simulated activities.

OTHER ASPECTS TO CONSIDER

Navigating terminology.

Among the methodological approaches, there are other terms for content analysis that researchers may come across. Hsieh and Shannon 10 proposed three qualitative approaches to content analysis: conventional, directed, and summative. These categories were intended to explain the role of theory in the analysis process. In conventional content analysis, the researcher does not use preconceived categories because existing theory or literature are limited. In directed content analysis, the researcher attempts to further describe a phenomenon already addressed by theory, applying a deductive approach and using identified concepts or codes from exiting research to validate the theory. In summative content analysis, a descriptive approach is taken, identifying and quantifying words or content in order to describe their context. These three categories roughly map to the terms of latent projective, latent pattern, and manifest content analyses respectively, though not precisely enough to suggest that they are synonyms.

Graneheim and colleagues 9 reference the inductive, deductive, and abductive methods of interpretation of content analysis, which are data-driven, concept-driven, and fluid between both data and concepts, respectively. Where manifest content produces phenomenological descriptions most often (but not always) through deductive interpretation, and latent content analysis produces interpretations most often (but not always) through inductive or abductive interpretations. Erlingsson and Brysiewicz 23 refer to content analysis as a continuum, progressing as the researcher develops codes, then categories, and then themes. We present these alternative conceptualizations of content analysis to illustrate that the literature on content analysis, while incredibly useful, presents a multitude of interpretations of the method itself. However, these complexities should not dissuade readers from using content analysis. Identifying what you want to know (ie, your research question) will effectively direct you toward your methodological approach. That said, we have found the most helpful aid in learning content analysis is the application of the methods we have presented.

Ensuring Quality

The standards used to evaluate quantitative research are seldom used in qualitative research. The terms “reliability” and “validity” are typically not used because they reflect the positivist quantitative paradigm. In qualitative research, the preferred term is “trustworthiness,” which is comprised of the concepts of credibility, transferability, dependability, and confirmability, and researchers can take steps in their work to demonstrate that they are trustworthy. 24 Though establishing trustworthiness is outside the scope of this article, novice researchers should be familiar with the necessary steps before publishing their work. This suggestion includes exploration of the concept of saturation, the idea that researchers must demonstrate they have collected and analyzed enough data to warrant their conclusions, which has been a focus of recent debate in qualitative research. 25

There are several threats to the trustworthiness of content analysis in particular. 14 We will use the terms “reliability and validity” to describe these threats, as they are conceptualized this way in the formative literature, and it may be easier for researchers with a quantitative research background to recognize them. Though some of these threats may be particular to the type of data being analyzed, in general, there are risks specific to the different methods of content analysis. In manifest content analysis, reliability is necessary but not sufficient to establish validity. 14 Because there is little judgment required of the coders, lack of high inter-rater agreement among coders will render the data invalid. 14 Additionally, coder fatigue is a common threat to manifest content analysis because the coding is clerical and repetitive in nature.

For latent pattern content analysis, validity and reliability are inversely related. 14 Greater reliability is achieved through more detailed coding rules to improve consistency, but these rules may diminish the accessibility of the coding to consumers of the research. This is defined as low ecological validity. Higher ecological validity is achieved through greater reliance on coder judgment to increase the resonance of the results with the audience, yet this often decreases the inter-rater reliability. In latent projective content analysis, reliability and validity are equivalent. 14 Consistent interpretations among coders both establishes and validates the constructed norm; construction of an accurate norm is evidence of consistency. However, because of this equivalence, issues with low validity or low reliability cannot be isolated. A lack of consistency may result from coding rules, lack of a shared schema, or issues with a defined variable. Reasons for low validity cannot be isolated, but will always result in low consistency.

Any good analysis starts with a codebook and coder training. It is important for all coders to share the mental model of the skill, construct, or phenomenon being coded in the data. However, when conducting latent pattern or projective content analysis in particular, micro-level rules and definitions of codes increase the threat of ecological validity, so it is important to leave enough room in the codebook and during the training to allow for a shared mental schema to emerge in the larger group rather than being strictly directed by the lead researcher. Stability is another threat, which occurs when coders make different judgments as time passes. To reduce this risk, allowing for recoding at a later date can increase the consistency and stability of the codes. Reproducibility is not typically a goal of qualitative research, 15 but for content analysis, codes that are defined both prior to and during analysis should retain their meaning. Researchers can increase the reproducibility of their codebook by creating a detailed audit trail, including descriptions of the methods used to create and define the codes, materials used for the training of the coders, and steps taken to ensure inter-rater reliability.

In all forms of qualitative analysis, coder fatigue is a common threat to trustworthiness, even when the instructor is coding individually. Over time, the cases may start to look the same, making it difficult to refocus and look at each case with fresh eyes. To guard against this, coders should maintain a reflective journal and write analytical memos to help stay focused. Memos might include insights that the researcher has, such as patterns of misunderstanding, areas to focus on when considering re-teaching specific concepts, or specific conversations to have with students. Fatigue can also be mitigated by occasionally talking to participants (eg, meeting with students and listening for their rationale on why they included specific pieces of information in an assignment). These are just examples of potential exercises that can help coders mitigate cognitive fatigue. Most researchers develop their own ways to prevent the fatigue that can seep in after long hours of looking at data. But above all, a sufficient amount of time should be allowed for analysis, so that coders do not feel rushed, and regular breaks should be scheduled and enforced.

Qualitative content analysis is both accessible and high-yield for pharmacy educators and researchers. Though some of the methods may seem abstract or fluid, the nature of qualitative content analysis encompasses these concerns by providing a systematic approach to discover meaning in textual data, both on the surface and implied beneath it. As with most research methods, the surest path towards proficiency is through application and intentional, repeated practice. We encourage pharmacy educators to ask questions suited for qualitative research and to consider the use of content analysis as a qualitative research method for discovering meaning in their data.

  • What is content analysis?

Last updated

20 March 2023

Reviewed by

Miroslav Damyanov

When you're conducting qualitative research, you'll find yourself analyzing various texts. Perhaps you'll be evaluating transcripts from audio interviews you've conducted. Or you may find yourself assessing the results of a survey filled with open-ended questions.

Streamline content analysis

Bring all your qualitative research into one place to code and analyze with Dovetail

Content analysis is a research method used to identify the presence of various concepts, words, and themes in different texts. Two types of content analysis exist: conceptual analysis and relational analysis . In the former, researchers determine whether and how frequently certain concepts appear in a text. In relational analysis, researchers explore how different concepts are related to one another in a text. 

Both types of content analysis require the researcher to code the text. Coding the text means breaking it down into different categories that allow it to be analyzed more easily.

  • What are some common uses of content analysis?

You can use content analysis to analyze many forms of text, including:

Interview and discussion transcripts

Newspaper articles and headline

Literary works

Historical documents

Government reports

Academic papers

Music lyrics

Researchers commonly use content analysis to draw insights and conclusions from literary works. Historians and biographers may apply this approach to letters, papers, and other historical documents to gain insight into the historical figures and periods they are writing about. Market researchers can also use it to evaluate brand performance and perception.

Some researchers have used content analysis to explore differences in decision-making and other cognitive processes. While researchers traditionally used this approach to explore human cognition, content analysis is also at the heart of machine learning approaches currently being used and developed by software and AI companies.

  • Conducting a conceptual analysis

Conceptual analysis is more commonly associated with content analysis than relational analysis. 

In conceptual analysis, you're looking for the appearance and frequency of different concepts. Why? This information can help further your qualitative or quantitative analysis of a text. It's an inexpensive and easily understood research method that can help you draw inferences and conclusions about your research subject. And while it is a relatively straightforward analytical tool, it does consist of a multi-step process that you must closely follow to ensure the reliability and validity of your study.

When you're ready to conduct a conceptual analysis, refer to your research question and the text. Ask yourself what information likely found in the text is relevant to your question. You'll need to know this to determine how you'll code the text. Then follow these steps:

1. Determine whether you're looking for explicit terms or implicit terms.

Explicit terms are those that directly appear in the text, while implicit ones are those that the text implies or alludes to or that you can infer. 

Coding for explicit terms is straightforward. For example, if you're looking to code a text for an author's explicit use of color,  you'd simply code for every instance a color appears in the text. However, if you're coding for implicit terms, you'll need to determine and define how you're identifying the presence of the term first. Doing so involves a certain amount of subjectivity and may impinge upon the reliability and validity of your study .

2. Next, identify the level at which you'll conduct your analysis.

You can search for words, phrases, or sentences encapsulating your terms. You can also search for concepts and themes, but you'll need to define how you expect to identify them in the text. You must also define rules for how you'll code different terms to reduce ambiguity. For example, if, in an interview transcript, a person repeats a word one or more times in a row as a verbal tic, should you code it more than once? And what will you do with irrelevant data that appears in a term if you're coding for sentences? 

Defining these rules upfront can help make your content analysis more efficient and your final analysis more reliable and valid.

3. You'll need to determine whether you're coding for a concept or theme's existence or frequency.

If you're coding for its existence, you’ll only count it once, at its first appearance, no matter how many times it subsequently appears. If you're searching for frequency, you'll count the number of its appearances in the text.

4. You'll also want to determine the number of terms you want to code for and how you may wish to categorize them.

For example, say you're conducting a content analysis of customer service call transcripts and looking for evidence of customer dissatisfaction with a product or service. You might create categories that refer to different elements with which customers might be dissatisfied, such as price, features, packaging, technical support, and so on. Then you might look for sentences that refer to those product elements according to each category in a negative light.

5. Next, you'll need to develop translation rules for your codes.

Those rules should be clear and consistent, allowing you to keep track of your data in an organized fashion.

6. After you've determined the terms for which you're searching, your categories, and translation rules, you're ready to code.

You can do so by hand or via software. Software is quite helpful when you have multiple texts. But it also becomes more vital for you to have developed clear codes, categories, and translation rules, especially if you're looking for implicit terms and concepts. Otherwise, your software-driven analysis may miss key instances of the terms you seek.

7. When you have your text coded, it's time to analyze it.

Look for trends and patterns in your results and use them to draw relevant conclusions about your research subject.

  • Conducting a relational analysis

In a relational analysis, you're examining the relationship between different terms that appear in your text(s). To do so requires you to code your texts in a similar fashion as in a relational analysis. However, depending on the type of relational analysis you're trying to conduct, you may need to follow slightly different rules.

Three types of relational analyses are commonly used: affect extraction , proximity analysis , and cognitive mapping .

Affect extraction

This type of relational analysis involves evaluating the different emotional concepts found in a specific text. While the insights from affect extraction can be invaluable, conducting it may prove difficult depending on the text. For example, if the text captures people's emotional states at different times and from different populations, you may find it difficult to compare them and draw appropriate inferences.

Proximity analysis

A relatively simpler analytical approach than affect extraction, proximity analysis assesses the co-occurrence of explicit concepts in a text. You can create what's known as a concept matrix, which is a group of interrelated co-occurring concepts. Concept matrices help evaluate and determine the overall meaning of a text or the identification of a secondary message or theme.

Cognitive mapping

You can use cognitive mapping as a way to visualize the results of either affect extraction or proximity analysis. This technique uses affect extraction or proximity analysis results to create a graphic map illustrating the relationship between co-occurring emotions or concepts.

To conduct a relational analysis, you must start by determining the type of analysis that best fits the study: affect extraction or proximity analysis. 

Complete steps one through six as outlined above. When it comes to the seventh step, analyze the text according to the relational analysis type they've chosen. During this step, feel free to use cognitive mapping to help draw inferences and conclusions about the relationships between co-occurring emotions or concepts. And use other tools, such as mental modeling and decision mapping as necessary, to analyze the results.

  • The advantages of content analysis

Content analysis provides researchers with a robust and inexpensive method to qualitatively and quantitatively analyze a text. By coding the data, you can perform statistical analyses of the data to affirm and reinforce conclusions you may draw. And content analysis can provide helpful insights into language use, behavioral patterns, and historical or cultural conventions that can be valuable beyond the scope of the initial study.

When content analyses are applied to interview data, the approach provides a way to closely analyze data without needing interview-subject interaction, which can be helpful in certain contexts. For example, suppose you want to analyze the perceptions of a group of geographically diverse individuals. In this case, you can conduct a content analysis of existing interview transcripts rather than assuming the time and expense of conducting new interviews.

What is meant by content analysis?

Content analysis is a research method that helps a researcher explore the occurrence of and relationships between various words, phrases, themes, or concepts in a text or set of texts. The method allows researchers in different disciplines to conduct qualitative and quantitative analyses on a variety of texts.

Where is content analysis used?

Content analysis is used in multiple disciplines, as you can use it to evaluate a variety of texts. You can find applications in anthropology, communications, history, linguistics, literary studies, marketing, political science, psychology, and sociology, among other disciplines.

What are the two types of content analysis?

Content analysis may be either conceptual or relational. In a conceptual analysis, researchers examine a text for the presence and frequency of specific words, phrases, themes, and concepts. In a relational analysis, researchers draw inferences and conclusions about the nature of the relationships of co-occurring words, phrases, themes, and concepts in a text.

What's the difference between content analysis and thematic analysis?

Content analysis typically uses a descriptive approach to the data and may use either qualitative or quantitative analytical methods. By contrast, a thematic analysis only uses qualitative methods to explore frequently occurring themes in a text.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 13 April 2023

Last updated: 14 February 2024

Last updated: 27 January 2024

Last updated: 18 April 2023

Last updated: 8 February 2023

Last updated: 23 January 2024

Last updated: 30 January 2024

Last updated: 7 February 2023

Last updated: 7 March 2023

Last updated: 18 May 2023

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

what is content analysis in the research

Users report unexpectedly high data usage, especially during streaming sessions.

what is content analysis in the research

Users find it hard to navigate from the home page to relevant playlists in the app.

what is content analysis in the research

It would be great to have a sleep timer feature, especially for bedtime listening.

what is content analysis in the research

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Grad Coach

What Is Qualitative Content Analysis?

Qca explained simply (with examples).

By: Jenna Crosley (PhD). Reviewed by: Dr Eunice Rautenbach (DTech) | February 2021

If you’re in the process of preparing for your dissertation, thesis or research project, you’ve probably encountered the term “ qualitative content analysis ” – it’s quite a mouthful. If you’ve landed on this post, you’re probably a bit confused about it. Well, the good news is that you’ve come to the right place…

Overview: Qualitative Content Analysis

  • What (exactly) is qualitative content analysis
  • The two main types of content analysis
  • When to use content analysis
  • How to conduct content analysis (the process)
  • The advantages and disadvantages of content analysis

1. What is content analysis?

Content analysis is a  qualitative analysis method  that focuses on recorded human artefacts such as manuscripts, voice recordings and journals. Content analysis investigates these written, spoken and visual artefacts without explicitly extracting data from participants – this is called  unobtrusive  research.

In other words, with content analysis, you don’t necessarily need to interact with participants (although you can if necessary); you can simply analyse the data that they have already produced. With this type of analysis, you can analyse data such as text messages, books, Facebook posts, videos, and audio (just to mention a few).

The basics – explicit and implicit content

When working with content analysis, explicit and implicit content will play a role. Explicit data is transparent and easy to identify, while implicit data is that which requires some form of interpretation and is often of a subjective nature. Sounds a bit fluffy? Here’s an example:

Joe: Hi there, what can I help you with? 

Lauren: I recently adopted a puppy and I’m worried that I’m not feeding him the right food. Could you please advise me on what I should be feeding? 

Joe: Sure, just follow me and I’ll show you. Do you have any other pets?

Lauren: Only one, and it tweets a lot!

In this exchange, the explicit data indicates that Joe is helping Lauren to find the right puppy food. Lauren asks Joe whether she has any pets aside from her puppy. This data is explicit because it requires no interpretation.

On the other hand, implicit data , in this case, includes the fact that the speakers are in a pet store. This information is not clearly stated but can be inferred from the conversation, where Joe is helping Lauren to choose pet food. An additional piece of implicit data is that Lauren likely has some type of bird as a pet. This can be inferred from the way that Lauren states that her pet “tweets”.

As you can see, explicit and implicit data both play a role in human interaction  and are an important part of your analysis. However, it’s important to differentiate between these two types of data when you’re undertaking content analysis. Interpreting implicit data can be rather subjective as conclusions are based on the researcher’s interpretation. This can introduce an element of bias , which risks skewing your results.

Explicit and implicit data both play an important role in your content analysis, but it’s important to differentiate between them.

2. The two types of content analysis

Now that you understand the difference between implicit and explicit data, let’s move on to the two general types of content analysis : conceptual and relational content analysis. Importantly, while conceptual and relational content analysis both follow similar steps initially, the aims and outcomes of each are different.

Conceptual analysis focuses on the number of times a concept occurs in a set of data and is generally focused on explicit data. For example, if you were to have the following conversation:

Marie: She told me that she has three cats.

Jean: What are her cats’ names?

Marie: I think the first one is Bella, the second one is Mia, and… I can’t remember the third cat’s name.

In this data, you can see that the word “cat” has been used three times. Through conceptual content analysis, you can deduce that cats are the central topic of the conversation. You can also perform a frequency analysis , where you assess the term’s frequency in the data. For example, in the exchange above, the word “cat” makes up 9% of the data. In other words, conceptual analysis brings a little bit of quantitative analysis into your qualitative analysis.

As you can see, the above data is without interpretation and focuses on explicit data . Relational content analysis, on the other hand, takes a more holistic view by focusing more on implicit data in terms of context, surrounding words and relationships.

There are three types of relational analysis:

  • Affect extraction
  • Proximity analysis
  • Cognitive mapping

Affect extraction is when you assess concepts according to emotional attributes. These emotions are typically mapped on scales, such as a Likert scale or a rating scale ranging from 1 to 5, where 1 is “very sad” and 5 is “very happy”.

If participants are talking about their achievements, they are likely to be given a score of 4 or 5, depending on how good they feel about it. If a participant is describing a traumatic event, they are likely to have a much lower score, either 1 or 2.

Proximity analysis identifies explicit terms (such as those found in a conceptual analysis) and the patterns in terms of how they co-occur in a text. In other words, proximity analysis investigates the relationship between terms and aims to group these to extract themes and develop meaning.

Proximity analysis is typically utilised when you’re looking for hard facts rather than emotional, cultural, or contextual factors. For example, if you were to analyse a political speech, you may want to focus only on what has been said, rather than implications or hidden meanings. To do this, you would make use of explicit data, discounting any underlying meanings and implications of the speech.

Lastly, there’s cognitive mapping, which can be used in addition to, or along with, proximity analysis. Cognitive mapping involves taking different texts and comparing them in a visual format – i.e. a cognitive map. Typically, you’d use cognitive mapping in studies that assess changes in terms, definitions, and meanings over time. It can also serve as a way to visualise affect extraction or proximity analysis and is often presented in a form such as a graphic map.

Example of a cognitive map

To recap on the essentials, content analysis is a qualitative analysis method that focuses on recorded human artefacts . It involves both conceptual analysis (which is more numbers-based) and relational analysis (which focuses on the relationships between concepts and how they’re connected).

Need a helping hand?

what is content analysis in the research

3. When should you use content analysis?

Content analysis is a useful tool that provides insight into trends of communication . For example, you could use a discussion forum as the basis of your analysis and look at the types of things the members talk about as well as how they use language to express themselves. Content analysis is flexible in that it can be applied to the individual, group, and institutional level.

Content analysis is typically used in studies where the aim is to better understand factors such as behaviours, attitudes, values, emotions, and opinions . For example, you could use content analysis to investigate an issue in society, such as miscommunication between cultures. In this example, you could compare patterns of communication in participants from different cultures, which will allow you to create strategies for avoiding misunderstandings in intercultural interactions.

Another example could include conducting content analysis on a publication such as a book. Here you could gather data on the themes, topics, language use and opinions reflected in the text to draw conclusions regarding the political (such as conservative or liberal) leanings of the publication.

Content analysis is typically used in projects where the research aims involve getting a better understanding of factors such as behaviours, attitudes, values, emotions, and opinions.

4. How to conduct a qualitative content analysis

Conceptual and relational content analysis differ in terms of their exact process ; however, there are some similarities. Let’s have a look at these first – i.e., the generic process:

  • Recap on your research questions
  • Undertake bracketing to identify biases
  • Operationalise your variables and develop a coding scheme
  • Code the data and undertake your analysis

Step 1 – Recap on your research questions

It’s always useful to begin a project with research questions , or at least with an idea of what you are looking for. In fact, if you’ve spent time reading this blog, you’ll know that it’s useful to recap on your research questions, aims and objectives when undertaking pretty much any research activity. In the context of content analysis, it’s difficult to know what needs to be coded and what doesn’t, without a clear view of the research questions.

For example, if you were to code a conversation focused on basic issues of social justice, you may be met with a wide range of topics that may be irrelevant to your research. However, if you approach this data set with the specific intent of investigating opinions on gender issues, you will be able to focus on this topic alone, which would allow you to code only what you need to investigate.

With content analysis, it’s difficult to know what needs to be coded  without a clear view of the research questions.

Step 2 – Reflect on your personal perspectives and biases

It’s vital that you reflect on your own pre-conception of the topic at hand and identify the biases that you might drag into your content analysis – this is called “ bracketing “. By identifying this upfront, you’ll be more aware of them and less likely to have them subconsciously influence your analysis.

For example, if you were to investigate how a community converses about unequal access to healthcare, it is important to assess your views to ensure that you don’t project these onto your understanding of the opinions put forth by the community. If you have access to medical aid, for instance, you should not allow this to interfere with your examination of unequal access.

You must reflect on the preconceptions and biases that you might drag into your content analysis - this is called "bracketing".

Step 3 – Operationalise your variables and develop a coding scheme

Next, you need to operationalise your variables . But what does that mean? Simply put, it means that you have to define each variable or construct . Give every item a clear definition – what does it mean (include) and what does it not mean (exclude). For example, if you were to investigate children’s views on healthy foods, you would first need to define what age group/range you’re looking at, and then also define what you mean by “healthy foods”.

In combination with the above, it is important to create a coding scheme , which will consist of information about your variables (how you defined each variable), as well as a process for analysing the data. For this, you would refer back to how you operationalised/defined your variables so that you know how to code your data.

For example, when coding, when should you code a food as “healthy”? What makes a food choice healthy? Is it the absence of sugar or saturated fat? Is it the presence of fibre and protein? It’s very important to have clearly defined variables to achieve consistent coding – without this, your analysis will get very muddy, very quickly.

When operationalising your variables, you must give every item a clear definition. In other words, what does it mean (include) and what does it not mean (exclude).

Step 4 – Code and analyse the data

The next step is to code the data. At this stage, there are some differences between conceptual and relational analysis.

As described earlier in this post, conceptual analysis looks at the existence and frequency of concepts, whereas a relational analysis looks at the relationships between concepts. For both types of analyses, it is important to pre-select a concept that you wish to assess in your data. Using the example of studying children’s views on healthy food, you could pre-select the concept of “healthy food” and assess the number of times the concept pops up in your data.

Here is where conceptual and relational analysis start to differ.

At this stage of conceptual analysis , it is necessary to decide on the level of analysis you’ll perform on your data, and whether this will exist on the word, phrase, sentence, or thematic level. For example, will you code the phrase “healthy food” on its own? Will you code each term relating to healthy food (e.g., broccoli, peaches, bananas, etc.) with the code “healthy food” or will these be coded individually? It is very important to establish this from the get-go to avoid inconsistencies that could result in you having to code your data all over again.

On the other hand, relational analysis looks at the type of analysis. So, will you use affect extraction? Proximity analysis? Cognitive mapping? A mix? It’s vital to determine the type of analysis before you begin to code your data so that you can maintain the reliability and validity of your research .

what is content analysis in the research

How to conduct conceptual analysis

First, let’s have a look at the process for conceptual analysis.

Once you’ve decided on your level of analysis, you need to establish how you will code your concepts, and how many of these you want to code. Here you can choose whether you want to code in a deductive or inductive manner. Just to recap, deductive coding is when you begin the coding process with a set of pre-determined codes, whereas inductive coding entails the codes emerging as you progress with the coding process. Here it is also important to decide what should be included and excluded from your analysis, and also what levels of implication you wish to include in your codes.

For example, if you have the concept of “tall”, can you include “up in the clouds”, derived from the sentence, “the giraffe’s head is up in the clouds” in the code, or should it be a separate code? In addition to this, you need to know what levels of words may be included in your codes or not. For example, if you say, “the panda is cute” and “look at the panda’s cuteness”, can “cute” and “cuteness” be included under the same code?

Once you’ve considered the above, it’s time to code the text . We’ve already published a detailed post about coding , so we won’t go into that process here. Once you’re done coding, you can move on to analysing your results. This is where you will aim to find generalisations in your data, and thus draw your conclusions .

How to conduct relational analysis

Now let’s return to relational analysis.

As mentioned, you want to look at the relationships between concepts . To do this, you’ll need to create categories by reducing your data (in other words, grouping similar concepts together) and then also code for words and/or patterns. These are both done with the aim of discovering whether these words exist, and if they do, what they mean.

Your next step is to assess your data and to code the relationships between your terms and meanings, so that you can move on to your final step, which is to sum up and analyse the data.

To recap, it’s important to start your analysis process by reviewing your research questions and identifying your biases . From there, you need to operationalise your variables, code your data and then analyse it.

Time to analyse

5. What are the pros & cons of content analysis?

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

For example, with conceptual analysis, you can count the number of times that a term or a code appears in a dataset, which can be assessed from a quantitative standpoint. In addition to this, you can then use a qualitative approach to investigate the underlying meanings of these and relationships between them.

Content analysis is also unobtrusive and therefore poses fewer ethical issues than some other analysis methods. As the content you’ll analyse oftentimes already exists, you’ll analyse what has been produced previously, and so you won’t have to collect data directly from participants. When coded correctly, data is analysed in a very systematic and transparent manner, which means that issues of replicability (how possible it is to recreate research under the same conditions) are reduced greatly.

On the downside , qualitative research (in general, not just content analysis) is often critiqued for being too subjective and for not being scientifically rigorous enough. This is where reliability (how replicable a study is by other researchers) and validity (how suitable the research design is for the topic being investigated) come into play – if you take these into account, you’ll be on your way to achieving sound research results.

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

Recap: Qualitative content analysis

In this post, we’ve covered a lot of ground – click on any of the sections to recap:

If you have any questions about qualitative content analysis, feel free to leave a comment below. If you’d like 1-on-1 help with your qualitative content analysis, be sure to book an initial consultation with one of our friendly Research Coaches.

what is content analysis in the research

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Narrative analysis explainer

15 Comments

Abhishek

If I am having three pre-decided attributes for my research based on which a set of semi-structured questions where asked then should I conduct a conceptual content analysis or relational content analysis. please note that all three attributes are different like Agility, Resilience and AI.

Ofori Henry Affum

Thank you very much. I really enjoyed every word.

Janak Raj Bhatta

please send me one/ two sample of content analysis

pravin

send me to any sample of qualitative content analysis as soon as possible

abdellatif djedei

Many thanks for the brilliant explanation. Do you have a sample practical study of a foreign policy using content analysis?

DR. TAPAS GHOSHAL

1) It will be very much useful if a small but complete content analysis can be sent, from research question to coding and analysis. 2) Is there any software by which qualitative content analysis can be done?

Carkanirta

Common software for qualitative analysis is nVivo, and quantitative analysis is IBM SPSS

carmely

Thank you. Can I have at least 2 copies of a sample analysis study as my reference?

Yang

Could you please send me some sample of textbook content analysis?

Abdoulie Nyassi

Can I send you my research topic, aims, objectives and questions to give me feedback on them?

Bobby Benjamin Simeon

please could you send me samples of content analysis?

Obi Clara Chisom

Yes please send

Gaid Ahmed

really we enjoyed your knowledge thanks allot. from Ethiopia

Ary

can you please share some samples of content analysis(relational)? I am a bit confused about processing the analysis part

eeeema

Is it possible for you to list the journal articles and books or other sources you used to write this article? Thank you.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • How it works

researchprospect post subheader

What is Content Analysis – Steps & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On August 29, 2023

“The content analysis identifies specific words, patterns, concepts, themes, phrases, characters, or sentences within the recorded communication content.”

To conduct content analysis, you need to gather data from multiple sources; it can be anything or any form of data, including text, audio, or videos.

Depending on the requirements of your analysis, you may have to use a  primary or secondary form of data , including:

The Purpose of Content Analysis

There are so many objectives of content analysis. Some fundamental objectives are given below.

  • To simplify the content.
  • To get a clear, in-depth meaning of the language.
  • To identify the uses of language.
  • To know the impact of language on society.
  • To find out the association of the language with cultures, interpersonal relationships, and communication.
  • To gain an in-depth understanding of the concept.
  • To find out the context, behaviour, and response of the speaker.
  • To analyse the trends and association between the text and multimedia.

When to Use Content Analysis? 

There are many uses of the content analysis; some of them are listed below:

The content analysis is used.

  • To represent the content precisely, breaking it into short form.
  • To describe the characteristics of the content.
  • To support an argument.
  • It is used in many walks of life, including marketing, media, literature, etc.
  • It is used for extracting essential information from a large amount of data.

Types of Content Analysis

Content analysis is a broad concept, and it has various types depending on various fields. However, people from all walks of life use it at their convenience. Some of the popular methods are given below:

Confused between qualitative and quantitative methods of data analysis? No idea what discourse and content analysis are?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Advantages and Disadvantages of Content Analysis

Content analysis has so many benefits, which are given below.

Content analysis:

  • Offers both qualitative and quantitative analysis of the communication.
  • Provides an in-depth understanding of the content by making it precise.
  • Enables us to understand the context and perception of the speaker.
  • Provides insight into complex models of human thoughts and language use.
  • Provides historical/cultural insight.
  • It can be applied at any given time, place, and people.
  • It helps to learn any language, its origin, and association with society and culture

Disadvantages

There are also some disadvantages of using the method of content analysis which are given below:

  • is very time-consuming.
  • Cannot interpret a large amount of data accurately and is subjected to increased error.
  • Cannot be computerised easily.

How to Conduct a Content Analysis?

If you want to conduct the content analysis, so here are some steps that you have to follow for that purpose. Those steps are given below.

Develop a Research Question and Select the Content

It’s essential to have a  research question to proceed with your study.  After selecting your research question, you need to find out the relevant resources to analyse.

Example:  If you want to find out the impact of plagiarism on the credibility of the authors. You can examine the relevant materials available on the topic from the internet, newspapers, and books published during the past 5-10 years.

Could you read it Thoroughly?

At this point, you have to read the content thoroughly until you understand it. 

Condensation

It would help if you broke the text into smaller portions for clear interpretation. In short, you have to create categories or smaller text from a large amount of given data.

The unit of analysis  is the basic unit of text to be classified. It can be a word, phrase, a theme, a plot, a newspaper article.

Code the Content

It takes a long to go through the textual data. Coding is a way of tagging the data and organising it into a sequence of symbols, numbers, and letters to highlight the relevant points. At this point, you have to draw meanings from those condensed parts. You have to understand the meaning and context of the text and the speaker clearly. 

Analyse and Interpret the Data

You can use statistical analysis to analyse the data. It is a method of collecting, analysing, and interpreting ample data to discover underlying patterns and details. Statistics are used in every field to make better decisions. It would help if you aimed to retain the meaning of the content while making it precise.

Frequently Asked Questions

How to perform content analysis.

To perform content analysis:

  • Define research objectives.
  • Select a representative sample.
  • Develop coding categories.
  • Analyze content systematically.
  • Apply coding to data.
  • Interpret results to draw insights about themes, patterns, and meanings.

You May Also Like

Ethnography is a type of research where a researcher observes the people in their natural environment. Here is all you need to know about ethnography.

You can transcribe an interview by converting a conversation into a written format including question-answer recording sessions between two or more people.

What are the different types of research you can use in your dissertation? Here are some guidelines to help you choose a research strategy that would make your research more credible.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

what is content analysis in the research

Live revision! Join us for our free exam revision livestreams Watch now →

Reference Library

Collections

  • See what's new
  • All Resources
  • Student Resources
  • Assessment Resources
  • Teaching Resources
  • CPD Courses
  • Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

  • All Psychology Resources

Resource Selections

Currated lists of resources

  • Study Notes

Content Analysis

Last updated 22 Mar 2021

  • Share on Facebook
  • Share on Twitter
  • Share by Email

Content analysis is a method used to analyse qualitative data (non-numerical data). In its most common form it is a technique that allows a researcher to take qualitative data and to transform it into quantitative data (numerical data). The technique can be used for data in many different formats, for example interview transcripts, film, and audio recordings.

The researcher conducting a content analysis will use ‘coding units’ in their work. These units vary widely depending on the data used, but an example would be the number of positive or negative words used by a mother to describe her child’s behaviour or the number of swear words in a film.

The procedure for a content analysis is shown below:

what is content analysis in the research

Strengths of content analysis

It is a reliable way to analyse qualitative data as the coding units are not open to interpretation and so are applied in the same way over time and with different researchers

It is an easy technique to use and is not too time consuming

It allows a statistical analysis to be conducted if required as there is usually quantitative data as a result of the procedure

Weaknesses of content analysis

Causality cannot be established as it merely describes the data

As it only describes the data it cannot extract any deeper meaning or explanation for the data patterns arising.

  • Content Analysis

You might also like

A level psychology topic quiz - research methods.

Quizzes & Activities

Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

Example Answers for Research Methods: A Level Psychology, Paper 2, June 2018 (AQA)

Exam Support

Our subjects

  • › Criminology
  • › Economics
  • › Geography
  • › Health & Social Care
  • › Psychology
  • › Sociology
  • › Teaching & learning resources
  • › Student revision workshops
  • › Online student courses
  • › CPD for teachers
  • › Livestreams
  • › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

  • › Contact us
  • › Terms of use
  • › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 November 2023

Connecting with fans in the digital age: an exploratory and comparative analysis of social media management in top football clubs

  • Edgar Romero-Jara 1 ,
  • Francesc Solanellas 2 ,
  • Joshua Muñoz   ORCID: orcid.org/0000-0001-6220-6328 2 &
  • Samuel López-Carril   ORCID: orcid.org/0000-0001-5278-057X 3  

Humanities and Social Sciences Communications volume  10 , Article number:  858 ( 2023 ) Cite this article

7175 Accesses

3 Citations

3 Altmetric

Metrics details

  • Business and management
  • Cultural and media studies

In a globalised society, characterised by increasingly demanding markets and the accelerated growth of the digital approach, sports organisations face the challenge of connecting with fans, generating and maintaining audiences and communicating with stakeholders creatively and efficiently. Social media has become a fundamental tool, with engagement as a critical measurement element. However, despite its popularity and use, many questions about its application, measurement and real potential in the sports sector still need to be answered. Therefore, the main objective of this study is to carry out a descriptive and comparative analysis of the engagement generated through social media posts by elite football clubs in Europe, South America and North America. To this purpose, 19,745 Facebook, Twitter and Instagram posts were analysed, through the design, validation and application of an observation instrument, using content analysis techniques. The findings show evidence of a priority focus on “Marketing” and “Sports” type messages in terms of frequency, with high engagement rates. They were also showing a growing stream of “ESG” type messages, with a low posting frequency but engagement rates similar to “Marketing” and “Sport”. “Institutional” messages remain constant in all football clubs. “Commercial” messages still have growth potential in both regards, frequency and engaging fans, representing an opportunity for digital assets. Also, specific format combinations that generate greater engagement were identified: “text/image” and “text/videos” are the format combinations more used by football clubs on Facebook, Twitter and Instagram; however, resulting in different engagement rates. This study showed evidence of different social media management strategies adopted according to region, obtaining similar engagement rates. This research concludes with theoretical and practical applications that will be of interest to both academics and practitioners to maximise the potential of social media for fan engagement, social initiatives and as a marketing tool.

Similar content being viewed by others

what is content analysis in the research

Determinants of behaviour and their efficacy as targets of behavioural change interventions

what is content analysis in the research

Toolbox of individual-level interventions against online misinformation

what is content analysis in the research

Mechanisms linking social media use to adolescent mental health vulnerability

Introduction.

In a context of booming technology and high organisational competitiveness (Ratten, 2020 ), digital tools have evolved from an essential add-on to crucial strategic and operational elements in sports organisations (Stegmann et al., 2021 ). Fans increasingly demand a connection with their favourite athletes and teams (Su et al., 2020 ) through digital channels such as social media, podcasts (Rohden et al., 2023 ), Esports (Cuesta-Valiño et al., 2022 ), among others. Today’s digitised world presents therefore, an opportunity for brands, sponsors, sports properties, and other stakeholders to interact in a complex and emotionally charged sector (Su et al., 2022 ) for fans from different age generations (Sheldon et al., 2021 ). Understanding and getting to know fans are at the forefront of every sports organisation’s objective.

Social media plays a fundamental role due to their ability to reach multiple audiences faster and generate a sense of connection with fans through a key measurement element: engagement (Doyle et al., 2022 ). Sports organisations, specifically football clubs, invest time, people and resources in managing social media to achieve their brand positioning and commercial and communication objectives (Anagnostopoulos et al., 2018 ; Maderer et al., 2018 ), with Facebook, Twitter and more recently, Instagram, being the most widely used (Abeza et al., 2019 ; Machado et al., 2020 ). However, the real potential of social media and its optimal use still poses many questions to be answered.

Although there are previous studies that have explored some aspects of social media in a sports context (e.g., Anagnostopoulos et al., 2018 ; Mastromartino and Naraine, 2022 ; Su et al., 2020 ), the potential impact and efficiency of content posted by football clubs on their social media channels remains unclear. For example, several studies point to various factors that contribute to fan engagement on social media depending on elements such as the type of content, the format used (e.g. photo, text or a combination of both) or the social media platform (see Einsle et al., 2023 ; Maderer et al., 2018 ; Su et al., 2020 ). This gap in the literature prompts a call to action from across the domains of sports marketing and sports management. Identifying the elements generated by football clubs on their official social media profiles can help them improve their marketing strategies and better support their fans. Based on this need and opportunity for management improvement, this study addresses the following research question:

RQ . What are the main characteristics of Facebook, Twitter, and Instagram posts from elite football clubs to understand the content type, format and social media platform that generate the highest engagement among social media consumers?

Grounded on the theoretical framework of relationship marketing, the main objective of this study is to carry out a descriptive and comparative analysis of the engagement generated through social media posts on Facebook, Twitter and Instagram by elite football clubs in Europe, South America and North America, using a categorisation approach developed from an existing model in the literature (see Solanellas et al., 2022 ), as well as the identification of key elements of high-impact social media posts. For this purpose, a new instrument was designed, validated and applied to analyse the use of social media as a marketing tool in sports management. By conducting this exploration, this paper contributes to the literature on sports marketing by identifying which social media and which types of content provoke the most interaction among fans. As a result, football team managers can gain a better understanding of how to target and personalise potential commercial and branding actions, thereby reinforcing the loyalty and commitment of fans to football clubs, and opening or consolidating new lines of action aligned with the strategic objectives of sport entities. Furthermore, the findings and conclusions presented in this study can assist sports managers in the decision-making process, as well as in planning, organising, directing, and effectively controlling social media platforms, thus enhancing engagement with fans in a digital environment.

The article is structured as follows. Firstly, the literature review presents the main theoretical and conceptual elements, focusing on social media and their relationship with marketing theory in sports and football. Secondly, the methodological aspects guiding the study’s process are detailed, including sample, instrument, research procedure, and data analysis. Thirdly, the study’s main results are presented. Fourth, the discussion section critically examines the findings in the context of existing literature, offering practical and theoretical implications for both academics and practitioners. Finally, the study concludes with the main conclusions and limitations.

Literature review

Social media and sports, a combination of great potential.

Social media is a collective term for media tools, platforms, and applications allowing consumers to connect, communicate, and collaborate (Williams and Chinn, 2010 ). They encourage interaction between users and the organisation and provide information from customers and the organisation faster than through conventional media (Kümpel et al., 2015 ; Shilbury et al., 2014 ). Furthermore, social media is considered a mass phenomenon due to its ability to transmit information in an agile and interactive way (Vivar, 2009 ), as well as a unique form of communication that transcends geographical and social boundaries through the instantaneous communication of information (Filo et al., 2015 ). Social Media is used in different sectors for marketing activities (Chen, 2023 ), brand equity and loyalty (Malarvizhi et al., 2022 ) to understand consumer´s behaviour, brand positioning, business revenue opportunities and social communication (Ramos et al., 2019 ). However, although the first studies about this phenomenon have been explored in the sports industry field, there is still a need for more evidence about its real potential, essential elements, and efficiency measurement in the sector.

Due to the high graphic, interactive and visual content of social media, their use in the sports industry, a sector of strong emotional influence, has become more relevant and pervasive in the last decade (Hull and Abeza, 2021 ), where the interest of the viewer has become crucial and increasingly demanding (Nisar et al. 2018 ). The differences that make the sports industry unique and particular are, among others: immediate results and changes (Davis and Hilbert, 2013 ) in addition to the fact that every decision is “in the spotlight” of the public (alluding to the complexity of fans, athletes, coaches, media and other stakeholders). Thus, athletes, teams and sports organisations have been using social media as part of their public relations and communication efforts (Filo et al., 2015 ; Pegoraro, 2010 ; Yan et al., 2019 ) to engage with their partners and fans (Zakerian et al., 2022 ), promoting interactions and increasing engagement with the sport product, as well as with the team in general (Abeza et al., 2019 ; Parganas and Anagnostopoulos, 2015 ).

The linking of social media within the integrated marketing communication process has changed communication strategies and consumer outreach, where marketing managers must include these tools when developing and executing their customer-focused promotional strategies (Lee and Kahle, 2016 ; Rehman et al., 2022 ). On the other hand, social media, directly and indirectly, impacts revenue generation and favours negotiation with sponsors due to their notoriety, visibility, and reach (Mastromartino and Naraine, 2022 ; Parganas and Anagnostopoulos, 2015 ). They are therefore considered a key tool for building and enhancing a brand’s reputation (Maderer et al., 2018 ) and an ideal platform to advertise and increase the visibility of a brand or company, as well as to interact with and analyse the actions of their fans and followers (Abeza et al., 2017 ; García-Fernández et al., 2015 ; Herrera-Torres et al., 2017 ).

Social media has also been used in sports education in recent years (Sanz-Labrador et al., 2021 ). Moreover, their application is increasingly common in construction and dissemination related to social responsibility (López-Carril and Anagnostopoulos, 2020 ; Sharpe et al., 2020 ). In this way, they have also become a key tool for interacting with fans, addressing a strengthened social approach, and gaining engagement from athletes, sponsors, and authorities (Einsle et al., 2023 ; Oviedo et al., 2014 ; Su et al., 2020 ). Beyond the digital environment, Cuesta-Valiño et al. ( 2021 ) pointed out the relevance of considering the emerging sustainable management approach to measure sports organisations’ goals. One of the most relevant challenges for this industry is to issue social media posts efficiently, using the proper formatting resources and at the right time, to generate the most significant possible impact and engagement.

Relationship marketing theory applied to social media in sports

The sports industry is a fast-growing and increasingly diverse market worldwide (Kim and Andrew, 2016 ). Football (soccer in North America) is one of the most popular sports worldwide as well as a cultural manifestation, characterised by its high emotional level and economic, political and social relevance (Bucher and Eckl, 2022 ; Petersen-Wagner and Ludvigsen, 2022 ). Only in Spain, the sports sector generates 3.3% of the Gross Domestic Product (GDP), of which 1.37% is produced through football (PWC, 2020 ).

Globalisation has demanded an adaptation at all levels due to the endless search for immediacy and access to information, where the business of sports is becoming more and more relationship-based and the importance of generating engagement (Einsle et al., 2023 ; Fried and Mumcu, 2017 ; García-Fernández et al., 2017 ) is one of the most relevant variables in generating loyalty in sports organisations (Loranca-Valle et al., 2021 ; Núñez-Barriopedro et al., 2021 ). Sports consumers are seen as “channels” through which sports products can be promoted (O’Shea and Alonso, 2011 ), and sports fans have become both the consumer and the advocates of the product. This is where relationship marketing theory helps us to better understand this phenomenon. As Abeza and Sanderson ( 2022 , p. 287) point out, relationship marketing theory “is based on the idea that a relationship between two parties creates additional value for those involved”. This theory is one of the most widely used to understand the phenomenon of social media in sports (Abeza and Sanderson, 2022 ) as highlighted by numerous authors who have used it in their studies (e.g., Abeza et al., 2017 , 2019 , 2020 ; Su et al., 2020 ; Williams and Chinn, 2010 ).

Merging the roots of relationship marketing theory (Möller and Halinen, 2000 ) and the particular characteristics of the sports sector, and taking into account the perspective of short-term transactions and immediate economic benefits (Abeza et al., 2017 ), social media represents opportunities for better knowledge about fans, more advanced consumer–organisation interaction, efficient fan engagement, efficient use of resources and agile evaluation of the relationship between fans and organisation (Abeza et al., 2019 , 2020 ). In view of this, and in line with Abeza and Sanderson ( 2022 ), social media thus becomes a channel through which to establish, maintain and cultivate long-term relationships beneficial to both parties (in our study, football clubs and fans).

Previous studies have addressed the use of specific social media in the context of sports, such as Facebook (Achen, 2019 ; Meng et al., 2015 ; Pegoraro et al., 2017 ; Waters et al., 2009 ), Twitter (Blaszka et al., 2012 ; Hambrick et al., 2010 ; Lovejoy and Saxton, 2012 ; Winand et al., 2019 ; Witkemper et al., 2012 ) and Instagram (Anagnostopoulos et al., 2018 ; Machado et al., 2020 ; Zakerian et al., 2022 ), because of the relevance in the use of these platforms in the sports sector. From another broader perspective, Solanellas et al. ( 2022 ) propose a practical analysis of multiple social media in sports organisations from a content categorisation point of view.

The results and contributions of the studies mentioned above, reveal the importance of further exploring the social media fan engagement phenomenon as a strategic perspective (Tafesse and Wien, 2018 ) and the added value that social media can generate in sports. In this sense, it is relevant for sports managers to know which techniques, methodologies and perspectives to use. Furthermore, as stated by Abeza and Sanderson ( 2022 ), it is necessary to go deeper into the theories behind its use. Taking these aspects into account, this work presents a new instrument of observation and measurement of social media posts by football organisations, as a basis for understanding and deepening the knowledge about the digital audience and its impact on the different objectives of the organisation. Thus, the study draws on relationship marketing theory to better understand how sports managers can make the most of the possibilities offered by social media to generate added value from the interaction between fans and football clubs. Particularly, the developed instrument focuses on the analysis of the type of content published by football clubs, categorising it into dimensions, as well as the engagement of the different publications according to the type of dimension to which they belong.

With a view to the implementation of the instrument, and to contribute to the literature related to the use of social media as a marketing tool in sports, this study analyses Facebook, Twitter and Instagram posts issued by elite football clubs from Europe, South America and North America, using a practical approach to content categorisation and taking the engagement factor as a key element for comparison.

Methodology

This study adopts an exploratory, descriptive, and comparative research design (Andrew et al., 2011 ) using the observational method and content analysis techniques. Content analysis involves the recounting and comparison of content, followed by the interpretation of the underlying context. It has been widely used in social media communication research, specifically in sports settings (e.g., Anagnostopoulos et al., 2018 ; Wang and Zhou, 2015 ; Winand et al., 2019 ), to interpret textual data through systematic classification, coding, and identifying themes or patterns (Hsieh and Shannon, 2005 ). First, exploratory studies are particularly useful when the phenomenon under investigation is in constant evolution (such as social media as a marketing tool), as well as when there are several factors and variables at play (Andrew et al., 2011 ). In this study, these are linked to the engagement that can be caused by the type of content or format used by elite football clubs on their social media accounts. Second, the descriptive aspect of the research design aims to describe and quantify the engagement levels in social media for the selected football clubs. By Collecting and analysing quantitative data on the interaction metrics, including likes, comments, shares, and follower counts, the study provided a comprehensive overview of the current state of engagement, and other variables, among the clubs, helping to build a foundation for further analysis and comparison. Lastly, the comparative aspect of the research design (Andrew et al., 2011 ) is valuable in this study because it enables a cross-regional analysis of three of the most traditional social media platforms. The study compared the engagement practices, elements, and strategies across three key regions of the football industry worldwide. Understanding potential differences can be useful for sports managers to design more optimised social media marketing strategies.

Considering the study design and observational method applied in this research (Anguera-Argilaga et al., 2011 ), a nonprobable sample design (see Battaglia, 2008 ) was established following several steps to make the following three decisions: (1) selection of football clubs, (2) social media platforms, and (3) period of time studied.

First, a geographical criterion was used to determine the origin of the football clubs under study. This criterion was based on a comprehensive and global perspective, considering factors such as historical significance, popularity, sporting achievements, and the modernisation of football worldwide. Based on these considerations, three regions were selected for analysis: Europe and South America, renowned for their broad global relevance and football tradition (e.g., the winning national teams of the 22 editions of the FIFA World Cup so far are from Europe and South America [Venkat, 2023 ]). Next, North America was chosen for its ascending market growth potential and global efforts to promote football. This is exemplified by upcoming milestones, such as the organisation of the FIFA World Cup 2026 in the United States, Mexico, and Canada, as well as the recent arrival of Lionel Messi into Major League Soccer (see Mizrahi, 2023 ). These three regions are governed by the three most influential regional football bodies of FIFA: Europe (UEFA), South America (CONMEBOL), and North America (CONCACAF). Second, to select the most relevant football clubs in these three regions, we followed some of the selection criteria set in similar studies (e.g., Anagnostopoulos et al., 2018 ; Maderer et al., 2018 ). Therefore, the rankings of four of the most influential football organisations or websites were considered: (1) the International Federation of Football History and Statistics (IFFHS) club ranking, (2) the Football World Rankings website, (3) the FIFA club and league ranking, and (4) the Transfermarkt player ranking website (of great relevance in the player transfer market). As a result of this process, 24 teams were pre-selected (9 from Europe, 9 from South America and 6 from North America) according to the objectives and the study design and the author’s agreement (Andrew et al., 2011 ; Anguera-Argilaga et al., 2011 ; Battaglia, 2008 ; Hernández-Sampieri et al., 2014 ). Finally, a random draw was made resulting in a selection of six teams from Europe, six from South America and four from North America (with a limit of two teams per league). This process resulted in the 16 teams whose use of social media is analysed in this study (see Table 1 ).

Following, social media to be analysed in the study were selected. It was noted in the literature that Facebook had been one of the first social media to be used by football clubs and other sports organisations, either to connect with fans or purely for informational purposes (Achen, 2019 ; Waters et al., 2009 ). Twitter and Instagram are also platforms that have become relevant, not only for marketers in sports but also in other sectors (Anagnostopoulos et al., 2018 ; Wang and Zhou, 2015 ). Although the use of Facebook, Twitter and Instagram as marketing tools for football clubs has been studied (e.g., Machado et al. 2020 ; Maderer et al. 2018 ; Nisar et al., 2018 ), there is a lack of literature comparing their potential engagement across a sample of teams from different geographic regions. Thus, it was deemed appropriate to select these three social media sources for our study.

Finally, the periods over which the publications were to be extracted were determined. Among other authors, Ashley and Tuten ( 2015 ) point out that, in a social media environment, two to four weeks are sufficient for a wide variety of posts to be made in a regular and cyclical context, excluding exceptional milestones or events that could have an extraordinary impact on engagement and that could bias regular reading. Therefore, 45 days for each club and each social media is set as an appropriate observation period.

Once the sample selection criteria had been defined, the links of all publications from the clubs selected in the study on the three social media were extracted through the Fanpage Karma software that allows data to be collected and interpreted (Lozano-Blasco et al., 2021 ). After prior data analysis, the final sample consisted of 19,745 publications, a very similar figure to that used in other related studies (e.g., Maderer et al., 2018 ; Yan et al., 2019 ).

Instrument and research procedure

Based on the review of the techniques and methodologies used to analyse the use of social media as a marketing tool for football clubs in previous studies, we proceeded to design and develop an observation and data collection instrument in a Microsoft Excel Spreadsheet (.xlsx format), taking as a starting point the model of content analysis proposed by Solanellas et al. ( 2022 ). Due to the nature of the study, the .xlsx data collection format was chosen for its flexibility, allowing for manual data collection and the application of the categorisation tool post-by-post. This format has been successfully used as a data collection tool in previous social media content analysis studies in football (e.g., López-Carril and Anagnostopoulos, 2020 ).

To ensure its rigour, the codebook was subsequently submitted for review to nine field experts. The selection of these experts was undertaken via judgmental nonprobability sampling, a method commonly employed in the literature due to the specialised and ever-evolving nature of the subject (Andrew et al., 2011 ). These individuals were chosen based on specific criteria, encompassing their professional roles in specialised, coordinating, managerial, or directorial positions tied to the digital domain. Moreover, their academic background, particularly in marketing, methodology, or digital tools, was considered. To ensure an extensive grasp of the subject matter, the chosen experts were required to have a minimum of five years of experience in the area and to be actively participating in their respective roles. This approach aimed to incorporate diverse viewpoints, offering insights from a spectrum of angles relevant to this research. As a result, the panel of experts was comprised of the following professionals: the Head of Digital from a prominent European professional football league (1), a Marketing Manager and an International Communications Manager from leading professional football clubs (2), Directors of digital marketing and branding agencies (2), professors specialising in marketing and sports management at Spanish universities (2), and the Vice-President of Sales along with the Head of Digital from sports business intelligence consultancies (2).

Semi-structured interviews were undertaken with these chosen experts to delve into pertinent aspects linked to the study. An interview guide was developed, following the methodological aspects indicated in specialised works in this field (see Andrew et al., 2011 ; Anguera-Argilaga et al., 2011 ). Furthermore, the interview guide encompassed critical aspects of social media management and relevant facets of football club management (e.g., post formats, observation timeframes, platforms for capturing and analysing social media posts), drawing upon the elements and variables derived from studies conducted by Parganas and Anagnostopoulos ( 2015 ) as well as Solanellas et al. ( 2022 ). Additionally, these interviews comprised discussions about the conception and execution of the observation tool, which was employed as a supplementary instrument for data collection. Further variables relevant to the research objectives were explored within these interviews.

The qualitative insights garnered from the experts’ conclusive remarks offered valuable suggestions that contributed to refining the study’s development and enhancing the observation tool. This iterative approach ensured the harmonisation of the tool with the research objectives and its effective alignment with the study’s research questions. After incorporating the modifications suggested in the experts’ evaluations, the study’s codebook adhered to the variables and categories illustrated in Table 2 .

The .xlsx instrument sheet was then pilot-tested. Seventy-five publications (25 from Facebook, 25 from Instagram and 25 from Twitter) from three different football clubs were randomly selected, conforming to a total sample of 225 publications. The data were collected in an observation sheet in .xlxs format for analysis purposes. During the analysis process, including the discussion of possible discrepancies in interpreting each publication as belonging to one or another of the dimensions of the study’s codebook, the authors decided that each publication would be classified only in one dimension, depending on the type of content that predominates in each post.

To measure the level of reliability and accuracy of the instrument (Andrew et al., 2011 ), the intra-observer reliability method was applied, incorporating 10–12 minute breaks every 40–45 min of observation. After 15 days, the same publications were re-coded using the same established protocol. The results of the coding provided a Kappa coefficient of 0.949, demonstrating a very high level of agreement and reliability, following the scale of Landis and Koch ( 1977 ).

To measure the reliability and accuracy of the instrument (Andrew et al. 2011 ), the intra-observer reliability method was applied. In the first stage, the data was collected and coded post-by-post by applying the xlsx. sheet, incorporating 10–12 minute breaks every 40–45 min of observation to ensure the quality of the data observed and collected. The same posts were re-coded using the same established protocol in the second stage. To ensure a more accurate application of the codebook and to avoid potential bias, a 15-day impasse was established between the two data collections. The coding results between the two stages provided a Kappa coefficient of 0.949, demonstrating a very high level of agreement and reliability, following the scale of Landis and Koch ( 1977 ).

Finally, based on the interaction data collected with the data collection instrument, the variable of engagement with the publications was calculated by adapting the formulas used by the Fanpage Karma ( 2022 ) and Rival IQ (Feehan, 2023 ) platforms (Fig. 1 ).

figure 1

Adapted from Fanpage Karma ( 2022 ) and Rival IQ (Feehan, 2023 ) platforms.

Therefore, after the protocol and the .xlsx observation instrument sheet were tested and validated, the final procedure was established as follows: (a) social media posts from Facebook, Twitter and Instagram of the selected football clubs were extracted automatically using the FanPage Karma license and added to the .xlsx observation instrument sheet; (b) according to the Study Codebook (see Table 2 ) the data was collected and registered manually into the .xlsx observation instrument sheet by clicking the posts one by one; c) we proceeded to set up a database coding the variables from the data collected to perform the statistical analyses.

Data analysis

A descriptive analysis of the engagement generated by publications on social media and their content (dimensions and formats) on Facebook, Instagram and Twitter was carried out. To analyse the differences in engagement generated by the posts on each social media according to their content, we used the t-test for independent samples and the one-factor ANOVA. The significance value established is <0.05. A chi-square test and correspondence analysis were applied to identify and visualise points of association between the key variables. Data analysis was performed using the SPSS statistical package, version 27.0.

As shown in Table 3 , of the 19,745 posts observed and analysed, Twitter accounted for 64%, followed by Facebook at 22% and Instagram at 14%. However, from the point of view of engagement, Instagram reflects an average of 1.873, well above the other social media. Facebook follows it with 0.112 and Twitter with 0.045, showing an inverse behaviour to the number of posts made.

Frequency and engagement

In Fig. 2 , we can observe the strategy used by each club in terms of the frequency of posts on Facebook, Twitter and Instagram, as well as the levels of engagement obtained. On Facebook, the football clubs analysed posts at different frequencies. In Europe, we observe that the clubs with the highest frequency of posts are Liverpool FC and Manchester United FC, with n  = 445 and n  = 486, respectively. In contrast, the Spanish clubs (Real Madrid FC and FC Barcelona) have the lowest frequency of posts ( n  = 195 and n  = 118, respectively). On the other hand, beyond this difference in frequency, they have very similar engagement ratios.

figure 2

Frequency of posts and level of engagement generated on Facebook, Twitter and Instagram by the football clubs selected for this study (organised by regions).

The club with the highest frequency of publications is CR Flamengo from Brazil ( n  = 644); however, SE Palmeiras, the other Brazilian club studied, despite registering fewer publications in the same period ( n  = 289), shows much higher levels of engagement. SE Palmeiras (Brazil), Club Olimpia and Club Cerro Porteño (Paraguay), CF America (Mexico) and Atlanta United FC (USA) show the highest levels of engagement, with similar posting frequencies (between n  = 142 and n  = 241). On Twitter, the highest frequencies of posts were published compared to Facebook and Instagram, with CR Flamengo and Atlanta United FC being the clubs that posted the most ( n  = 1606 and n  = 2096, respectively). However, the levels of engagement identified show similar and homogeneous levels in the period analysed, regardless of the frequency of publications. On the other hand, the highest engagement levels were observed on Instagram, with a lower frequency of publications in all cases. Football clubs SE Palmeiras, CA River Plate, CF America and Atlanta United FC have the highest engagement values (2.5 and 3), with posting frequencies ranging from n  = 91 to n  = 154. European football clubs have very similar engagement ratios (around 1.00), while North American football clubs have different engagement values despite having similar posting frequencies ( n  = 91 and n  = 154).

Content dimensions of publications

As shown in Fig. 3 , we observe the dimensions proposed in this study, comparing the social media analysed and the engagement generated by each category. From this point of view, in terms of frequency, the “Marketing” and “Sport” dimensions are observed as the most used publication approaches by football clubs, followed by the “Institutional” dimension, “Commercial” and, finally, “ESG”. This order of frequency applies to Facebook, Twitter and Instagram.

figure 3

Categorisation in the posts’ dimensions and their relationship with the engagement generated by Facebook, Twitter and Instagram of the football clubs analysed.

In terms of engagement, the social media Instagram is the one that registers considerably higher values than the rest of the social media analysed, with the “Marketing” dimension generating the highest engagement (2.03). It is followed by the “Institutional” dimension (1.78) and the “Sports” dimension (1.74), closing with the “Commercial” and “ESG” dimensions, with values of 1.54 and 1.41, respectively. Facebook is the following social media that generates the highest engagement.

In the case of Facebook (see Supplementary Table S1 ), the findings show a significance of the engagement means between the “Commercial” and the “Sports” ( p  = 0.000 < 0.05), “Institutional” ( p  = 0.001 < 0.05) and “Marketing” type of the posts in Facebook.

On the other hand, Twitter (see Supplementary Table S2 ) is the one that generates the minor engagement, with very similar values between the different dimensions, despite being the one with the highest frequency of publications (Fig. 3 ). Unlike the previous dimensions, the “Institutional”, “ESG”, and “Commercial” dimensions are those with the highest engagement values (0.07), followed by the “Marketing” and “Sports” dimensions (both with 0.04). However, in this social media platform, the “Institutional” type of content is statistically significant with “Sports” ( p  = 0.000 < 0.05), “Commercial” ( p  = 0.000 < 0.05) and “Marketing” ( p  = 0.000 < 0.05). Also, we can find significant engagement results between the “ESG” and the “Commercial” ( p  = 0.033 < 0.05) dimensions.

On Instagram (see Supplementary Table S3 ), the “Marketing” dimension has the highest engagement value, as does the “Institutional” dimension (both with 0.12). It is followed by the “Sports” dimension (0.11), “ESG” (0.10) and finally, “Commercial” (0.07) (Fig. 3 ). Nevertheless, as difference of Facebook and Twitter, the findings show a strong relevance of “Marketing” dimensions posts (Supplementary Table S3 ), linked significantly with “Sports” ( p  = 0.000 < 0.05), “Commercial” ( p  = 0.000 < 0.05) and “Institutional” ( p  = 0.002 < 0.05).

Types of formats in publications

Nine combinations of the most relevant formats have been identified in the publications analysed (Table 4 ), both in the frequency of use and engagement they generate.

On Facebook, the most frequent formats are “Text/Image” and “Text/Video” ( n  = 2031 and n  = 1265, respectively). However, the format with the highest engagement is “Image” (0.23), followed by “Text/Image” (0.13), “Text/Video” (0.12) and “Text/Link” (0.07). On Twitter, on the other hand, the “Text/Image” format is the most used ( n  = 4412), “Text” ( n  = 2499), “Text/Video” ( n  = 2239) and “Image” ( n  = 1534), with the “Text/Video” and “Text/Image” format combinations (0.07) registering the highest engagement. On Instagram, due to the nature of social media, the most frequent format is “Text/Image” ( n  = 1986). In terms of engagement, the formats “Image” (2.20), “Text/Image” (1.95), “Text/Image/Polls” (1.93) and “Video” (1.84) have the highest values.

The correspondence analysis (Fig. 4 ) shows the degree of association between the variables and the categorisation dimensions proposed in this study in a relative position map. The chi-squared test yielded a result of 1027.65. The “Marketing” dimension shows a closer relationship with the “video” and “image” format resources. The “ESG” and “Institutional” content type shows an association with the “Image” and “Text” formats. The “Commercial” dimension, based on the characteristics of the categorisation, shows a relationship with the “Link” format as ideal points of association, considering the frequency and engagement analysed.

figure 4

Correspondence analysis (dimensions and formats).

Nowadays, sports organisations and athletes use social media for communication purposes, brand positioning, visibility (Maderer et al., 2018 ; Winand et al., 2019 ; Zakerian et al., 2022 ) and even for potential business (Parganas and Anagnostopoulos, 2015 ), dedicating effort and resources. Previous studies reinforce the need to categorise the message delivered to understand this phenomenon according to the objective (Filo et al., 2015 ) and content analysis for effect (Meng et al., 2015 ). However, its optimal use still leaves many questions. The complexity of the market is evolving towards the need to understand the fan as a premise in a sector characterised by its high emotional charge. In the past, strategies focused on attracting and retaining fans. However, the current trend shows increased relevance in generating engagement (Oviedo et al., 2014 ) to generate links with fans. The sports industry, especially in the digital environment, is in an era where the goal is not just getting new followers and post social media content but interact and engage “to know the users better”.

First, this study provides evidence of relevant frequency-engagement relationships according to the dimensions of the study, depending on the type of social media used (Facebook, Twitter and Instagram). Regarding the dimensions of the content published, the posts related to “Marketing” and “Sport” are the most frequent due to the natural and traditional use of these tools as communicative, brand positioning and informative elements (Lee and Kahle, 2016 ; Rehman et al., 2022 ; Winand et al., 2019 ). This is attributable to the need for clubs to generate emotional content (such as videos or images of past iconic matches or campaigns involving athletes), on the one hand, and to broadcast messages alluding to sporting performance and results. Nevertheless, the findings show different engagement impacts not directly linked to the frequency of the posts but influenced by other elements, such as the social media platform, the dimension of the content and the format. The evidence shows there are specific content dimensions that statistically generate more engagement in each platform.

On Facebook, the most traditional platform football clubs use provides a more balanced frequency-engagement ratio, with a strong engagement with “commercial” content. This platform was one of the social media platforms that started monetising in other industries, characterised for its high brand impact, where the know-how and the platform interphase are more friendly to focus on this type of posts (and in some cases, to launch joint posts with brands). Even with the positive engagement impact of this platform, it is observed that efforts of this nature in the digital sphere are scarce in comparison to the rest, making this a relevant aspect in the spectrum of growth and an opportunity to explore, especially with the new assets that are appearing in the market and the growth of e-commerce.

On Twitter, on the other hand, the dimension that works best for engaging in “Institutional” is linked to “Sports”, “Marketing” and “Commercial” content, but not with “ESG”. However, the “ESG” linked with “Commercial” dimensions statically gets significantly more impact on this platform. The “ESG” dimension is emerging as this platform is used for promoting socio-political activities and promoting more altruistic purposes as previous authors as López-Carril and Anagnostopoulos ( 2020 ), and Sharpe et al. ( 2020 ) noted. This strategy shows a possible intention to use social media not only for marketing (communication) or sporting purposes but also as an element with socio-political aspects. The nature of Twitter as a microblogging site with the highest number of posts with the lower means of engagement, is more attractive for the audience looking for quick and summarised information because of its ability to increase the visibility and awareness of fans (Abeza et al., 2017 ). Sports managers can focus on this type of message for a potential higher engagement on Twitter.

In contrast, on Instagram, the focus is on “Marketing” content. This platform shows the lowest number of post frequency, with a high engagement means, attributable to the platform’s audio–visual formats and more interactive content, ratifying its growing popularity among users. As a fast-growing platform, there is a major link with “Sports”, “Institutional” and “Commercial” dimensions, which makes it an ideal platform for emotional content, easy to connect with brands, athletes, and sports properties, counting with a larger and more varied audience looking mainly, as the evidence suggests, for entertainment and club’s closeness perception. Therefore, like Anagnostopoulos et al. ( 2018 ), we recommend sports managers use Instagram for marketing purposes, considering the context as a relevant factor.

Finally, this study reveals the post format’s relevance as another key element. In this sense, on Facebook, the highest engagement values are generated by “Image” and “Text/Image” formats, as on Instagram and Twitter; however, in each social media platform, the frequencies generated by these records are different. In any case, the power of the image as valuable content in marketing stands out, as it has also been highlighted in previous studies (e.g., Anagnostopoulos et al., 2018 ; Doyle et al., 2022 ; Machado et al., 2020 ). Nevertheless, the results obtained regarding the engagement triggered by video format posts on Facebook, Twitter and Instagram are not as conclusive, as other studies have pointed out (e.g., Su et al., 2020 ). Probably because these social media are not focused on that format as other social media such as TikTok or YouTube may be. Regardless, based on the results obtained, it is necessary for sports managers and academics to continue to explore and make the appropriate combinations of the dimensions of content type categorised in this study, the publication format, as well as the social media used to channel them.

Theoretical implications

Built upon the framework of relationship marketing, this study brings theoretical value to the realms of sports marketing, sports management, and fan engagement, spanning across four distinct lines of action.

Firstly, the research introduces a novel theoretical approach to social media strategies by employing a 5-dimensional content categorisation system aligned with the strategic pillars of football organisations. Previous studies have predominantly approached the role of social media in sports reactively, primarily focusing on communication and branding aspects. In contrast, this study contributes to the literature by adopting a strategic perspective towards social media, establishing a linkage between the study dimensions and football club strategies. This foundation paves the way for future research to delve deeper into each proposed dimension, potentially identifying sub-groups and exploring them in greater detail. The proposed dimensions serve to systematically organise the primary facets of football organisations for digital context analysis, a realm of increasing importance within the sports industry. As such, this work marks a pioneering step towards a novel approach in this area of study.

Secondly, this study establishes a fresh frequency-engagement approach for social network management, dispelling the notion that post frequency directly correlates with generated engagement. In doing so, this work highlights additional pivotal factors beyond post frequency that influence engagement among users of football-related social media. This perspective is aligned with the ethos of Web 2.0, underscoring the significance of engaging and connecting with fans.

Thirdly, from a theoretical perspective, this study introduces an innovative analytical proposition focusing on prominent international football clubs. This innovation is realised through the calculation and translation of engagement ratios, facilitating cross-entity comparisons independent of geographical location and follower count. The instrument developed and applied in this study acts as a tool to identify valuable digital practices within the industry.

Finally, this study stands out by conducting simultaneous analyses of posts across three prominent social media platforms (Facebook, Twitter, and Instagram), adopting a distinctive multi-platform approach that is seldom observed in comparable studies which often focus on a single social media platform. Gaining insights into the effects of cross-platform and cross-format postings can empower sports managers to make strategic decisions with a comprehensive perspective.

Practical implications

This study introduces a novel practical tool designed for the computation of fan engagement across the Facebook, Twitter, and Instagram accounts of football clubs globally. Consequently, sports managers can employ this instrument to gain a more realistic comprehension of the performance of social media accounts belonging to clubs. Furthermore, the developed tool facilitates the assessment of fan engagement in relation to the content type being published. This capability can aid sports managers in fortifying the bond between clubs and their followers by generating heightened value through strategic social media initiatives.

It is important to note that sports managers should consider both internal factors (club tradition, organisational culture) and external factors (competition, fan behaviour, sports results) within the context of clubs. This consideration is essential for developing and planning optimal digital strategies and for generating the best possible engagement with the audience. This research furnishes empirical evidence for understanding, in a practical and actionable manner, the pivotal components of a social media post. This understanding permits the visualisation of optimal combinations of these elements, thereby increasing the likelihood of sports managers guiding the club toward success and fostering substantial user engagement. Therefore, football team managers can apply the findings of this study to plan, monitor, and evaluate the club’s social media content for increased engagement and “closeness” with digital fans. They can combine various formats based on individual post requirements to achieve the desired results. Additionally, football team managers can analyse club identity and overall strategies more practically and coherently, facilitating the planning and execution of more effective commercial, brand positioning, institutional, and other relevant digital goals, with engagement serving as a key metric.

Conclusions

Social media plays a key role in today’s sports management, especially in football clubs, due to its global reach and ability to interact and connect with fans in an industry of great popularity, emotional charge, and economic, political and social impact. This exploratory research grounded in relationship marketing theory provided a comparison of the engagement generated by elite football clubs under a unique categorisation proposal, derived and adapted from existing literature, which addresses dimensions linked to strategic areas of football organisations and takes into consideration key elements such as frequency and format combinations used to analyse the efficiency of posts on Facebook, Twitter and Instagram.

Based on the results obtained, three lines of action stand out. First, concerning the type of content of the post, the “Marketing” and “Sports” dimensions are the preferred categories for football clubs in terms of post frequency. Regarding the engagement rates, on Facebook, the “Commercial” dimension shows an opportunity for growth and development due to the good engagement impact and due to the technological boom and the emergence of new digital assets. On Twitter, the emerging “ESG” linked to “Commercial” perspective and the “Institutional” dimension gets a significant impact on Twitter. On Instagram, the “Marketing” dimension linked to “Sports”, “Institutional” and “Commercial”, makes this platform ideal for emotional and marketing purposes. Second, concerning social media sources, this study provides evidence that Instagram is the social media that generates the most engagement using the lowest frequency of posts, followed by Facebook and Twitter. There is no direct evidence that links the post’s frequency with the engagement generated. Finally, concerning the type of format of the post, the combination of formats that generates the most engagement in all cases is “Image”, “Text/Image”, and “Text/Video”.

In short, this research stimulates a practical reflection for professionals and academics on the exploration, analysis, and evaluation of the management of social media in football clubs, using the observation method and content analysis techniques, applying elements of reliability and scientific rigour. The results obtained in this study offer practical and managerial implications in sports management, fan engagement, digital marketing, and social media, among others, through a proposal for categorisation and unique variables, taking engagement and its influence within the context of analysis as the axis.

The above conclusions should be taken into consideration viewing a series of limitations of the study. Firstly, the sample is limited to one sport (football) and not a large number of football clubs from different regions of the world. Secondly, despite the high number of posts analysed, these are located over a short period of time, and it may be relevant to analyse the engagement of posts at different times of the season, as these can influence the type of content and the engagement of fans with the posts. Thirdly, the study is limited to analysing engagement on Facebook, Twitter and Instagram, leaving aside the analysis of the possibilities that other booming social media, such as TikTok or Twitch, are having in the field of marketing. Nevertheless, these limitations can be a starting point for future research lines including, among others: (a) to assess the application and feasibility of the technique for measuring social media engagement included in this work in other football organisations (e.g. leagues) or social media platforms (e.g., TikTok, Twitch); (b) to incorporate new variables of study (e.g., size of the social mass of sports clubs, financial budget, trophies won); (c) to conduct the study considering different phases of the sports season (e.g.; preseason, season, playoffs; postseason); (d) to analyse fan engagement relation of geographical regions to understand the digital user’s behaviours; (e) to conduct the study adding engagement prediction models in social media; and (f) to incorporate this model on an AI language to suggest and predict digital user engagement in a simulated context.

Data availability

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

Abeza G, O’Reilly N, Finch D, Séguin B, Nadeau J (2020) The role of social media in the co-creation of value in relationship marketing: a multi-domain study. J Strateg Mark 28(6):472–493. https://doi.org/10.1080/0965254X.2018.1540496

Article   Google Scholar  

Abeza G, O’Reilly N, Seguin B (2019) Social media in relationship marketing: the perspective of professional sport managers in the MLB, NBA, NFL, and NHL. Commun Sport 7(1):80–109. https://doi.org/10.1177/2167479517740343

Abeza G, O’Reilly N, Seguin B, Nzindukiyimana O (2017) Social media as a relationship marketing tool in professional sport: a netnographical exploration. Int J Sport Commun 10(3):325–358. https://doi.org/10.1123/ijsc.2017-0041

Abeza G, Sanderson J (2022) Theory and social media in sport studies. Int J Sport Commun 15(4):284–292. https://doi.org/10.1123/ijsc.2022-0108

Achen RM (2019) Re-examining a model for measuring Facebook interaction and relationship quality. Sport Bus Manag 9(3):255–272. https://doi.org/10.1108/SBM-10-2018-0082

Anagnostopoulos C, Parganas P, Chadwick S, Fenton A (2018) Branding in pictures: using Instagram as a brand management tool in professional team sport organisations. Eur Sport Manag Q 18(4):413–438. https://doi.org/10.1080/16184742.2017.1410202

Andrew D, Pedersen P, McEvoy C (2011) Research methods and design in sport management. Human Kinetics

Anguera-Argilaga MT, Blanco-Villaseñor A, Hernández-Mendo A, Losada JL (2011) Observational designs: their suitability and application in sports psychology. Cuad Psicol Deporte 11:63–76

Google Scholar  

Ashley C, Tuten T (2015) Creative strategies in social media marketing: an exploratory study of branded social content and consumer engagement. Psychol Mark 32(1):15–27. https://doi.org/10.1002/mar.20761

Battaglia MP (2008) Nonprobability sampling. In: Lavrakas PJ (ed) Encyclopedia of survey research methods. SAGE Publications, pp. 523–526

Blaszka M, Burch L, Frederick E, Clavio G, Walsh P (2012) #WorldSeries: an empirical examination of a Twitter hashtags during a major sport event. Int J Sport Commun 5:435–453. https://doi.org/10.1123/ijsc.5.4.435

Bucher B, Eckl J (2022) Football’s contribution to international order: the ludic and festive reproduction of international society by world societal actors. Int Theory 14(2):311–337. https://doi.org/10.1017/S1752971920000676

Chen Y (2023) Comparing content marketing strategies of digital brands using machine learning. Humanit Soc Sci Commun 10(1):57. https://doi.org/10.1057/s41599-023-01544-x

Cuesta-Valiño P, Gutiérrez-Rodríguez P, Loranca-Valle C (2022) Sponsorship image and value creation in E-sports. J Bus Res 145:198–209. https://doi.org/10.1016/j.jbusres.2022.02.084

Cuesta-Valiño P, Gutiérrez-Rodríguez P, Loranca-Valle C (2021) Sustainable management of sports federations: the indirect effects of perceived service on member’s loyalty. Sustainability 13(2):458. https://doi.org/10.3390/su13020458

Davis J, Hilbert JZ (2013) Sports marketing—creating long term value. Edward Elgar Publishing

Doyle JP, Su Y, Kunkel T (2022) Athlete branding via social media: examining the factors influencing consumer engagement on Instagram. Eur Sport Manag Q 22(4):506–526. https://doi.org/10.1080/16184742.2020.1806897

Einsle C-S, Escalera-Izquierdo G, García-Fernández J (2023) Social media hook sports events: a systematic review of engagement. Commun Soc 36(3):133–151. https://doi.org/10.15581/003.36.3.133-151

Fanpage Karma (2022) Metrics overview. https://academy.fanpagekarma.com/en/metrics/

Feehan B (2023) Social media industry benchmark report. Rival IQ. https://www.rivaliq.com/blog/social-media-industry-benchmark-report/

Filo K, Lock D, Karg A (2015) Sport and social media research: a review. Sport Manag Rev 18(2):166–181. https://doi.org/10.1016/j.smr.2014.11.001

Fried G, Mumcu C (2017) Sport analytics: a data-driven approach to sport business and management. Routledge

García-Fernández J, Elasri A, Pérez-Tur F, Triadó-Ivern X, Herrera-Torres L, Aparicio-Chueca P (2017) Social networks in fitness centres: the impact of fan engagement on annual turnover. J Phys Educ Sport 17:1068–1077. https://doi.org/10.7752/jpes.2017.03164

García-Fernández J, Fernández-Gavira J, Durán-Muñoz J, Vélez-Colon L (2015) La actividad en las redes sociales: un estudio de caso en la industria del fitness. Retos 28:44–48. https://doi.org/10.47197/retos.v0i28.34839

Hambrick M, Simmons J, Greenhalgh G, Greenwell C (2010) Understanding professional athletes’ use of Twitter. Int J Sport Commun 3(4):454–471. https://doi.org/10.1123/ijsc.3.4.454

Hernández-Sampieri R, Fernández-Collado C, Baptista-Lucio P (2014) Metodología de la investigación, 6th edn. McGraw-Hill

Herrera-Torres L, Pérez-Tur F, García-Fernández J, Fernández-Gavira J (2017) El uso de las redes sociales y el engagement de los clubes de la Liga Endesa ACB. Cuad Psicol Deporte 17(3):175–182

Hsieh H-F, Shannon ES (2005) Three approaches to qualitative content analysis. Nord J Digit Lit 15(9):1147–1288. https://doi.org/10.1177/1049732305276687

Hull K, Abeza G (2021) Introduction to social media in sport. In: Abeza G, O’Reilly N, Sanderson J, Frederick E (eds) Emerging issues and trends in sport business: vol. 2. Social media in sport theory and practice. World Scientific Publishing Company, pp. 1–28

Kim S, Andrew DPS (2016) Understanding sport organizations: the application of organization theory (2nd Edition). J Sport Manag 21(3):455–457. https://doi.org/10.1123/jsm.21.3.455

Kümpel AS, Karnowski V, Keyling T (2015) News sharing in social media: a review of current research on news sharing users, content, and networks. Soc Media Soc 1(2):1–14. https://doi.org/10.1177/2056305115610141

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174. https://doi.org/10.2307/2529310

Article   CAS   MATH   PubMed   Google Scholar  

Lee C, Kahle L (2016) The linguistics of social media: communication of emotions and values in sport. Sport Mark Q 25(4):201–211

López-Carril S, Anagnostopoulos C (2020) COVID-19 and soccer teams on Instagram: the case of corporate social responsibility. Int J Sport Commun 13(3):447–457. https://doi.org/10.1123/ijsc.2020-0230

Loranca-Valle C, Cuesta-Valiño P, Núnez-Barriopedro E, Gutiérrez-Rodríguez P (2021) Management of loyalty and its main antecedents in sport organizations: a systematic analysis review. Front Psychol 12:783781. https://doi.org/10.3389/fpsyg.2021.783781

Article   PubMed   PubMed Central   Google Scholar  

Lovejoy K, Saxton GD (2012) Information, community, and action: how nonprofit organizations use social media. J Comput-Mediat Commun 17(3):337–353. https://doi.org/10.1111/j.1083-6101.2012.01576.x

Lozano-Blasco R, Quílez-Robres A, Delgado-Bujedo D, Latorre-Martínez MP (2021) YouTube’s growth in use among children 0–5 during COVID19: the occidental European case. Technol Soc 66:101648. https://doi.org/10.1016/j.techsoc.2021.101648

Machado JC, Martins CC, Ferreira FC, Silva SC, Duarte PA (2020) Motives to engage with sports brands on Facebook and Instagram—the case of a Portuguese football club. Int J Sports Mark Spons 21(2):325–349. https://doi.org/10.1108/IJSMS-06-2019-0066

Maderer D, Parganas P, Anagnostopoulos C (2018) Brand-image communication through social media: the case of European professional football clubs. Int J Sport Commun 11(3):319–338. https://doi.org/10.1123/ijsc.2018-0086

Malarvizhi CA, Al Mamun A, Jayashree S, Naznen F, Abir T (2022) Modelling the significance of social media marketing activities, brand equity and loyalty to predict consumers’ willingness to pay premium price for portable tech gadgets. Heliyon 8(8):e10145. https://doi.org/10.1016/j.heliyon.2022.e10145

Mastromartino B, Naraine ML (2022) (Dis)innovative digital strategy in professional sport: examining sponsor leveraging through social media. Int J Sports Mark Spons 23(5):934–949. https://doi.org/10.1108/IJSMS-02-2021-0032

Meng MD, Stavros C, Westberg K (2015) Engaging fans through social media: implications for team identification. Sport Bus Manag 5(3):199–217. https://doi.org/10.1108/SBM-06-2013-0013

Mizrahi I (2023) The Messi effect—how one single player will impact soccer in America. Forbes. https://www.forbes.com/sites/isaacmizrahi/2023/06/20/the-messi-effect--how-one-single-player-will-impact-soccer-in-america/?sh=45e4bda86ecf

Möller K, Halinen A (2000) Relationship marketing theory: its roots and direction. J Mark Manag 16(1–3):29–54. https://doi.org/10.1362/026725700785100460

Nisar TM, Prabhakar G, Patil PP (2018) Sports clubs’ use of social media to increase spectator interest. Int J Inf Manag 43:188–195. https://doi.org/10.1016/j.ijinfomgt.2018.08.003

Núñez-Barriopedro E, Cuesta-Valiño P, Gutiérrez-Rodríguez P, Ravina-Ripoll R (2021) How does happiness influence the loyalty of karate athletes? A model of structural equations from the constructs: consumer satisfaction, engagement, and meaningful. Front Psychol 12:653034. https://doi.org/10.3389/fpsyg.2021.653034

O’Shea M, Alonso AD (2011) Opportunity or obstacle? A preliminary study of professional sport organisations in the age of social media. Int J Sport Manag Mark 10(3-4):196–212. https://doi.org/10.1504/IJSMM.2011.044790

Oviedo MÁ, Muñoz M, Castellanos M, Sancho-Mejías M (2014) Metric proposal for customer engagement in Facebook. J Res Interact Mark 8(4):327–344. https://doi.org/10.1108/JRIM-05-2014-0028

Parganas P, Anagnostopoulos C (2015) Social media strategy in professional football: the case of Liverpool FC. Choregia 11(2):61–75. https://doi.org/10.4127/ch.2015.0102

Pegoraro A (2010) Look who’s talking-athletes on Twitter: a case study. Int J Sport Commun 3(4):501–514. https://doi.org/10.1123/ijsc.3.4.501

Pegoraro A, Scott O, Burch LM (2017) Strategic use of Facebook to build brand awareness. Int J Public Adm Digit Age 4(1):69–87. https://doi.org/10.4018/ijpada.2017010105

Petersen-Wagner R, Ludvigsen JAL (2022) Digital transformations in a platform society: a comparative analysis of European football leagues as YouTube complementors. Convergence 29(5):1330–1351. https://doi.org/10.1177/13548565221132705

PWC (2020) La industria deportiva aporta el 3,3% del PIB español y genera 414.000 puestos de trabajo. https://www.pwc.es/es/sala-prensa/notas-prensa/2020/industria-deportiva-pib-espanol.html

Ramos RF, Rita P, Moro S (2019) From institutional websites to social media and mobile applications: A usability perspective Eur Res Manag Bus. Econ 25:138–143. https://doi.org/10.1016/j.iedeen.2019.07.001

Ratten V (2020) Sport technology: a commentary. J High Technol Manag Res 31(1):100383. https://doi.org/10.1016/j.hitech.2020.100383

Rehman S ul, Gulzar R, Aslam W (2022) Developing the integrated marketing communication (IMC) through social media (SM): the modern marketing communication approach. SAGE Open 12(2) https://doi.org/10.1177/21582440221099936

Rohden SF, Tassinari G, Netto CF (2023) Listen as much as you want: the antecedents of the engagement of podcast consumers. Int J Internet Mark Advert 18(1):82–97. https://doi.org/10.1504/IJIMA.2023.128152

Sanz-Labrador I, Cuerdo-Mir M, Doncel-Pedrera LM (2021) The use of digital educational resources in times of COVID-19. Soc Media Soc 7(3) https://doi.org/10.1177/20563051211049246

Stegmann P, Nagel S, Ströbel T (2021) The digital transformation of value co-creation: a scoping review towards an agenda for sport marketing research. Eur Sport Manag Q 23(3):1221–1248. https://doi.org/10.1080/16184742.2021.1976241

Sharpe S, Mountifield C, Filo K (2020) The social media response from athletes and sport organizations to COVID-19: an altruistic tone. Int J Sport Commun 13(3):474–483. https://doi.org/10.1123/ijsc.2020-0220

Sheldon P, Antony MG, Ware LJ (2021) Baby Boomers’ use of Facebook and Instagram: uses and gratifications theory and contextual age indicators. Heliyon 7(4):e06670. https://doi.org/10.1016/j.heliyon.2021.e06670

Shilbury D, Westerbeek H, Quick S, Funk D, Karg A (2014) Strategic sport marketing. Sport Manag Rev 18(4):627–628. https://doi.org/10.1016/j.smr.2014.09.004

Solanellas F, Muñoz J, Romero-Jara E (2022) Redes sociales y el caso de las ligas deportivas durante el COVID-19. Movimento 28:e28049. https://doi.org/10.22456/1982-8918.123802

Su Y, Baker BJ, Doyle JP, Yan M (2020) Fan engagement in 15 seconds: Athletes’ relationship marketing during a pandemic via TikTok. Int J Sport Commun 13(3):436–446. https://doi.org/10.1123/ijsc.2020-0238

Su Y, Du J, Biscaia R, Inoue Y (2022) We are in this together: sport brand involvement and fans’ well-being. Eur Sport Manag Q 22(1):92–119. https://doi.org/10.1080/16184742.2021.1978519

Tafesse W, Wien A (2018) Implementing social media marketing strategically: an empirical assessment. J Mark Manag 34(9–10):732–749. https://doi.org/10.1080/0267257X.2018.1482365

Venkat R (2023) FIFA World Cup winners: why Brazilians are unique and Germany, Italy relentless—full roll of honour. International Olympic Committee. https://olympics.com/en/news/fifa-world-cup-winners-list-champions-record

Vivar JMF (2009) Nuevos modelos de comunicación, perfiles y tendencias en las redes sociales. Comunicar 16(33):73–81. https://doi.org/10.3916/c33-2009-02-007

Wang Y, Zhou S (2015) How do sports organizations use social media to build relationships? A content analysis of NBA clubs’ Twitter use. Int J Sport Commun 8(2):133–148. https://doi.org/10.1123/ijsc.2014-0083

Waters R, Burnett E, Lamm A, Lucas J (2009) Engaging stakeholders through Social networking: how nonprofit organizations are using Facebook. Public Relat Rev 35(2):102–106. https://doi.org/10.1016/j.pubrev.2009.01.006

Williams J, Chinn SJ (2010) Meeting relationship-marketing goals through social media: a conceptual model for sport marketers. Int J Sport Commun 3(4):422–437. https://doi.org/10.1123/ijsc.3.4.422

Winand M, Belot M, Merten S, Kolyperas D (2019) International sport federations’ social media communication: a content analysis of FIFA’s Twitter account. Int J Sport Commun 12(2):209–233. https://doi.org/10.1123/ijsc.2018-0173

Witkemper C, Lim C, Waldburger A (2012) Social media and sports marketing: examining the motivations and constraints of Twitter users. Sport Mark Q 21(3):170–183

Yan G, Watanabe NM, Shapiro SL, Naraine ML, Hull K (2019) Unfolding the Twitter scene of the 2017 UEFA Champions League Final: social media networks and power dynamics. Eur Sport Manag Q 19(4):419–436. https://doi.org/10.1080/16184742.2018.1517272

Zakerian A, Sarkoohi P, Ghafouri F, Keshkar S (2022) Developing a model for athletes’ personal brands on social networks (case study: Instagram). Int J Sport Manag Mark 22(1–2):142–160. https://doi.org/10.1504/IJSMM.2022.121253

Download references

Acknowledgements

The authors would like to acknowledge the experts who contributed their excellent technical knowledge and valuable inputs to the development of this work and the Fanpage Karma platform for providing the software licence to support this research. Edgar Romero-Jara would like to acknowledge the funding support of the pre-doctoral scholarship “National Academic Excellence Scholarship Programme Carlos Antonio López (BECAL)”, granted by the Government of Paraguay. Samuel López-Carril would like to acknowledge the funding support of the postdoctoral contract “Juan de la Cierva-formación 2021” (FJC2021-0477779-I), granted by the Spanish Ministry of Science and Innovation and by the European Union through the NextGenerationEU Funds (Plan de Recuperación, Transformación y Resilencia).

Author information

Authors and affiliations.

National Institute of Physical Education of Catalonia (INEFC), University of Barcelona (UB), Barcelona, Spain

Edgar Romero-Jara

Grup d’Investigació Social i Educativa de l’Activitat Física i de l’Esport (GISEAFE), National Institute of Physical Education of Catalonia (INEFC), University of Barcelona (UB), Barcelona, Spain

Francesc Solanellas & Joshua Muñoz

Universidad de Castilla–La Mancha (UCLM), Ciudad Real, Spain

Samuel López-Carril

You can also search for this author in PubMed   Google Scholar

Contributions

ER-J (corresponding author) and FS: conception and design of the work. ER-J and JM: analysis and methodology. ER-J and SL-C: literature review, interpretation of data, drafting of the work. FS: supervised this work. All authors made substantial contributions, discussed the results, revised critically for important intellectual content, and approved the final version of the work.

Corresponding author

Correspondence to Edgar Romero-Jara .

Ethics declarations

Ethical statement.

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Romero-Jara, E., Solanellas, F., Muñoz, J. et al. Connecting with fans in the digital age: an exploratory and comparative analysis of social media management in top football clubs. Humanit Soc Sci Commun 10 , 858 (2023). https://doi.org/10.1057/s41599-023-02357-8

Download citation

Received : 18 April 2023

Accepted : 06 November 2023

Published : 21 November 2023

DOI : https://doi.org/10.1057/s41599-023-02357-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

what is content analysis in the research

  • Search Menu
  • Sign in through your institution
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical Literature
  • Classical Reception
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Language Acquisition
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Culture
  • Music and Religion
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Science
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Law and Politics
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Games
  • Computer Security
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business History
  • Business Strategy
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Methodology
  • Economic Systems
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Politics and Law
  • Politics of Development
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Qualitative Research

A newer edition of this book is available.

  • < Previous chapter
  • Next chapter >

18 Content Analysis

Lindsay Prior, School of Sociology, Social Policy, and Social Work, Queen's University

  • Published: 04 August 2014
  • Cite Icon Cite
  • Permissions Icon Permissions

In this chapter, the focus is on ways in which content analysis can be used to investigate and describe interview and textual data. The chapter opens with a contextualization of the method and then proceeds to an examination of the role of content analysis in relation to both quantitative and qualitative modes of social research. Following the introductory sections, four kinds of data are subjected to content analysis. These include data derived from a sample of qualitative interviews (N = 54), textual data derived from a sample of health policy documents (N = 6), data derived from a single interview relating to a “case” of traumatic brain injury, and data gathered from 54 abstracts of academic papers on the topic of “well-being.” Using a distinctive and somewhat novel style of content analysis that calls upon the notion of semantic networks, the chapter shows how the method can be used either independently or in conjunction with other forms of inquiry (including various styles of discourse analysis) to analyze data, and also how it can be used to verify and underpin claims that arise out of analysis. The chapter ends with an overview of the different ways in which the study of “content”—especially the study of document content—can be positioned in social scientific research projects.

What is Content Analysis?

In his 1952 text on the subject of content analysis, Bernard Berelson traces the origins of the method to communication research and then lists what he calls six distinguishing features of the approach. As one might expect, the six defining features reflect the concerns of social science as taught in the 1950s, an age in which the calls for an “objective,” “systematic,” and “quantitative” approach to the study of communication data were first heard. The reference to the field of “communication” was of course nothing less than a reflection of a substantive social scientific interest over the previous decades in what was called public opinion, and specifically attempts to understand why and how a potential of source of critical, rational judgement on political leaders (i.e., the views of the public) could be turned into something to be manipulated by dictators and demagogues. In such a context, it is perhaps not so surprising that in one of the more popular research methods texts of the decade, the terms content analysis and communication analysis are used interchangeably (see Goode & Hatt, 1952 :325).

Academic fashions and interests naturally change with available technology, and these days we are more likely to focus on the individualization of communications through Twitter and the like, rather than of mass newspaper readership or mass radio audiences, yet the prevailing discourse on content analysis has remained much the same as it was in Berleson’s day. Thus Neuendorf (2002 :1), for example, continues to define content analysis as “the systematic, objective, quantitative analysis of message characteristics.” Clearly the centrality of communication as a basis for understanding and using content analysis continues to hold, but in this article I will try to show that, rather than locate the use of content analysis in disembodied “messages” and distantiated “media,” we would do better to focus on the fact that communication is a building block of social life itself and not merely a system of messages that are transmitted—in whatever form—from sender to receiver. To put that statement in another guise, we need to note that communicative action (to use the phraseology of Habermas, 1987 ) rests at the very base of the lifeworld, and one very important way of coming to grips with that world is to study the content of what people say and write in the course of their everyday lives.

My aim is to demonstrate various ways in which content analysis (henceforth CTA) can be used and developed to analyze social scientific data as derived from interviews and documents. It is not my intention to cover the history of CTA or to venture into forms of literary analysis or to demonstrate each and every technique that has ever been deployed by content analysts. (Many of the standard textbooks deal with those kinds of issues much more fully than is possible here. See, for example, Babbie, 2013 ; Berelson, 1952 ; Bryman, 2008 , Krippendorf, 2004 ; Neuendorf, 2002 ; and Weber, 1990 ). Instead I seek to recontextualize the use of the method in a framework of network thinking and to link the use of CTA to specific problems of data analysis. As will become evident, my exposition of the method is grounded in real world problems. Those problems are drawn from my own research projects and tend to reflect my particular academic interests—which are almost entirely related to the analysis of the ways in which people talk and write about aspects of health, illness, and disease. However, lest the reader be deterred from going any further, I should emphasise that the substantive issues that I elect to examine are secondary if not tertiary to my main objective—which is to demonstrate how CTA can be integrated into a range of research designs and add depth and rigour to the analysis of interview and inscription data. To that end, in the next section I aim to clear our path to analysis by dealing with some issues that touch on the general position of CTA in the research armory, and especially its location in the schism that has developed between quantitative and qualitative modes of inquiry.

The Methodological Context of Content Analysis

Content analysis is usually associated with the study of inscription contained in published reports, newspapers, adverts, books, web pages, journals, and other forms of documentation. Hence, nearly all of Berelson’s (1952) illustrations and references to the method relate to the analysis of written records of some kind, and where speech is mentioned it is almost always in the form of broadcast and published political speeches (such as State of the Union addresses). This association of content analysis with text and documentation is further underlined in modern textbook discussions of the method. Thus Bryman (2008) for example, defines content analysis as “an approach to the analysis of documents and texts , that seek to quantify content in terms of pre-determined categories” (2008:274, emphasis in original), while Babbie (2013) states that content analysis is “the study of recorded human communications” (2013:295), and Weber refers to it as a method to make “valid inferences from text” (1990:9). It is clear then that CTA is viewed as a text-based method of analysis, though extensions of the method to other forms of inscriptional material are also referred to in some discussions. Thus Neuendorf (2002) , for example, rightly refers to analyses of film and television images as legitimate fields for the deployment of CTA, and by implication analyses of still—as well as moving—images such as photographs and billboard adverts. Oddly, in the traditional or standard paradigm of content analysis, the method is solely used to capture the “message” of a text or speech; it is not used for the analysis of a recipient’s response to or understanding of the message (which is normally accessed via interview data and analyzed in other and often less rigorous ways; see, e.g., Merton, 1968 ). So in this article I suggest that we can take things at least one small step further by using CTA to analyse speech (especially interview data) as well as text.

Standard textbook discussions of CTA usually refer to it as a “non-reactive” or “unobtrusive” method of investigation (see, e.g., Babbie, 2013 :294), and a large part of the reason for that designation is due to its focus on already existing text (i.e., text gathered without intrusion into a research setting). More importantly, however, (and to underline the obvious) CTA is primarily a method of analysis rather than of data collection. Its use therefore has to be integrated into wider frames of research design that embrace systematic forms of data collection as well as forms of data analysis. Thus routine strategies for sampling data are often required in designs that call upon CTA as a method of analysis. These latter can either be built around random sampling methods, or even techniques of “theoretical sampling” ( Glaser & Strauss, 1967 ) so as to identify as suitable range of materials for content analysis. CTA can also be linked to styles of ethnographic inquiry and to the use of various purposive or non-random sampling techniques. For an example, see Altheide (1987) .

Of course, the use of CTA in a research design does not preclude the use of other forms of analysis in the same study, for it is a technique that can be deployed in parallel with other methods or with other methods sequentially. For example, and as I will demonstrate in the following sections, one might use CTA as a preliminary analytical strategy to get a grip on the available data before moving into specific forms of discourse analysis. In this respect it can be as well to think of using CTA in, say, the frame of a priority/sequence model of research design as described by Morgan (1998) .

As I shall explain, there is a sense in which content analysis rests at the base of all forms of qualitative data analysis, yet the paradox is that the analysis of content is usually considered to be a quantitative (numerically based) method. In terms of the qualitative/quantitative divide, however, it is probably best to think of CTA as a hybrid method, and some writers have in the past argued that it is necessarily so ( Kracauer, 1952 ). That was probably easier to do in an age when many recognised the strictly drawn boundaries between qualitative and quantitative styles of research to be inappropriate. Thus in their widely used text on “ Methods in Social Research ,” Goode and Hatt (1952 :313), for example, asserted that, “[M]odern research must reject as a false dichotomy the separation between ‘qualitative’ and ‘quantitative’ studies, or between the ‘statistical’ and the ‘non-statistical’ approach.” It was a position advanced on the grounds that all good research must meet adequate standards of validity and reliability whatever its style, and it is a message well worth preserving. However, there is a more fundamental reason why it is nonsensical to draw a division between the qualitative and the quantitative. It is simply this: all acts of social observation depend on the deployment of qualitative categories—whether gender, class, race, or even age; there is no descriptive category in use in the social sciences that connects to a world of “natural kinds.” In short, all categories are made, and therefore when we seek to count “things” in the world, we are dependent on the existence of socially constructed divisions. How the categories take the shape that they do—how definitions are arrived at, how inclusion and exclusion criteria are decided upon, and how taxonomic principles are deployed—constitute interesting research questions in themselves. From our starting point, however, we need only note that “sorting things out” (to use a phrase from Bowker & Star, 1999 ) and acts of “counting”—whether it be of chromosomes or people ( Martin and Lynch, 2009 )—are activities that connect to the social world of organized interaction rather than to unsullied observation of the external world.

Of course, some writers deny the strict division between the qualitative and quantitative on grounds of empirical practice rather than of ontological reasoning. For example, Bryman (2008) argues that qualitative researchers also call upon quantitative thinking but tend to use somewhat vague, imprecise terms rather than numbers and percentages—referring to frequencies via the use of phrases such as “more than” and “less then.” Kracauer (1952) advanced various arguments against the view that CTA was strictly a quantitative method, suggesting that very often we wished to assess content as being negative or positive with respect to some political, social, or economic thesis and that such evaluations could never be merely statistical. He further argued that we often wished to study “underlying” messages or latent content of documentation and that in consequence we needed to interpret content as well as count items of content. Morgan (1993) has argued that, given the emphasis that is placed on “coding” in almost all forms of qualitative data analysis, the deployment of counting techniques is essential and that we ought therefore to think in terms of what he calls qualitative as well as quantitative content analysis. Naturally, some of these positions create more problems than they seemingly solve (as is the case with considerations of “latent content”), but given the twentieth-first-century predilection for “mixed-methods” research ( Creswell, 2007 ), it is clear that CTA has a role to play in integrating quantitative and qualitative modes of analysis in a systematic rather than merely an ad hoc and piecemeal fashion. In the sections that follow, I will provide some examples of the ways in which “qualitative” analysis can be combined with systematic modes of counting. First, however, we need to focus on what is analyzed in CTA.

Units of analysis

So what is the unit of analysis in CTA? A brief answer to that question is that analysis can be focused on words, sentences, grammatical structures, tenses, clauses, ratios (of say, nouns to verbs), or even “themes.” Berelson (1952) gives some examples of all of the above and also recommends a form of thematic analysis (c.f., Braun and Clarke, 2006 ) as a viable option. Other possibilities include counting column length (of speeches and newspaper articles), amounts of (advertising) space, or frequency of images. For our purposes, however, it might be useful to consider a specific (and somewhat traditional) example. Here it is. It is an extract from what has turned out to be one of the most important political speeches of the current century.

Iraq continues to flaunt its hostility toward America and to support terror. The Iraqi regime has plotted to develop anthrax and nerve gas and nuclear weapons for over a decade. This is a regime that has already used poison gas to murder thousands of its own citizens, leaving the bodies of mothers huddled over their dead children. This is a regime that agreed to international inspections then kicked out the inspectors. This is a regime that has something to hide from the civilized world. States like these, and their terrorist allies, constitute an axis of evil, arming to threaten the peace of the world. By seeking weapons of mass destruction, these regimes pose a grave and growing danger. They could provide these arms to terrorists, giving them the means to match their hatred. They could attack our allies or attempt to blackmail the United States. In any of these cases, the price of indifference would be catastrophic.” —George W. Bush, State of the Union address, January 29, 2002

A number of possibilities arise for analysing the content of a speech such as the one above. Clearly, words and sentences must play a part in any such analysis, but in addition to words there are structural features of the speech that could also figure. For example, the extract takes the form of a simple narrative—pointing to a past, a present, and an ominous future (catastrophe)—and could therefore be analysed as such. There are, in addition, a number of interesting oppositions in the speech (such as those between “regimes” and the “civilised” world), as well as a set of interconnected present participles such as “plotting,” “hiding,” “arming,” and “threatening” that are associated both with Iraq and with other states that “constitute an axis of evil.” Evidently, simple word counts would fail to capture the intricacies of a speech of this kind. Indeed, our example serves another purpose—to highlight the difficulty that often arises in dissociating content analysis from discourse analysis (of which narrative analysis and the analysis of rhetoric and trope are subspecies). So how might we deal with these problems?

One approach that can be adopted is to focus on what is referenced in text and speech. That is, to concentrate on the characters or elements that are recruited into the text and to examine the ways in which they are connected or co-associated. I shall provide some examples of this form of analysis shortly. Let us merely note for the time being that in the previous example we have a speech in which various “characters”—including weapons in general, specific weapons (such as nerve gas), threats, plots, hatred, evil and mass destruction—play a role. Be aware that we need not be concerned with the veracity of what is being said—whether it is true or false—but simply with what is in the speech and how what is in there is associated. (We may leave the task of assessing truth and falsity to the jurists). Be equally aware that it is a text that is before us and not an insight into the ex-President’s mind, nor his thinking, nor his beliefs, nor any other subjective property that he may have possessed.

In the introductory paragraph, I made brief reference to some ideas of the German philosopher Jűrgen Habermas (1987) . It is not my intention here to expand on the detailed twists and turns of his claims with respect to the role of language in the “lifeworld” at this point. However, I do intend to borrow what I regard as some particularly useful ideas from his work. The first, is his claim—influenced by a strong line of twentieth-century philosophical thinking—that language and culture are constitutive of the lifeworld (1987:125), and in that sense we might say that things (including individuals and societies) are made in language. That of course is a simple justification for focusing on what people say rather than what they “think” or “believe” or “feel” or “mean” (all of which have been suggested at one time or another as points of focus for social inquiry and especially qualitative forms of inquiry). Second, Habermas argues that speakers and therefore hearers (and one might add writers and therefore readers), in what he calls their speech acts, necessarily adopt a pragmatic relation to one of three worlds: entities in the objective world, things in the social world, and elements of a subjective world. In practice, Habermas (1987 :120) suggests all three worlds are implicated in any speech act but that there will be a predominant orientation to one of these. To rephrase this in a crude form, when speakers engage in communication, they refer to things and facts and observations relating to external nature, to aspects of interpersonal relations, and to aspects of private inner subjective worlds (thoughts, feelings, beliefs, etc.). One of the problems with locating CTA in “communication research” has been that the communications referred to are but a special and limited form of action (often what Habermas would call strategic acts). In other words, television, newspaper, video, and internet communications are just particular forms (with particular features) of action in general. Again we might note in passing that the adoption of the Habermassian perspective on speech acts implies that much of qualitative analysis in particular has tended to focus only on one dimension of communicative action—the subjective and private. In this respect, I would argue that it is much better to look at speeches such as George W Bush’s 2002 State of the Union address as an “account” and to examine what has been recruited into the account; and how what has been recruited is connected or co-associated rather than to use the data to form insights into his (or his adviser’s) thoughts, feelings, and beliefs.

In the sections that follow, and with an emphasis on the ideas that I have just expounded, I intend to demonstrate how CTA can be deployed to advantage in almost all forms of inquiry that call upon either interview (or speech-based) data or textual data. In my first example, I will show how CTA can be used to analyze a group of interviews. In the second example, I will show how it can be used to analyze a group of policy documents. In the third, I shall focus on a single interview (a “case”), and in the fourth and final example, I will show how CTA can be used to track the biography of a concept. In each instance, I shall briefly introduce the context of the “problem” on which the research was based, outline the methods of data collection, discuss how the data were analyzed and presented, and underline the ways in which content analysis has sharpened the analytical strategy.

Analyzing a Sample of Interviews: Looking at Concepts and Their Co-Associations in a Semantic Network

My first example of using CTA is based on a research study that was initially undertaken in the early 2000s. It was a project aimed at understanding why older people might reject the offer to be immunized against influenza (at no cost to them). The ultimate objective was to improve rates of immunization in the study area. The first phase of the research was based on interviews with 54 older people in South Wales. The sample included people who had never been immunized, some who had refused immunization, and some who had accepted immunization. Within each category, respondents were randomly selected from primary care physician patient lists, and the data were initially analyzed “thematically” and published accordingly ( Evans, Prout, Prior, et al., 2007 ). A few years later, however, I returned to the same data set to look at a different question—how (older) lay people talked about colds and flu, especially how they distinguished between the two illnesses and how they understood the causes of the two illnesses (see Prior, Evans, & Prout, 2011 ). Fortunately, in the original interview schedule, we had asked people about how they saw the “differences between cold and flu” and what caused flu, so it was possible to reanalyze the data with such questions in mind. In that frame, the example that follows demonstrates not only how CTA might be used on interview data, but also how it might be used to undertake a secondary analysis of a pre-existing data set ( Bryman, 2008 ).

As with all talk about illness, talk about colds and flu is routinely set within a mesh of concerns—about causes, symptoms, and consequences. Such talk comprises the base elements of what has at times been referred to as the “explanatory model” of an illness ( Kleinman, Eisenberg, & Good, 1978 ). In what follows, I shall focus almost entirely on issues of causation as understood from the viewpoint of older people; the analysis is based on the answers that respondents made in response to the question, “How do you think people catch flu?”

Semi-structured interviews of the kind undertaken for a study such as this are widely used and are often characterized as akin to “a conversation with a purpose” ( Kahn & Cannell, 1957 :97). One of the problems of analyzing the consequent data is that, although the interviewer holds to a planned schedule, the respondents often reflect in a somewhat unstructured way about the topic of investigation, so it is not always easy to unravel the web of talk about, say, “causes” that occurs in the interview data. In this example, causal agents of flu, inhibiting agents, and means of transmission were often conflated by the respondents. Nevertheless, in their talk people did answer the questions that were posed, and in the study referred to here, that talk made reference to things such as “bugs” (and “germs”) as well as viruses; but the most commonly referred to causes were “the air” and the “atmosphere.” The interview data also pointed toward means of transmission as “cause”—so coughs and sneezes and mixing in crowds figured in the causal mix. Most interesting perhaps was the fact that lay people made a nascent distinction between facilitating factors (such as bugs and viruses) and inhibiting factors (such as being resistant, immune, or healthy), so that in the presence of the latter, the former are seen to have very little effect. Here are some shorter examples of typical question-response pairs from the original interview data.

(R:32): “How do you catch it [the flu]? Well, I take it its through ingesting and inhaling bugs from the atmosphere. Not from sort of contact or touching things. Sort of airborne bugs. Is that right?” (R:3): “I suppose it’s [the cause of flu] in the air. I think I get more diseases going to the surgery than if I stayed home. Sometimes the waiting room is packed and you’ve got little kids coughing and spluttering and people sneezing, and air conditioning I think is a killer by and large I think air conditioning in lots of these offices”. (R:46): “I think you catch flu from other people. You know in enclosed environments in air conditioning which in my opinion is the biggest cause of transferring diseases is air conditioning. Worse thing that was ever invented that was. I think so, you know. It happens on aircraft exactly the same you know.”

Alternatively, it was clear that for some people being cold, wet, or damp could also serve as a direct cause of flu; thus:

Interviewer: “OK, good. How do you think you catch the flu?” (R:39): “Ah. The 65 dollar question. Well, I would catch it if I was out in the rain and I got soaked through. Then I would get the flu. I mean my neighbour up here was soaked through and he got pneumonia and he died. He was younger than me: well, 70. And he stayed in his wet clothes and that’s fatal. Got pneumonia and died, but like I said, if I get wet, especially if I get my head wet, then I can get a nasty head cold and it could develop into flu later.”

As I suggested earlier, despite the presence of bugs and germs, viruses, the air, and wetness or dampness, “catching” the flu is not a matter of simple exposure to causative agents. Thus some people hypothesized that within each person there is a measure of immunity or resistance or healthiness that comes into play and that is capable of counteracting the effects of external agents. For example, being “hardened” to germs and harsh weather can prevent a person getting colds and flu. Being “healthy” can itself negate the effects of any causative agents, and healthiness is often linked to aspects of “good” nutrition and diet and not smoking cigarettes. These mitigating and inhibiting factors can either mollify the effects of infection or prevent a person “catching” the flu entirely. Thus (R:45) argued that it was almost impossible for him to catch flu or cold “[c]os I got all this resistance.” Interestingly respondents often used possessive pronouns in their discussion of immunity and resistance (“my immunity” and “my resistance”)—and tended to view them as personal assets (or capital) that might be compromised by mixing with crowds.

By implication, having a weak immune system can heighten the risk of contracting cold and flu and might therefore spur one on to take preventive measures such as accepting a flu jab. There are some, of course, who believe that it is the flu jab that can cause the flu and other illnesses. An example of what might be called lay “epidemiology” ( Davison, Davey-Smith, & Frankel, 1991 ) is evident in the following extract.

(R:4): “Well, now it’s coincidental you know that [my brother] died after the jab, but another friend of mine, about 8 years ago, the same happened to her. She had the jab and about six months later, she died, so I know they’re both coincidental, but to me there’s a pattern.”

Normally, results from studies such as this are presented in exactly the same way as has just been set out. Thus the researcher highlights given themes that are said to have emerged out of the data and then provides appropriate extracts from the interviews to illustrate and substantiate the relevant themes. However, one very reasonable question that any critic might ask about the selected data extracts concerns the extent to which they are “representative” of the material in the data set as a whole. Maybe, for example, the author has been unduly selective in his or her use of both themes and quotations. Perhaps, as a consequence, the author has ignored or left out talk that does not fit their arguments or extracts that might be considered dull and uninteresting compared to more exotic material. And these kinds of issues and problems are certainly common to the reporting of almost all forms of qualitative research. However, the adoption of CTA techniques can help to mollify such problems. This is so because by using CTA we can indicate the extent to which we have used all or just some of the data, and we can provide a view of the content of the entire sample of interviews rather than just the content and flavor of merely one or two interviews. In this light, we need to consider Figure 18.1 . The figure is based on counting the number of references in the 54 interviews to the various “causes” of the flu, though references to the flu jab (i.e., inoculation) as a cause of flu have been ignored for the purpose of this discussion). The node sizes reflect the relative importance of each cause as determined by the concept count (frequency of occurrence). The links between nodes reflect the degree to which causes are co-associated in interview talk and are calculated according to a co-occurrence index (see, e.g., SPSS, 2007 :183).

Given this representation, we can immediately assess the relative importance of the different causes as referred to in the interview data. Thus we can see that such things as (poor) “hygiene” and “foreigners” were mentioned as a potential cause of flu—but mention of hygiene and foreigners was nowhere near so important as references to “the air” or to “crowds” or to “coughs and sneezes.” In addition, we can also determine the strength of the connections that interviewees made between one cause and another. Thus there are relatively strong links between “resistance” and “coughs and sneezes,” for example.

In fact, Figure 18.1 divides causes into the “external” and the “internal,” or the facilitating and the impeding (lighter and darker nodes). Among the former I have placed such things as crowds, coughs, sneezes, and the air while among the latter I have included “resistance,” “immunity,” and “health.” That division, of course, is a product of my conceptualizing and interpreting the data, but whichever way we organize the findings, it is evident that talk about the causes of flu belongs in a web or mesh of concerns that would be difficult to represent by the use of individual interview extracts alone. Indeed, it would be impossible to demonstrate how the semantics of causation belong to a culture (rather than to individuals) in any other way. In addition I would argue that the counting involved in the construction of the diagram functions as a kind of check on researcher interpretations and provides a source of visual support for claims that an author might make about, say, the relative importance of “damp” and “air” as perceived causes of disease. Finally, the use of CTA techniques allied with aspects of conceptualization and interpretation has enabled us to approach the interview data as a set and to consider the respondents as belonging to a community rather than regarding them merely as isolated and disconnected individuals, each with their own views. It has also enabled us to squeeze some new findings out of old data, and I would argue that it has done so with advantage. There are of course other advantages to using CTA to explore data sets, which I highlight in the next section.

What causes flu? A lay perspective. Factors listed as causes of colds and flu in 54 interviews. Node size is proportional to number of references “as causes.” Line thickness is proportional to co-occurrence of any two “causes” in the set of interviews.

Analyzing a Sample of Documents: Using Content Analysis to Verify Claims

Policy analysis is a difficult business. For a start, it is never entirely clear where (social, health, economic, environmental) policy actually is. Is it in documents (as published by governments, think tanks, and research centres), in action (what people actually do), or in speech (what people say)? Perhaps it rests in a mixture of all three realms. Yet wherever it may be, it is always possible, at the very least, to identify a range of policy texts and to focus on the conceptual or semantic webs in terms of which government officials and other agents (such as politicians) talk about the relevant policy issues. Furthermore, in so far as policy is recorded—in speeches, pamphlets, and reports—we may begin to speak of specific policies as having a history or a pedigree that unfolds through time (think, e.g., of US or UK health policies during the Clinton years or the Obama years). And in so far as we consider “policy” as having a biography or a history, we can also think of studying policy narratives.

Though firmly based in the world of literary theory, narrative method has been widely used for both the collection and the analysis of data concerning ways in which individuals come to perceive and understand various states of health, ill health, and disability ( Frank, 1995 ; “ Hydén, 1997 ). Narrative techniques have also been adapted for use in clinical contexts and allied to concepts of healing ( Charon, 2006 ). In both social scientific and clinical work, however, the focus is invariably on individuals and on how individuals “tell” stories of health and illness. Yet narratives can also belong to collectives—such as political parties and ethnic and religious groups—just as much as to individuals, and in the latter case there is a need to collect and analyse data that are dispersed across a much wider range of materials than can be obtained from the personal interview. In this context, Roe (1994) has demonstrated how narrative method can be applied to an analysis of national budgets, animal rights, and environmental policies.

An extension of the concept of narrative to policy discourse is undoubtedly useful ( Newman & Vidler, 2006 ), but how might such narratives be analyzed? What strategies can be used to unravel the form and content of a narrative, especially in circumstances where the narrative might be contained in multiple (policy) documents, authored by numerous individuals, and published across a span of time rather than in a single, unified text such as a novel? Roe (1994) , unfortunately, is not in any way specific about analytical procedures apart from offering the useful rule to “never stray too far from the data” (1994:xii). So in this example I will outline a strategy for tackling such complexities. In essence, it is a strategy that combines techniques of linguistically (rule) based content analysis with a theoretical and conceptual frame that enables us to unraveland identify the core features of a policy narrative. My substantive focus is on documents concerning health service delivery policies published 2000–2009 in the constituent countries of the UK (that is, England, Scotland, Wales, and Northern Ireland—all of which have different political administrations).

Narratives can be described and analyzed in various ways, but for our purposes we can say that they have three key features: they point to a chronology, they have a plot and they contain “characters.”

Chronology : All narratives have beginnings; they also have middles and endings, and these three stages are often seen as comprising the fundamental structure of narrative text. Indeed, in his masterly analysis of time and narrative, Ricoeur (1984) argues that it is in the unfolding chronological structure of a narrative that one finds its explanatory (and not merely descriptive) force. By implication, one of the simplest strategies for the examination of policy narratives is to locate and then divide a narrative into its three constituent parts—beginning, middle, and end.

Unfortunately, while it can sometimes be relatively easy to locate or choose a beginning to a narrative, it can be much more difficult to locate an end point. Thus in any illness narrative, a narrator might be quite capable of locating the start of an illness process (in an infection, accident, or other event) but unable to see how events will be resolved in an ongoing and constantly unfolding life. As a consequence, both narrators and researchers usually find themselves in the midst of an emergent present—a present without a known and determinate end (see, e.g., Frank, 1995 ). Similar considerations arise in the study of policy narratives where chronology is perhaps best approached in terms of (past) beginnings, (present) middles, and projected futures.

Plot : According to Ricoeur (1984) , our basic ideas about narrative are best derived from the work and thought of Aristotle who in his Poetics sought to establish “first principles” of composition. For Ricoeur, as for Aristotle, plot ties things together. It “brings together factors as heterogeneous as agents, goals, means, interactions, circumstances, unexpected results” (1984:65) into the narrative frame. For Aristotle, it is the ultimate untying or unraveling of the plot that releases the dramatic energy of the narrative.

Character : Characters are most commonly thought of as individuals, but they can be considered in much broader terms. Thus the French semiotician A. J. Greimas (1970) , for example, suggested that, rather than think of characters as people, it would be better to think in terms of what he called “actants” and of the functions that such actants fulfill within a story. In this sense geography, climate, and capitalism can be considered as characters every bit as much as aggressive wolves and Little Red Riding Hood. Further, he argued that the same character (actant) can be considered to fulfill many functions and the same function performed by many characters. Whatever else, the deployment of the term actant certainly helps us to think in terms of narratives as functioning and creative structures. It also serves to widen our understanding of the ways in which concepts, ideas, and institutions, as well “things” in the material world can influence the direction of unfolding events every bit as much as conscious human subjects. Thus, for example, the “American people,” “the nation,” “the constitution,” “ the West,” “tradition,” and “Washington” can all serve as characters in a policy story.

As I have already suggested, narratives can unfold across many media and in numerous arenas—speech and action, as well as text. Here, however, my focus is solely on official documents—all of which are UK government policy statements as listed in Table 18.1 . The question is how might CTA help us unravel the narrative frame?

It might be argued that a simple reading of any document should familiarize the researcher with elements of all three policy narrative components (plot, chronology, and character). However, in most policy research, we are rarely concerned with a single and unified text as is the case with a novel, but rather with multiple documents written at distinctly different times by multiple (usually anonymous) authors that notionally can range over a wide variety of issues and themes. In the full study, some 19 separate publications were analyzed across England, Wales, Scotland, and Northern Ireland.

Naturally, to list word frequencies—still less to identify co-occurrences and semantic webs in large data sets (covering hundreds of thousand of words and footnotes)—cannot be done manually but rather requires the deployment of complex algorithms and text-mining procedures. To this end I analyzed the 19 documents using “Text Mining for Clementine” ( SPSS, 2007 ).

Text-mining procedures begin by providing an initial list of concepts based on the lexicon of the text but which can be weighted according to word frequency and which take account of elementary word associations. For example, learning disability, mental health, and performance management indicate three concepts, not six words. Using such procedures on the aforementioned documents gives the researcher an initial grip on the most important concepts in the document set of each country. Note that this is much more than a straightforward concordance analysis of the text and is more akin to what Ryan & Bernard (2000) have referred to as “semantic analysis” and Carley (1993) has referred to as “concept” and “mapping” analysis.

So the first task was to identify and then extract the core concepts, thus identifying what might be called “key” characters or actants in each of the policy narratives. For example, in the Scottish documents such actants included “Scotland” and the “Scottish people,” as well as “health” and the “NHS,” among others; while in the Welsh documents it was “the people of Wales” and “Wales” that figured largely—thus emphasizing how national identity can play every bit as important a role in a health policy narrative as concepts such as “health,” “hospitals,” and “wellbeing.”

Having identified key concepts it was then possible to track concept clusters in which particular actants or characters are embedded. Such cluster analysis is dependent on the use of co-occurrence rules and the analysis of synonyms, whereby it is possible to get a grip on the strength of the relationships between the concepts, as well as the frequency with which the concepts appear in the collected texts. In Figure 18.2 , I provide an example of a concept cluster. The diagram indicates the nature of the conceptual and semantic web in which various actants are discussed. The diagrams further indicate strong (solid line) and weaker (dotted line) connections between the various elements in any specific mix, and the numbers indicate frequency counts for the individual concepts. Using Clementine , the researcher is unable to specify in advance which clusters will emerge from the data. One cannot, for example, choose to have an NHS cluster. In that respect, these diagrams not only provide an array in terms of which concepts are located, but also serve as a check on and to some extent validation of the interpretations of the researcher. Of course none of this tells us what the various narratives contained within the documents might be. They merely point to key characters and relationships both within and between the different narratives. So having indicated the techniques used to identify the essential parts of the four policy narratives, it is now time to sketch out their substantive form.

It may be useful to note that Aristotle recommended brevity in matters of narrative —deftly summarising the whole of the Odyssey in just seven lines. In what follows, I attempt—albeit somewhat weakly—to emulate that example by summarising a key narrative of English health services policy in just four paragraphs. The citations are of Department of Health publications (by year) as listed in Table 18.1 . Note how the narrative unfolds in relation to the dates of publication. In the English case (though not so much in the other UK countries), it is a narrative that is concerned to introduce market forces into what is and has been a state-managed health service. Market forces are justified in terms of improving opportunities for the consumer (i.e., the patients in the service), and the pivot of the newly envisaged system is something called “patient choice” or “choice.” This is how the story unfolds as told through the policy documents between 2000–2008 (see Table 18.1 ).

The advent of the NHS in 1948 was a “seminal event” (2000:8), but under successive Conservative administrations the NHS was seriously underfunded (2006:3). The (New Labour) government will invest (2000) or already has (2003:4) invested extensively in infrastructure and staff, and the NHS is now on a “journey of major improvement” (2004:2). But “more money is only a starting point” (2000:2), and the journey is far from finished. Continuation requires some fundamental changes of “culture” (2003:6). In particular, the NHS remains unresponsive to patient need, and “[a]ll too often, the individual needs and wishes are secondary to the convenience of the services that are available. This ‘one size fits all’ approach is neither responsive, equitable nor person-centred” (2003:17). In short, the NHS is a 1940s system operating in a twenty-first-century world (2000:26). Change is therefore needed across the “whole system” (2005:3) of care and treatment.

Above all, we have to recognize that we “live in a consumer age” (2000:26). People’s expectations have changed dramatically (2006:129), and people want more choice, more independence, and more control (2003:12) over their affairs. Patients are no longer, and should not be considered as, “passive recipients” of care (2003:62), but wish to be and should be (2006:81) actively “involved” in their treatments (2003:38, 2005:18)—indeed, engaged in a partnership (2003:22) of respect with their clinicians. Furthermore, most people want a personalized service “tailor made to their individual needs” (2000:17, 2003:15, 2004:1, 2006:83)—“[a] service which feels personal to each and every individual within a framework of equity and good use of public money” (2003:6).

To advance the necessary changes, “patient choice” needs to be and “will be strengthened” (2000:89). “Choice” must be made to “happen” (2003), and it must be “real” (2003:3, 2004:5, 2005:20, 2006:4). Indeed, it must be “underpinned” (2003:7) and “widened and deepened” (2003:6) throughout the entire system of care.

If “we” expand and underpin patient choice in appropriate ways and engage patients in their treatment systems, then levels of patient satisfaction will increase (2003:39), and their choices will lead to a more “efficient” (2003:5, 2004:2, 2006:16) and effective (2003:62, 2005:8) use of resources. Above all, the promotion of choice will help to drive up “standards” of care and treatment (2000:4, 2003:12, 2004:3, 2005:7, 2006:3). Furthermore, the expansion of choice will serve to negate the effects of the “inverse care law,” whereby those who need services most tend to get catered for the least (2000:107, 2003:5, 2006:63), and it will thereby help in moderating the extent of health inequalities in the society in which we live. “The overall aim of all our reforms,” therefore, “is to turn the NHS from a top down monolith into a responsive service that gives the patient the best possible experience. We need to develop an NHS that is both fair to all of us, and personal to each of us” (2003:5).

Concept cluster for “care” in six English policy documents, 2000–2007. Line thickness is proportional to the strength co-occurrence co-efficient. Node size reflects relative frequency of concept, and (numbers) refer to the frequency of concept. Solid lines indicate relationships between terms within the same cluster, and dotted lines indicate relationships between terms in different clusters.

We can see how most—though not all—of the elements of this story are represented in Figure 18.2 . In particular we can see strong (co-occurrence) links between “care” and “choice” and how partnership, performance, control, and improvement have a prominent profile. There are of course some elements of the web that have a strong profile (in terms of node size and links) but to which we have not referred; access, information, primary care, and waiting times are four. As anyone well versed in English health care policy would know, these have important roles to play in the wider, consumer-driven narrative. However, by rendering the excluded as well as included elements of that wider narrative visible, the concept web provides a degree of verification on the content of the policy story as told herein and on the scope of its “coverage.”

In following through on this example, we have of course moved from content analysis to a form of discourse analysis (in this instance narrative analysis). That shift underlines aspects of both the versatility of CTA and some of its weaknesses—versatility in the sense that CTA can be readily combined with other methods of analysis and in the way in which the results of the CTA help us to check and verify the claims of the researcher. The weakness of the diagram compared to the narrative is that CTA on its own is a somewhat one-dimensional and static form of analysis, and while it is possible to introduce time and chronology into the diagrams, the diagrams themselves remain lifeless in the absence of some form of discursive overview. (For a fuller analysis of these data see, Prior, Hughes, & Peckham, 2012 ).

Analyzing a Single Interview: The Role of Content Analysis in a Case Study

So far I have focused on using content analysis on a sample of interviews and on a sample of documents. In the first instance, I recommended CTA for its capacity to tell us something about what is seemingly central to interviewees and for demonstrating how what is said is linked (in terms of a concept network). In the second instance, I reaffirmed the virtues of co-occurrence and network relations, but this time in the context of a form of discourse analysis. I also suggested that CTA can serve an important role in the process of verification of a narrative and its academic interpretation. In this section, however, I am going to link the use of CTA to another style of research—case study—to show how CTA might be used to analyze a single “case.”

Case study is a term used in multiple and often ambiguous ways. However, Gerring (2004 :342) defines it as “an intensive study of a single unit for the purpose of understanding a larger class of (similar) units.” As Gerring points out, case study does not necessarily imply a focus on N = 1, although that is indeed the most logical number for case study research ( Ragin & Becker, 1992 ). Naturally, an N of 1 can be immensely informative, and whether we like it or not we often have only one N to study (think, e.g., of the 1986 Challenger shuttle disaster, or of the 9/11 attack on the World Trade Center). In the clinical sciences, of course, case studies are widely used to represent the “typical” features of a wider class of phenomena, and often used to define a kind or syndrome (as is in the field of clinical genetics). Indeed, at the risk of mouthing a tautology, one can say that the distinctive feature of case study is its focus on a case in all of its complexity—rather than on individual variables and their inter-relationships, which tends to be a point of focus for large N research.

There was a time when case study was central to the science of psychology. Breuer and Freud’s (2001) famous studies of “hysteria” (orig. 1895) provide an early and outstanding example of the genre in this respect, but as with many of the other styles of social science research, the influence of case studies waned with the rise of much more powerful investigative techniques—including experimental methods—driven by the deployment of new statistical technologies. Ideographic studies consequently gave way to the current fashion for statistically driven forms of analysis that focus on causes and cross-sectional associations between variables rather than ideographic complexity.

In the example that follows, we will look at the consequences of a traumatic brain injury (TBI) on just one individual. The analysis is based on an interview with a person suffering from such an injury, and it was one of 32 interviews carried out with people who had experienced a TBI. The objective of the original research was to develop an outcome measure for TBI that was sensitive to the sufferer’s (rather than the health professional’s) point of view. In our original study (see Morris, Prior, Deb et al., 2005 ), interviews were also undertaken with 27 carers of the injured with the intention of comparing their perceptions of TBI to those of the people for which they cared. A sample survey was also undertaken to elicit views about TBI from a much wider population of patients than was studied via interview.

In the introduction, I referred to Habermas and the concept of the “lifeworld.” Lifeworld ( Lebenswelt ) is a concept that first arose out of twentieth-century German philosophy. It constituted a specific focus for the work of Alfred Schutz (see, e.g., Schutz and Luckman, 1974 ). Schutz described the lifeworld as “that province of reality which the wide-awake and normal adult simply takes-for-granted in an attitude of common sense” (1974:3). Indeed, it was the routine and taken-for-granted quality of such a world that fascinated Schutz. As applied to the worlds of those with head injuries, the concept has particular resonance because head injuries often result in that taken-for-granted quality being disrupted and fragmented, ending in what Russian neuropsychologist A.R. Luria once described as “shattered” worlds ( Luria, 1975 ). As well as providing another excellent example of a case study, Luria’s work is also pertinent because he sometimes argued for a “romantic science” of brain injury—that is, a science that sought to grasp the world view of the injured patient by paying attention to an unfolding and detailed personal “story” of the head injured as well as to the neurological changes and deficits associated with the injury itself. In what follows, I shall attempt to demonstrate how CTA might be used to underpin such an approach.

In the original research, we began analysis by a straightforward reading of the interview transcripts. Unfortunately, a simple reading of a text or an interview can, strangely, mislead the reader into thinking that some issues or themes are actually more important than is warranted by the actual contents of the text. How that comes about is not always clear, but it probably has something to do with a desire to develop “findings” and our natural capacity to overlook the familiar in favor of the unusual. For that reason alone, it is always useful to subject any text to some kind of concordance analysis—that is, generating a simple frequency list of words used in an interview or text. Given the current state of technology, one might even speak these days of using text-mining procedures such as the aforementioned Clementine to undertake such a task. By using Clementine, and as we have seen, it is also possible to measure the strength of co-occurrence links between elements (i.e., words and concepts) in the entire data set (in this example, 32 interviews), though for a single interview these aims can just as easily be achieved using much simpler, low-tech strategies.

By putting all 32 interviews into the database, a number of common themes emerged. For example, it was clear that “time” entered into the semantic web in a prominent manner, and it was clearly linked to such things as “change,” “injury,” “the body,” and what can only be called the “I was.” Indeed, time runs through the 32 stories in many guises, and the centrality of time is of course a reflection of storytelling and narrative recounting in general—chronology, as we have noted, being a defining feature of all story telling ( Ricoeur, 1984 ). Thus sufferers recounted both the events surrounding their injury and provided accounts as to how the injuries affected their present life and future hopes. As to time present, much of the patient story circled around activities of daily living—walking, working, talking, looking, feeling, remembering, and so forth.

Understandably, the word and the concept of “injury” featured largely in the interviews, though it was a word most commonly associated with discussions of physical consequences of injury. There were many references in that respect to injured arms, legs, hands, and eyes. There were also references to “mind”—though with far lesser frequency than with references to the body and to body parts. Perhaps none of this is surprising. However, one of the most frequent concepts in the semantic mix was the “I was” (716 references). The statement “I was,” or “I used to” was in turn strongly connected to terms such as “the accident” and “change.” Interestingly, the “I was” overwhelmingly eclipsed the “I am” in the interview data (the latter with just 63 references). This focus on the “I was” appears in many guises. For example, it is often associated with the use of the passive voice: “I was struck by a car;” “I was put on the toilet;” “I was shipped from there then, transferred to [Cityville];” “I got told that I would never be able...;” “I was sat in a room,” and so forth. In short, the “I was” is often associated with things, people, and events acting upon the injured person. More importantly, however, the appearance of the “I was” is often used to preface statements signifying a state of loss or change in the person’s course of life—that is, as an indicator for talk about the patient’s shattered world. For example, Patient 7122 stated, “The main (effect) at the moment is I’m not actually with my children, I can’t really be their mum at the moment. I was a caring Mum, but I can’t sort of do the things that I want to be able to do like take them to school. I can’t really do a lot on my own. Like crossing the roads.”

Another patient stated, “Everything is completely changed. The way I was... I can’t really do anything at the moment. I mean my German, my English, everything’s gone. Job possibilities is out the window. Everything is just out of the window... I just think about it all the time actually every day you know. You know it has destroyed me anyway, but if I really think about what has happened I would just destroy myself.”

Each of these quotations in its own way serves to emphasize how life has changed and how the patient’s world has changed. In that respect, we can say that one of the major outcomes arising from TBI may be substantial “biographical disruption” ( Bury, 1982 ), whereupon key features of an individual’s life course are radically altered forever. Indeed, as Becker (1997 :37) argues in relation to a wide array of life events, “When their health is suddenly disrupted, people are thrown into chaos. Illness challenges one’s knowledge of one’s body. It defies orderliness. People experience the time before their illness and its aftermath as two separate entities.” Indeed, this notion of a cusp in personal biography is particularly well illustrated by Luria’s patient Zasetsky; the latter often refers to being a “newborn creature” ( Luria, 1975 :24, 88), a shadow of a former self (1975;25), and as having his past “wiped out” (1975: 116).

However, none of this tells us about how these factors come together in the life and experience of one individual. When we focus on an entire set of interviews, we necessarily lose the rich detail of personal experience and tend instead to rely on a conceptual rather than a graphic description of effects and consequences (to focus on, say, “memory loss,” rather than loss of memory about family life). The contents of Figure 18.3 attempt to correct that vision. It records all of the things that a particular respondent (Patient 7011 )used to do and liked doing. It records all of the things that he says that can no longer do (at one year after injury), and it records all of the consequences that he suffered from his head injury at the time of interview. Thus we see references to epilepsy (his “fits”), paranoia (the patient spoke of his suspicions concerning other people, people scheming behind his back, and his inability to trust others), deafness, depression, and so forth. Note that, although I have inserted a future tense into the web (“I will”), such a statement never appeared in the transcript. I have set it there for emphasis and to show how for this person the future fails to connect to any of the other features of his world except in a negative way. Thus he states at one point that he cannot think of the future because it makes him feel depressed (see Fig. 18.3). The line thickness of the arcs reflect the emphasis that the subject placed on the relevant “outcomes” in relation to the “I was” and the “now” during the interview. Thus we see that factors affecting his concentration and balance loom large but that he is also concerned about his being dependent on others, his epileptic fits, and his being unable to work and drive a vehicle. The schism in his life between what he used to do, what cannot now do, and his current state of being is nicely represented in the CTA diagram.

What have we gained from executing this kind of analysis? For a start, we have moved away from a focus on variables, frequencies, and causal connections (e.g., a focus on the proportion of people with TBI who suffer from memory problems or memory problems and speech problems) and refocused on how the multiple consequences of a TBI link together in one person. In short, instead of developing a narrative of acting variables, we have emphasized a narrative of an acting individual ( Abbott, 1992 :62). Second, it has enabled us to see how the consequences of a TBI connect to an actual lifeworld (and not simply an injured body). So the patient is not viewed just as having a series of discrete problems such as balancing, or staying awake, which is the usual way of assessing outcomes, but is seen as someone struggling to come to terms with an objective world of changed things, people, and activities (missing work is not, for example, routinely considered an “outcome” of head injury). Third, by focusing on what the patient was saying, we gain insight into something that is simply not visible by concentrating on single outcomes or symptoms alone—namely, the void that rests at the center of the interview, what I have called the “I was.” Fourth, we have contributed to understanding a type, for the case that we have read about is not simply a case of “John” or “Jane” but a case of TBI, and in that respect it can add to many other accounts of what it is like to experience head injury—including one of the most well documented of all TBI cases, that of Zatetsky. Finally, we have opened up the possibility of developing and comparing cognitive maps ( Carley, 1993 ) for different individuals, and thereby gained insight into how alternative cognitive frames of the world arise and operate.

The shattered world of patient 7011. Thickness of lines (arcs) are proportional to the frequency of reference to the “outcome” by the patient during interview.

Tracing the biography of a concept

In the previous sections, I emphasised the virtues of CTA for its capacity to link into a data set in its entirety—and how the use of CTA can counter any tendency of a researcher to be selective and partial in the presentation and interpretation of information contained in interviews and documents. However, that does not mean that we always have to take an entire document or interview as the data source. Indeed, it is possible to select (on rational and explicit grounds) sections of documentation and to conduct the CTA on the chosen portions. In the example that follows, I do just that. The sections that I chose to concentrate on are titles and abstracts of academic papers—rather than the full texts. The research on which the following is based is concerned with a biography of a concept and is being conducted in conjunction with a PhD student of mine, Joanne Wilson. Joanne thinks of this component of the study more in terms of a “scoping study” than of a biographical study, and that too is a useful framework for structuring the context in which CTA can be used. Scoping studies ( Arksey & O’Malley, 2005 ) are increasingly used in health related research to “map the field” and to get a sense of the range of work that has been conducted on a given topic. Such studies can also be used to refine research questions and research designs. In our investigation the scoping study was centred on the concept of “well-being.” During the past decade or so, “well-being” has emerged as an important research target for governments and corporations as well as for academics, yet it is far from clear to what the term refers. Given the ambiguity of meaning, it is clear that a scoping review, rather than either a systematic review or a narrative review of available literature, would be best suited to our goals.

The origins of the concept of well-being can be traced at least as far back as the fourth century B.C., when philosophers produced normative explanations of the good life (e.g., eudaimonia, hedonia, and harmony). However, contemporary interest in the concept seemed to have been regenerated by the concerns of economists and most recently psychologists. These days governments are equally concerned with measuring well-being to inform policy and conduct surveys of well-being to assess that state of the nation (see, e.g., Office for National Statistics [ONS], 2012 )—but what are they assessing?

We adopted a two-step process to address the research question, “What is the meaning of ‘well-being’ in the context of public policy?” First, we explored the existing thesauri of eight databases to establish those higher-order headings (if any) under which articles with relevance to well-being might be catalogued. Thus we searched the following databases: Cumulative Index of Nursing and Allied Health Literature [CINAHL], EconLit, Health Management Information Consortium [HMIC], MEDLINE, Philosopher’s Index, PsycINFO, Sociological Abstracts, and Worldwide Political Science Abstracts (WPSA). Each of these databases adopts keyword-controlled vocabularies. In other words, they use inbuilt statistical procedures to link core terms to a set lexis of phrases that depict the concepts contained in the database. Table 18.2 shows each database and its associated taxonomy. The contents of the table point toward a linguistic infrastructure in terms of which academic discourse is conducted, and our task was to extract from this infrastructure the semantic web wherein the concept of “well-being” is situated. We limited the thesaurus terms to “well-being” and its variants (i.e., wellbeing or well being). If the term was returned, it was then exploded to identify any associated terms.

CINAHL = Cumulative Index of Nursing and Allied Health Literature; HMIC = Health Management Information Consortium; WPSA = Worldwide Political Science Abstracts.

To develop the conceptual map, we conducted a free-text search for well-being and its variants within the context of public policy across the same databases. We orchestrated these searches across five separate timeframes: January 1990 to December 1994, January 1995 to December 1999, January 2000 to December 2004, January 2005 to December 2009, and January 2010 to October 2011. Naturally, different disciplines use different words to refer to well-being, each of which may wax and wane in usage over time. The searches thus sought to quantitatively capture any changes in the use and subsequent prevalence of well-being and any referenced terms (i.e., to trace a biography).

It is important to note that we did not intend to provide an exhaustive, systematic search of all the relevant literature. Rather we wanted to establish the prevalence of well-being and any referenced (i.e., allied) terms within the context of public policy. This has the advantage of ensuring that any identified words are grounded in the literature (i.e., they represent words actually used by researchers to talk and write about well-being in policy settings). The searches were limited to abstracts to increase specificity, albeit at some expense to sensitivity, with which we could identify relevant articles.

We also employed inclusion/exclusion criteria to facilitate the process by which we selected articles, thereby minimizing any potential bias arising from our subjective interpretations. We included independent, standalone investigations relevant to the study’s objectives (i.e., concerned with well-being in the context of public policy), which focused on well-being as a central outcome or process and which made explicit reference to “well-being” and “public policy” in either the title or the abstract. We excluded articles that were irrelevant to the study’s objectives, used noun adjuncts to focus on the well-being of specific populations (i.e., children, elderly, women) and contexts (e.g., retirement village), or that focused on deprivation or poverty unless poverty indices were used to understand well-being as opposed to social exclusion. We also excluded book reviews and abstracts describing a compendium of studies.

Using these criteria, Joanne Wilson conducted the review and recorded the results on a template developed specifically for the project, organized chronologically across each database and timeframe. Results were scrutinized by two other colleagues to ensure the validity of the search strategy and the findings. Any concerns regarding the eligibility of studies for inclusion were discussed amongst the research team. I then analyzed the co-occurrence of the key terms in the database. The resultant conceptual map is shown in Figure 18.4 .

The diagram can be interpreted as a visualization of a conceptual space. So when academics write about “well-being” in the context of public policy, they tend to connect the discussion to the other terms in the matrix. “Happiness,” “health,” “economic,” and “subjective,” for example, are relatively dominant terms in the matrix. The node size of these words suggest that references to such entities is only slightly less than reference to well-being itself. However, when we come to analyse how well-being is talked about in detail, we see specific connections come to the fore. Thus the data imply that talk of “subjective well-being” far outweighs discussion of “social well-being,” or “economic well-being.” Happiness tends to act as an independent node (there is only one occurrence of happiness and well-being), probably suggesting that “happiness” is acting as a synonym for wellbeing. Quality of life (QoL) is poorly represented in the abstracts, and its connection to most of the other concepts in the space is very weak—confirming, perhaps, that QoL is unrelated to contemporary discussions of well-being and happiness. The existence of “measures” points to a distinct concern to assess and to quantify expressions of happiness, well-being, economic growth, and gross domestic product. More important and underlying this detail, there are grounds for suggesting that there are in fact a number of tensions in the literature on well-being.

On one hand, the results point toward an understanding of well-being as a property of individuals—as something that they feel or experience. Such a discourse is reflected through the use of words like “happiness,” “subjective,” and “individual.” This individualistic and subjective frame has grown in influence over the past decade in particular, and one of the problems with it is that it tends toward a somewhat content-free conceptualisation of well-being. To feel a sense of well-being one merely states that one is in a state of well-being; to be happy, one merely proclaims that one is happy (cf. ONS, 2012 ). It is reminiscent of the conditions portrayed in Aldous Huxley’s Brave New World , wherein the rulers of a closely managed society gave their priority to maintaining order and ensuring the happiness of the greatest number—in the absence of attention to justice or freedom of thought or any sense of duty and obligation to others, many of whom were systematically bred in “the hatchery” as slaves.

The position of a concept in a network—a study of “wellbeing.” Node size is proportional to the frequency of terms in 54 selected abstracts. Line thickness is proportional to the co-occurrence of two terms in any phrase of three words (e.g., subjective well-being, economics of well-being, well-being and development).

On the other hand, there is some intimation in our web that the notion of well-being cannot be captured entirely by reference to individuals alone and that there are other dimensions to the concept—that well-being is the outcome or product of, say, access to reasonable incomes, to safe environments, to “development,” and to health and welfare. It is a vision hinted at by the inclusion of those very terms in the network. These different concepts necessarily give rise to important differences concerning how well-being is identified and measured and therefore what policies are most likely to advance well-being. In the first kind of conceptualization, we might improve well-being merely by dispensing what Huxley referred to as “soma” (a super drug that ensured feelings of happiness and elation); in the other case, however, we would need to invest in economic, human, and social capital as the infrastructure for well-being. In any event and even at this nascent level, we can see how content analysis can begin to tease out conceptual complexities and theoretical positions in what is otherwise routine textual data.

Putting the Content of Documents in Their Place

I suggested in my introduction that CTA was a method of analysis—not a method of data collection nor a form of research design. As such, it does not necessarily inveigle us into any specific forms of either design or of data collection, though designs and methods that rely on quantification are dominant. In this closing section, however, I want to raise the issue as to how we should position a study of content in our research strategies as a whole. For we need to keep in mind that documents and records always exist in a context, and that while what is “in” the document may be considered central, a good research plan can often encompass a variety of ways of looking at how content links to context. Hence in what follows I intend to outline how an analysis of content might be combined with other ways of looking at a record or text, and even how the analysis of content might even be positioned as secondary to an examination of a document or record. The discussion calls upon a much broader analysis as presented in Prior (2011) .

I have already stated that basic forms of CTA can serve as an important point of departure for many different types of data analysis—for example, as discourse analysis. Naturally, whenever “discourse” is invoked, there is at least some recognition of the notion that words might actually play a part in structuring the world rather than merely reporting on it or describing it (as is the case with the 2002 State of the Nation address that was quoted in Section “Units of Analysis”). Thus, for example, there is a considerable tradition within social studies of science and technology for examining the place of scientific rhetoric in structuring notions of “nature” and the position of human beings (especially as scientists) within nature (see, e.g., work by Bazerman, 1988 ); Gilbert & Mulkay, 1984 ; and Kay, 2000 ). Nevertheless, little if any of that scholarship situates documents as anything other than as inert objects, either constructed by or waiting patiently to be activated by scientists.

However, in the tradition of the ethnomethodologists ( Heritage, 1991 ) and some adherents of discourse analysis, it is also possible to argue that documents might be more fruitfully approached as a “topic” ( Zimmerman and Pollner; 1971 ) rather than a “resource” (to be scanned for content), in which case the focus would be on the ways in which any given document came to assume its present content and structure. In the field of documentation, these latter approaches are akin to what Foucault (1970) might have called an “archaeology of documentation” and are well represented in studies of such things as how crime, suicide, and other statistics and associated official reports and policy documents are routinely generated. That too is a legitimate point of research focus, and it can often be worth examining the genesis of, say, suicide statistics or statistics about the prevalence of mental disorder in a community as well as using such statistics as a basis for statistical modeling.

Unfortunately, the distinction between topic and resource is not always easy to maintain—especially in the hurly-burly of doing empirical research (see, e.g., Prior, 2003 ). Putting an emphasis on “topic,” however, can open up a further dimension of research, and that concerns the ways in which documents function in the everyday world. And as I have already hinted, when we focus on function, it becomes apparent that documents serve not merely as containers of content but very often as active agents in episodes of interaction and schemes of social organization. In this vein, one can begin to think of an ethnography of documentation. Therein, the key research questions revolve around the ways in which documents are used and integrated into specific kinds of organizational settings, as well as with how documents are exchanged and how they circulate within such settings. Clearly, documents carry content—words, images, plans, ideas, patterns, and so forth—but the manner in which such material is actually called upon and manipulated, and the way in which it functions, cannot be determined (though it may be constrained) by an analysis of content. Thus, Harper’s (1998) study of the use of economic reports inside the International Monetary Fund provides various examples of how “reports” can function to both differentiate and cohere work groups. In the same way. Henderson (1995) illustrates how engineering sketches and drawings can serve as what she calls conscription devices on the workshop floor.

Of course, documents constitute a form of what Latour (1986) would refer to as “immutable mobiles,” and with an eye on the mobility of documents, it is worth noting an emerging interest in histories of knowledge that seek to examine how the same documents have been received and absorbed quite differently by different cultural networks (see, e.g., Burke, 2000 ). A parallel concern has arisen with regard to the newly emergent “geographies of knowledge” (see, e.g., Livingstone, 2005 ). In the history of science, there has also been an expressed interest in the biography of scientific objects ( Latour, 1987 :262) or of “epistemic things” ( Rheinberger, 2000 )—tracing the history of objects independent of the “inventors” and “discoverers” to which such objects are conventionally attached. It is an approach that could be easily extended to the study of documents and is partly reflected in the earlier discussion concerning the meaning of the concept of well-being. Note how in all of these cases a key consideration is how words and documents as “things” circulate and translate from one culture to another; issues of content are secondary.

Clearly, studying how documents are used and how they circulate can constitute an important area of research in its own right. Yet even those who focus on document use can be overly anthropocentric and subsequently overemphasize the potency of human action in relation to written text. In that light, it is interesting to consider ways in which we might reverse that emphasis and instead to study the potency of text and the manner in which documents can influence organizational activities as well as reflect them. Thus Dorothy Winsor (1999) has, for example, examined the ways in which work orders drafted by engineers not only shape and fashion the practices and activities of engineering technicians but construct “two different worlds” on the workshop floor.

In light of this, I will suggest a typology (Table 18.3 ) of the ways in which documents have come to be and can be considered in social research.

While accepting that no form of categorical classification can capture the inherent fluidity of the world, its actors, and its objects, Table 18.3 aims to offer some understanding of the various ways in which documents have been dealt with by social researchers. Thus approaches that fit into cell 1 have been dominant in the history of social science generally. Therein documents (especially as text) have been analyzed and coded for what they contain in the way of descriptions, reports, images, representations, and accounts. In short, they have been scoured for evidence. Data-analysis strategies concentrate almost entirely on what is in the “text” (via various forms of content analysis). This emphasis on content is carried over into cell 2 type approaches with the key differences that analysis is concerned with how document content comes into being. The attention here is usually on the conceptual architecture and socio-technical procedures by means of which written reports, descriptions, statistical data, and so forth are generated. Various kinds of discourse analysis have been used to unravel the conceptual issues, while a focus on socio-technical and rule-based procedures by means of which clinical, police, social work, and other forms of records and reports are constructed has been well represented in the work of ethnomethodologists ( see Prior, 2011 ). In contrast, and in cell 3, the research focus is on the ways in which documents are called upon as a resource by various and different kinds of “user.” Here concerns with document content or how a document has come into being are marginal, and the analysis concentrates on the relationship between specific documents and their use or recruitment by identifiable human actors for purposeful ends. I have already pointed to some studies of the latter kind in earlier paragraphs (e.g., Henderson, 1995 ). Finally, the approaches that fit into cell 4 also position content as secondary. The emphasis here is on how documents as “things” function in schemes of social activity and with how such things can drive, rather than be driven by, human actors. In short, the spotlight is on the vita activa of documentation, and I have provided numerous example of documents as actors in other publications (see Prior, 2003 ; 2008 ; 2011 ).

Content analysis was a method originally developed to analyze mass media “messages” in an age of radio and newspaper print, and well before the digital age. Unfortunately, it struggles to break free of its origins and continues to be associated with the quantitative analysis of “communication.” Yet as I have argued, there is no rational reason why its use has to be restricted to such a narrow field, for it can be used to analyze printed text and interview data (as well as other forms of inscription) in various settings. What it cannot overcome is the fact that it is a method of analysis and not a method of data collection. However, as I have shown, it is an analytical strategy that can be integrated into a variety of research designs and approaches—cross-sectional and longitudinal survey designs, ethnography and other forms of qualitative design, and secondary analysis of pre-existing data sets. Even as a method of analysis it is flexible and can be used either independent of other methods or in conjunction with them. As we have seen, it is easily merged with various forms of discourse analysis and can be used as an exploratory method or as a means of verification. Above all, perhaps, it crosses the divide between “quantitative” and “qualitative” modes of inquiry in social research and offers a new dimension to the meaning of mixed-methods research. I recommend it.

Source : Prior (2008) .

Abbott, A. ( 1992 ). What do cases do? In C. C. Ragin , and H. S. Becker (Eds.). What is a case? Exploring the foundations of social inquiry . Cambridge: Cambridge University Press, 53–82.

Google Scholar

Google Preview

Altheide, D. L. ( 1987 ). Ethnographic Content Analysis.   Qualitative Sociology , 10 (1): 65–77.

Arksey H , O’Malley L. ( 2005 ). Scoping studies: Towards a Methodological Framework.   International Journal of Sociological Research Methodology , 8 : 19–32.

Babbie, E. ( 2013 ). The practice of social research. 13th ed . Belmont, CA: Wadsworth.

Bazerman, C. ( 1988 ). Shaping written knowledge. The genre and activity of the experimental article in science . Madison, WI: University of Wisconsin Press.

Becker, G. ( 1997 ). Disrupted lives. How people create meaning in a chaotic world . London: University of California Press.

Berelson, B. ( 1952 ). Content analysis in communication research . Glencoe, IL: Free Press.

Bowker, G. C. and Star, S. L. ( 1999 ). Sorting things out. Classification and its consequences . Cambridge, MA: MIT Press.

Braun, V. , Clarke, V. ( 2006 ). Using Thematic Analysis in Psychology.   Qualitative Research in Psychology , 3 : 77–101.

Breuer, J. , Freud, S. ( 2001 ). Studies on Hysteria. In Strachey, L. (Ed.). The standard edition of the complete psychological works of Sigmund Freud . Vol. 2 . London: Vintage.

Bryman, A. ( 2008 ). Social research methods . 3rd Ed. Oxford: Oxford University Press.

Burke, P. ( 2000 ). A social history of knowledge. From Guttenberg to Diderot . Cambridge: Polity Press.

Bury, M. ( 1982 ). Chronic illness as biographical disruption.   Sociology of Health and Illness , 4 : 167–182.

Carley, K. ( 1993 ). Coding choices for textual analysis. A comparison of content analysis and map analysis.   Sociological Methodology , 23 : 75–126.

Charon, R. ( 2006 ). Narrative medicine. Honoring the stories of illness . New York: Oxford University Press.

Creswell, J. W. ( 2007 ). Designing and conducting mixed methods research . Thousand Oaks, CA: Sage.

Davison, C. , Davey-Smith, G. , Frankel, S. ( 1991 ). Lay epidemiology and the prevention paradox.   Sociology of Health & Illness , 13 (1): 1–19.

Evans, M. , Prout, H. , Prior, L. , Tapper-Jones, L. , Butler, C. ( 2007 ). A qualitative Study of Lay Beliefs about Influenza,   British Journal of General Practice , 57 :352–358.

Foucault, M. ( 1970 ). The Order of things. An archaeology of the human sciences . London: Tavistock.

Frank, A. ( 1995 ). The wounded storyteller: Body, illness, and ethics . Chicago: University of Chicago Press.

Gerring, J. ( 2004 ). What is a case study, and what is it good for?   The American Political Science Review , 98 (2): 341–354.

Gilbert, G.N. , Mulkay, M. ( 1984 ). Opening Pandora’s box. A sociological analysis of scientists’ discourse . Cambridge: Cambridge University Press.

Glaser, B.G. , Strauss, A.L. ( 1967 ). The discovery of grounded theory. Strategies for qualitative research . New York: Aldine De Gruyter.

Goode, W. J. , Hatt, P. K. ( 1952 ). Methods in social research . New York: McGraw-Hill.

Greimas, A. J. ( 1970 ). Du Sens. Essays sémiotiques . Paris: Ėditions du Seuil.

Habermas, J. ( 1987 ). The theory of communicative action. Vol.2. A critique of functionalist reason . ( T. McCarthy , trans.). Cambridge: Polity Press.

Harper, R. ( 1998 ). Inside the IMF. An ethnography of documents, technology, and organizational action . London: Academic Press.

Henderson, K. ( 1995 ). The political career of a prototype. Visual representation in design engineering,   Social Problems , 42 (2): 274–299.

Heritage, J. ( 1991 ). Garkfinkel and ethnomethodology . Cambridge. Polity Press.

Hydén, L-C. ( 1997 ). ‘ Illness and narrative ’, Sociology of Health & Illness , 19 (1): 48–69.

Kahn, R. , Cannell, C. ( 1957 ). The dynamics of interviewing. Theory, technique and cases . New York: Wiley.

Kay, L. E. ( 2000 ). Who wrote the book of life? A history of the genetic code . Stanford, CA: Stanford University Press.

Kleinman, A. , Eisenberg, L. , Good, B. ( 1978 ). Culture, illness & care, clinical lessons from anthropologic and cross-cultural research.   Annals of Internal Medicine , 88 (2): 251–258.

Kracauer, S. ( 1952 ). The Challenge of Qualitative Content Analysis’,   Public Opinion Quarterly, Special Issue on International Communications Research (1952–53) , 16 ( 4 ): 631–642.

Krippendorf, K. ( 2004 ). Content Analysis: An introduction to its methodology, 2nd ed . Thousand Oaks, CA: Sage Publications.

Latour, B. ( 1986 ). Visualization and Cognition: Thinking with Eyes and Hands,   Knowledge and Society, Studies in Sociology of Culture, Past and Present , 6 : 1–40.

Latour, B. ( 1987 ). Science in Action. How to Follow Scientists and Engineers through Society . Milton Keynes: Open University Press.

Livingstone, D. N. ( 2005 ). Text, talk, and testimony: geographical reflections on scientific habits. An afterword,   British Society for the History of Science . 38 (1): 93–100.

Luria, A.R. ( 1975 ). The man with the shattered world. A history of a brain wound . (Trans. L. Solotaroff ). Harmondsworth: Penguin.

Martin, A. , and Lynch, M. ( 2009 ). Counting things and counting people: The practices and politics of counting,   Social Problems , 56 (2): 243–266.

Merton, R.K. ( 1968 ). Social theory and social structure . New York: Free Press.

Morgan, D. L. ( 1993 ). Qualitative content analysis. A guide to paths not taken,   Qualitative Health Research , 2 : 112–121.

Morgan, D. L. ( 1998 ). Practical Strategies for combining qualitative and quantitative methods,   Qualitative Health Research , 8 (3): 362–376.

Morris, P. G. , Prior, L. , Deb, S. , Lewis, G. , et al. ( 2005 ). Patients’ views on outcome following head injury: a qualitative study,   BMC Family Practice , 6 :30.

Neuendorf, K. A. ( 2002 ). The content analysis guidebook . Thousand Oaks: CA: Sage.

Newman. J , and Vidler. E. ( 2006 ). Discriminating customers, responsible patients, empowered users: consumerism and the modernisation of health care,   Journal of Social Policy , 35 (2): 193–210.

Office for National Statistics ( 2012 ) First ONS Annual Experimental Subjective Well-being Results . London: ONS. Available at: http://www.ons.gov.uk/ons/dcp171766_272294.pdf . Accessed July 2013.

Prior, L. ( 2003 ). Using documents in social research . London: Sage.

Prior, L. ( 2008 ). Repositioning Documents in Social Research.   Sociology. Special Issue on Research Methods , 42 : 821–836.

Prior, L. ( 2011 ). Using documents and records in social research . 4 Vols . London: Sage.

Prior, L.   Hughes, D. , Peckham, S. ( 2012 ) The discursive turn in policy analysis and the validation of policy stories,   Journal of Social Policy , 41 (2): 271–289.

Prior, L. , Evans, M. , Prout, H. ( 2011 ). Talking about colds and flu: The lay diagnosis of two common illnesses among older British people,   Social Science and Medicine , 73 : 922–928.

Ragin, C. C. , Becker, H. S. ( 1992 ). What is a case? Exploring the foundations of social inquiry . Cambridge: Cambridge University Press.

Rheinberger H.-J. , ( 2000 ). Cytoplasmic Particles. The Trajectory of a Scientific Object. In Daston, L. (Ed.). Biographies of scientific objects . Chicago: Chicago University Press, 270–294.

Ricoeur, P. ( 1984 ). Time and narrative . Vol. 1 . ( McLaughlin K. , Pellauer D. trans.) Chicago: University of Chicago Press.

Roe, E. ( 1994 ). Narrative policy analysis, theory and practice . Durham, NC: Duke University Press.

Ryan, G.W. , Bernard, H. R. ( 2000 ). Data management and analysis methods. In Denzin, N.K. , Lincoln, Y.S. (Eds.). Handbook of qualitative research . 2nd ed . Thousand Oaks, CA: Sage, 769–802.

Schutz, A. , Luckman, T. ( 1974 ). The structures of the life-world . ( Zaner, R. M. , Engelhardt, H.T. , trans.). London: Heinemann.

SPSS. ( 2007 ). Text Mining for Clementine . 12.0 User’s Guide. Chicago: SPSS.

Weber, R.P. ( 1990 ). Basic content analysis . Newbury Park: CA: Sage.

Winsor, D. ( 1999 ). Genre and activity systems. The role of documentation in maintaining and changing engineering activity systems.   Written Communication , 16 (2): 200–224.

Zimmerman, D. H. , Pollner, M. ( 1971 ). The everyday world as a phenomenon. In Douglas, J. D. (Ed). Understanding everyday life . London: Routledge and Kegan Paul, 80–103.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

When Online Content Disappears

38% of webpages that existed in 2013 are no longer accessible a decade later, table of contents.

  • Webpages from the last decade
  • Links on government websites
  • Links on news websites
  • Reference links on Wikipedia
  • Posts on Twitter
  • Acknowledgments
  • Collection and analysis of Twitter data
  • Data collection for World Wide Web websites, government websites and news websites
  • Data collection for Wikipedia source links
  • Evaluating the status of pages and links
  • Definition of links

Pew Research Center conducted the analysis to examine how often online content that once existed becomes inaccessible. One part of the study looks at a representative sample of webpages that existed over the past decade to see how many are still accessible today. For this analysis, we collected a sample of pages from the Common Crawl web repository for each year from 2013 to 2023. We then tried to access those pages to see how many still exist.

A second part of the study looks at the links on existing webpages to see how many of those links are still functional. We did this by collecting a large sample of pages from government websites, news websites and the online encyclopedia Wikipedia .

We identified relevant news domains using data from the audience metrics company comScore and relevant government domains (at multiple levels of government) using data from get.gov , the official administrator for the .gov domain. We collected the news and government pages via Common Crawl and the Wikipedia pages from an archive maintained by the Wikimedia Foundation . For each collection, we identified the links on those pages and followed them to their destination to see what share of those links point to sites that are no longer accessible.

A third part of the study looks at how often individual posts on social media sites are deleted or otherwise removed from public view. We did this by collecting a large sample of public tweets on the social media platform X (then known as Twitter) in real time using the Twitter Streaming API. We then tracked the status of those tweets for a period of three months using the Twitter Search API to monitor how many were still publicly available. Refer to the report methodology for more details.

The internet is an unimaginably vast repository of modern life, with hundreds of billions of indexed webpages. But even as users across the world rely on the web to access books, images, news articles and other resources, this content sometimes disappears from view.

A new Pew Research Center analysis shows just how fleeting online content actually is:

  • A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible, as of October 2023. In most cases, this is because an individual page was deleted or removed on an otherwise functional website.

A line chart showing that 38% of webpages from 2013 are no longer accessible

  • For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.

This “digital decay” occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the “References” section of Wikipedia pages as of spring 2023. This analysis found that:

  • 23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.
  • 54% of Wikipedia pages contain at least one link in their “References” section that points to a page that no longer exists.

To see how digital decay plays out on social media, we also collected a real-time sample of tweets during spring 2023 on the social media platform X (then known as Twitter) and followed them for three months. We found that:

  • Nearly one-in-five tweets are no longer publicly visible on the site just months after being posted. In 60% of these cases, the account that originally posted the tweet was made private, suspended or deleted entirely. In the other 40%, the account holder deleted the individual tweet, but the account itself still existed.
  • Certain types of tweets tend to go away more often than others. More than 40% of tweets written in Turkish or Arabic are no longer visible on the site within three months of being posted. And tweets from accounts with the default profile settings are especially likely to disappear from public view.

How this report defines inaccessible links and webpages

There are many ways of defining whether something on the internet that used to exist is now inaccessible to people trying to reach it today. For instance, “inaccessible” could mean that:

  • The page no longer exists on its host server, or the host server itself no longer exists. Someone visiting this type of page would typically receive a variation on the “404 Not Found” server error instead of the content they were looking for.
  • The page address exists but its content has been changed – sometimes dramatically – from what it was originally.
  • The page exists but certain users – such as those with blindness or other visual impairments – might find it difficult or impossible to read.

For this report, we focused on the first of these: pages that no longer exist. The other definitions of accessibility are beyond the scope of this research.

Our approach is a straightforward way of measuring whether something online is accessible or not. But even so, there is some ambiguity.

First, there are dozens of status codes indicating a problem that a user might encounter when they try to access a page. Not all of them definitively indicate whether the page is permanently defunct or just temporarily unavailable. Second, for security reasons, many sites actively try to prevent the sort of automated data collection that we used to test our full list of links.

For these reasons, we used the most conservative estimate possible for deciding whether a site was actually accessible or not. We counted pages as inaccessible only if they returned one of nine error codes that definitively indicate that the page and/or its host server no longer exist or have become nonfunctional – regardless of how they are being accessed, and by whom. The full list of error codes that we included in our definition are in the methodology .

Here are some of the findings from our analysis of digital decay in various online spaces.

To conduct this part of our analysis, we collected a random sample of just under 1 million webpages from the archives of Common Crawl , an internet archive service that periodically collects snapshots of the internet as it exists at different points in time. We sampled pages collected by Common Crawl each year from 2013 through 2023 (approximately 90,000 pages per year) and checked to see if those pages still exist today.

We found that 25% of all the pages we collected from 2013 through 2023 were no longer accessible as of October 2023. This figure is the sum of two different types of broken pages: 16% of pages are individually inaccessible but come from an otherwise functional root-level domain; the other 9% are inaccessible because their entire root domain is no longer functional.

Not surprisingly, the older snapshots in our collection had the largest share of inaccessible links. Of the pages collected from the 2013 snapshot, 38% were no longer accessible in 2023. But even for pages collected in the 2021 snapshot, about one-in-five were no longer accessible just two years later.

A bar chart showing that Around 1 in 5 government webpages contain at least one broken link

We sampled around 500,000 pages from government websites using the Common Crawl March/April 2023 snapshot of the internet, including a mix of different levels of government (federal, state, local and others). We found every link on each page and followed a random selection of those links to their destination to see if the pages they refer to still exist.

Across the government websites we sampled, there were 42 million links. The vast majority of those links (86%) were internal, meaning they link to a different page on the same website. An explainer resource on the IRS website that links to other documents or forms on the IRS site would be an example of an internal link.

Around three-quarters of government webpages we sampled contained at least one on-page link. The typical (median) page contains 50 links, but many pages contain far more. A page in the 90th percentile contains 190 links, and a page in the 99th percentile (that is, the top 1% of pages by number of links) has 740 links.

Other facts about government webpage links:

  • The vast majority go to secure HTTP pages (and have a URL starting with “https://”).
  • 6% go to a static file, like a PDF document.
  • 16% now redirect to a different URL than the one they originally pointed to.

When we followed these links, we found that 6% point to pages that are no longer accessible. Similar shares of internal and external links are no longer functional.

Overall, 21% of all the government webpages we examined contained at least one broken link. Across every level of government we looked at, there were broken links on at least 14% of pages; city government pages had the highest rates of broken links.

A bar chart showing that 23% of news webpages have at least one broken link

For this analysis, we sampled 500,000 pages from 2,063 websites classified as “News/Information” by the audience metrics firm comScore. The pages were collected from the Common Crawl March/April 2023 snapshot of the internet.

Across the news sites sampled, this collection contained more than 14 million links pointing to an outside website. 1 Some 94% of these pages contain at least one external-facing link. The median page contains 20 links, and pages in the top 10% by link count have 56 links.

Like government websites, the vast majority of these links go to secure HTTP pages (those with a URL beginning with “https://”). Around 12% of links on these news sites point to a static file, like a PDF document. And 32% of links on news sites redirected to a different URL than the one they originally pointed to – slightly less than the 39% of external links on government sites that redirect.

When we tracked these links to their destination, we found that 5% of all links on news site pages are no longer accessible. And 23% of all the pages we sampled contained at least one broken link.

Broken links are about as prevalent on the most-trafficked news websites as they are on the least-trafficked sites. Some 25% of pages on news websites in the top 20% by site traffic have at least one broken link. That is nearly identical to the 26% of sites in the bottom 20% by site traffic.

For this analysis, we collected a random sample of 50,000 English-language Wikipedia pages and examined the links in their “References” section. The vast majority of these pages (82%) contain at least one reference link – that is, one that directs the reader to a webpage other than Wikipedia itself.

In total, there are just over 1 million reference links across all the pages we collected. The typical page has four reference links.

The analysis indicates that 11% of all references linked on Wikipedia are no longer accessible. On about 2% of source pages containing reference links, every link on the page was broken or otherwise inaccessible, while another 53% of pages contained at least one broken link.

A pie chart showing that Around 1 in 5 tweets disappear from public view within months

For this analysis, we collected nearly 5 million tweets posted from March 8 to April 27, 2023, on the social media platform X, which at the time was known as Twitter. We did this using Twitter’s Streaming API, collecting 3,000 public tweets every 30 minutes in real time. This provided us with a representative sample of all tweets posted on the platform during that period. We monitored those tweets until June 15, 2023, and checked each day to see if they were still available on the site or not.

At the end of the observation period, we found that 18% of the tweets from our initial collection window were no longer publicly visible on the site . In a majority of cases, this was because the account that originally posted the tweet was made private, suspended or deleted entirely. For the remaining tweets, the account that posted the tweet was still visible on the site, but the individual tweet had been deleted.

Which tweets tend to disappear?

A bar chart showing that Inaccessible tweets often come from accounts with default profile settings

Tweets were especially likely to be deleted or removed over the course of our collection period if they were:

  • Written in certain languages. Nearly half of all the Turkish-language tweets we collected – and a slightly smaller share of those written in Arabic – were no longer available at the end of the tracking period.
  • Posted by accounts using the site’s default profile settings. More than half of tweets from accounts using the default profile image were no longer available at the end of the tracking period, as were more than a third from accounts with a default bio field. Tweets from these accounts tend to disappear because the entire account has been deleted or made private, as opposed to the individual tweet being deleted.
  • Posted by unverified accounts.

We also found that removed or deleted tweets tended to come from newer accounts with relatively few followers and modest activityon the site. On average, tweets that were no longer visible on the site were posted by accounts around eight months younger than those whose tweets stayed on the site.

And when we analyzed the types of tweets that were no longer available, we found that retweets, quote tweets and original tweets did not differ much from the overall average. But replies were relatively unlikely to be removed – just 12% of replies were inaccessible at the end of our monitoring period.

Most tweets that are removed from the site tend to disappear soon after being posted. In addition to looking at how many tweets from our collection were still available at the end of our tracking period, we conducted a survival analysis to see how long these tweets tended to remain available. We found that:

  • 1% of tweets are removed within one hour
  • 3% within a day
  • 10% within a week
  • 15% within a month

Put another way: Half of tweets that are eventually removed from the platform are unavailable within the first six days of being posted. And 90% of these tweets are unavailable within 46 days.

Tweets don’t always disappear forever, though. Some 6% of the tweets we collected disappeared and then became available again at a later point. This could be due to an account going private and then returning to public status, or to the account being suspended and later reinstated. Of those “reappeared” tweets, the vast majority (90%) were still accessible on Twitter at the end of the monitoring period.

  • For our analysis of news sites, we did not collect or check the functionality of internal-facing on-page links – those that point to another page on the same root domain. ↩

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Internet & Technology
  • Online Search
  • Public Knowledge

Electric Vehicle Charging Infrastructure in the U.S.

A quarter of u.s. teachers say ai tools do more harm than good in k-12 education, teens and video games today, americans’ views of technology companies, 6 facts about americans and tiktok, most popular, report materials.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

  • Program Finder
  • Admissions Services
  • Course Directory
  • Academic Calendar
  • Hybrid Campus
  • Lecture Series
  • Convocation
  • Strategy and Development
  • Implementation and Impact
  • Integrity and Oversight
  • In the School
  • In the Field
  • In Baltimore
  • Resources for Practitioners
  • Articles & News Releases
  • In The News
  • Statements & Announcements
  • At a Glance
  • Student Life
  • Strategic Priorities
  • Inclusion, Diversity, Anti-Racism, and Equity (IDARE)
  • What is Public Health?

Publishing global health research in Nature Medicine

Department and Center Events 

A brief talk from a senior editor at Nature Medicine sharing insights into the editorial and review process.

Join the Department of International Health for a brief talk discussing the priorities of Nature Medicine in the global health field, editorial perspectives as well as insights into the editorial and peer review processes.

The talk will be followed by a Q&A session.

what is content analysis in the research

Speaker: Ming Yang, PhD

Senior editor, nature medicine.

Ming completed his Undergraduate degree in Physiology at King’s College London, followed by a Master’s degree in Pharmacology at University of Bristol. He completed his PhD and Postdoctoral research at the University of Cambridge. During his PhD, he did a secondment at Science magazine and Science Translational Medicine. Ming was an editor in BMC Medicine in 2020 before joining Nature Medicine in March 2022, handling papers in the areas of non-communicable diseases and public health.

Contact Info

Related content.

Yellow and gray building with the words &amp;quot;welcome to the mental health unit&amp;quot; painted on them.

Inside the Movement to Transform Mental Health in Sierra Leone

Belinda Karimi, wearing a surgical mask and a hot pink head covering, performs surgery with a colleague.

Theatre Nurses: The Unsung Heroes in Surgical Safety

A prostitute waits for clients behind her window in the red light district of Amsterdam, on December 8, 2008.

Amsterdam’s Struggle to Improve Sex Worker Health

Brick kiln workers laying bricks in Pakistan

Looking at Forced Labor from a Global Health Perspective

Rotavirus vaccine vile. Rotavirus is the most common cause of severe diarrhea in children

Rotavirus the Leading Cause of Diarrheal Deaths Among Children Under 5, New Analysis Finds

Technological advances and challenges of reclaimed asphalt pavement (RAP) application in road engineering—a bibliometric analysis from 2000 to 2022

  • Research Article
  • Published: 11 May 2024

Cite this article

what is content analysis in the research

  • Qi Jiang 1 ,
  • Wei Liu 1 &
  • Shaopeng Wu 1  

185 Accesses

Explore all metrics

Reclaimed asphalt pavement (RAP) is a valuable material that can be recycled and reused in road engineering to reduce environmental impact, resource utilization, and economic costs. However, the application of RAP in road engineering presents both opportunities and challenges. This study visually analyzes the knowledge background, research status, and latest knowledge structure of literature related to RAP using scientific metric methods such as VOSviewer and Citespace. The Web of Science (WoS) core collection database identified 2963 research publications from 2000 to 2022. Collaborative networks between highly cited references, journals, authors, academic institutions, countries, and funding organizations are analyzed in this study, along with a co-occurrence analysis of keywords for the RAP research publications. Results showed that the USA has long been a leader in RAP research, China surpassed the USA in annual publication output in 2019, increasing from 2 publications in 2002 to 177 publications in 2022, and has made significant investments in technological aspects. Chang’an University ranked first in total publication output (131 publications, 4.4%). Current major research themes include road performance, recycling technology, regeneration mechanisms, and the life cycle assessment of RAP. In addition, based on cluster analysis of keywords, text content analysis, and SWOT analysis, this study also discusses RAP’s challenges and future development directions in road engineering. These findings provide scholars with valuable information to gain insight into technological advances and challenges in the field of RAP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

what is content analysis in the research

Similar content being viewed by others

what is content analysis in the research

Sustainability assessment in construction projects: a sustainable earned value management model under uncertain and unreliable conditions

what is content analysis in the research

Accounting for product recovery potential in building life cycle assessments: a disassembly network-based approach

what is content analysis in the research

Life cycle assessment of electric vehicles: a systematic review of literature

Data availability.

All the data used in this article are in the manuscript.

Aguirre MA, Hassan MM, Shirzad S, Daly WH, Mohammad LN (2016) Micro-encapsulation of asphalt rejuvenators using melamine-formaldehyde. Constr Build Mater 114:29–39

Article   CAS   Google Scholar  

Ahmadinia E, Zargar M, Karim MR, Abdelaziz M, Ahmadinia E (2012) Performance evaluation of utilization of waste polyethylene terephthalate (PET) in stone mastic asphalt. Constr Build Mater 36:984–989

Article   Google Scholar  

Akbulut H, Gurer C (2007) Use of aggregates produced from marble quarry waste in asphalt pavements. Build Environ 42:1921–1930

Andersen MS (2007) An introductory note on the environmental economics of the circular economy. Sustain Sci 2:133–140

Androjic I, Kaluder G (2013) Cold recycling of asphalt pavements using foamed bitumen and cement. Gradevinar 65:463–471

Google Scholar  

Antwi-Afari P, Ng ST, Hossain MU (2021) A review of the circularity gap in the construction industry through scientometric analysis. J Clean Prod 298:126870

Anuardo RG, Espuny M, Costa ACF, Oliveira OJ (2022) Toward a cleaner and more sustainable world: a framework to develop and improve waste management through organizations, governments and academia. Heliyon 8. https://doi.org/10.1016/j.heliyon.2022.e09225

Apeagyei AK, Diefenderfer BK (2013) Evaluation of cold in-place and cold central-plant recycling methods using laboratory testing of field-cored specimens. J Mater Civ Eng 25:1712–1720

Aria M, Cuccurullo C (2017) Bibliometrix: an R-tool for comprehensive science mapping analysis. J Informet 11:959–975

Arulrajah A, Piratheepan J, Disfani MM, Bo MW (2013) Geotechnical and geoenvironmental properties of recycled construction and demolition materials in pavement subbase applications. J Mater Civ Eng 25:1077–1088

Arulrajah A, Disfani MM, Horpibulsuk S, Suksiripattanapong C, Prongmanee N (2014) Physical properties and shear strength responses of recycled construction and demolition materials in unbound pavement base/subbase applications. Constr Build Mater 58:245–257

Ashtiani MZ, Mogawer WS, Austerman AJ (2018) A mechanical approach to quantify blending of aged binder from recycled materials in new hot mix asphalt mixtures. Transp Res Rec 2672:107–118

Aurangzeb Q, Al-Qadi IL (2014) Asphalt pavements with high reclaimed asphalt pavement content economic and environmental perspectives. Transp Res Rec 2456(1):161–169

Baghaee Moghaddam T, Baaj H (2016) The use of rejuvenating agents in production of recycled hot mix asphalt: a systematic review. Constr Build Mater 114:805–816

Balaguera A, Carvajal GI, Alberti J, Fullana-i-Palmer P (2018) Life cycle assessment of road construction alternative materials: a literature review. Resour Conserv Recycl 132:37–48

Behnood A (2019) Application of rejuvenators to improve the rheological and mechanical properties of asphalt binders and mixtures: a review. J Clean Prod 231:171–182

Bhatt Y, Ghuman K, Dhir A (2020) Sustainable manufacturing. Bibliometrics and content analysis. J Clean Prod:260

Bonaquist R (2007) Can I run more RAP? HMAT: hot mix asphalt technology 12

Bonoli A, Degli Esposti A, Magrini C (2020) A case study of industrial symbiosis to reduce GHG emissions: performance analysis and LCA of asphalt concretes made with RAP aggregates and steel slags. Front Mater 7:572955

Bowers BF, Huang BS, Shu X, Miller BC (2014) Investigation of reclaimed asphalt pavement blending efficiency through GPC and FTIR. Constr Build Mater 50:517–523

Brand AS, Roesler JR (2015) Ternary concrete with fractionated reclaimed asphalt pavement. ACI Mater J 112:155–163

Brand AS, Roesler JR (2017) Bonding in cementitious materials with asphalt-coated particles: Part I – The interfacial transition zone. Constr Build Mater 130:171–181

Cao ZL, Chen MZ, Han XB, Wang RY, Yu JY, Xu X et al (2020) Influence of characteristics of recycling agent on the early and long-term performance of regenerated SBS modified bitumen. Constr Build Mater 237:12

Cavalli MC, Griffa M, Bressi S, Partl MN, Tebaldi G, Poulikakos LD (2016) Multiscale imaging and characterization of the effect of mixing temperature on asphalt concrete containing recycled components. J Microsc 264:22–33

Cavalli MC, Zaumanis M, Mazza E, Partl MN, Poulikakos LD (2018) Effect of ageing on the mechanical and chemical properties of binder from RAP treated with bio-based rejuvenators. Composites Part B-Eng 141:174–181

Chen JS, Wang CH, Huang CC (2009) Engineering properties of bituminous mixtures blended with second reclaimed asphalt pavements (R2AP). Road Mater Pavement Des 10:129–149

Chen J, Dan H, Ding Y, Gao Y, Guo M, Guo S et al (2021) New innovations in pavement materials and engineering: a review on pavement engineering research 2021. J Traffic Transp Eng (English Edition) 8:815–999

China MoTotPsRo (2012) Guidance of the Ministry of Transportation on accelerating the recycling of highway pavement materials. Highway Bureau

Chiu C-T, Lee M-G (2006) Effectiveness of seal rejuvenators for bituminous pavement surfaces. J Test Eval 34(5):390–394

Chung SS, Lo CWH (2003) Evaluating sustainability in waste management: the case of construction and demolition, chemical and clinical wastes in Hong Kong. Resour Conserv Recycl 37:119–145

Copeland A (2011) Reclaimed asphalt pavement in asphalt mixtures: state of the practice. Office of Research, Development, and Technology

Costa JO, Borges PHR, dos Santos FA, Bezerra ACS, Van den bergh W, Blom J. (2020) Cementitious binders and reclaimed asphalt aggregates for sustainable pavement base layers: potential, challenges and research needs. Construct Build Mater 265:120325

Daim TU, Rueda G, Martin H, Gerdsri P (2006) Forecasting emerging technologies: use of bibliometrics and patent analysis. Technol Forecast Soc Chang 73:981–1012

Ding YJ, Huang BS, Shu X, Zhang YZ, Woods ME (2016a) Use of molecular dynamics to investigate diffusion between virgin and aged asphalt binders. Fuel 174:267–273

Ding ZK, Wang YF, Zou PXW (2016b) An agent based environmental impact assessment of building demolition waste management: conventional versus green management. J Clean Prod 133:1136–1153

Ding L, Wang X, Zhang M, Chen Z, Meng J, Shao X (2021) Morphology and properties changes of virgin and aged asphalt after fusion. Constr Build Mater 291:123284

Du XY, Meng CH, Guo ZH, Yan H (2023) An improved approach for measuring the efficiency of low carbon city practice in China. Energy 268

EAPA (2022) Asphalt in Figures 2020. European Asphalt Pavement Association, Brussels

Farahzadi L, Kioumarsi M (2023) Application of machine learning initiatives and intelligent perspectives for CO2 emissions reduction in construction. J Clean Prod 384:135504

Farina A, Zanetti MC, Santagata E, Blengini GA (2017) Life cycle assessment applied to bituminous mixtures containing recycled materials: crumb rubber and reclaimed asphalt pavement. Resour Conserv Recycl 117:204–212

Garcia A, Jelfs J, Austin CJ (2015) Internal asphalt mixture rejuvenation using capsules. Constr Build Mater 101:309–316

Garcia-Morales M, Partal P, Navarro FJ, Martinez-Boza F, Gallegos C (2004) Linear viscoelasticity of recycled EVA-modified bitumens. Energy Fuel 18:357–364

García-Morales M, Partal P, Navarro FJ, Gallegos C (2006) Effect of waste polymer addition on the rheology of modified bitumen. Fuel 85:936–943

Giani MI, Dotelli G, Brandini N, Zampori L (2015) Comparative life cycle assessment of asphalt pavements using reclaimed asphalt, warm mix technology and cold in-place recycling. Resour Conserv Recycl 104:224–238

Grinys A, Sivilevicius H, Dauksys M (2012) tyre rubber additive effect on concrete mixture strength. J Civ Eng Manag 18:393–401

Gu J, Liu X, Zhang Z (2023) Road base materials prepared by multi-industrial solid wastes in China: a review. Constr Build Mater 373:130860

Hajj EY, Sebaaly PE, Kandiah P (2010) Evaluation of the use of reclaimed asphalt pavement in airfield HMA pavements. J Transp Eng-Asce 136:181–189

Hansen K, Copeland A (2013) 2nd Annual asphalt pavement industry survey on reclaimed asphalt pavement, reclaimed asphalt shingles, and warm-mix Asphalt Usage: 2009–2011. Reclaimed Asphalt Pavements

Hasan U, Whyte A, Al JH (2020) Life cycle assessment of roadworks in United Arab Emirates: recycled construction waste, reclaimed asphalt pavement, warm-mix asphalt and blast furnace slag use against traditional approach. J Clean Prod 257:120531

He Q, Wang G, Luo L, Shi Q, Xie J, Meng X (2017) Mapping the managerial areas of Building Information Modeling (BIM) using scientometric analysis. Int J Proj Manag 35:670–685

Hou H, Su L, Guo D, Xu H (2023) Resource utilization of solid waste for the collaborative reduction of pollution and carbon emissions: case study of fly ash. J Clean Prod 383:135449

Hoy M, Horpibulsuk S, Arulrajah A (2016) Strength development of recycled asphalt pavement – fly ash geopolymer as a road construction material. Constr Build Mater 117:209–219

Huang SC, Turner TF (2014) Aging characteristics of RAP blend binders: rheological properties. J Mater Civ Eng 26:966–973

Huang B, Li G, Vukosavljevic D, Shu X, Egan BK (2005a) Laboratory investigation of mixing hot-mix asphalt with reclaimed asphalt pavement. Transp Res Rec 1929:37–45

Huang BS, Li GQ, Vukosavjevic D, Shu X, Egan BK, Trb. (2005b) Laboratory investigation of mixing hot-mix asphalt with reclaimed asphalt pavement. Bituminous Paving Mixtures 2005:37–45

Huang BS, Shu X, Li GQ (2005c) Laboratory investigation of portland cement concrete containing recycled asphalt pavements. Cem Concr Res 35:2008–2013

Huang Y, Bird RN, Heidrich O (2007) A review of the use of recycled solid waste materials in asphalt pavements. Resour Conserv Recycl 52:58–73

Inti S, Tandon V (2021) Towards precise sustainable road assessments and agreeable decisions. J Clean Prod 323:129167

Jahanbakhsh H, Karimi MM, Naseri H, Nejad FM (2020) Sustainable asphalt concrete containing high reclaimed asphalt pavements and recycling agents: performance assessment, cost analysis, and environmental impact. J Clean Prod 244:118837

Jia XY, Huang BS, Bowers BF, Zhao S (2014) Infrared spectra and rheological properties of asphalt cement containing waste engine oil residues. Constr Build Mater 50:683–691

Jiang W, Huang Y, Sha A (2018) A review of eco-friendly functional road materials. Constr Build Mater 191:1082–1092

Jin RY, Chen Q (2019) Overview of concrete recycling legislation and practice in the United States. J Constr Eng Manag 145(4):05019004

Kalantar ZN, Karim MR, Mahrez A (2012) A review of using waste and virgin polymer in pavement. Constr Build Mater 33:55–62

Kandhal PS, Mallick RB (1998) Pavement recycling guidelines for state and local governments: participant's reference book. ROSA P

Karki P, Zhou F (2016) Effect of rejuvenators on rheological, chemical, and aging properties of asphalt binders containing recycled binders. Transp Res Rec 2574:74–82

Kirchherr J, van Santen R (2019) Research on the circular economy: a critique of the field. Resour Conserv Recycl 151:104480

Kucukvar M, Tatari O (2012) Ecologically based hybrid life cycle analysis of continuously reinforced concrete and hot-mix asphalt pavements. Transp Res Part D-Transp Environ 17:86–90

Lee HVWC, Carlson R et al (2015) Development of quality standards for inclusion of high recycled asphalt pavement content in asphalt mixtures-phase II. University of Iowa

Lee N, Chou C-P, Chen K-Y (2012) Benefits in energy savings and CO 2 reduction by using reclaimed asphalt pavement. TRID

Li J, Xiao F, Zhang L, Amirkhanian SN (2019) Life cycle assessment and life cycle cost analysis of recycled solid waste materials in highway pavement: a review. J Clean Prod 233:1182–1206

Li HB, Zhang MM, Temitope AA, Guo XY, Sun JM, Yombah M et al (2022) Compound reutilization of waste cooking oil and waste engine oil as asphalt rejuvenator: performance evaluation and application. Environ Sci Pollut Res 29:90463–90478

Li J, Yang L, He L, Guo R, Li X, Chen Y et al (2023) Research progresses of fibers in asphalt and cement materials: a review. J Road Eng 3(1):35–70

Liang X, Kurniawan TA, Goh HH, Zhang DD, Dai W, Liu H et al (2022) Conversion of landfilled waste-to-electricity (WTE) for energy efficiency improvement in Shenzhen (China): a strategy to contribute to resource recovery of unused methane for generating renewable energy on-site. J Clean Prod 369:133078

Lin J, Hong J, Huang C, Liu J, Wu S (2014) Effectiveness of rejuvenator seal materials on performance of asphalt pavement. Constr Build Mater 55:63–68

Liu K, Da Y, Wang F, Ding W, Xu P, Pang H et al (2022a) An eco-friendly asphalt pavement deicing method by microwave heating and its comprehensive environmental assessments. J Clean Prod 373:133899

Liu N, Wang Y, Bai Q, Liu Y, Wang P, Xue S et al (2022b) Road life-cycle carbon dioxide emissions and emission reduction technologies: a review. J Traffic Transp Eng (English Edition) 9:532–555

Liu Y, Ali A, Chen Y, She X (2023) The effect of transport infrastructure (road, rail, and air) investments on economic growth and environmental pollution and testing the validity of EKC in China, India, Japan, and Russia. Environ Sci Pollut Res 30:32585–32599

Long HY, Liu HY, Li XW, Chen LJ (2020) An evolutionary game theory study for construction and demolition waste recycling considering green development performance under the Chinese Government's reward-penalty mechanism. Int J Environ Res Public Health 17(17):6303

Luan Y, Ma T, Wang S, Ma Y, Xu G, Wu M (2022) Investigating mechanical performance and interface characteristics of cold recycled mixture: promoting sustainable utilization of reclaimed asphalt pavement. J Clean Prod 369:133366

Luo Z, Xiao FP, Hu SW, Yang YS (2013) Probabilistic analysis on fatigue life of rubberized asphalt concrete mixtures containing reclaimed asphalt pavement. Constr Build Mater 41:401–410

Ma MX, Tam VWY, Le KN, Li WG (2020) Challenges in current construction and demolition waste recycling: a China study. Waste Manag 118:610–625

MacLeod D, Ho S, Wirth R, Zanzotto L (2007) Study of crumb rubber materials as paving asphalt modifiers. Can J Civ Eng 34:1276–1288

Mannan UA, Islam MR, Tarefder RA (2015) Effects of recycled asphalt pavements on the fatigue life of asphalt under different strain levels and loading frequencies. Int J Fatigue 78:72–80

Meyer DE, Li M, Ingwersen WW (2020) Analyzing economy-scale solid waste generation using the United States environmentally-extended input-output model. Resour Conserv Recycl 157:104795

Mogawer WS, Austerman AJ, Bonaquist R (2012) Determining the influence of plant type and production parameters on performance of plant-produced reclaimed asphalt pavement mixtures. Transp Res Rec 2268:71–81

Mogawer WS, Booshehrian A, Vahidi S, Austerman AJ (2013) Evaluating the effect of rejuvenators on the degree of blending and performance of high RAP, RAS, and RAP/RAS mixtures. Road Mater Pavement Des 14:193–213

Mohamed AS, Cao ZL, Xu XY, Xiao FP, Abdel-Wahed T (2022) Bonding, rheological, and physiochemical characteristics of reclaimed asphalt rejuvenated by crumb rubber modified binder. J Clean Prod 373:133896

Ng CP, Law TH, Wong SV, Kulanthayan S (2017) Relative improvements in road mobility as compared to improvements in road accessibility and economic growth: a cross-country analysis. Transp Policy 60:24–33

Norin M, Stromvall AM (2004) Leaching of organic contaminants from storage of reclaimed asphalt pavement. Environ Technol 25:323–340

Offenbacker D, Mehta Y (2022) Assessing the life-cycle costs of pavement rehabilitation strategies used in long-term pavement performance program. J Transp Eng Part B-Pavements 148(1):04022002

Oner J, Sengoz B (2015) Utilization of recycled asphalt concrete with warm mix asphalt and cost-benefit analysis. PLoS One 10(1):e116180

Oreto C, Veropalumbo R, Viscione N, Biancardo SA, Russo F (2021) Investigating the environmental impacts and engineering performance of road asphalt pavement mixtures made up of jet grouting waste and reclaimed asphalt pavement. Environ Res 198:111277

Ozer H, Al-Qadi IL, Lambros J, El-Khatib A, Singhvi P, Doll B (2016) Development of the fracture-based flexibility index for asphalt concrete cracking potential using modified semi-circle bending test parameters. Constr Build Mater 115:390–401

Pei J, Guo F, Zhang J, Zhou B, Bi Y, Li R (2021) Review and analysis of energy harvesting technologies in roadway transportation. J Clean Prod 288:125338

Pompigna A, Mauro R (2022) Smart roads: a state of the art of highways innovations in the Smart Age. Eng Sci Technol Int J 25:100986

Pradyumna TA, Mittal A, Jain P (2013) Characterization of reclaimed asphalt pavement (RAP) for use in bituminous road construction. Procedia Soc Behav Sci 104:1149–1157

Pranav S, Lahoti M, Shan X, Yang EH, Muthukumar G (2022) Economic input-output LCA of precast corundum-blended ECC overlay pavement. Resour Conserv Recycl 184:106385

Puccini M, Leandri P, Tasca AL, Pistonesi L, Losa M (2019) Improving the environmental sustainability of low noise pavements: comparative life cycle assessment of reclaimed asphalt and crumb rubber based warm mix technologies. Coatings 9(5):343

Puppala AJ, Hoyos LR, Potturi AK (2011) Resilient moduli response of moderately cement-treated reclaimed asphalt pavement aggregates. J Mater Civ Eng 23:990–998

Qiao YN, Dave E, Parry T, Valle O, Mi LY, Ni GD et al (2019) Life cycle costs analysis of reclaimed asphalt pavement (RAP) under future climate. Sustainability 11(19):5414

Rafiq W, Napiah M, Habib NZ, Sutanto MH, Alaloul WS, Khan MI et al (2021) Modeling and design optimization of reclaimed asphalt pavement containing crude palm oil using response surface methodology. Constr Build Mater 291:123288

Rahman MA, Imteaz MA, Arulrajah A, Piratheepan J, Disfani MM (2015) Recycled construction and demolition materials in permeable pavement systems: geotechnical and hydraulic characteristics. J Clean Prod 90:183–194

Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496

Rodríguez-Fernández I, Lastra-González P, Indacoechea-Vega I, Castro-Fresno D (2019) Technical feasibility for the replacement of high rates of natural aggregates in asphalt mixtures. Int J Pavement Eng 22(8):940–949

Roja KL, Masad E, Vajipeyajula B, Yiming W, Khalid E, Shunmugasamy VC (2020) Chemical and multi-scale material properties of recycled and blended asphalt binders. Constr Build Mater 261:119689

Roja KL, Masad E, Mogawer W (2021) Performance and blending evaluation of asphalt mixtures containing reclaimed asphalt pavement. Road Mater Pavement Des 22:2441–2457

Sabouri M, Kim YR (2014) Development of a failure criterion for asphalt mixtures under different modes of fatigue loading. Transp Res Rec 2447:117–125

Sanchez X, Tighe SL (2019) Steps towards the detection of reclaimed asphalt pavement in superpave mixtures. Road Mater Pavement Des 20:1201–1214

Santero NJ, Masanet E, Horvath A (2011) Life-cycle assessment of pavements. Part I: Critical review. Resour Conserv Recycl 55:801–809

Sarah Mariam A, Ransinchung GDRN (2020) Laboratory research on reclaimed asphalt pavement-inclusive cementitious mixtures. ACI Mater J 117:193

Sha A, Liu Z, Jiang W, Qi L, Hu L, Jiao W et al (2021) Advances and development trends in eco-friendly pavements. J Road Eng 1:1–42

Shao-peng W, Xiao-ming H, Yong-li Z (2002) The development of recycling agent for asphalt pavement. J Wuhan Univ Technol-Mater Sci Ed 17:63–65

Shen DH, Du JC (2005) Application of gray relational analysis to evaluate HMA with reclaimed building materials. J Mater Civ Eng 17:400–406

Shi C, Meyer C, Behnood A (2008) Utilization of copper slag in cement and concrete. Resour Conserv Recycl 52:1115–1120

Shirodkar P, Mehta Y, Nolan A, Sonpal K, Norton A, Tomlinson C et al (2011) A study to determine the degree of partial blending of reclaimed asphalt pavement (RAP) binder for high RAP hot mix asphalt. Constr Build Mater 25:150–155

Shu X, Huang BS (2014) Recycling of waste tire rubber in asphalt and portland cement concrete: an overview. Constr Build Mater 67:217–224

Shu X, Huang B, Vukosavljevic D (2008) Laboratory evaluation of fatigue characteristics of recycled asphalt mixture. Constr Build Mater 22:1323–1330

Shu X, Huang BS, Shrum ED, Jia XY (2012) Laboratory evaluation of moisture susceptibility of foamed warm mix asphalt containing high percentages of RAP. Constr Build Mater 35:125–130

Silva H, Oliveira JRM, Jesus CMG (2012) Are totally recycled hot mix asphalts a sustainable alternative for road paving? Resour Conserv Recycl 60:38–48

Singh S, Ransinchung GD, Kumar P (2017) An economical processing technique to improve RAP inclusive concrete properties. Constr Build Mater 148:734–747

Sivilevicius H, Braziunas J, Prentkovskis O (2017) Technologies and principles of hot recycling and investigation of preheated reclaimed asphalt pavement batching process in an asphalt mixing plant. Applied Sciences-Basel 7:20

Song W, Huang B, Shu X (2018) Influence of warm-mix asphalt technology and rejuvenator on performance of asphalt mixtures containing 50% reclaimed asphalt pavement. J Clean Prod 192:191–198

Su J-F, Qiu J, Schlangen E (2013) Stability investigation of self-healing microcapsules containing rejuvenator for bitumen. Polym Degrad Stab 98:1205–1215

Sudarsanan N, Kim YR (2022) A critical review of the fatigue life prediction of asphalt mixtures and pavements. J Traffic Transp Eng (English Edition) 9:808–835

Sun LC, Wang QW, Zhang JJ (2017) Inter-industrial carbon emission transfers in China: economic effect and optimization strategy. Ecol Econ 132:55–62

Sun Y, Zheng L, Cheng Y, Chi F, Liu K, Zhu T (2023) Research on maintenance equipment and maintenance technology of steel fiber modified asphalt pavement with microwave heating. Case Stud Constr Mater 18:e01965

Thakur JK, Han J, Pokharel SK, Parsons RL (2012) Performance of geocell-reinforced recycled asphalt pavement (RAP) bases over weak subgrade under cyclic plate loading. Geotext Geomembr 35:14–24

Townsend TG, Ingwersen WW, Niblick B, Jain P, Wally J (2019) CDDPath: a method for quantifying the loss and recovery of construction and demolition debris in the United States. Waste Manag 84:302–309

Tran NP, Nguyen TN, Ngo TD (2022) The role of organic polymer modifiers in cementitious systems towards durable and resilient infrastructures: a systematic review. Constr Build Mater 360:129562

Umer A, Hewage K, Haider H, Sadiq R (2017) Sustainability evaluation framework for pavement technologies: an integrated life cycle economic and environmental trade-off analysis. Transp Res Part D-Transp Environ 53:88–101

Vignisdottir HR, Ebrahimi B, Booto GK, O'Born R, Brattebø H, Wallbaum H et al (2019) A review of environmental impacts of winter road maintenance. Cold Reg Sci Technol 158:143–153

Vislavicius K, Sivilevicius H (2013) Effect of reclaimed asphalt pavement gradation variation on the homogeneity of recycled hot-mix asphalt. Arch Civil Mech Eng 13:345–353

Waltman L, van Eck NJ, Noyons ECM (2010) A unified approach to mapping and clustering of bibliometric networks. J Informet 4:629–635

Wang C, Lim MK, Zhang X, Zhao L, Lee PT-W (2020) Railway and road infrastructure in the Belt and Road Initiative countries: estimating the impact of transport infrastructure on economic growth. Transp Res A Policy Pract 134:288–307

Wang FS, Xie J, Wu SP, Li JS, Barbieri DM, Zhang L (2021) Life cycle energy consumption by roads and associated interpretative analysis of sustainable policies. Renew Sustain Energy Rev 141:110823

Wang L, Wei J, Wu W, Zhang X, Xu X, Yan X (2022) Technical development and long-term performance observations of long-life asphalt pavement: a case study of Shandong Province. J Road Eng 2:369–389

Wei M, Wu S, Zhu L, Li N, Yang C (2021) Environmental impact on VOCs emission of a recycled asphalt mixture with a high percentage of RAP. Materials 14

Williams B, Willis J (2022) Asphalt pavement industry survey on recycled materials and warm-mix asphalt usage 2020 Information Series 138 11th Annual Survey

Willis J, Williams B (2022) Asphalt pavement industry survey on recycled materials and warm-mix asphalt usage 2021 Information Series 138 12th Annual Survey

Wu SP, Xue YJ, Ye QS, Chen YC (2007) Utilization of steel slag as aggregates for stone mastic asphalt (SMA) mixtures. Build Environ 42:2580–2585

Wu M, Xu GJ, Luan YC, Zhu YJ, Ma T, Zhang WG (2022) Molecular dynamics simulation on cohesion and adhesion properties of the emulsified cold recycled mixtures. Constr Build Mater 333:127403

Xiang C, Wang Y, Liu H (2017) A scientometrics review on nonpoint source pollution research. Ecol Eng 99:400–408

Xiao FP, Amirkhanian S, Juang CH (2007) Rutting resistance of rubberized asphalt concrete pavements containing reclaimed asphalt pavement mixtures. J Mater Civ Eng 19:475–483

Xiao F, Su N, Yao S, Amirkhanian S, Wang J (2019) Performance grades, environmental and economic investigations of reclaimed asphalt pavement materials. J Clean Prod 211:1299–1312

Xiao F, Xu L, Zhao Z, Hou X (2023) Recent applications and developments of reclaimed asphalt pavement in China, 2010–2021. Sustain Mater Technol 37:e00697

CAS   Google Scholar  

Xiao FP, Yao SL, Wang JG, Li XH, Amirkhanian S (2018) A literature review on cold recycling technology of asphalt pavement. Constr Build Mater 180:579–604

Xie ZX, Tran N, Taylor A, Julian G, West R, Welch J (2017) Evaluation of foamed warm mix asphalt with reclaimed asphalt pavement: field and laboratory experiments. Road Mater Pavement Des 18:328–352

Xing C, Li M, Liu L, Lu R, Liu N, Wu W et al (2023) A comprehensive review on the blending condition between virgin and RAP asphalt binders in hot recycled asphalt mixtures: mechanisms, evaluation methods, and influencing factors. J Clean Prod 398:136515

Xu B, Ding R, Yang Z, Sun Y, Zhang J, Lu K et al (2023) Investigation on performance of mineral-oil-based rejuvenating agent for aged high viscosity modified asphalt of porous asphalt pavement. J Clean Prod 395:136285

Yao LY, Leng Z, Lan JT, Chen RQ, Jiang JW (2022) Environmental and economic assessment of collective recycling waste plastic and reclaimed asphalt pavement into pavement construction: a case study in Hong Kong. J Clean Prod 336:130405

Yao Y, Yang J, Gao J, Zheng M, Xu J, Zhang W et al (2023) Strategy for improving the effect of hot in-place recycling of asphalt pavement. Constr Build Mater 366:130054

Yousefi A, Behnood A, Nowruzi A, Haghshenas H (2021) Performance evaluation of asphalt mixtures containing warm mix asphalt (WMA) additives and reclaimed asphalt pavement (RAP). Constr Build Mater 268:121200

Yu XK, Zaumanis M, dos Santos S, Poulikakos LD (2014) Rheological, microscopic, and chemical characterization of the rejuvenating effect on asphalt binders. Fuel 135:162–171

Yu B, Wang SY, Gu XY (2018) Estimation and uncertainty analysis of energy consumption and CO 2 emission of asphalt pavement maintenance. J Clean Prod 189:326–333

Yuan HP (2017) Barriers and countermeasures for managing construction and demolition waste: a case of Shenzhen in China. J Clean Prod 157:84–93

Zaumanis M, Mallick RB (2015) Review of very high-content reclaimed asphalt use in plant-produced pavements: state of the art. Int J Pavement Eng 16:39–55

Zaumanis M, Mallick RB, Frank R (2013) Evaluation of rejuvenator's effectiveness with conventional mix testing for 100% reclaimed asphalt pavement mixtures. Transp Res Rec:17–25

Zaumanis M, Mallick RB, Frank R (2014a) 100% recycled hot mix asphalt: a review and analysis. Resour Conserv Recycl 92:230–245

Zaumanis M, Mallick RB, Poulikakos L, Frank R (2014b) Influence of six rejuvenators on the performance properties of reclaimed asphalt pavement (RAP) binder and 100% recycled asphalt mixtures. Constr Build Mater 71:538–550

Zhang J, Guo C, Chen T, Zhang W, Yao K, Fan C et al (2021) Evaluation on the mechanical performance of recycled asphalt mixtures incorporated with high percentage of RAP and self-developed rejuvenators. Constr Build Mater 269:121337

Zhang Y, Wang J, Deng H, Zhang D, Wang Y (2023) Developing a multidimensional assessment framework for clean technology transfer potential and its application on the belt and road initiative countries. J Clean Prod 401:136769

Zhao S, Huang B, Shu X, Woods M (2013) Comparative evaluation of warm mix asphalt containing high percentages of reclaimed asphalt pavement. Constr Build Mater 44:92–100

Zhao S, Huang BS, Shu X, Woods ME (2016) Quantitative evaluation of blending and diffusion in high RAP and RAS mixtures. Mater Des 89:1161–1170

Zheng XW, Xu WY, Xu HP, Wu SX, Cao K (2022) Research on the ability of bio-rejuvenators to disaggregate oxidized asphaltene nanoclusters in aged asphalt. Acs Omega 7:21736–21749

Download references

This work was supported by the National Natural Science Foundation of China (No. 51778515 and No. 71961137010), the Technological Innovation Major Project of Hubei Province (2019AEE023), the Key R&D Program of Hubei Province (2020BCB064), and the State Key Laboratory of Silicate Materials for Architectures (Wuhan University of Technology, No. SYSJJ2019-20).

Author information

Authors and affiliations.

State Key Laboratory of Silicate Materials for Architectures, Wuhan University of Technology, Wuhan, 430070, China

Qi Jiang, Wei Liu & Shaopeng Wu

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study’s conception and design. Visualization and supervision were performed by Wei Liu and Shaopeng Wu. Review and editing were performed by Qi Jiang and Wei Liu. Project administration was performed by Shaopeng Wu. The first draft of the manuscript was written by Qi Jiang, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Qi Jiang .

Ethics declarations

Ethics approval.

Not applicable.

Consent to participate

All authors agreed to participate in this study.

Consent for publication

All authors agree to publish.

Conflict of interest

The authors declare no conflict of interest.

Additional information

Responsible Editor: Philippe Garrigues

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Jiang, Q., Liu, W. & Wu, S. Technological advances and challenges of reclaimed asphalt pavement (RAP) application in road engineering—a bibliometric analysis from 2000 to 2022. Environ Sci Pollut Res (2024). https://doi.org/10.1007/s11356-024-33635-w

Download citation

Received : 16 August 2023

Accepted : 06 May 2024

Published : 11 May 2024

DOI : https://doi.org/10.1007/s11356-024-33635-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Reclaimed asphalt pavement
  • Road engineering
  • Visual analysis
  • Knowledge structure. SWOT analysis
  • Find a journal
  • Publish with us
  • Track your research

Weekend rundown: Here's the biggest news you missed this weekend

Daily marijuana use outpaces daily drinking in the U.S., a new study says

A Marijuana plant is displayed as a person smokes marijuana

Daily and near-daily marijuana use is now more common than similar levels of drinking in the U.S., according to  an analysis of national survey data  over four decades.

Alcohol is still more widely used, but 2022 was the first time this intensive level of marijuana use overtook high-frequency drinking, said the study’s author, Jonathan Caulkins, a cannabis policy researcher at Carnegie Mellon University.

“A good 40% of current cannabis users are using it daily or near daily, a pattern that is more associated with tobacco use than typical alcohol use,” Caulkins said.

The research, based on data from the National Survey on Drug Use and Health, was published Wednesday in the journal Addiction. The survey is a highly regarded source of estimates of tobacco, alcohol and drug use in the United States.

In 2022, an estimated 17.7 million people used marijuana daily or near-daily compared to 14.7 million daily or near-daily drinkers, according to the study. From 1992 to 2022, the per capita rate of reporting daily or near-daily marijuana use increased 15-fold.

The trend reflects changes in public policy.  Most states now allow  medical or recreational marijuana, though it remains illegal at the federal level. In November, Florida voters will decide on a constitutional amendment allowing recreational cannabis, and the federal government is moving to  reclassify marijuana  as a less dangerous drug.

Research shows that high-frequency users are more likely to become addicted to marijuana, said Dr. David A. Gorelick, a psychiatry professor at the University of Maryland School of Medicine, who was not involved in the study.

The number of daily users suggests that more people are at risk for developing problematic cannabis use or addiction, Gorelick said.

“High frequency use also increases the risk of developing cannabis-associated psychosis,” a severe condition where a person loses touch with reality, he said.

what is content analysis in the research

The Associated Press

what is content analysis in the research

RSC Advances

Rapid quantitative analysis of petroleum coke properties by laser-induced breakdown spectroscopy combined with random forest based on a variable selection strategy †.

ORCID logo

* Corresponding authors

a Key Laboratory of Synthetic and Natural Functional Molecular Chemistry of Ministry of Education, College of Chemistry & Material Science, Northwest University, Xi'an, China E-mail: [email protected] , [email protected]

b China Certification & Inspection Group Shan Dong Co; Ltd, Qing Dao, China

c College of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an, China

Driven by the “double carbon” strategy, petroleum coke short-term demand is growing rapidly as a negative electrode material for artificial graphite. The analysis of petroleum coke physicochemical properties has always been an important part of its research, encompassing significant indicators such as ash content, volatile matter and calorific value. A strategy based on laser-induced breakdown spectroscopy (LIBS) in combination with chemometrics is proposed to realize the rapid and accurate quantification of the above properties. LIBS spectra of 46 petroleum coke samples were collected, and an original random forest (RF) calibration model was constructed by optimizing the pretreatment parameters. The RF calibration model was further optimized based on variable importance measures (VIM) and variable importance in projection (VIP) methods. After variable selection, the elemental spectral lines related to ash content, volatile matter and calorific value modeling were screened out, thus initially exploring the correlation between these properties and elements. Under the optimized spectral pretreatment method, VI threshold and model parameters, the mean relative error (MRE P ) of the prediction set of ash content, volatile matter and calorific value were 0.0881, 0.0527 and 0.006, the root mean square error (RMSE P ) of the prediction set of ash content, volatile matter and calorific value were 0.0471%, 0.6178% and 0.2697 MJ kg −1 , respectively, and the determination coefficient ( R P 2 ) of the prediction set was 0.9187, 0.9820 and 0.9510, respectively. The combination of LIBS technology and chemometric methods can provide powerful technical means for the analysis and evaluation of the physicochemical properties of petroleum coke.

Graphical abstract: Rapid quantitative analysis of petroleum coke properties by laser-induced breakdown spectroscopy combined with random forest based on a variable selection strategy

Supplementary files

  • Supplementary information PDF (166K)

Article information

what is content analysis in the research

Download Citation

Permissions.

what is content analysis in the research

Rapid quantitative analysis of petroleum coke properties by laser-induced breakdown spectroscopy combined with random forest based on a variable selection strategy

S. Hu, J. Ding, Y. Dong, T. Zhang, H. Tang and H. Li, RSC Adv. , 2024,  14 , 16358 DOI: 10.1039/D4RA02873B

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence . You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication , please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author, advertisements.

IMAGES

  1. Content Analysis For Research

    what is content analysis in the research

  2. 10 Content Analysis Examples (2024)

    what is content analysis in the research

  3. What it is Content Analysis and How Can you Use it in Research

    what is content analysis in the research

  4. Content Analysis

    what is content analysis in the research

  5. Content Analysis

    what is content analysis in the research

  6. Content Analysis For Research

    what is content analysis in the research

VIDEO

  1. Definitions / Levels of Measurement . 3/10 . Quantitative Analysis . 21st Sep. 2020 . #AE-QN/QL-201

  2. Content Analysis || Research Methodology || Dr.vivek pragpura || sociology with vivek ||

  3. Guide to Data Analytics for Social Media Monitoring Webinar Walkthrough

  4. Sampling plans in content analysis

  5. Content Analysis and Item Generation| Dr Muhammad Sarwar

  6. How to do content analysis in Excel and the concept of content analysis ( Amharic tutorial)

COMMENTS

  1. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers and magazines. Speeches and interviews. Web content and social media posts. Photographs and films.

  2. Content Analysis Method and Examples

    Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts.

  3. Content Analysis

    Content analysis is a research method used to analyze and interpret the characteristics of various forms of communication, such as text, images, or audio. It involves systematically analyzing the content of these materials, identifying patterns, themes, and other relevant features, and drawing inferences or conclusions based on the findings.

  4. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers, and magazines; Speeches and interviews;

  5. Chapter 17. Content Analysis

    Chapter 17. Content Analysis Introduction. Content analysis is a term that is used to mean both a method of data collection and a method of data analysis. Archival and historical works can be the source of content analysis, but so too can the contemporary media coverage of a story, blogs, comment posts, films, cartoons, advertisements, brand packaging, and photographs posted on Instagram or ...

  6. A hands-on guide to doing content analysis

    Keywords: Qualitative research, Qualitative data analysis, Content analysis. ... Content analysis, as in all qualitative analysis, is a reflective process. There is no "step 1, 2, 3, done!" linear progression in the analysis. This means that identifying and condensing meaning units, coding, and categorising are not one-time events. ...

  7. How to do a content analysis [7 steps]

    In research, content analysis is the process of analyzing content and its features with the aim of identifying patterns and the presence of words, themes, and concepts within the content. Simply put, content analysis is a research method that aims to present the trends, patterns, concepts, and ideas in content as objective, quantitative or ...

  8. Content Analysis

    Content analysis was a method originally developed to analyze mass media "messages" in an age of radio and newspaper print, well before the digital age. Unfortunately, CTA struggles to break free of its origins and continues to be associated with the quantitative analysis of "communication.".

  9. Content Analysis

    Content analysis is a research method that has been used increasingly in social and health research. Content analysis has been used either as a quantitative or a qualitative research method. Over the years, it expanded from being an objective quantitative description of manifest content to a subjective interpretation of text data dealing with ...

  10. Content analysis

    Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, ... Content analysis is research using the categorization and classification of speech, written text, interviews, images, or other forms of communication. In its beginnings, using the first newspapers at the end of the 19th ...

  11. Demystifying Content Analysis

    Quantitative content analysis is always describing a positivist manifest content analysis, in that the nature of truth is believed to be objective, observable, and measurable. Qualitative research, which favors the researcher's interpretation of an individual's experience, may also be used to analyze manifest content.

  12. What is Content Analysis? Uses, Types & Advantages

    Content analysis is a research method that helps a researcher explore the occurrence of and relationships between various words, phrases, themes, or concepts in a text or set of texts. The method allows researchers in different disciplines to conduct qualitative and quantitative analyses on a variety of texts.

  13. UCSF Guides: Qualitative Research Guide: Content Analysis

    "Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts." Source: Columbia Public Health

  14. Three Approaches to Qualitative Content Analysis

    Content analysis is a widely used qualitative research technique. Rather than being a single method, current applications of content analysis show three distinct approaches: conventional, directed, or summative. All three approaches are used to interpret meaning from the content of text data and, hence, adhere to the naturalistic paradigm.

  15. Qualitative Content Analysis 101 (+ Examples)

    Content analysis is a qualitative analysis method that focuses on recorded human artefacts such as manuscripts, voice recordings and journals. Content analysis investigates these written, spoken and visual artefacts without explicitly extracting data from participants - this is called unobtrusive research. In other words, with content ...

  16. What is Content Analysis

    Content analysis: Offers both qualitative and quantitative analysis of the communication. Provides an in-depth understanding of the content by making it precise. Enables us to understand the context and perception of the speaker. Provides insight into complex models of human thoughts and language use.

  17. Qualitative Content Analysis

    It is a flexible research method ( Anastas, 1999 ). Qualitative content analysis may use either newly collected data, existing texts and materials, or a combination of both. It may be used in exploratory, descriptive, comparative, or explanatory research designs, though its primary use is descriptive.

  18. (PDF) Content Analysis: a short overview

    Inductive content analysis listed all the tweets and each frequent word in two coding books (Appendix : Tables 1 and 2). Content analysis is a research methodology; numerous other analytic ...

  19. Qualitative Content Analysis 101: The What, Why & How (With ...

    Learn about content analysis in qualitative research. We explain what it is, the strengths and weaknesses of content analysis, and when to use it. This video...

  20. Reflexive Content Analysis: An Approach to Qualitative Data Analysis

    If the goal of the analysis is the reduction and description of a dataset in relation to a research question about manifest content, further analysis is unnecessary and may even be counterproductive. It would be unhelpful for data reduction purposes to have an identical code for a distinct concept in multiple places.

  21. Content Analysis

    Content analysis is a method used to analyse qualitative data (non-numerical data). In its most common form it is a technique that allows a researcher to take qualitative data and to transform it into quantitative data (numerical data). The technique can be used for data in many different formats, for example interview transcripts, film, and audio recordings.

  22. (PDF) Content Analysis

    Content analysis is the study of recorded human. communications such as dairy entries, books, newspaper, video s, text messages, tweets, Facebook updates etc. Being the scientific study of the ...

  23. Connecting with fans in the digital age: an exploratory and ...

    This study adopts an exploratory, descriptive, and comparative research design (Andrew et al., 2011) using the observational method and content analysis techniques.Content analysis involves the ...

  24. Content Analysis

    Abstract. In this chapter, the focus is on ways in which content analysis can be used to investigate and describe interview and textual data. The chapter opens with a contextualization of the method and then proceeds to an examination of the role of content analysis in relation to both quantitative and qualitative modes of social research.

  25. When Online Content Disappears

    ABOUT PEW RESEARCH CENTER Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  26. Digital

    Content analysis was employed to scrutinize the data obtained from these interviews. The outcomes of this analysis shed light on the assistive technology acknowledged, utilized, or desired by students with disabilities in both academic and domestic settings. ... Previous research has highlighted that access to appropriate assistive technology ...

  27. Publishing global health research in Nature Medicine

    A brief talk from a senior editor at Nature Medicine sharing insights into the editorial and review process. Join the Department of International Health for a brief talk discussing the priorities of Nature Medicine in the global health field, editorial perspectives as well as insights into the ...

  28. Technological advances and challenges of reclaimed asphalt ...

    Through keywords bursting analysis and keywords-title-abstract clustering analysis, 10 research themes were identified, and four key research themes were selected for text content analysis. Based on the visual and text content analysis results, recommendations were made to bridge the gap in the application of RAP in road engineering.

  29. Daily marijuana use outpaces daily drinking in the U.S., a new study says

    The research, based on data from the National Survey on Drug Use and Health, was published Wednesday in the journal Addiction. The survey is a highly regarded source of estimates of tobacco ...

  30. Rapid quantitative analysis of petroleum coke properties by laser

    Driven by the "double carbon" strategy, petroleum coke short-term demand is growing rapidly as a negative electrode material for artificial graphite. The analysis of petroleum coke physicochemical properties has always been an important part of its research, encompassing significant indicators such as ash content,